|
| 1 | +--- |
| 2 | +title: Stream Editing |
| 3 | +date: 2015-05-17 |
| 4 | +tags: cli-options, golf, globals, one-liner |
| 5 | +--- |
| 6 | + |
| 7 | +One of Ruby's goals was to replace popular unix *stream editors* like `awk` or `sed`, which both have the concept of manipulating files in a line-based manner. Ruby has the `-n` option for this: |
| 8 | + |
| 9 | + Causes Ruby to assume the following loop around your script, which makes it |
| 10 | + iterate over file name arguments somewhat like sed -n or awk. |
| 11 | + |
| 12 | + while gets |
| 13 | + ... |
| 14 | + end |
| 15 | + |
| 16 | +And its sibling `-p`: |
| 17 | + |
| 18 | + Acts mostly same as -n switch, but print the value of variable $_ at the each |
| 19 | + end of the loop. |
| 20 | + For example: |
| 21 | + |
| 22 | + % echo matz | ruby -p -e '$_.tr! "a-z", "A-Z"' |
| 23 | + MATZ |
| 24 | + |
| 25 | +What you need to know is that the [special global variable](http://idiosyncratic-ruby.com/9-globalization.html) `$_` contains the last read input. When using `-n` or `-p`, this usualy means the current line. Another thing to keep in mind: `gets` reads from [`ARGF`](http://readruby.io/io#argf), not from `STDIN`, so you can pass arguments that will be interpreted as filenames of the files that should be processed. Equipped with this knowlegde, you can build a very basic example, which just prints out the given file: |
| 26 | + |
| 27 | + $ ruby -ne 'print $_' filename |
| 28 | + |
| 29 | +Since print without arguments implicitely prints out `$_`, this can be shortened to: |
| 30 | + |
| 31 | + $ ruby -ne 'print' filename |
| 32 | + |
| 33 | +If one uses `-p`, instead of `-n`, no code is required, because `-p` will call `print` implicitely: |
| 34 | + |
| 35 | + $ ruby -pe '' filename |
| 36 | + |
| 37 | +Now let's modify each line: |
| 38 | + |
| 39 | + $ ruby -pe '$_.reverse!' filename |
| 40 | + |
| 41 | +This will print out the file with all its lines reversed. |
| 42 | + |
| 43 | +Here is another example, which will print every line in a random ANSI color: |
| 44 | + |
| 45 | + $ ruby -ne 'print "\e[3#{rand(8)}m#$_"' filename |
| 46 | + |
| 47 | +There is more to assist you in writing these short line manipulation scripts: |
| 48 | + |
| 49 | +## The Ruby One-Liner Toolbox |
| 50 | + |
| 51 | +* CLI Options: `-n` `-p` `-0` `-F` `-a` `-i` `-l` |
| 52 | +* Global Variables: `$_` `$/` `$\` `$;` `$F` `$.` |
| 53 | +* Methods that operate on `$_`, implicetly: `print` `~` |
| 54 | +* The special `BEGIN{}` and `END{}` blocks |
| 55 | + |
| 56 | +## Running Code Before or After Processing the Input |
| 57 | + |
| 58 | +You can run code before the loop starts with `BEFORE` and after the loop with `END`. For example, this will count characters: |
| 59 | + |
| 60 | + $ ruby -ne 'BEGIN{ count = 0 }; count += $_.size; END{ print count }' filename |
| 61 | + |
| 62 | +## Using Line Numbers |
| 63 | + |
| 64 | +`$.` contains the current line number. A use-case would be counting the lines of a file: |
| 65 | + |
| 66 | + $ ruby -ne 'END{p$.}' filename |
| 67 | + |
| 68 | +## String Matching |
| 69 | + |
| 70 | +Now let's do some conditional processing: Only print a line if it contains a digit: |
| 71 | + |
| 72 | + $ ruby -ne 'print if ~/\d/' filename |
| 73 | + |
| 74 | +The message to take away: The `~` method implicitely matches the regex against `$_`. |
| 75 | + |
| 76 | +But it gets even better: |
| 77 | + |
| 78 | + $ ruby -ne 'print if /\d/' filename |
| 79 | + |
| 80 | +You thought conditions with a truthy value will always execute the `if`-branch of a conditions? They will not, if the truthy value is a non-matching regex literal! |
| 81 | + |
| 82 | +This also works when using the ternary operator for conditions: |
| 83 | + |
| 84 | + $ ruby -ne 'puts "#$.: #{ /\d/ ? "first digit: #$&" : "no digit" }"' filename |
| 85 | + |
| 86 | +## Inplace-Editing files |
| 87 | + |
| 88 | +Using the `-i` option, you can modify files directy (just like `sed`'s `-i` mode). For example, removing all trailing spaces: |
| 89 | + |
| 90 | + $ ruby -ne 'puts $_.rstrip!' -i filename |
| 91 | + |
| 92 | +Like in `sed`, you can provide a file extension to the `-i` option which will be used to create a backup file before processing: |
| 93 | + |
| 94 | + $ ruby -pe '$_.upcase!' -i.original filename |
| 95 | + |
| 96 | +## Auto-splitting Lines |
| 97 | + |
| 98 | +The `-a` option will run `$F = $_.split` for every line: |
| 99 | + |
| 100 | + $ ruby -nae 'puts $F.reverse.join(" ")' filename |
| 101 | + |
| 102 | +## Specify Line Format |
| 103 | + |
| 104 | +You might not always want to use `\n` as the character that separates lines. Fortunately, Ruby has [record separators](http://idiosyncratic-ruby.com/16-changing-the-rules.html#change-a-global-default-separator), and you can set some of them via command-line options: |
| 105 | + |
| 106 | +Option | Variable | Description |
| 107 | +-------|-----------|------------ |
| 108 | +`-0` | `$/` | Sets the *input record separator*, which is used by `Kernel#gets`. Character to use must be given as [octal number](http://en.wikipedia.org/wiki/Octal). If no number is given (`-0`), it will use null bytes as separator. Using `-0777` will read in the whole file at once. Another special value is `-00`, which will set `$_` to `"\n\n"` (paragraph mode). |
| 109 | +`-F` | `$;` | Sets the *input field separator*, which is used by `Array#split`. Useful in combination with the `-a` option. |
| 110 | +`-l` | `$\` | Sets the *output record separator* to the value of the *input record separator* (`$/`). Also runs [String#chop!](http://ruby-doc.org/core-2.2.2/String.html#method-i-chop-21) on every line! |
| 111 | +{:.table-10-10-X} |
| 112 | + |
| 113 | +## Further Reading |
| 114 | + |
| 115 | +- [sed](https://en.wikipedia.org/wiki/Sed) |
| 116 | +- [un](http://idiosyncratic-ruby.com/6-run-ruby-run.html) |
| 117 | +- [pru](https://github.com/grosser/pru) |
0 commit comments