
# Chapter 04: `sed` Tutorial
___

`sed` is the stream editor which can conduct text transformation on the input stream (text file or input pipeline).

Here is the basic command synopsis for `sed`:
```bash
sed SCRIPT INPUTFILE
```

And of course `sed` has its own full synopsis:
```bash
sed [OPTIONS] SCRIPT INPUTFILE
```

### Synopsis
```bash
sed -n -e '{COMMAND}' FILENAME  ## 没有地址的情况，这时候不用花括号也可
sed -n -e 'ADDR1{COMMAND}' FILENAME  ## 单个地址的情况
sed -n -e 'ADDR1,ADDR2{COMMAND}' FILENAME ## 一对地址的情况，表示一个范围
```

<font color='red'>NOTE</font>: Although the option `-e` is not compulsory, we still recommend you to use this option especially when you have multiple commands (scripts) to run.

In [4]:
sed -n -e 'p' re/numbers.txt | head -6
#相当于sed -e '' re/numbers.txt，因为没有用-n选项，默认输出模式空间的内容

zero
one
two
three
four
five


### Addresses

1. When no addresses are given, command will be executed on all the input lines;
2. When only one address is given (`3p`), command will be executed on the matching line; the address can also be regular expression
3. When two addresses are given (`3,5p`), commmand will be executed on the range between the two lines, the two addresses can be either __numeric__ or __regular expression__.



#### <font color='blue'>Exercise</font>
1. Have a look at the manual of `sed`, tell what the following command will do:
```bash
sed -n -e 'p' re/numbers.txt     # no address
sed -n -e '3p' re/numbers.txt    # single address
sed -n -e '/^t/p' re/numbers.txt # single address, but defined by a regex
sed -n -e '3,5p' re/numbers.txt  # two addresses, starting from 3, ending at 5
sed -n -e '/^o/, /^t/p' re/numbers.txt # two regexes as starting and ending address
sed -n -e '0~5p' re/numbers.txt  # starting from head, every 5
sed -n -e '1,+5p' re/numbers.txt # starting from 1, continuous 5 lines
sed -n -e '1,~5p' re/numbers.txt # starting from 1, ending at the multiplier of 5
sed -n -e '1,1p' re/numbers.txt  # starting from 1, ending at 1
```
2. Which of the following commands are valid? Why?
```bash
sed -n -e '0,1p' re/numbers.txt      # the first address is 0, second is also a number
sed -n -e '0,/^z/p' re/numbers.txt   # the first address is 0, second is regex
sed -n -e '/^o/,/^o/p' re/numbers.txt # the two address are two same regexes
sed -n -e '/^t/,/^t/p' re/numbers.txt
```
3. Tell what will be printed for the following two commands, and tell the reasons.
```bash
sed -n -e '/^t/{p;q}' re/numbers.txt
sed -n -e '/^t/p;q' re/numbers.txt
```
4. Write down the output
```bash
sed -n -e '8,~5!p' re/numbers.txt
```

### Options (`sed`的常用选项的用法)

`sed` can have different options:

| Option | Description |
| --- | --- |
| -n, --quiet, --silent | Suppress automatic printing of pattern space. 不在每个工作周期结束后默认自动输出模式空间中的内容 |
| -e SCRIPT, --expression=SCRIPT | Add the script to the commands to be executed. 一般用于多个不同的命令作用于不同的地址（条件） |
| -f SCRIPT-FILE, --file=SCRIPT-FILE | Use the scripts in SCRIPT-FILE. 将命令置入一个文件中 |
| -i[SUFFIX], --in-place[=SUFFIX] | Edit files in place (backup if SUFFIX is supplied). 直接修改文件，而不是修改缓存中的内容，但可以将源文件进行备份 |
|  -l N, --line-length=N | Specify the desired line-wrap length for command `l`. |
| -r, --regrexp-extended | Use ERE in the script. 使用扩展正则表达式 |
| -s, --separate | Consider files as separate rather than as a single continuous stream. 针对多个文件的情况，对每个文件单独处理而不将其视为连续的流 |
| -u, --unbuffered | Load minimal amounts of data from the input files and flush the output buffers more often. |
| -z, --null-data | Separate lines by NUL characters. |


### Two data buffers for `sed`(`sed`中的数据缓冲区)

`sed` contains two data buffers:
1. __pattern space (模式空间)__
2. __hold space (保留空间)__

#### Execution cycle (工作周期)
1. Reads one line from the input stream, removes any trailing newline, and places it in the pattern space （读入一行，去除换行符，置入模式空间中）.
2. Executes the command if the addresses (conditions) are verified. (如果地址或者条件符合，则执行后面的命令)
3. When the end of the script is reached, unless the `-n` option is used, the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed (除非用了-n选项，否则在所有命令结束后自动将模式空间中的内容输出到标准输出，并在后面自动加入换行符)
4. goto 1

#### Note (注意)
1. Unless special commands (like `D') are used, the pattern space is deleted between two cycles. (除非使用了如`D`的特殊命令，两个工作周期中间模式空间中的内容被自动删除)
2. The hold space, on the other hand, keeps its data between cycles (see commands `h', `H', `x', `g', `G' to move data between both buffers). （相反，保留空间中的内容在两个工作周期中被保留）


### Commands for `sed` SCRIPTS

`sed` has many different commands:

##### 1. Commands for non-address

| Command | Description | Example |
| --- | --- | --- |
| : label | **label for b and t commands （为命令t和b提供的标签位置）** | : loop |
| #comment | The comment extends until the next new line （注释） | |

##### 2. Commands for zero- or one-address

| Command | Description | Example |
| --- | --- | --- | 
| = | Print the line number of current address | `sed -n '1,~5=' test` |
| i \TEXT   | Insert TEXT before the current address 在当前的地址前面插入内容 | `sed '1~5i \NEW' test` |
| a \TEXT   | Append TEXT after the current address 在当前的地址后面加入内容 | `sed '1~5a \NEW' test` |
| q [EXIT_CODE] | Quit immediately without any further processing; will print the content space if auto-print is enabled. 立即退出，但同时将模式空间中的内容自动输出，如果没有指定-n选项的话 | `sed -e '1~5p' -e '14q' test` |
| Q [EXIT_CODE] | Quit immediately without any further processing 马上退出，且不执行任何自动输出，即便没有使用-n选项 | `sed -e '1~5p' -e '14Q' test` |
| r FILENAME | Append the text read from FILENAME 将从文件FILENNAME中读取的内容附加到模式空间中 | `sed -n -e '1～5p;1~5r test2' test` |
| R FILENAME | Append a line read from FILENAME 从文件FILENAME中读取一行，附加到模式空间中 | `sed -n -e '1～5p;1~5R test2' test` |

##### 3. Commands for 1+-addresses

| Command | Description | Example |
| --- | --- | --- |
| c \TEXT   | Change the range with TEXT 改变这个范围地址内容  | `sed '1,5c \NEW' test` |
| d | Delete the pattern space and start a new cycle. 删除模式空间中的内容，并开始新的工作周期 | `sed -n -e '1,5d;p' test`  |
| D | Delete the first line in the pattern space if the pattern space contains a newline. 删除模式空间中第一行的内容 | `sed -e '1,5{N;D}' test` |
| n | Copy the next line to the pattern space. 将模式空间中的内容替换为下一行的内容 | `sed -e 'n;p' test` |
| N | Append the next line to the pattern space. 将下一行的内容附加到模式空间后面 | `sed -n -e 'N;p' test` |
| g | Copy the hold space to the pattern space. 将模式空间中的内容替换为保留空间中的内容 |                        |
| G | Append the hold space to the pattern space. 将保留空间中的内容附加到模式空间后 | `sed -n -e '1!G;h;$p' test3` |
| h | Copy the pattern space to the hold space. 将模式空间的内容替换为保留空间的内容 | `sed -n -e '1!G;h;$p' test3`  |
| H | Append the pattern space to the hold space. 将模式空间内的内容附加到保留空间内容后面 |                       |
| x | Exchange the pattern space and the hold space. 将模式空间内容与保留空间互换 |                    |
| b LABEL | Branch to LABEL; if LABEL is omitted, branch to end of script. 跳转到由:LABEL指定的标签，用于循环执行；如不指定标签，跳转到脚本的最后 |        |
| t LABEL | If  a  s///  has done a successful substitution since the last input line was  read  and  since the  last t or T command, then branch to LABEL; if LABEL is omitted, branch to end of script. 如果在上一行读入之后或者在执行上一次t或者T命令后，还有成功的替换发生，则跳转到LABEL |             |
| T LABEL | If  NO  s///  has done a successful substitution since the last input line was  read  and  since the  last t or T command, then branch to LABEL; if LABEL is omitted, branch to end of script. 如果替换不成功，则跳转；如果么有指定，则跳转到脚本最后 |              |
| s/REGEX/REPLACE/ | Substitute the pattern REGEX into REPLACE. 文本替换 |                    |
| y/SRC/DEST/  | Transliterate the char in SRC to correponding char in DEST 相当于翻译 | `sed 'y/a-z/A-Z/' test` |
| p |   Print the current pattern space. 输出模式空间中的内容  | `sed -n -e '1~5N;p' test` |
| P |   Print the first line of the current space. 输出模式空间中的第一行内容 | `sed -n -e '1~5N;P' test` |
| w FILENAME | Write the pattern space into FILENAME. 将模式空间中的内容写入文件 |  `sed -n -e '1~5w test3' test`  |
| W FILENAME | Write the first line of the pattern space into the FILENAME. 将模式空间中的第一行内容写入文件 |  `sed -e '1~5N;W test3' test` |


<font color="red">NOTE</font>: We can add "!" before any command, which means ACTION on all the other lines with the exception of the given line.


### <font color='blue'>Exercise</font>
#### 1. What is the difference between `sed -n -e '1,5{p;N;D}' test` and `sed -n -e '1,5{p;N;d}' test`?
#### 2. Here is a complete example of *commify* (Add commas as thousands separator)

(1) This is the file `numbers.txt`:
```1
12
123
1234
12345
123456
1234567
12345678
123456789
1234567890
1234567890.1234
+1234567890.1234
-1234567890.1234
$1234567890.1234
```

(2) `sed` command:
```bash
sed -r ':a; s@(^|[^0-9.])([0-9]+)([0-9]{3})@\1\2,\3@g;t a' numbers.txt
```

#### 3. Rewrite the file so that all the numbers in the file have 2 valid decimal digits. 
```
name marks grade
abc 50.5 CB
def 45 CC
ghhi 55 CA
jkl 85 A
mno 75.0 BA
pqr 77 BA
stu 89.50 A
```

#### 4. There is a file containing 1 to 100, with each on one line. Rewrite the file so that the numbers will be printed on one single line, separated by TAB.

#### 5. Tell the difference of the following commands:
```bash
sed '1,+5!c\NEWLINE' re/numbers.txt
sed '1,~5!c\NEWLINE' re/numbers.txt
sed '1,~5!a\NEW' re/numbers.txt
sed '1,~5!i\NEW' re/numbers.txt
```

In [13]:
sed '1,~5!c\NEW' re/numbers.txt

zero
one
two
three
four
NEW
NEW
NEW
NEW
NEW


In [28]:
sed -n -e '1!G;$!h;${s/\n/ /g;p}' re/numbers.txt

nine eight seven six five four three two one zero


Here is the explantion for the above command:
* For first line, `1h` will copy the pattern space to the hold space.
* For the other lines, `G;h` will append the hold space to the pattern space, and then copy back to the hold space. `s/\n/ /g` will replace all the newlines with a space.
* For the final line, `p` will print out the pattern space.

In [18]:
sed -n -e '1h;1!H;${g;s/\n/ /g;p}' re/numbers.txt

zero one two three four five six seven eight nine

Here is the explanation:
* For line 1, `1h;1!H` will copy the pattern space to the hold space.
* For the other lines, `1!H` will append the pattern space to the hold space.
* For the last line `$`, `g` will copy the hold space to the pattern space, and `s/\n/ /g` will replace all the newlines with a space, and then print out the pattern space.

In [20]:
echo "-234567890.333" | sed -r ':a; s/\B[0-9]{3}\b/,&/; ta'

-234,567,890.333


Here is the explanation:
* `:a`: create a label named `a`;
* `s/\B[0-9]{3}\b/,&/`: match three consecutive digits from the margin to left, insert a comma before
* If the last `s///` returns true, branch to the label `a`; branch to the end otherwise

### Note
* If you do want to use the `extended` regular expression, apply `-r` option to `sed` command.
* If you do want to use `Perl-style` regular expression (for example, __zero-width assertion__) in sed, please download [ssed](http://sed.sourceforge.net/grabbag/ssed/sed-3.62.tar.gz) and run `ssed` with option `-R`.

## Reference
* [sed FAQ](http://sed.sourceforge.net/sedfaq3.html)