# Understanding SED

<hr>

If you're using OSX (mac) I recommend to install GNU sed. The sed version that comes preinstalled with OSX is 

> `$ brew install gnu-sed`

Then edit the `.bash_profile`, add:

> `alias sed='gsed'`

Save the file and then 'refresh' the profile by typing:

> `$ source .bash_profile`

(or close and open the terminal again).

<hr>

### Intro

_I'll use `sed`, SED, sed interchangeably in this tutorial._

<hr>

# <center> ! </center>

If you're looking for a more simple approach for text/pattern processing (searching, replacing, extracting) I'd recommend to stick to `grep`. 

<br>

<hr>

`sed` is a special type of text editor. It will allow you to automatically edit (ie, substitute) text files using pattern matching (regular expressions) while using conditions (or not).

`sed` stands for __s__tream __ed__itor (for filtering and transforming text). It will read a text file and perform operations (like substitutions or deletions) and output the results. 

In its simplest form sed reads the text file line by line, if the pattern looked for is matched performs an operation and prints the result.

With sed you can convert DNA sequences to RNA or viceversa, change complete words and even sentences from text files.

### Getting started with `sed`

In [58]:
sed 's/is/isnt/' zen.txt | head # > newZen.txt # Add this to redirect to a new file.

The Zen of Python, by Tim Peters

Beautiful isnt better than ugly.
Explicit isnt better than implicit.
Simple isnt better than complex.
Complex isnt better than complicated.
Flat isnt better than nested.
Sparse isnt better than dense.
Readability counts.
Special cases aren't special enough to break the rules.


In [15]:
## Change only lines 1-4
sed '1,4s/is/isnt/g'<zen.txt | head -n6 # > newZen.txt # Add this to redirect to a new file.

The Zen of Python, by Tim Peters

Beautiful isnt better than ugly.
Explicit isnt better than implicit.
Simple is better than complex.
Complex is better than complicated.


In [77]:
## Change all lines  EXCEPT 1-4
sed '1,4!s/is/isnt/g'<zen.txt | head -n6 # > newZen.txt # Add this to redirect to a new file.

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple isnt better than complex.
Complex isnt better than complicated.


Unless specified, `sed` will only perform the substitution in the first (if found) occurence:

In [63]:
sed 's/e/E/' zen.txt | head # > newZen.txt # Add this to redirect to a new file.

ThE Zen of Python, by Tim Peters

BEautiful is better than ugly.
Explicit is bEtter than implicit.
SimplE is better than complex.
ComplEx is better than complicated.
Flat is bEtter than nested.
SparsE is better than dense.
REadability counts.
SpEcial cases aren't special enough to break the rules.


To instruct `sed` to make changes in ALL occurrences use __g__ (global):

In [70]:
sed 's/e/u/g' zen.txt | head # > newZen.txt # Add this to redirect to a new file.

Thu Zun of Python, by Tim Puturs

Buautiful is buttur than ugly.
Explicit is buttur than implicit.
Simplu is buttur than complux.
Complux is buttur than complicatud.
Flat is buttur than nustud.
Sparsu is buttur than dunsu.
Ruadability counts.
Spucial casus arun't spucial unough to bruak thu rulus.


## <center> ! </center>

Caution, `sed` makes changes whenever it finds a matching pattern so you have to be careful to specify the pattern so it won't have any unintended consequences:

(example from the [grymoire](http://www.grymoire.com/Unix/sed.html))

In [125]:
echo Sunday | sed 's/day/night/'

Sunnight


<hr>

#### Formatting files with  `sed`

Sometimes a delimited text file will have whitspaces - leading (at the start) or trailing (at the end)- or (for any reason) empty lines. With `sed` we can delete those trailing whitespaces that might affect how a program that will process the file behaves.

In [56]:
## This file has leading and trailing whitespaces and an empty line.
# trailing whitespaces are harder to notice but if they exist in some lines they can affect the response of some programs.
head sed_test.txt

 rowA		1	9	

 . 	2	7	10
file3	3	6	20
 file4	4	5	
line_a	12	13	144
 line_b			177


In [35]:
## Delete leading tab spaces
sed 's/^[ \t]*//' sed_test.txt

rowA		1	9	

. 	2	7	10
file3	3	6	20
file4	4	5	
line_a	12	13	144
line_b			177


In [37]:
## Delete both leading and trailing whitespaces.
# sed can accept multiple patterns separated by semicolons:
# sed 's/pattern1/substitution1/;s/pattern2/substitution2/'
sed 's/^[ \t]*//;s/[ \t]*$//' sed_test.txt

rowA		1	9	

. 	2	7	10
file3	3	6	20
file4	4	5	
line_a	12	13	144
line_b			177


In [39]:
# How do we know this works? Instead of deleting add a character
sed 's/^[ \t]*//;s/[ \t]*$/.../' sed_test.txt

rowA		1	9	...
...
. 	2	7	10...
file3	3	6	20...
file4	4	5	...
line_a	12	13	144...
line_b			177...


In [41]:
# We can delete blank linkes:
# sed '/pattern/delete'
sed '/^$/d' sed_test.txt

 rowA		1	9	
 . 	2	7	10
file3	3	6	20
 file4	4	5	
line_a	12	13	144
 line_b			177


## How sed works?


A basic `sed` instruction is given by an action (ie ``s``), a __pattern__ and a __substitution__ enclosed in a `delimiter` and (optional) another action.

> `sed 's/is/isnt/' zen.txt | head`


> `action/pattern/substitute/action`

The pattern and substitute need to be encolsed within __three__ delimiters. Conventionally they're slashes (__/__) but any other symbol will work


> `sed 's*is*isnt*' zen.txt | head` <br>
> `sed 's_is_isnt_' zen.txt | head`<br>
> `sed 's:is:isnt:' zen.txt | head`<br>



## Actions

__d__ for delete.

We can delete specific lines or those that match a pattern

In [127]:
head -n 6 zen.txt 

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.


In [128]:
# Delete first 4 lines
sed '1,4d' <zen.txt | head -n 6

Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.


In [129]:
# Delete ALL except first 4 lines
sed '1,4!d' <zen.txt | head -n 6

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.


### no printing and printing

the __`-n`__ option will avoid printing a line if the pattern isn't matched. We must turn on print again with the  __`/p`__ flag after the pattern delimiter.

This mode of `sed` mimics output from `grep`.

In [293]:
# Will read everything
sed '/better/p' zen.txt | wc -l

      29


From the sed manual:

<pre>
        -n    By default, each line of input is echoed to the standard output
              after all of the commands have been applied to it.  The -n option
              suppresses this behavior. 
</pre>

In [295]:
## Only matched lines will be printed.
sed -n '/better/p' zen.txt  | wc -l
echo "---"
sed -n '/better/p' zen.txt 

       8
---
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Now is better than never.
Although never is often better than *right* now.


In [296]:
# Same results with grep:
grep 'better' zen.txt

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Now is better than never.
Although never is often better than *right* now.


In [320]:
## Duplicate first n lines
sed '1,6 p' zen.txt | head

The Zen of Python, by Tim Peters
The Zen of Python, by Tim Peters


Beautiful is better than ugly.
Beautiful is better than ugly.
Explicit is better than implicit.
Explicit is better than implicit.
Simple is better than complex.
Simple is better than complex.


In [322]:
## Duplicate last n lines
sed '11,$ p' zen.txt | tail

Now is better than never.
Now is better than never.
Although never is often better than *right* now.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Namespaces are one honking great idea -- let's do more of those!


### delete

WIll 'remove' (not print) lines matching a pattern. This can be useful to trim patterns, delete the first N lines, etc.

In [312]:
## Delete lines from 11 to end of file
sed '6,$ d' zen.txt

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.


In [133]:
# Ignore lines with a pattern.
sed '/better/d' zen.txt

The Zen of Python, by Tim Peters

Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [134]:
# Same results with grep:
grep -v 'better' zen.txt

The Zen of Python, by Tim Peters

Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [135]:
## Print only the header of a SAM file
sed -n '/@/p' SAM_header.sam

@HD	VN:1.0	SO:unsorted
@SQ	SN:Chr1	LN:30427671
@SQ	SN:Chr2	LN:19698289
@SQ	SN:Chr3	LN:23459830
@SQ	SN:Chr4	LN:18585056
@SQ	SN:Chr5	LN:26975502
@SQ	SN:chloroplast	LN:154478
@SQ	SN:mitochondria	LN:366924
@PG	ID:Bowtie	VN:1.2.1.1	CL:"bowtie-align --wrapper basic-0 -a --best --strata -n 1 -m 1 -p 4 --sam --tryhard athGenome/bwt1_genome/athIndex - -S AlignedReads/bwt1_genome/testSeq.sam"


In [136]:
## Remove the header of a SAM file
sed '/@/d' SAM_header.sam | head

SRR1463325.44	16	Chr1	25048610	255	48M	*	0	0	GTGCAACCGAACAAGGGAAGCTTCCACATTGTCCAGTACCGTCCATCA	FFIIIIIIIIIIIIIIIIIIIIIIFFIIIIIFIIIIIFIIIIIIIIFF	XA:i:0	MD:Z:48	NM:i:0	XM:i:2
SRR1463325.45	16	Chr4	8917612	255	48M	*	0	0	TAACGATGAGAGTTTTGGCTTTGGTCCTAACAATGGTAGCTGCGACAG	IIIIIIIIFIIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFF	XA:i:0	MD:Z:48	NM:i:0	XM:i:2
SRR1463325.49	0	chloroplast	38699	255	48M	*	0	0	TTCGTTCTATACATATGACCCGCAATGAGGAAAAGAATTGCGATAGCT	FFFFFIIFFIIFIFFFFIFFFFFIIFFFBFFIFIIIIIIFBFIIIIII	XA:i:0	MD:Z:48	NM:i:0	XM:i:2
SRR1463325.52	16	chloroplast	52634	255	48M	*	0	0	CGGAGTCAGTACACAAAGATTTAAGGTCATTTCTTCAATTTACTCTCC	IIIIFFIIIIFIIIIIIIIIIIIIIIIIIIIIFIIIIIFIFBIFIFFF	XA:i:0	MD:Z:48	NM:i:0	XM:i:2
SRR1463325.58	0	chloroplast	76111	255	48M	*	0	0	ACGCTATTCCGGTAATAGGATCACCTCTTGTAGAATTATTACGCGGAA	FFFIIIIIIIIIIIIIIIIIIIIIIBFFIIBFFIIIIIIIIFFFIFFF	XA:i:0	MD:Z:48	NM:i:0	XM:i:2
SRR1463325.63	0	Chr4	16355475	255	48M	*	0	0	TGTCTGGAATAGCAACACTCTCTCCACTGATGACTTCATTGGCAATGC	FFIIIIIIIIIIIIIFFIIIIIIIIIIIIFFII

In [137]:
sed -n '1~4p;2~4p' testQ32.fastq

sed: 1: "1~4p;2~4p": invalid command code ~


### Modify the matched pattern
If you don't want to substitute the pattern but modify it the __`&`__ acts as a placeholder for the matched string:

In [141]:
sed 's/better/&er/' zen.txt | head

The Zen of Python, by Tim Peters

Beautiful is betterer than ugly.
Explicit is betterer than implicit.
Simple is betterer than complex.
Complex is betterer than complicated.
Flat is betterer than nested.
Sparse is betterer than dense.
Readability counts.
Special cases aren't special enough to break the rules.


In [142]:
# Duplicate the pattern matched
sed 's/is/&&/' zen.txt | head

The Zen of Python, by Tim Peters

Beautiful isis better than ugly.
Explicit isis better than implicit.
Simple isis better than complex.
Complex isis better than complicated.
Flat isis better than nested.
Sparse isis better than dense.
Readability counts.
Special cases aren't special enough to break the rules.


### Remembering part of the patterns.

We can enclose parts of regular expressions ins parentheses to divide it into multiple patterns. The part of the pattern can be called with `\1`


The following regular expression matches a word:

> `\([a-zA-Z0-9]*\).*`

* `\(` and `\)` delimit the pattern.
* `[`  `]` specify a range of characters.
    * In this case any words having characters between `a-z` or `A-Z` or `0-9`.
* \.\* To match the longest string possible until the next whitespace or break.

The following regular expression matches a character:
> `\([a-zA-Z]\)`


Enclosing regular expressions between `\(pattern)\` can be recalled using `\{Number}`. Up to 9 patterns can be '_remembered_':

In [219]:
# Get the first two words and swap them
sed 's/\([a-zA-Z0-9]*\).* \([a-zA-Z0-9]*\).*/\2 \1/' zen.txt | head

Peters The

ugly Beautiful
implicit Explicit
complex Simple
complicated Complex
nested Flat
dense Sparse
counts Readability
rules Special


In [220]:
# Without * after the range and .* after the remembered pattern
sed 's/\([a-zA-Z0-9]\)\([a-zA-Z0-9]\)/\2\1/' zen.txt | head

hTe Zen of Python, by Tim Peters

eBautiful is better than ugly.
xEplicit is better than implicit.
iSmple is better than complex.
oCmplex is better than complicated.
lFat is better than nested.
pSarse is better than dense.
eRadability counts.
pSecial cases aren't special enough to break the rules.


In [230]:
## Spaces between the patterns are important to tell sed the word breaks 
sed 's/\([a-zA-Z0-9]*\).* \([a-zA-Z0-9]*\).* \([a-zA-Z0-9]*\).*/\3 \2 \1/' zen.txt | head

Peters Tim The

ugly than Beautiful
implicit than Explicit
complex than Simple
complicated than Complex
nested than Flat
dense than Sparse
Readability counts.
rules the Special


In [231]:
## No spaces between the first and second pattern:
sed 's/\([a-zA-Z0-9]*\).*\([a-zA-Z0-9]*\).* \([a-zA-Z0-9]*\).*/\3 \2 \1/' zen.txt | head

Peters  The

ugly  Beautiful
implicit  Explicit
complex  Simple
complicated  Complex
nested  Flat
dense  Sparse
counts  Readability
rules  Special


In [239]:
# Using extended regular expressions:
sed -E 's/([a-zA-Z]+) ([a-zA-Z]+) ([a-zA-Z]+)/\3 \2 \1/' zen.txt | head #For GNU (linux) use -r

of Zen The Python, by Tim Peters

better is Beautiful than ugly.
better is Explicit than implicit.
better is Simple than complex.
better is Complex than complicated.
better is Flat than nested.
better is Sparse than dense.
Readability counts.
aren cases Special't special enough to break the rules.


## Specifying occurrences



In [245]:
# by default sed will match the first pattern on each line
sed 's/[^ ]*/(&)/' zen.txt | head

(The) Zen of Python, by Tim Peters
()
(Beautiful) is better than ugly.
(Explicit) is better than implicit.
(Simple) is better than complex.
(Complex) is better than complicated.
(Flat) is better than nested.
(Sparse) is better than dense.
(Readability) counts.
(Special) cases aren't special enough to break the rules.


In [244]:
# The g (global) flag specifies to match ALL the patterns
sed 's/[^ ][^ ]*/(&)/g' zen.txt | head

(The) (Zen) (of) (Python,) (by) (Tim) (Peters)

(Beautiful) (is) (better) (than) (ugly.)
(Explicit) (is) (better) (than) (implicit.)
(Simple) (is) (better) (than) (complex.)
(Complex) (is) (better) (than) (complicated.)
(Flat) (is) (better) (than) (nested.)
(Sparse) (is) (better) (than) (dense.)
(Readability) (counts.)
(Special) (cases) (aren't) (special) (enough) (to) (break) (the) (rules.)


In [261]:
# We can specify the occurrence to affect using /{NUMBER}
sed 's/[a-zA-Z]* /"SECOND" /2' zen.txt | head

The "SECOND" of Python, by Tim Peters

Beautiful "SECOND" better than ugly.
Explicit "SECOND" better than implicit.
Simple "SECOND" better than complex.
Complex "SECOND" better than complicated.
Flat "SECOND" better than nested.
Sparse "SECOND" better than dense.
Readability counts.
Special "SECOND" aren't special enough to break the rules.


In [253]:
sed 's/[a-zA-Z]* /DELETED /2g' zen.txt

sed: 1: "s/[a-zA-Z]* /DELETED /2g": more than one number or 'g' in substitute flags


### Write to file

using the flag `/w` we can instruct sed to write a new file with the output of the search and match.

In [264]:
## Get lines whose first word starts with a vowel:
sed -n 's/^[AEIOU]*[a-zA-Z] /&/p' zen.txt

In the face of ambiguity, refuse the temptation to guess.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.


In [270]:
# The file doesn:t exist 
ls vowel.txt

ls: vowel.txt: No such file or directory


In [271]:
sed -n 's/^[AEIOUaeiou]*[a-zA-Z] /&/w vowel.txt' <zen.txt 



In [272]:
head vowel.txt

In the face of ambiguity, refuse the temptation to guess.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.


### Case sensitive/insensitive

SED will match __exactly__ what we tell it to search: `[a-z]` is different from `[A-Z]`; to have both we need `[a-zA-Z]`.

Let's be lazy:

use the I flag to make the search case insensitive.

In [15]:
# Note, this won't work on OSX sed, use GNU-sed:
echo -e "sed:\n"
sed -n '/^[aeiou]/I p' zen.txt
echo -e "\n---\ngnu-sed:\n"
gsed -n '/^[aeiou]/I p' zen.txt

sed:

sed: 1: "/^[aeiou]/I p": invalid command code I

---
gnu-sed:

Explicit is better than implicit.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
Although that way may not be obvious at first unless you're Dutch.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.


### Combining flags

We can combine different flags (when it makes sense):

For example substitute only the second ocurrence of a word, print only matched lines and to file:


In [285]:
ls MatchModifyPrint.txt
##
echo "--"
sed -n 's/a/A/2pw MatchModifyPrint.txt' zen.txt
echo "--"
##
ls MatchModifyPrint.txt

MatchModifyPrint.txt
--
Beautiful is better thAn ugly.
Complex is better than complicAted.
Flat is better thAn nested.
Sparse is better thAn dense.
ReadAbility counts.
Special cAses aren't special enough to break the rules.
Although practicAlity beats purity.
In the face of Ambiguity, refuse the temptation to guess.
There should be one-- and preferAbly only one --obvious way to do it.
Although that wAy may not be obvious at first unless you're Dutch.
If the implementation is hArd to explain, it's a bad idea.
If the implementation is eAsy to explain, it may be a good idea.
NamespAces are one honking great idea -- let's do more of those!
--
MatchModifyPrint.txt


### Multiple arguments

We can stitch multiple patterns with the `-e` (``--expression``) flag:

In [289]:
sed -e 's/a/Y/' -e 's/b/X/' zen.txt

The Zen of Python, Xy Tim Peters

BeYutiful is Xetter than ugly.
Explicit is Xetter thYn implicit.
Simple is Xetter thYn complex.
Complex is Xetter thYn complicated.
FlYt is Xetter than nested.
SpYrse is Xetter than dense.
ReYdaXility counts.
SpeciYl cases aren't special enough to Xreak the rules.
Although prYcticality Xeats purity.
Errors should never pYss silently.
Unless explicitly silenced.
In the fYce of amXiguity, refuse the temptation to guess.
There should Xe one-- Ynd preferably only one --obvious way to do it.
Although thYt way may not Xe obvious at first unless you're Dutch.
Now is Xetter thYn never.
Although never is often Xetter thYn *right* now.
If the implementYtion is hard to explain, it's a Xad idea.
If the implementYtion is easy to explain, it may Xe a good idea.
NYmespaces are one honking great idea -- let's do more of those!


In [288]:
## Or make it elegant separating arguments with semicolons (;)
sed -e 's/a/Y/;s/b/X/' zen.txt

The Zen of Python, Xy Tim Peters

BeYutiful is Xetter than ugly.
Explicit is Xetter thYn implicit.
Simple is Xetter thYn complex.
Complex is Xetter thYn complicated.
FlYt is Xetter than nested.
SpYrse is Xetter than dense.
ReYdaXility counts.
SpeciYl cases aren't special enough to Xreak the rules.
Although prYcticality Xeats purity.
Errors should never pYss silently.
Unless explicitly silenced.
In the fYce of amXiguity, refuse the temptation to guess.
There should Xe one-- Ynd preferably only one --obvious way to do it.
Although thYt way may not Xe obvious at first unless you're Dutch.
Now is Xetter thYn never.
Although never is often Xetter thYn *right* now.
If the implementYtion is hard to explain, it's a Xad idea.
If the implementYtion is easy to explain, it may Xe a good idea.
NYmespaces are one honking great idea -- let's do more of those!


In [290]:
# We can input multiple files at the same time
sed -e 's/a/Y/;s/b/X/' zen.txt copyZen.txt

The Zen of Python, Xy Tim Peters

BeYutiful is Xetter than ugly.
Explicit is Xetter thYn implicit.
Simple is Xetter thYn complex.
Complex is Xetter thYn complicated.
FlYt is Xetter than nested.
SpYrse is Xetter than dense.
ReYdaXility counts.
SpeciYl cases aren't special enough to Xreak the rules.
Although prYcticality Xeats purity.
Errors should never pYss silently.
Unless explicitly silenced.
In the fYce of amXiguity, refuse the temptation to guess.
There should Xe one-- Ynd preferably only one --obvious way to do it.
Although thYt way may not Xe obvious at first unless you're Dutch.
Now is Xetter thYn never.
Although never is often Xetter thYn *right* now.
If the implementYtion is hard to explain, it's a Xad idea.
If the implementYtion is easy to explain, it may Xe a good idea.
NYmespaces are one honking great idea -- let's do more of those!
The Zen of Python, Xy Tim Peters

BeYutiful is Xetter than ugly.
Explicit is Xetter thYn implicit.
Simple is Xetter th

### Ranges 

So far we've only seen how to match and substitute on a line-by-line basis. `sed` can also perform more complicated operations based on context, lile:

* Action by row index (line number).
* Action between row indices (range of lines).
* Lines matching a regular expression/pattern.
* Lines __up to__ a regular expression/pattern (ie from the start of file until the match).
* Lines __from__ a regular expression/pattern (ie from the match until the end of file).
* Lines between two matching patterns (a block of lines delimited by a regular expression).


In [333]:
## Delete the character word on line 3
sed '3 s/[A-Za-z0-9]//' zen.txt | head -n 5

The Zen of Python, by Tim Peters

eautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.


In [334]:
## Delete the first word on line 3
sed '3 s/[A-Za-z0-9]* //' zen.txt | head -n 5

The Zen of Python, by Tim Peters

is better than ugly.
Explicit is better than implicit.
Simple is better than complex.


In [346]:
## We can specify a regular expression to select lines that only match to that pattern then substitute
# The regular expression must be written between delimiters (in this case /regex/)
sed -n '/^[AEIOU]/ s/[A-Za-z0-9]*/(&)/p' zen.txt

# In principle the spaces aren't necessary but help to make the command more readable

(Explicit) is better than implicit.
(Although) practicality beats purity.
(Errors) should never pass silently.
(Unless) explicitly silenced.
(In) the face of ambiguity, refuse the temptation to guess.
(Although) that way may not be obvious at first unless you're Dutch.
(Although) never is often better than *right* now.
(If) the implementation is hard to explain, it's a bad idea.
(If) the implementation is easy to explain, it may be a good idea.


In [None]:
# Same output as above.
sed -n '/^[AEIOU]/s/[A-Za-z0-9]*/(&)/p' zen.txt

In [384]:
# noprint; from line 1 until it matches the pattern; substitute/[anything between A-Z capital]/add parentheses/print 
sed -n '1,/Flat/ s/^[A-Z]/(&)/p' zen.txt

(T)he Zen of Python, by Tim Peters
(B)eautiful is better than ugly.
(E)xplicit is better than implicit.
(S)imple is better than complex.
(C)omplex is better than complicated.
(F)lat is better than nested.


In [436]:
sed -n '1,/Flat/ s/^[A-Z]/(&)/p' zen.txt

(T)he Zen of Python, by Tim Peters
(B)eautiful is better than ugly.
(E)xplicit is better than implicit.
(S)imple is better than complex.
(C)omplex is better than complicated.
(F)lat is better than nested.


### Range of patterns

We can specify a range of patterns to perform an action.

In [617]:
# Between the first empty line to the first ocurrence where there is an "n.t" pattern
sed -n '/^$/,/n.t/ s/[A-Za-z0-9]*/(&)/p' zen.txt

()
(Beautiful) is better than ugly.
(Explicit) is better than implicit.
(Simple) is better than complex.
(Complex) is better than complicated.
(Flat) is better than nested.
(Sparse) is better than dense.
(Readability) counts.
(Special) cases aren't special enough to break the rules.


In [615]:
## Modify things between the patterns (except the patterns)
sed '/Zen/,/Simple/ {/Zen/n; /Simple/ !{s/[A-Za-z]/(&)/g;};};' zen.txt

The Zen of Python, by Tim Peters

(B)(e)(a)(u)(t)(i)(f)(u)(l) (i)(s) (b)(e)(t)(t)(e)(r) (t)(h)(a)(n) (u)(g)(l)(y).
(E)(x)(p)(l)(i)(c)(i)(t) (i)(s) (b)(e)(t)(t)(e)(r) (t)(h)(a)(n) (i)(m)(p)(l)(i)(c)(i)(t).
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


## append or change an entire line.


In [43]:
gsed '/is/ a\----This is a new line AFTER the matching pattern' zen.txt | head

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
----This is a new line AFTER the matching pattern
Explicit is better than implicit.
----This is a new line AFTER the matching pattern
Simple is better than complex.
----This is a new line AFTER the matching pattern
Complex is better than complicated.
----This is a new line AFTER the matching pattern


In [42]:
gsed '/is/ i\----This is a new line BEFORE the matching pattern' zen.txt | head

The Zen of Python, by Tim Peters

----This is a new line BEFORE the matching pattern
Beautiful is better than ugly.
----This is a new line BEFORE the matching pattern
Explicit is better than implicit.
----This is a new line BEFORE the matching pattern
Simple is better than complex.
----This is a new line BEFORE the matching pattern
Complex is better than complicated.


In [49]:
gsed '/^[AEIOU].*/ c\----This modifies the entire line with a matching pattern' zen.txt | head

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
----This modifies the entire line with a matching pattern
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.


In [58]:
## By default sed ignores tabs and whitespaces added except when they're under the control of c,i and a:
gsed '/^[AEIOU].*/ c\\tThis modifies the \t entire line with \t a matching pattern' zen.txt | head

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
	This modifies the 	 entire line with 	 a matching pattern
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.


In [66]:
# OSX sed works differently:
sed -e 's/^[^(AEIOU)].*/(\t&)/' zen.txt | head -n 8

(tThe Zen of Python, by Tim Peters)

(tBeautiful is better than ugly.)
Explicit is better than implicit.
(tSimple is better than complex.)
(tComplex is better than complicated.)
(tFlat is better than nested.)
(tSparse is better than dense.)


In [71]:
#  Modify lines that don't contain vowels. Add a tab
gsed -e 's/^[^(AEIOU)].*/(\t&)/' zen.txt | head -n 8

(	The Zen of Python, by Tim Peters)

(	Beautiful is better than ugly.)
Explicit is better than implicit.
(	Simple is better than complex.)
(	Complex is better than complicated.)
(	Flat is better than nested.)
(	Sparse is better than dense.)


In [86]:
## Add more than one line: use newline character \n
gsed '/Explicit/ i\----This is a new line BEFORE the matching pattern \n\t newline and tab ccccombo!' zen.txt | head

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
----This is a new line BEFORE the matching pattern 
	 newline and tab ccccombo!
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.


In [1]:
## Combine with ranges:
gsed '/Zen/,/Explicit/ a\**Line added after**' zen.txt | head


The Zen of Python, by Tim Peters
**Line added**

**Line added**
Beautiful is better than ugly.
**Line added**
Explicit is better than implicit.
**Line added**
Simple is better than complex.
Complex is better than complicated.


In [3]:
gsed '/Zen/,/Explicit/ i\**Line added before**' zen.txt | head

**Line added before**
The Zen of Python, by Tim Peters
**Line added before**

**Line added before**
Beautiful is better than ugly.
**Line added before**
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.


In [120]:
## The c (change) will change the whole block into one:
gsed '/Zen/,/Explicit/ c\**CENSORED**' zen.txt | head

**CENSORED**
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.


In [4]:
# We can negate the action pattern to change everything outside the range:
gsed '/Zen/,/Explicit/ !c\**CENSORED**' zen.txt | head

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
**CENSORED**
**CENSORED**
**CENSORED**
**CENSORED**
**CENSORED**
**CENSORED**


### Line Number

The `=` operator print the line number to the standard output. This means that we can't print both the line and the number at the same time :( .

In [5]:
gsed -n '/[AEIOU]/ =' zen.txt | head

4
11
12
13
14
16
18
19
20


In [27]:
# We can't mix the = with results from a pattern matching
gsed -n '/Zen/,/Explicit/ s/^./(&)/p ; =' zen.txt | head

(T)he Zen of Python, by Tim Peters
1
2
(B)eautiful is better than ugly.
3
(E)xplicit is better than implicit.
4
5
6
7


...or can we?

In [1]:
## If we pipe the output of the line number to the stdout and combine it with N (next line) while removing the /n between the two:
sed '=' text.txt | \
sed '{
	N
	s/\n/ /
}'

1 Consult Section 3.1 in the Owner and Operator Guide
2 Consult Section 3.1 in the Owner and Operator Guide
3 for a description of the tape drives
4 available on your system.


#### Transform upper/lower case

In [151]:
gsed 'y/abcdef/ABCDEF/' zen.txt | head

ThE ZEn oF Python, By Tim PEtErs

BEAutiFul is BEttEr thAn ugly.
ExpliCit is BEttEr thAn impliCit.
SimplE is BEttEr thAn ComplEx.
ComplEx is BEttEr thAn CompliCAtED.
FlAt is BEttEr thAn nEstED.
SpArsE is BEttEr thAn DEnsE.
READABility Counts.
SpECiAl CAsEs ArEn't spECiAl Enough to BrEAk thE rulEs.


In [157]:
# On a single line
gsed '3 y/abcdef/ABCDEF/' zen.txt | head -4

The Zen of Python, by Tim Peters

BEAutiFul is BEttEr thAn ugly.
Explicit is better than implicit.


In [155]:
# On a single pattern
gsed '/Beautiful/ y/abcdef/ABCDEF/' zen.txt | head -4

The Zen of Python, by Tim Peters

BEAutiFul is BEttEr thAn ugly.
Explicit is better than implicit.


In [153]:
# Within a range of patterns
gsed '/BeautiFul/,/Explicit/ y/abcdef/ABCDEF/' zen.txt | head

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.


## Multiple Lines

https://stackoverflow.com/questions/25946273/sed-next-next-command

In [252]:
head text.txt

Consult Section 3.1 in the Owner and Operator Guide
Consult Section 3.1 in the Owner and Operator Guide
for a description of the tape drives
available on your system.


In [250]:
sed '/Operator/{
 n
 s/Owner and Operator Guide/Installation Guide/
 }' text.txt


Consult Section 3.1 in the Owner and Operator Guide
Consult Section 3.1 in the Installation Guide
for a description of the tape drives
available on your system.


In [251]:
sed '/Operator/{
 N
 s/Owner and Operator Guide/Installation Guide/
 }' text.txt


Consult Section 3.1 in the Installation Guide
Consult Section 3.1 in the Owner and Operator Guide
for a description of the tape drives
available on your system.


In [268]:
## Match a pattern. Read the N(ext) line, modify current line and do something to it.
gsed '
/Explicit/ {
	N
	s/\n/ -Pasted- /
}' zen.txt | head 

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit. -Pasted- Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.


In [324]:
## Look for two consecutive lines having matching two patterns (one each):
sed -n '
/Beautiful/ {
# found "ONE" - read in next line
	N
# look for "TWO" on the second line
# and print if there.
	/\n.*implicit/ p
}' zen.txt

Beautiful is better than ugly.
Explicit is better than implicit.


In [284]:
## Look for two consecutive lines having matching two patterns (one each):
# If found, delete everything between them
gsed '
/Beautiful/ {
	N
	/\n.*implicit/ {s/Beautiful.*\n.*implicit/One Two/}
}' zen.txt | head

The Zen of Python, by Tim Peters

One Two.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.


In [313]:
## Look for a match, delete it along with next 2 lines:
gsed '
/Beautiful/ {
	N
    N
    # As much Ns as deleted lines wanted
	s/^.*\n.*// 
}' zen.txt | head
echo -e "\n--\n"
head zen.txt

The Zen of Python, by Tim Peters


Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.

--

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.


In [338]:
## Look for a pattern that spans the end of one line and the start of the next one. If found, remove the next line (\n)
# and concatenate both lines together.
gsed '
/drives/ {
# append a line
	N
	s/drives available/available/
	s/\n/ /
}' text.txt
echo -e "\n- Original: -\n"
cat text.txt

Consult Section 3.1 in the Owner and Operator Guide
Consult Section 3.1 in the Owner and Operator Guide
for a description of the tape drives available on your system.

- Original: -

Consult Section 3.1 in the Owner and Operator Guide
Consult Section 3.1 in the Owner and Operator Guide
for a description of the tape drives
available on your system.


In [31]:
### Look for a pattern that spans two lines. If found, delete first line.
gsed '
/drives/ {
# append a line
	N
# if TWO found, delete the first line
	/\n.*available/ D
}' text.txt
echo -e "\n- Original: -\n"
cat text.txt

Consult Section 3.1 in the Owner and Operator Guide
Consult Section 3.1 in the Owner and Operator Guide
available on your system.

- Original: -

Consult Section 3.1 in the Owner and Operator Guide
Consult Section 3.1 in the Owner and Operator Guide
for a description of the tape drives
available on your system.


In [33]:
### Look for a pattern that spans two lines. If found, Print first line only.
gsed -n '
# if first pattern
/drives/ {
# append a line
	N
# if second pattern found, print the first line
	/\n.*available/ P
}' text.txt
echo -e "\n- Original: -\n"
cat text.txt

for a description of the tape drives

- Original: -

Consult Section 3.1 in the Owner and Operator Guide
Consult Section 3.1 in the Owner and Operator Guide
for a description of the tape drives
available on your system.


In [38]:
### Look for a pattern that spans two lines. If found, Print first line only.
gsed -n '
/drives/ { # if first pattern
	N # append a line
	/\n.*available/ P # if second pattern found, print the first line
}' text.txt
echo -e "\n- Original: -\n"
cat text.txt

for a description of the tape drives

- Original: -

Consult Section 3.1 in the Owner and Operator Guide
Consult Section 3.1 in the Owner and Operator Guide
for a description of the tape drives
available on your system.


In [48]:
### Look for a pattern that spans two lines. If found, Print first line only.
gsed -n '
/drives/ { # If first pattern matched
	N # append a line
    /drives.*available/{ # Look for a match between the first and second pattern, if fund
	s/drives.*available/drives available/ # Substitute everything between the patterns, print it and delete the first line
    P
    D }
}' text.txt
echo -e "\n- Original: -\n"
cat text.txt

for a description of the tape drives available on your system.

- Original: -

Consult Section 3.1 in the Owner and Operator Guide
Consult Section 3.1 in the Owner and Operator Guide
for a description of the tape drives
available on your system.


In [55]:
# Using variables
match1="Guide"
change="and"
final="or"
sed -n '/'$match1'/ s/'$change'/'$final'/p' text.txt

Consult Section 3.1 in the Owner or Operator Guide
Consult Section 3.1 in the Owner or Operator Guide


In [306]:
## Here we should be able to better understand how = combined with N can yield a line-numbered file
sed '=' text.txt | \
sed '{
	N
	s/\n/ /
}'

1 Consult Section 3.1 in the Owner and Operator Guide
2 Consult Section 3.1 in the Owner and Operator Guide
3 for a description of the tape drives
4 available on your system.


# `sed` IRL

Convert a multiline fasta into normal fasta format (without line breaks)

In [350]:
gsed ' />/ !{ :Flow N; />/ !{ s/\n// ;} ; tFlow ; P ; D ;} ' multiline.fa

>Gen001
ggatttagcgcttattgttgggccttttttttttttttgctctgatggtttgtcagaagattattcgttaatgaattatgcgtttgttattgctgttgtcttcattttgaatgttggctctgaatttgattgaagctg
>Gen002
agaatggttggtatttgttgtgtccttctgcctcaagaagcttcaatctcaagatccttttgttttattttctgcagattgctttgttgatggtttattaggtatttttatatttgcagcagtctttttagcattgcagat
>Gen003
agaatggttggtatttgttgtgtccttctgcctcaagaagcttcaatctcaagat
>Gen004a
>Gen004b
agaatggttggtatttgttgtgtccttctgcctcaagaagcttcaatctcaagat

In [351]:
gsed '
    />/ !{  #Check if line is a sequence header (If so, do not do the next) 
    :flow # Flow control
    N; # Check next line
        />/ !{ # Same, check if it is sequence header
        s/\n// ;}; # substitute the line break for nothing (ie, remove line break)
    tflow ; P ; D ;} # back to flow control, Print and Delete previous line
' multiline.fa

>Gen001
ggatttagcgcttattgttgggccttttttttttttttgctctgatggtttgtcagaagattattcgttaatgaattatgcgtttgttattgctgttgtcttcattttgaatgttggctctgaatttgattgaagctg
>Gen002
agaatggttggtatttgttgtgtccttctgcctcaagaagcttcaatctcaagatccttttgttttattttctgcagattgctttgttgatggtttattaggtatttttatatttgcagcagtctttttagcattgcagat
>Gen003
agaatggttggtatttgttgtgtccttctgcctcaagaagcttcaatctcaagat
>Gen004a
>Gen004b
agaatggttggtatttgttgtgtccttctgcctcaagaagcttcaatctcaagat

In [352]:
# Convert a fastq to fasta 
gsed '/+/,+1d' testQ32.fastq
# Look for a + (identifies the quality section). From there (,) to the next (+1), remove (d)

@SRR1463325.1 HS2:447:C2DFYACXX:5:1101:1336:2178 length=59
ATGTTAGTAACCGAACCTTCTTCAAAAAGGGCTAAGGGATAAGCTACATACGCAATAAA
@SRR1463325.2 HS2:447:C2DFYACXX:5:1101:1364:2181 length=59
ACGCATTTATTAGATAAAAGGTCGACGCGGGCTCTGCCCGTTGCTCTGATGATTCATGA
@SRR1463325.3 HS2:447:C2DFYACXX:5:1101:1499:2208 length=59
AGGACCTCTTTAGTATTTTTGTTGATGACCAAAGCACCAGCACCTACAACATGAGAAGC
@SRR1463325.4 HS2:447:C2DFYACXX:5:1101:1648:2157 length=59
NTGTAGAATCTATGTTGAATCACCATTTAGCAGGGCTACTAGGACTTGGGTCCCTTTCT
@SRR1463325.5 HS2:447:C2DFYACXX:5:1101:1776:2228 length=59
AGCCTCTTTCCGATCTTCTCAACTCCAAGGCTCTCAACGAACTTCCTCACTTCATCATC
@SRR1463325.6 HS2:447:C2DFYACXX:5:1101:1956:2235 length=59
AGAGTCAATAATTTTATATGAGGAACTACTGAACTCAATCACTTGCTGCCGTTACTCTT
@SRR1463325.7 HS2:447:C2DFYACXX:5:1101:2058:2150 length=59
NTGTTTGAGGGGGAGGTCATAAGCGTCTATACCGTAAAATAGATTTTCGACGAAATGCA
@SRR1463325.8 HS2:447:C2DFYACXX:5:1101:2251:2171 length=59
CTAAGGGTGGGTTGATAACCCACAGCAGAAGGCATTCTACCCAATAAGGCGGATACCTC


In [353]:
## Transcribe to RNA
gsed '
    />/ !{  #Check if line is a sequence header (If so, do not do the next) 
    :flow # Flow control
    y/t/U/; #C
    N; # Check next line
        />/ !{ # Same, check if it is sequence header
        s/\n//}; # substitute the line break for nothing (ie, remove line break)
    tflow ; P ; D ; } # back to flow control, Print and Delete previous line

' multiline.fa

>Gen001
ggaUUUagcgcUUaUUgUUgggccUUUUUUUUUUUUUUgcUcUgaUggUUUgUcagaagaUUaUUcgUUaaUgaaUUaUgcgUUUgUUaUUgcUgUUgUcUUcaUUUUgaaUgUUggcUcUgaaUUUgaUUgaagcUg
>Gen002
agaaUggUUggUaUUUgUUgUgUccUUcUgccUcaagaagcUUcaaUcUcaagaUccUUUUgUUUUaUUUUcUgcagaUUgcUUUgUUgaUggUUUaUUaggUaUUUUUaUaUUUgcagcagUcUUUUUagcaUUgcagaU
>Gen003
agaaUggUUggUaUUUgUUgUgUccUUcUgccUcaagaagcUUcaaUcUcaagaU
>Gen004a
>Gen004b
agaaUggUUggUaUUUgUUgUgUccUUcUgccUcaagaagcUUcaaUcUcaagaU

In [354]:
## Reverse complement
gsed '
    />/ !{  #Check if line is a sequence header (If so, do not do the next) 
    :flow # Flow control
    y/actg/TGAC/; #Convert to RNA
    # This reverses the line, taken from http://www.catonmat.net/blog/sed-one-liners-explained-part-one/
    /\n/ !G # If there is no end of line, append one.
       # This is a loop driven by the //D
       s/\(.\)\(.*\n\)/&\2\1/ #1. Saves the patterns to remember \(pat1\) and \(pat2\) #Remember?
       # Saves one character to pat1 and the rest until the line break to pat2. Then pastes everything inverted.
       # ACTG\n --> ACTG\nCTG\nA
       //D # 2. Deletes from the start of the line to the \n the existing pattern
       # So ACTG\nCTG\nA --> CTG\nA
       # It will repeat the pattern until \nGTCA:
            #ACTG\n
            #CTG\nA
            #TG\nCA
            #G\nTCA
            #\nGTCA
       s/.// # This one removes the first character in the reversed sequence (which would be a new line)
            #GTCA
    # Done reversing   
    N; # Check next line
        />/ !{ # Same, check if it is sequence header
        s/\n//}; # substitute the line break for nothing (ie, remove line break)
    tflow ; P ; D ; } # back to flow control, Print and Delete previous line
' multiline.fa

>Gen001
CAGCTTCAATCAAATTCAGAGCCAACATTCAAAATGAAGACAACAGATCAGAGCAAAAAAAAAAAAAAGGCCCAACAATAAGCGCTAAATCCCCAAACAGTCTTCTAATAAGCAATTACTTAATACGCAAACAATAAC
>Gen002
ATCTGCAATGCTAAAAAGACTGCTGCAAATATAAAAATACCTAATAAATTGAAGCTTCTTGAGGCAGAAGGACACAACAAATACCAACCATTCTGAGTTCTAGGAAAACAAAATAAAAGACGTCTAACGAAACAACTACCA
>Gen003
ATCTTGAGATTGAAGCTTCTTGAGGCATCTTACCAACCATAAACAACACAGGAAG
>Gen004a
>Gen004b
ATCTTGAGATTGAAGCTTCTTGAGGCAGAAGGACACAACAAATACCAACCATTCT


In [355]:
## Reverse complement a DNA multifasta oneliner
gsed '/>/ !{ :flow y/actg/TGAC/; /\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//; N; />/ !{ s/\n//}; t flow ; P ; D ; } ' multiline.fa

>Gen001
CAGCTTCAATCAAATTCAGAGCCAACATTCAAAATGAAGACAACAGATCAGAGCAAAAAAAAAAAAAAGGCCCAACAATAAGCGCTAAATCCCCAAACAGTCTTCTAATAAGCAATTACTTAATACGCAAACAATAAC
>Gen002
ATCTGCAATGCTAAAAAGACTGCTGCAAATATAAAAATACCTAATAAATTGAAGCTTCTTGAGGCAGAAGGACACAACAAATACCAACCATTCTGAGTTCTAGGAAAACAAAATAAAAGACGTCTAACGAAACAACTACCA
>Gen003
ATCTTGAGATTGAAGCTTCTTGAGGCATCTTACCAACCATAAACAACACAGGAAG
>Gen004a
>Gen004b
ATCTTGAGATTGAAGCTTCTTGAGGCAGAAGGACACAACAAATACCAACCATTCT


# Resources

* www.grymoire.com/Unix/sed.html
* https://www.gnu.org/software/sed/manual/sed.html
* https://www.ibm.com/developerworks/library/l-sed1/index.html
* https://www.gnu.org/software/sed/manual/html_node/Branching-and-flow-control.html