## Read and write sequence files using biopython

### Sequence objects

In [2]:
from Bio.Seq import Seq

A `Seq`instance is basically a string of letters with some additional functionality

In [5]:
s = Seq("GATTACA")
s

Seq('GATTACA')

In [8]:
s.complement()

Seq('CTAATGT')

In [6]:
s.reverse_complement()

Seq('TGTAATC')

In [9]:
s.transcribe()

Seq('GAUUACA')

In [7]:
s.translate()



Seq('DY')

In [11]:
s + s

Seq('GATTACAGATTACA')

In [12]:
str(s)

'GATTACA'

### Sequence annotation objects

Sequence annotation objects, `SeqRecord`, allow attaching further information to a `Seq` object.

They have (among others) the following attributes

- `id`
- `name`
- `description`

In [13]:
from Bio.SeqRecord import SeqRecord

In [15]:
sr = SeqRecord(s, id="test")
sr

SeqRecord(seq=Seq('GATTACA'), id='test', name='<unknown name>', description='<unknown description>', dbxrefs=[])

In [16]:
sr.seq

Seq('GATTACA')

In [17]:
sr.id

'test'

### Sequence input/output

How to read and write sequences to and from FASTA files

In [19]:
from Bio import SeqIO

To read sequences (as `SeqRecord`s) from a FASTA file, we can use the `SeqIO.parse(filaname, 'fasta')` function. It returns an iterator of `SeqRecord`s.

One can use an iterator directly in a `for` loop, or retrieve the next element using `next`. To convert the iterator to a list, one can use the `list`function.

To write a list of `SeqRecord` objects to a FASTA file, one can use the `SeqIO.write(records, filename, 'fasta')` function. Here `records` can be a list or a generator expression, which does not require all the records to be in memory at once.

In [21]:
seq1 = SeqRecord(Seq("ACTG"), id="seq1", description = "")
seq2 = SeqRecord(Seq("GATTACA"), id="seq2", description = "")
SeqIO.write([seq1, seq2], "seq.fasta", "fasta")

2

In [24]:
iter = SeqIO.parse("seq.fasta", "fasta")
list(iter)

[SeqRecord(seq=Seq('ACTG'), id='seq1', name='seq1', description='seq1', dbxrefs=[]),
 SeqRecord(seq=Seq('GATTACA'), id='seq2', name='seq2', description='seq2', dbxrefs=[])]

### Exercise

In notebook 02 you have created a function that returns the amino-acid sequence of a given chain. Now we can write that sequence to a FASTA file.

Consider '9ds2' again.

- figure out the names of the heavy and the light chain (using ab_ag.tsv)
- copy the function from notebook 02 into a code cell below
- extract the amino-acid sequence for those two chains up to residue_numbers 109 for the light chain and 113 for the heavy chain
- write them into separate FASTA files 'VH.fa' and 'VL.fa'