First of all, it is assumed that you have already known some basic Python operations and data structures.

In [23]:
from Bio.Seq import Seq
from Bio import SeqIO

### Sequence

In [29]:
seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")

In [30]:
seq

Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC')

In [31]:
len(seq)

32

We can also turn the Seq object into a string:

In [59]:
str(seq)

'GATCGATGGGCCTATATAGGATCGAAAATCGC'

If we have a sequence that does not have a uniform letter case, we can use the following methods to change to either upper or lower case:

In [64]:
seq2 = Seq('actggATCC')
seq2.upper()

Seq('ACTGGATCC')

In [65]:
seq2.lower()

Seq('actggatcc')

### Sequence slicing

There are two ways to slice the sequences. The first way is using the usual Python slicing where you have to specify the start and end position of the elements:

In [63]:
seq[4:20]

Seq('GATGGGCCTATATAGG')

The second method is to specify the start codon position and then the number of steps/strides. In Python, we can denote the start position, end position and step size like so:


seq[start:end:step]

In [43]:
#Let's have a look at the sequence again
seq

Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC')

In this example, we start at position 0 and take every 3rd item from the sequence (take note of the double colon):

In [52]:
seq[::3]

Seq('GCTGTAGTAAG')

In another example, we start from position 2 and take a stride of 4:

In [51]:
seq[2::4]

Seq('TTCTGCAG')

Since we can treat the Seq object as strings, we can also reverse the sequence:

In [57]:
seq[::-1]

Seq('CGCTAAAAGCTAGGATATATCCGGGTAGCTAG')

### Example of parsing and reading sequences

In [27]:
data = open("../data/test-data.fasta")

In [28]:
records = SeqIO.parse(data, "fasta")
for record in records:
    print(f"Id: {record.id}")
    print(f"Name: {record.name}") 
    print(f"Description: {record.description}")
    print(f"Annotations: {record.annotations}")
    print(f"Sequence Data: {record.seq}")

Id: sp|P25730|FMS1_ECOLI
Name: sp|P25730|FMS1_ECOLI
Description: sp|P25730|FMS1_ECOLI CS1 fimbrial subunit A precursor (CS1 pilin)
Annotations: {}
Sequence Data: MKLKKTIGAMALATLFATMGASAVEKTISVTASVDPTVDLLQSDGSALPNSVALTYSPAVNNFEAHTINTVVHTNDSDKGVVVKLSADPVLSNVLNPTLQIPVSVNFAGKPLSTTGITIDSNDLNFASSGVNKVSSTQKLSIHADATRVTGGALTAGQYQGLVSIILTKSTTTTTTTKGT
Id: sp|P15488|FMS3_ECOLI
Name: sp|P15488|FMS3_ECOLI
Description: sp|P15488|FMS3_ECOLI CS3 fimbrial subunit A precursor (CS3 pilin)
Annotations: {}
Sequence Data: MLKIKYLLIGLSLSAMSSYSLAAAGPTLTKELALNVLSPAALDATWAPQDNLTLSNTGVSNTLVGVLTLSNTSIDTVSIASTNVSDTSKNGTVTFAHETNNSASFATTISTDNANITLDKNAGNTIVKTTNGSQLPTNLPLKFITTEGNEHLVSGNYRANITITSTIKGGGTKKGTTDKK
