<a href="https://colab.research.google.com/github/mariyagolchin/Biopython/blob/main/Biopython2_Chapter_3_Sequence_objects_3_1___3_6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec17

Chapter 3  Sequence objects:

3.1  Sequences act like strings

3.2  Slicing a sequence

3.3  Turning Seq objects into strings

3.4  Concatenating or adding sequences

3.5  Changing case

3.6  Nucleotide sequences and (reverse) complements




In [2]:
try:
    import google.colab
    # Running on Google Colab, so install Biopython first
    !pip install biopython
except ImportError:
    pass

Collecting biopython
[?25l  Downloading https://files.pythonhosted.org/packages/76/02/8b606c4aa92ff61b5eda71d23b499ab1de57d5e818be33f77b01a6f435a8/biopython-1.78-cp36-cp36m-manylinux1_x86_64.whl (2.3MB)
[K     |████████████████████████████████| 2.3MB 5.8MB/s 
Installing collected packages: biopython
Successfully installed biopython-1.78


# 3.1 Sequences act like strings

In [3]:
# Sequences act like strings
from Bio.Seq import Seq
my_seq = Seq("GATCG")
for index, letter in enumerate(my_seq):
  print("%i %s" % (index, letter))

0 G
1 A
2 T
3 C
4 G


In [4]:
len(my_seq)

5

In [5]:
# You can access elements of the sequence in the same way as for strings (but remember, Python counts from zero!):
my_seq[0]

'G'

In [7]:
# The Seq object has a .count() method, just like a string
"AATTGTCA".count('AT')

1

In [12]:
# For some biological uses, you may actually want an overlapping count
#  When searching for single letters, this makes no difference:

# calculate a GC%

my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")
len(my_seq)
# print(my_seq.count("G"))
GC_percent = 100 * float(my_seq.count("G") + my_seq.count("C")) / len(my_seq)
print(GC_percent)

46.875


In [11]:
# Bio.SeqUtils module has several GC functions already built. For example:

from Bio.Seq import Seq
from Bio.SeqUtils import GC
my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")
GC(my_seq)

46.875

# 3.2  Slicing a sequence

In [22]:
my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")
# my_seq[-1]
my_seq[:5]


Seq('GATCG')

In [23]:
# Also like a Python string, you can do slices with a start, 
# stop and stride (the step size, which defaults to one).
#  For example, we can get the first, second and third codon positions of this DNA sequence:

my_seq[0::3]


Seq('GCTGTAGTAAG')

In [25]:
my_seq[1::3]

Seq('AGGCATGCATC')

In [24]:
my_seq[2::3]

Seq('TAGCTAAGAC')

In [26]:
# Another stride trick you might have seen with a Python string is the use of a -1 stride to reverse the string.
#  You can do this with a Seq object too:

my_seq[::-1]


Seq('CGCTAAAAGCTAGGATATATCCGGGTAGCTAG')

# 3.3  Turning Seq objects into strings

In [28]:
print(my_seq)
str(my_seq)

GATCGATGGGCCTATATAGGATCGAAAATCGC


'GATCGATGGGCCTATATAGGATCGAAAATCGC'

In [32]:
fasta_format_string = ">Name\n%s\n" % my_seq
print(fasta_format_string)
# f= "mariya %s is good girl" % my_seq
print(f)

>Name
GATCGATGGGCCTATATAGGATCGAAAATCGC

mariya GATCGATGGGCCTATATAGGATCGAAAATCGC is good girl


# 3.4  Concatenating or adding sequences

In [33]:
protein_seq = Seq("EVRNAK")
dna_seq = Seq("ACGT")
protein_seq + dna_seq

Seq('EVRNAKACGT')

In [36]:
list_of_seqs = [Seq("ACGT"), Seq("AACC"), Seq("GGTT")]
concatenated = Seq("")
for i in list_of_seqs:
  concatenated += i
print(concatenated)

ACGTAACCGGTT


In [40]:
# Biopython Seq also has a .join method:

list_of_seqs = [Seq("ACGT"), Seq("AACC"), Seq("GGTT")]
s=Seq("")
s.join(list_of_seqs)

Seq('ACGTAACCGGTT')

In [41]:
list_of_seqs = [Seq("ACGT"), Seq("AACC"), Seq("GGTT")]
spacer = Seq("N"*10)
spacer.join(list_of_seqs)

Seq('ACGTNNNNNNNNNNAACCNNNNNNNNNNGGTT')

# 3.5  Changing case

In [43]:
from Bio.Seq import Seq
dna_seq = Seq("acgtACGT")
print(dna_seq.upper())
print(dna_seq.lower())

ACGTACGT
acgtacgt


In [44]:
# These are useful for doing case insensitive matching:
"GTAC" in dna_seq

False

In [46]:
"GTAC" in dna_seq.upper()

True

# 3.6  Nucleotide sequences and (reverse) complements

In [47]:
# For nucleotide sequences, you can easily obtain the complement or reverse complement of a Seq object using its built-in methods:
from Bio.Seq import Seq
my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")
my_seq.complement()


Seq('CTAGCTACCCGGATATATCCTAGCTTTTAGCG')

In [48]:
my_seq.reverse_complement()

Seq('GCGATTTTCGATCCTATATAGGCCCATCGATC')

In [50]:
# As mentioned earlier, an easy way to just reverse a Seq object (or a Python string) is slice it with -1 step:
my_seq[::-1]

Seq('CGCTAAAAGCTAGGATATATCCGGGTAGCTAG')