# Python for Biologists

Exercises for the book written by Dr. Martin Jones.

## Chapter 2: Printing and manipulating text

Some functions do manipulate the object without assignment and some don't.
Text functions like `.lower` and `.replace()` don't manipulate their arguments (here parent object) because **strings and integers are immutable** while **lists are mutable**.

A call to `my_list.reverse()` therefore reverses the list without re-assignment.

### Strings are immutable

In [1]:
mystring = 'ACTG'
mystring.lower()

'actg'

In [2]:
mystring

'ACTG'

In [3]:
mystring.replace('AC', 'TG')

'TGTG'

In [4]:
mystring

'ACTG'

### Exercises

Calculating AT content

In [5]:
seq = 'ACTGATCGAATTCACGTATATATATTTCATATATAGCTAGCTAGCTA'
(seq.count('A') + seq.count('T'))/len(seq)

0.7021276595744681

Complementing DNA

In [6]:
new_seq = ''
for i in seq:
    if i == 'A': new_seq += 'T'
    elif i == 'T': new_seq += 'A'
    elif i == 'G': new_seq += 'C'
    else: new_seq += 'G'
print(seq + '\n' + new_seq)

ACTGATCGAATTCACGTATATATATTTCATATATAGCTAGCTAGCTA
TGACTAGCTTAAGTGCATATATATAAAGTATATATCGATCGATCGAT


Restriction fragment length. Write a program that recognizes the _EcoR1_ restriction site `G*AATTC` and returns length of fragments.
Works only with one restriction site, not multiple.

In [7]:
def restrict(seq, site = 'GAATTC', cut = 1):
    # find start pos of seq and calculate fragment lengths
    cut_site = seq.find(site) + cut
    print('total length: ' + str(len(seq)))
    print('fragment length: {0}, {1}'.format(str(cut_site),str(len(seq)-cut_site)))

restrict(seq)

total length: 47
fragment length: 8, 39


Splicing out introns, part one.

Of the following sequence, remove one intron from nt 15 to 30 (including boundary nts). Note that python counts from zero and does not include final index. If we want to include nt 30 (which has internal index 29), need to index up till 30.

In [8]:
intron = seq[14:30]
exon = seq.split(intron)
print(''.join(exon))

ACTGATCGAATTCAATATAGCTAGCTAGCTA


Splicing out introns, part two.

Calculate percentage of DNA sequence that is coding (i.e. is exon(s))

In [9]:
len(''.join(exon))/len(seq)

0.6595744680851063

Splicing out introns, part three.

Print coding sequences (exons) in upper case and non-coding (introns) in lower case.

In [10]:
print(seq)
print(exon[0] + intron.lower() + exon[1])

ACTGATCGAATTCACGTATATATATTTCATATATAGCTAGCTAGCTA
ACTGATCGAATTCAcgtatatatatttcatATATAGCTAGCTAGCTA


## Chapter 3: Reading and writing files

### Exercises

Splitting genomic DNA. Write intron and exon sequences to two separate files.

In [11]:
with open('./intron.txt', 'x') as f:
    f.write(intron)

with open('./exon.txt', 'x') as f:
    f.write('\n'.join(exon))

Write a `.fasta` file with three arbitrary sequences.

In [12]:
with open('./example.fasta', 'x') as f:
    for i, j in enumerate(exon + [intron]):
        f.write('>sequence_' + str(i) + '\n')
        f.write(j + '\n')

Read the written files and print.

In [13]:
for fil in ['./intron.txt', './exon.txt', './example.fasta']:
    with open(fil, 'r') as f:
        print('FILE: {}'.format(fil))
        print(f.read(), '\n')

FILE: ./intron.txt
CGTATATATATTTCAT 

FILE: ./exon.txt
ACTGATCGAATTCA
ATATAGCTAGCTAGCTA 

FILE: ./example.fasta
>sequence_0
ACTGATCGAATTCA
>sequence_1
ATATAGCTAGCTAGCTA
>sequence_2
CGTATATATATTTCAT
 



In [14]:
# remove test files again
import os
for f in ['./intron.txt', './exon.txt', './example.fasta']:
    os.remove(f)

## Chapter 4: Lists and Loops

### Exercises