# An introduction to solving biological problems with Python

## Session 2.2: Exercices


## Bio exercises

### 1) Translate RNA sequence to protein sequence

Write a function that translates a RNA sequence into a sequence of amino acids. The function should take 2 arguments, a RNA sequence and a dictionary that defines the standard genetic code.

For mapping codons to amino acids you can use the dictionary `standardGeneticCode` defined below. Notice that it only maps strings in upper case, so make sure that `codon` is in upper case before your look up. You can translate codon into an upper case with the `upper()` method: `codon = codon.upper()`

`standardGeneticCode = { 
          'UUU':'Phe', 'UUC':'Phe', 'UCU':'Ser', 'UCC':'Ser',
          'UAU':'Tyr', 'UAC':'Tyr', 'UGU':'Cys', 'UGC':'Cys',
          'UUA':'Leu', 'UCA':'Ser', 'UAA':None,  'UGA':None,
          'UUG':'Leu', 'UCG':'Ser', 'UAG':None,  'UGG':'Trp',
          'CUU':'Leu', 'CUC':'Leu', 'CCU':'Pro', 'CCC':'Pro',
          'CAU':'His', 'CAC':'His', 'CGU':'Arg', 'CGC':'Arg',
          'CUA':'Leu', 'CUG':'Leu', 'CCA':'Pro', 'CCG':'Pro',
          'CAA':'Gln', 'CAG':'Gln', 'CGA':'Arg', 'CGG':'Arg',
          'AUU':'Ile', 'AUC':'Ile', 'ACU':'Thr', 'ACC':'Thr',
          'AAU':'Asn', 'AAC':'Asn', 'AGU':'Ser', 'AGC':'Ser',
          'AUA':'Ile', 'ACA':'Thr', 'AAA':'Lys', 'AGA':'Arg',
          'AUG':'Met', 'ACG':'Thr', 'AAG':'Lys', 'AGG':'Arg',
          'GUU':'Val', 'GUC':'Val', 'GCU':'Ala', 'GCC':'Ala',
          'GAU':'Asp', 'GAC':'Asp', 'GGU':'Gly', 'GGC':'Gly',
          'GUA':'Val', 'GUG':'Val', 'GCA':'Ala', 'GCG':'Ala', 
          'GAA':'Glu', 'GAG':'Glu', 'GGA':'Gly', 'GGG':'Gly'}`

### 2) Sliding window analysis of GC content along chain

Write a function that calculates overlapping sliding windows along a DNA sequence, from start to end, and returns the GC content in each window. The function should take two arguments, the DNA sequence and the size of the sliding window.

You can use the `calculate_windows` function defined below:

In [None]:
def calculate_windows(seq, winSize): 
    """This function takes a given sequence (seq) and a sliding window size (WinSize)
    and returns all sub-sequences acording to the size of the sliding window.
    Notice that the sub-sequences are overlapping and their size is fixed according to winSize.
    """ 
    if winSize <= 0:
        raise Exception("Window size must be a positive integer")
    if winSize > len(seq): 
        raise Exception("Window size is larger than sequence length")
    result = []
    nrWindows = len(seq)-winSize+1
    for i in range(nrWindows):
        subSeq = seq[i:i+winSize]
        result.append(subSeq)
    return result

__For plotting:__

You can plot the GC content along the DNA sequence using the `matplotlib` plotting library:

In [None]:
%matplotlib inline

gcResults = [0.5, 0.5, 0.6, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.6, 0.5, 0.6, 0.6, 0.6, 0.6, 0.6, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.5, 0.6, 0.5, 0.6, 0.6, 0.6, 0.6, 0.5, 0.6, 0.6, 0.7, 0.7, 0.8, 0.8, 0.8, 0.7, 0.6, 0.7, 0.7, 0.7, 0.7]
from matplotlib import pyplot
pyplot.plot( gcResults )
pyplot.axis([0, 50, .45, .85])
pyplot.ylabel('%GC')
pyplot.title('GC plot')
pyplot.text(12, .7, "this is some text!")
pyplot.show()

### 3) Sliding window analysis of hydrophobicity

Similarly to the exercise above, write a function that calculates overlapping sliding windows along a protein sequence, from start to end, and returns the `sum` of the aminoacids hydrophobicity scale in each window. You can use the same `calculate_windows` function defined in the previous exercise.

The dictionary `gesScale` contains the hydrophobicity scale for each aminoacid:

`gesScale = {'F': -3.7, 'M': -3.4, 'I': -3.1,' L': -2.8, 'V': -2.6,
            'C': -2.0, 'W': -1.9, 'A': -1.6,' T': -1.2, 'G': -1.0,
            'S': -0.6, 'P': 0.2, 'Y': 0.7, 'H': 3.0, 'Q': 4.1,
            'N': 4.8, 'E': 8.2, 'K': 8.8, 'D': 9.2,' R': 12.3}`

__For plotting:__

You can plot the hydrophobicity scores using the `matplotlib` plotting library:

In [None]:
%matplotlib inline

scores = [11.499999999999998, 23.4, 7.400000000000001, 12.700000000000001, 16.700000000000003, 14.000000000000004, 12.400000000000002, 6.600000000000002, 18.2, 11.3, 7.400000000000001, 9.0, 9.0, 4.300000000000001, 15.899999999999999, 31.6, 31.599999999999998, 35.5, 39.6, 39.599999999999994, 42.3, 45.8, 53.39999999999999, 42.400000000000006, 45.400000000000006, 46.00000000000001, 44.10000000000001, 46.300000000000004, 39.4, 35.4, 27.199999999999996]
from matplotlib import pyplot
pyplot.plot( scores )
pyplot.grid(True)
pyplot.show()