# An introduction to solving biological problems with Python

## Session 2.2: Exercises and Modules

- [Excercises 2.2.1](#Excercises-2.1.1)
- [Excercises 2.2.2](#Excercises-2.2.2)
- [Excercises 2.2.3](#Excercises-2.2.3)
- [Modules](#Modules)
- [Excercises 2.2.4](#Excercises-2.2.4)

## Excercises 2.2.1

### Translate DNA sequence into protein sequence

Write a function that translates a DNA sequence into a protein, a sequence of amino acids. The function should take 2 arguments, a DNA sequence and a dictionary that defines the standard genetic code.

For mapping RNA codons to amino acids you can use the dictionary `standardGeneticCode` defined below. Notice that it only maps strings in upper case, so make sure that `codon` is in upper case before your look up. You can translate codon into an upper case with the `upper()` method on String. Notice also that it maps RNA codons and not DNA ones.

First, loop over the sequence to extract every three basees until the end or until a stop codon either by using a `for` loop or a `while` one. 

Then convert the DNA into an RNA sequence, by replacing all T bases by U. Make sure that the codon corresponds to an amino accid. Convert the RNA codon into an amino acid using the dictionary provided and return the protein sequence as a list of amino acids.

In [None]:
standardGeneticCode = { 
          'UUU':'Phe', 'UUC':'Phe', 'UCU':'Ser', 'UCC':'Ser',
          'UAU':'Tyr', 'UAC':'Tyr', 'UGU':'Cys', 'UGC':'Cys',
          'UUA':'Leu', 'UCA':'Ser', 'UAA': None, 'UGA': None,
          'UUG':'Leu', 'UCG':'Ser', 'UAG': None, 'UGG':'Trp',
          'CUU':'Leu', 'CUC':'Leu', 'CCU':'Pro', 'CCC':'Pro',
          'CAU':'His', 'CAC':'His', 'CGU':'Arg', 'CGC':'Arg',
          'CUA':'Leu', 'CUG':'Leu', 'CCA':'Pro', 'CCG':'Pro',
          'CAA':'Gln', 'CAG':'Gln', 'CGA':'Arg', 'CGG':'Arg',
          'AUU':'Ile', 'AUC':'Ile', 'ACU':'Thr', 'ACC':'Thr',
          'AAU':'Asn', 'AAC':'Asn', 'AGU':'Ser', 'AGC':'Ser',
          'AUA':'Ile', 'ACA':'Thr', 'AAA':'Lys', 'AGA':'Arg',
          'AUG':'Met', 'ACG':'Thr', 'AAG':'Lys', 'AGG':'Arg',
          'GUU':'Val', 'GUC':'Val', 'GCU':'Ala', 'GCC':'Ala',
          'GAU':'Asp', 'GAC':'Asp', 'GGU':'Gly', 'GGC':'Gly',
          'GUA':'Val', 'GUG':'Val', 'GCA':'Ala', 'GCG':'Ala', 
          'GAA':'Glu', 'GAG':'Glu', 'GGA':'Gly', 'GGG':'Gly'}

## Excercises 2.2.2

### Calculate the GC content of a DNA sequence

Write a function that calculates the GC content of a DNA sequence by re-using the code written for the [Exercises 1.4.2](Introduction_to_python_day_1_session_4.ipynb#Exercises-1.4.2) yesterday.

## Excercises 2.2.3

### Extract the list of all overlaping sub-sequences
Write a function that extracts a list of overlapping sub-sequences for a given window size from a given sequence. Do not forget to test it on a given DNA sequence.

## Modules

So far we have been writing Python code in files as executable scripts without knowning that they are also modules from which we are able to call the different functions defined in them.

A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended. Create a file called `my_first_module.py` in the current directory with the following contents:

In [None]:
def say_hello(user):
    print('hello', user, '!')

Now enter the Python interpreter from the directory you've created `my_first_module.py` file and import the `say_hello` function from this module with the following command:

```bash
python3
Python 3.5.2 (default, Jun 30 2016, 18:10:25) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from my_first_module import say_hello
>>> say_hello('Anne')
hello Anne !
>>> 
```

There is one module already stored in the course directory called `my_first_module.py`, if you wish to import it into this notebook, below is what you need to do. If you wish to edit this file and change the code or add another function, you will have to restart the notebook to have these changes taken into account using the restart the kernel button in the menu bar.

In [None]:
from my_first_module import say_hello
say_hello('Anne')

A module can contain executable statements as well as function definitions. These statements are intended to initialize the module. They are executed only the first time the module name is encountered in an import statement. 
They are also run if the file is executed as a script.

Do comment out these executable statements if you do not wish to have them executed when importing your module.

For more information about modules, https://docs.python.org/3/tutorial/modules.html.

## Excercises 2.2.4
### Calculate GC content along the DNA sequence
Combine the two methods written above to calculates the GC content of each overlapping sliding window along a DNA sequence from start to end. 

From the two files you wrote, import the methods written at exercices 2.2.2 and 2.2.3.
The new function should take two arguments, the DNA sequence and the size of the sliding window, and re-use the previous methods written to calculate the GC content of a DNA sequence and to extract the list of all overlapping sub-sequences. It returns a list of GC% along the DNA sequence.

## Next session

Go to our next notebook: [Introduction_to_python_day_2_session_3](Introduction_to_python_day_2_session_3.ipynb)