# BioInformatics exercises for python (Informatics I)
##### I.   Manipulate strings
##### II.  Use list and dictionaries
##### III. Generate reverse complementary DNA


## Introduction

Author: Jurre Hageman
Date: 2017-09-15

For this lesson, we will write a small program that converts a DNA sequence to RNA, generates the complementary strand and the reverse complementary strand.

DNA is a double stranded helix and can be depicted as follows (strongly simplified):<br>
GACCATGGAC<br>
CTGGTACCTG<br>

Bioinformatitions often store only one strand in databases. This saves considerable space.
When one strand is given, the other strand can be generated as a T always pairs an A and a C always pairs a G.<br>
If we look at the strand: <br>
GACCATGGAC<br>
Than we can generate the following: <br>
reverse strand: CAGGTACCAG<br>
complementary strand: CTGGTACCTG<br>
reverse complementary strand: GTCCATGGTC <br>

Online tools excist to convert DNA such as [this tool](http://arep.med.harvard.edu/labgc/adnan/projects/Utilities/revcomp.html).

Your task is to write a similar DNA conversion tool. However, for simplicity, we will code it as a command line tool.


Let's first think of what the program should do:
- it should catch a DNA string
- it should reverse the string
- it should complement the string
- it should reverse-complement the string


Open IDLE3.
First generate a variable dna and assign it to the string "atcg"


In [1]:
dna = "atcg"
print(dna)

atcg


Let's first generate an upper case version of the string:

In [2]:
dna_caps = dna.upper()
print(dna_caps)

ATCG


We can reverse a sequence by slicing.
The format for slicing is: [x:y:z] where x represents the start index, y represents stop (not including this index) and y represents the stepsize. Try to invert the dna string using slicing and store the result in a new variable named dna_rev.

In order to generate a complementary sequence, we need a dictionary. Dictionaries store key:value pairs. Using a dictionary, you will be able to select the complementary base:

In [3]:
bases = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}

Try to output the base 'A' from the dictionary named bases using the correct key in IDLE.  

In order to generate the complementary DNA string, you will need to loop through the original DNA sequence. As for loops are not covered yet we will help you a bit more.
Have a close look at the following code:

In [4]:
dna = 'ATCG'
for base in dna:
    print(base)

A
T
C
G


As you can see, the for loop loops through the string. Each loop the placeholder 'base' will be overwritten with the value of the following base of bases. 

Now we will loop through the dna sequence and add each base to a new string. Before we do so, we will need te define an empty string first:

In [5]:
dna = 'ATCG'
new_dna = '' # generate empty string
for base in dna:
    new_dna += base #this is a shorthand of new_dna = new_dna + base
    print(new_dna)

A
AT
ATC
ATCG


As you can see, the print statement is within the for loop. You can see the string concatenation in process. The string will grow after each consecutive loop.
If we put the print statement after the loop we will see the end result:

In [6]:
dna = 'ATCG'
new_dna = ''
for base in dna:
    new_dna += base #this is a shorthand of new_dna = new_dna + base
print(new_dna)

ATCG


This seems to work fine but there is something very bad about concatenating strings in loops. Since strings are immutable, Python has to generate a new varable each loop. Therefore it is better to use a list to store the variables. List are mutable so only one object needs to be generated in memory. This is the equivalent code using a list to store the bases:

In [7]:
dna = 'ATCG'
new_dna = [] #generate empty list
for base in dna:
    new_dna.append(base) #this will add the base to the end of the list
print(new_dna)

['A', 'T', 'C', 'G']


To convert a list will can use the str.join() method. This method accepts a list as argument and "stringifies" the list.

In [8]:
dna = ['A', 'T', 'C', 'G']
dna_string = "".join(dna) #This line stringifies the list.
print(dna_string)

ATCG


## Excercise: DNA converter

Now we come to the final excersise:
Code a program that will catch a DNA sequence from the command line. Remember that you can catch arguments using the sys module and using the sys.argv property. This yields a list with the command line arguments:

In [9]:
import sys
args = sys.argv #this will provide you with a list of arguments. Use indexing to select the correct item.

Now write code to print the following to the screen:
- The original sequence in upper case
- The reverse string in upper case
- The complement string in upper case
- The reverse-complement string in upper case.

## Solutions

<p><a href="L4_solutions/dna_convert_solution.py">dna_converter_solution.py</a></p>

