# BioInformatics exercises for python (Informatics I)
##### Organizing a script

## Introduction

Author: Jurre Hageman <br>
Date: 2017-10-10

This lesson is about organizing a script.
We will cover the organization of a script.

For this lesson we will write a simple program that can cut DNA by a restriction enzyme. Restriction enzymes are proteins that can cleave DNA at very specific sites. The restriction enzyme EcoRI cuts the sequence G|AATTC. It will search for the sequence GAATTC and cut between the G and the A.

Some examples of restriction enzymes and their recocnition sites:
<br>
<table style="width:50%">
  <tr>
    <th>Enzyme</th>
    <th>Sequence</th> 
  </tr>
  <tr>
    <td>EcoR1</td>
    <td>G|AATTC</td> 
  </tr>
  <tr>
    <td>BamHI</td>
    <td>G|GATTCC</td> 
  </tr>
  <tr>
    <td>HindIII</td>
    <td>A|AGCTT</td> 
  </tr>
</table>
<br>
It will be your task to write a program that looks if a dna fragment contains a restriction site. 
Also write a function that will return the fragment sizes. 
For instance:
The sequence CCCGAATTCTTA has an EcoR1 site.
The fragment lengths after cutting with EcoR1 are 4 and 8 bp.

Important note: for simplicity, we will not deal with multiple occurances of the same site.
The sequence CCCGAATTCTTAGAATTCGGA has two EcoR1 sites. In reality, it will cut the DNA in 3 fragments. We will only deal with the first occurence of the site.

Let's (again) first make a list of what our program should do:
- it should accept a dna sequence and a restriction site as command-line argument.
- it should contain a function has_restriction_site that returns a boolean.
- it should contain a function that returns a tuple with the fragment lengths
- write a main function that will accept the command-line arguments and calls the other two functions
- write a pretty_print function that nicely prints the results.

## Organization of a script

In [1]:
#!/usr/bin/env python3
#template for script:

#imports
import sys

#global variables


def function_1(params):
    #Describe what this function will do
    pass


def function_2(params):
    #Describe what this function will do
    pass


def pretty_print(params):
    #prints the results to the terminal
    pass


def main():
    #main function:
    #catch command line arguments
    args = sys.argv
    #check if file names are given
    if len(args) < 2:
        print("please provide a sequence")
        print("Program stopping...")
        sys.exit()
    input_sequence = args[1]
    
    #call functions
    function_1(args)
    function_2(args)
    pretty_print(args)
    return
    
#call the main function
main()

In [2]:
main()

def main():
    print("main running")

Running the above script would cause the error: 
NameError: name 'main' is not defined.
In Python, functions are not hoisted.
This means that at first, no code is run. The interpreter will first parse the syntax and than tie functions to function names.
Therefore you need to call the function AFTER you have declared it:

In [3]:
def main():
    print("main running")

main()

main running


Because the interpreter will first parse the syntax and next ties the function to function names, the code below is valid:

In [4]:
def function_1():
    print("running function_1")
    function_2()


def function_2():
    print("running function_2")


def main():
    print("main running")
    function_1()


main()

main running
running function_1
running function_2


Note that function2 is called in function1. Function2 is declared after function1. 
To simplify:
- Remember to call functions AFTER the function definition.
- Within functions, you can call any other function, even if the definition of that function is below the function call.

## Excercise: Generate a virtual DNA cutter.

Now we come to the final excersise: <br>
Let's recall what the program should do: <br>
- it should accept a dna sequence and a restriction site (ecor1, bamh1 or hind3) as command-line argument.
- it should contain a function has_restriction_site that returns a boolean.
- it should contain a function get_fragments that returns a tuple with the fragment lengths
- write a main function that will accept the command-line arguments and calls the other two functions
- write a pretty_print function that nicely prints the results.

Use the template below to write the script.

In [5]:
#!/usr/bin/env python3
#template for script:

#imports
import sys

#global variables


def has_restriction_site(params):
    #Describe what this function will do
    pass


def get_fragments(params):
    #Describe what this function will do
    pass


def pretty_print(params):
    #prints the results to the terminal
    pass


def main():
    #main function:
    #catch command line arguments
    args = sys.argv
    #check if file names are given
    if len(args) < 3:
        print("please provide a sequence followed by an enzyme (bamh1, ecor1, hind3)")
        print("Program stopping...")
        sys.exit()
    input_sequence = args[1]
    enzyme = args[2]
    
    #call functions
    site_present = has_restriction_site(args)
    fragments = None
    if site_present:
        fragments = get_fragments(args)
    pretty_print(args)
    return
    
#call the main function
main()

Example output:<br>
Sequence: CCCCGAATTCAGGAGAGAG <br>
Enzyme ECOR1 creates fragments of: <br>
5 bp <br>
14 bp <br>

Example output when no site was found: <br>
Sequence: CCCCGAATTCAGGAGAGAG <br>
No cut site found <br>

## Solutions

Needs to be updated!

<p><a href="L2_solutions/excercise01.py">excercise01.py</a></p>

