# Lab 8 : Functions and Modules

## Learning Objectives

* Function 
* Modules

## 8.1 Functions

In this session we'll concentrate on our last fundamental programming concept for the course. To date, we've been writing all of our program logic in the main body of our scripts. And we've seen how built-in python functions like raw_input() are used to operate on variables and their values. In this session, we'll learn how to write functions of our own, how to properly document them for ourselves and other users.

If you properly leverage a well-designed function, writing the main logic of your programs becomes almost-too-easy. Instead of writing out meticulous logical statements and loops for every task, you just call forth your previously-crafted logic, which you've vested in well-made functions. These functions are called user-defined functions.

### Defining a Function

Here are simple rules to define a function in Python.

* Function blocks begin with the keyword def followed by the function name and parentheses ( ( ) ).
* Any input parameters or arguments should be placed within these parentheses. You can also define parameters inside these parentheses.
* The first statement of a function can be an optional statement - the documentation string of the function or docstring.
* The code block within every function starts with a colon (:) and is indented.
* The statement return [expression] exits a function, optionally passing back an expression to the caller. A return statement with no arguments is the same as return None.

### Syntax of a Function

The basic syntax is 

<pre>
def functionname( parameters ):
   "function_docstring"
   function_suite
   return [expression]
</pre>

In [1]:
#!/usr/bin/env python

# Example 8.1 
# Name: function_example.py
# Description: A program to demonstrate how a function works

def DNA_length(DNA):
    "A function that calculates the length of DNA"
    length = len(DNA)
    return length
 
# use the function
gene = 'ATGAGACGTAGTGCCAGTAGCGCGATGTAGCGATGACGCATGACGCG'
print(DNA_length(gene))

47


To define a function, you use the keyword def.  Then comes the function name, in this case DNA_length, with parentheses containing any input arguments the function might need. In this case the function DNA_length() a variable argument called DNA. After that, the function does its thing, executing the indented block of code immediately below. In this case, it calculates the length of DNA. The last thing that it does is return that length to the rest of the program.

Technically speaking, a function does not need to explicitly return something, although it's uncommon that you'll write any that don't. If you don't return something explicitly, Python will nevertheless return the special object None. None is logically false (for if statements), and printing None will result in nothing being printed (although None is not the empty string). It's easy to forget to return a value, so this is an easy first thing to check in case your functions don't work as expected.

Note that the variable names are different on the inside and the outside of the function: I give it gene, although it takes DNA, and it returns length which we printed. I did this on purpose, as I want to emphasize that the function only knows to expect something, which it internally refers to as DNA, and then to give something else back. In fact, there is some insulation against the outside world, as you can see in this example:

In [2]:
#!/usr/bin/env python

# Example 8.2 
# Name: function_example2.py
# Description: A program to demonstrate how a function works

def DNA_length(DNA):
    "A function that calculates the length of DNA"
    length = len(DNA)
    print ("DNA length inside the function is %s" % (length))
    return length
 
# use the function
gene = 'ATGAGACGTAGTGCCAGTAGCGCGATGTAGCGATGACGCATGACGCG'
print ("DNA length inside the function is %s" % (length))

NameError: name 'length' is not defined

### Scope of Variables

Variables created inside a function occupy their own namespace in memory distinct from variables outside of the function, and so reusing names between the two can be done without you having to keep track of it.   There are two basic scopes of variables in Python - Global variables and Local variables.   Variables that are defined inside a function body have a local scope, and those defined outside have a global scope. This means that local variables can be accessed only inside the function in which they are declared, whereas global variables can be accessed throughout the program body by all functions. In the above example we have not defined length outside of the function, so an error is given when we try to print it outside the function, but it prints just fine within the function.  This means you can use functions written by other people without having to keep track of what variables those functions are using internally.


In [3]:
#!/usr/bin/env python

# Example 8.3 
# Name: function_example3.py
# Description: A program to demonstrate how a function works

length = 10

def DNA_length(DNA):
    "A function that calculates the length of DNA"
    length = len(DNA)
    print("DNA length inside the function is %s" % (length))
    return length
 
# use the function
gene = 'ATGAGACGTAGTGCCAGTAGCGCGATGTAGCGATGACGCATGACGCG'
DNA_length(gene)

print('DNA length outside the function is %s' % (length))

DNA length inside the function is 47
DNA length outside the function is 10


However, if we did not specify a value for length in the function then the value would be the same as the global value

In [4]:
#!/usr/bin/env python

# Example 8.4 
# Name: function_example4.py
# Description: A program to demonstrate how a function works

length = 10

def DNA_length(DNA):
    "A function that calculates the length of DNA"
#    length = len(DNA)
    print("DNA length inside the function is %s" % (length))
    return length
 
# use the function
gene = 'ATGAGACGTAGTGCCAGTAGCGCGATGTAGCGATGACGCATGACGCG'
DNA_length(gene)

print ("DNA length outside the function is %s" % (length))

DNA length inside the function is 10
DNA length outside the function is 10


The order in which a function appears is important and must preceed the call of the function.  The following example puts out an error. 

In [5]:
#!/usr/bin/env python

# Example 8.5 
# Name: function_example5.py
# Description: A program to demonstrate how a function works

# use the function
gene = 'ATGAGACGTAGTGCCAGTAGCGCGATGTAGCGATGACGCATGACGCG'
DNA_length(gene)
print("DNA length is %s" % (length))

def DNA_length(DNA):
    "A function that calculates the length of DNA"
    length = len(DNA)
    return length
 


DNA length inside the function is 10
DNA length is 10


### Function Arguments

Last week when we entered multiple arguments when we ran a program.  Functions can also have multiple arguments and returns multiple 


In [6]:
#!/usr/bin/env python

# Example 8.6 
# Name: function_sum.py
# Description: A program that takes 2 inputs for a function

def sum(x,y):
    z = x + y
    return z

print(sum(2,5))
print(sum(7,10))

7
17


Required arguments are the arguments passed to a function in correct positional order. The number of arguments in the function call should match exactly with the function definition.  Thus the following example returns an error

In [7]:
#!/usr/bin/env python

# Example 8.7 
# Name: function_sum2.py
# Description: A program that takes 2 inputs for a function

def sum(x,y):
    z = x + y
    return z

print(sum(5))

TypeError: sum() missing 1 required positional argument: 'y'

The order is also very important in the below example the variables are called out of order.

In [8]:
#!/usr/bin/env python

# Example 8.8 
# Name: function_subtract.py
# Description: A program that takes 2 inputs for a function

def subtract(x,y):
    z = x - y
    return z

x = 5
y = 2
print(subtract(y,x))

-3


### How to pack and unpack multiple values from returns or multiple assignments

Just like with arguments from the comment line, multiple values returned from the function can be accessed as individually.

In [9]:
#!/usr/bin/env python

# Example 8.9 
# Name: function_protein_stats.py
# Description: A program with a function that returns 2 arguments

def protein_stats(protein):
    "A function to calculate protein statistics"
    protein_length =  len(protein)
    sum_pos = protein.count('R'+'H'+'K')
    sum_neg = protein.count('D'+'E')
    charge = sum_pos - sum_neg
    return (protein_length, charge)

gene = 'MKSLIQEKWNEILEFLKIEYNVTEVSYKTWLLPLKVYDVKDNVIKLSVDDTKIGANSLDFIKNKYSQFLK'
print(protein_stats(gene))

protein_results = protein_stats(gene)

print(protein_results)
print(protein_results[0])
print(protein_results[1])


(70, 0)
(70, 0)
70
0


We can also return values as a list with very similar results.  In the above example the results are returned as a tuple which is a list that can not be modified.  In the below example the values are returned as a list and we can use all of the list functions such as append and sort.

In [10]:
#!/usr/bin/env python

# Example 8.10 
# Name: function_protein_stats_list.py
# Description: A program with a function that returns 2 arguments as a list

def protein_stats(protein):
    "A function to calculate protein statistics"
    protein_length = len(protein)
    sum_pos = protein.count('R'+'H'+'K')
    sum_neg = protein.count('D'+'E')
    charge = sum_pos - sum_neg
    return [protein_length, charge] # The square brackets specify that we are returning a list

gene = 'MKSLIQEKWNEILEFLKIEYNVTEVSYKTWLLPLKVYDVKDNVIKLSVDDTKIGANSLDFIKNKYSQFLK'

protein_results = protein_stats(gene)

print(protein_results)
print(protein_results[0])
print(protein_results[1])


[70, 0]
70
0


Here is another example of returning a list 

In [11]:
#!/usr/bin/env python

# Example 8.11 
# Name: function_translate_DNA_3frames.py
# Description: A program with a function that translates DNA in the first 3 frames


def translate_3frames(DNA):
    "A function to translate DNA into a protein sequence in 3 frames"
    
    codon_table = {'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
    'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
    'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
    'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
    'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
    'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
    'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
    'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
    'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
    'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
    'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
    'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
    'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
    'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
    'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
    'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}
    
    #Initialize the protein list
    list_protein = []
    
    # starting coordinates for codon 1 in the first reading frame
    for x in range(3) :
   
        # Intitial the amino acid string
        protein = ''

        # Get the length of the DNA to use in the loop
        length = len(DNA)
    
        # A loop to get the codon and increment the DNA position by 3
        while x + 3 <= length :
            codon = DNA[x:x+3]
            AA = codon_table[codon]
            protein += AA
            x += 3
        list_protein.append(protein)
    return (list_protein)

gene = 'ATGAGTTGTAATGAGGCTGCCGTGATACGATTACGGCATCATTTAAAGGGCAGGAGGTAG'
print (translate_3frames(gene))

['MSCNEAAVIRLRHHLKGRR_', '_VVMRLP_YDYGII_RAGG', 'EL__GCRDTITASFKGQEV']


## 7.2 Modules

In all of the examples above, we defined our functions right above the code that we hoped to execute. If you have many functions, you can see how this would get messy in a hurry. Furthermore, part of the benefit of functions is that you can call them multiple times within a program to execute the same operations without tiresomely writing them all out again. But wouldn't it be nice to share functions across programs, too? For example, working with genomic data often means getting sequences out of FASTA files, and shuttling those sequences from program to program. Many of the programs we work with overlap to a significant degree, as they need to parse FASTA files, calculate evolutionary rates, and interface with our lab servers, for example -- all of which means that many of them share functions. And if the same function exists in two or more different programs, we hit the same problems that we hit before: complex debugging, decreased readability, and, of course, too much typing.

Modules solve these problems. In short, they're collections of functions and variables (and often objects, which we'll get to towards the end of the course) that are kept together in a single file that can be read and imported by any number of programs.  We have already started working with modules such are re and sys that come with many installations of python. We can also create our own modules.

Any file of python code with a .py extension can be imported as a module from your script. When you invoke an import operation from a program, all the statements in the imported module are executed immediately. You have to be careful though with your names.  For instance in Lab 4 we created a program called random.py and then imported random.  Our program loaded our own random module random.py rather than the module that came with Python.  

<pre>
# My simple math modules

def add(x,y):
    z = x + y
    return z

def subtract(x,y):
    z = x - y
    return z
</pre>

Save the above text as my_simple_math_module.py to use in the program below

In [12]:
#!/usr/bin/env python

# Example 8.12 
# Name: test_my_simple_math_module.py
# Description: A program that tests our modules

import my_simple_math_module

print (my_simple_math_module.add(10,5))

print (my_simple_math_module.subtract(10,5))


15
5


The syntax can be cleaned up somewhat by importing the specific function and thus dropping the name of the file

In [13]:
#!/usr/bin/env python

# Example 8.13 
# Name: test_my_simple_math_module2.py
# Description: A program that tests our modules

from my_simple_math_module import add 

print (add(10,5))


15


## Exercises

Use the following DNA in exercise 1-3.

ATGAGTTGTAATGAGGCTGCCGTGATACGATTACGGCATCATTTAAAGGGCAGGAGGTAG

1. Write a program with a function codon_number the divides the length of a gene by 3 and returns the value as an integer back to the main program for printing. Save this function as a module and call the function from the saved module.
2. Write a program with a function revcomp that takes as input a DNA sequence and returns the reverse complement (as DNA).
3. Write a program using functions that translates the above DNA sequence in all sequence reading frames stores the results in a dictionary and then prints the dictionary


* Next - <a href="http://nbviewer.ipython.org/github/jeffreyblanchard/EvoGenV5/blob/master/EvoGen5_Lab7.ipynb">Lab 7 : Dictionaries</a>
* Previous - <a href="http://nbviewer.ipython.org/github/jeffreyblanchard/EvoGenV5/blob/master/EvoGenV5_Lab9.ipynb">Lab 9 : Regular Expressions</a> 