## Notebook 2.2: Python functions

This notebook will correspond with chapter 4 in the official Python tutorial https://docs.python.org/3/tutorial/. 

### Learning objectives: 

By the end of this exercise you should:

1. Be able to write conditional Python clauses.
2. Become familiar with Python functions. 
3. Understand the use of tabs in structuring Python code.

### What are functions?
A function is used to perform a task based on a particular input. Functions are the bread and butter of any programming language. We have used many functions already that are builtin to the objects we have interacted with. For example, we saw that `string` objects have functions to capitalize letters, or add spacing, or query their length. Similarly, `list` objects have functions to search for elements in them, or to sort.

The next step in our journey is to begin writing our own functions. This is only an introduction, as we will continue over time to learn many new ways to write more advanced functions.  

### The basic structure of a function
In Python functions are defined using the keyword `def`. Optionally we can have the function return a result by ending it with the `return` operator. This is not required, but is usually desirable if we want to want to assign the result of the function to a variable 

In [None]:
## a simple function to add 100 to the input object
def myfunc(x):
    return x + 100

In [None]:
## let's run our function on an integer
myfunc(200)

### More structure: doc string
So the basic elements of a function include an input variable and a return variable. The next important thing is to add some documentation to our function. This is to explain what the function is for, and to let other users know how to use it. A documentation string, or docstring, should be entered as a string on the first line of the definition of a function.

In [None]:
def myfunc2(x):
    "This function adds 100 to an int or float and returns"
    return x + 100

In [None]:
myfunc2(300.3)

## Multiple inputs 
Of course we often want to write functions that take multiple inputs. This is easy. 

In [None]:
def sumfunc1(arg1, arg2):
    "returns the sum of two input args"
    return arg1 + arg2

In [None]:
sumfunc1(10, 20)

### Writing a useful function
Let's write a function that will calcuate the frequency of each base in a DNA string or genome. In addition to the docstring of a function, which is intended for the user to see, you can also still add comments to the function code to remind yourself what each element of the code is doing. You can find many comments describing the detailed action of the function below.

In [None]:
def base_frequency(string):
    "returns the frequency of A, C, G, and T as a list"
    
    # create an empty list to store results
    freqs = []
    
    # get the total length of the input string
    slen = len(string)
    
    # iterate over each letter in A,C,G,T
    for base in "ACGT":
        
        # count the letter's occurrence in the input string
        # divided by the total length of the input string
        frequency = string.count(base) / slen
        
        # store the measured frequency in the result list
        freqs.append(frequency)
        
    # return the result list
    return freqs

In [None]:
# test the function
base_frequency("ACACTGATCGACGAGCTAGCTAGCTAGCTGAC")

### Many ways to accomplish the same task
The task above can actually be accomplished in many possible ways. There is not only a single way to count the frequency of an element in a list. Among the many ways to accomplish a task some might be faster than others, but a good rule of thumb is to make your code as easily readable and comprehendable as possible. This is the best way to avoid mistakes. 

Below is alternative implementation of our `base_frequency()` function which I name `base_frequency2()`. It returns the same result though the code runs in a slightly different way. 

In [None]:
def base_frequency2(string):
    "returns the frequence of A,C,G and T in order"
    slen = len(string)
    freqA = string.count("A") / slen
    freqC = string.count("C") / slen
    freqG = string.count("G") / slen
    freqT = string.count("T") / slen
    return [freqA, freqC, freqG, freqT]

In [None]:
# test the function
base_frequency2("ACACTGATCGACGAGCTAGCTAGCTAGCTGAC")

### Reading and understanding functions

It can be a very useful exercise to look at code and functions that are written by others to try to learn common and useful techniques, and to try to understand what they are trying to accomplish and how they go about it. As an example, try to understand the function below and answer the questions following the demonstrated example of the function. 

In [None]:
def mystery_function(string):
    "no hint on this one"
    
    # code block 1
    ag = 0
    ct = 0
    
    # code block 2
    for element in string:
        if element in ["A", "G"]:
            ag += 1
        elif element in ["C", "T"]:
            ct += 1
            
    # code block 3
    freq_ag = ag / len(string)
    freq_ct = ct / len(string)
    
    return [freq_ag, freq_ct]

In [None]:
# test the function
mystery_function("ACACTGATCGACGAGCTAGCTAGCTAGCTGAC")

<div class="alert alert-success">
    <b>Question:</b> Describe the mystery_function() function above by annotating the comment strings in the code itself. For each comment line add a description of what you think that part of the code is doing. Finally, in the text cell below describe what you think the function does. Describe why this type of calculation is useful for a DNA sequence. If you are unsure why, try googling.
</div>

<div class="alert alert-warning">
    <h3>Response:</h3>

Write your answer using Markdown in the cell. 


</div>

### The standard library

You can optionally read chapter 6.2 if you wish, but otherwise we will just discuss it here because I think it covers a bit too much irrelevant details. This chapter introduces the *Python standard library*, and also what it means to import a library. The take home message is that there exists a large library of packages that are included in Python that can be accessed by *importing* them. We will learn about several common packages in the next few weeks. Let's learn about one of these package now by using it: the `random` library. 

Note: Usually, import statements should put at the top of a Python script or jupyter notebook. 

In [None]:
import random

The random module includes a large number of functions that can be viewed in the Python documentation, or, by simply exploring the module object interactively in your notebook using tab-completion. Try this out yourself. The example below using the function `randint` to randomly sample an integer. 

In [None]:
# draw a random number between 0 and 3
random.randint(0, 3)

Here are several more complex examples. Execute each one multiple times. Note that the results are different each time. This is because the random module is used to draw *random samples*. This is a very useful feature for scientific programming which we will use many times. 

In [None]:
# draw 10 random numbers between 0 and 3
[random.randint(0, 3) for i in range(10)]

In [None]:
# draw a random element from an iterable
random.choice("Columbia University")

In [None]:
# draw 10 random elements from an iterable
[random.choice("Columbia University") for i in range(10)]

<div class="alert alert-success">
    <b>Action:</b> Write a function that (1) uses the random package to randomly draw values to generate a random sequence of DNA (As, Cs, Gs, and Ts); (2) takes an argument specifying the length of the DNA sequence; and (3) returns the result as a string. (Note: you can store the result internally within your function as a list if you want, but it should be returned as a string. Feel free to google for help if you get stuck, but first try to solve this yourself based on what you have learned.
    
(4) Finally, demonstrate the function by generating a random 20 base pair long sequence of DNA. 
</div>