# Lab 6
## Importing and Creating Functions

Today we will learn about how to import and use functions in Python. A function is a block of code that performs a specific action, and can be referenced multiple times. Functions are important because they allow you to re‐use pieces of code with different inputs and parameters. Python has many built‐in and pre‐loaded functions (e.g. ```len()```, ```range()```, ```sum()```, ```min()```, and ```max()```); however, there are also many other functions that are stored separately in libraries, which are not automatically loaded. In addition, it is possible to write user‐defined functions that can be embedded within a script, or stored in separate user‐created libraries. You will learn how to import functions that are not automatically loaded by Python. You will also learn how to write your own functions and store and call them within a script.

You will follow along in a Jupyter notebook, and run the code as we go through it together.
Feel free to try to play around as we progress to get a better feel for how things work.

#### REMEMBER, THE INTERNET IS YOUR FRIEND: [Python.org Tutorials](http://docs.python.org/tutorial/)

#### Part A: Importing Functions

Python automatically loads some built‐in functions for users, e.g. ```len()```, ```range()```, ```sum()```, ```min()```, and ```max()```. However, there are also many other functions and objects that are stored in libraries that Python does not load automatically. These have many uses, such as interacting with the operating system or the user (e.g. ```sys```, ```os```); string operations (e.g. ```re```); advanced computation (e.g. ```math```, ```random```); and accessing new data types (e.g. ```datetime```, ```sets```). These are included in the default Python installation and do not have to be separately downloaded as packages and installed.

To access these functions and objects, we need to import the library. We use the ```import``` function to load new libraries that are not loaded by default. This is typically done immediately after ```#!/usr/bin/python```. The library can be renamed using ```import ... as ...```, and functions and objects in the library can be accessed using a dot (.). Let’s try importing and using some functions.

Follow the directions below and enter the missing information. When you are ready to run the code, hit "```CTRL+ENTER```" and jupyter will run your code for you:

In [6]:
#!/usr/bin/python

#importing libraries
import math
import random
import glob, sys #import multiple libraries on one line
import re as regex #import and rename a library

#example of built-in functions
#note that we did not import them!
number_list = [1,2,3,4,5]

#print the sum
print ('sum:', sum(number_list))
#get the length of the list with the len() functino
print ('len:', len(number_list))

#example of imported functions
#access objects and functions in a library with a dot (.)
five_cubed = math.pow(5,3)
print ('5 to the 3 power:', five_cubed)
#get the square root of the number 10 with the sqrt() function
square_root_10 = math.sqrt(10)
print ('Square root of 10:', round(square_root_10, 2))
#note that round is a built-in function!

sum: 15
len: 5
5 to the 3 power: 125.0
Square root of 10: 3.16


#### Part B: Using ```random```

Now that we’ve learned how to import and access specific functions, let’s work with functions in the ```random``` library. This library implements various functions for generating pseudo‐random numbers or sequences by uniformly selecting from a range of numbers or elements, by generating a random permutation of a list, or by random sampling items from a list without replacement.

In [7]:
#randomly generate a number between 0 and 1
print ( 'random number between 0 and 1: ', random.random())

#randomly generate an integer between (a) and (b)
#format: random.randint(a,b)
print( 'random integer between 1 and 100: ', random.randint(1, 100))

#randomly rolling a die
#format: randrange(start, stop, step)
#the value of STOP is +1 your desired stopping point
print ('random die roll: ', random.randrange(1,7))

random number between 0 and 1:  0.7223739809078673
random integer between 1 and 100:  40
random die roll:  3


__Note__: because the values are being generated randomly, it is unlikely that your output will look the same every time. Try running the code multiple times – you should see the values changing!

You can see now how ```random``` can be incredibly useful for making decisions, e.g. picking a number randomly for the lottery, rolling a die for a game, or choosing between "heads" or "tails" for a coin toss!

#### Part C: using ```random``` to create a random DNA sequence

Now that we’ve looked at some basic functions in ```random```, let’s use the function ```random.choice()``` to randomly choose an element from a list of elements. We can use this to build a DNA sequence by adding one random nucleotide at a time.

In [19]:
#randomly flipping a coin
#format: random.choice(list) - literally a list of choices!
print ('random coin toss: ', random.choice(['heads', 'tails']))

#use the example above to print a random nucleotide
print ('random nucleotide: ', )

#let's build a random DNA sequence of length 10
#initialize a variable to store your sequence
sequence = ''
seq2=''
#use range() and for to have 10 turns (length 10)
for i in range(10):
    #add a randomly generated nucleotide to your sequence
    sequence = sequence + random.choice(['A', 'T','G','C'])
    seq2=seq2+random.choice(['A', 'T','G','C'])
#print the random DNA sequence
print ('random 10-mer: ', sequence,seq2)

random coin toss:  tails
random nucleotide: 
random 10-mer:  GAGCACTTAT GGTTCGATAG


Try running your script multiple times!

#### Part D: creating your own functions

Now we know how to import libraries and functions, but what if there is no pre‐existing package that does what we want? We can write our own functions! Writing functions let’s you organize your code into discrete “blocks” which makes reading and running code easier. Also, if you have to do the same thing over and over again, with different inputs or parameters, it is useful to write a function. For example, if we want to generate 5 random DNA sequences, instead of copying and pasting our code 5 times, we could just write function and then call it 5 times.

Functions are defined with ```def```, followed by a name for the function (e.g. ```generate_random_DNA```), any parameters in parentheses, and finally a colon (:). All code associated with the function is indented below ```def```. Later, the function can be called by its name, plus any necessary parameters. Functions typically have some kind of output, which the user can access by using ```return```.

In [86]:
print ('Using our own functions:')

#create a function
#IMPORTANT: all code associated with the function is INDENTED!
#LENGTH is a parameter that the user will supply later in the code
def generate_random_DNA(length):
    #initialize a variable to store the sequence
    sequence = ''
    #make the number of turns the length of the sequence
    for i in range(length):
        #add a randomly generated nucleotide to the end
        sequence = sequence + random.choice(['A', 'T', 'C', 'G'])
    #return the result to the user
    return sequence

#now let's use our function!
#using a function is called CALLING the function
#we will call our function 5 times, to make 5 random sequences
for i in range(5):
    print ('random 10-mer: ', generate_random_DNA(10))

#we can also save the output of generate_random_DNA(10)
rand_DNA = generate_random_DNA(10)
#then we can change it using other functions, or print it
print ("Saved sequence: ", rand_DNA)
rand_DNA_last5 = rand_DNA[-5:]
print ("Last five bases in rand_DNA: ", rand_DNA_last5)

Using our own functions:
random 10-mer:  TAGTCTGTCG
random 10-mer:  ATCGAGGTGT
random 10-mer:  TGTCGAACGT
random 10-mer:  TCAGTTGCAT
random 10-mer:  GTAGTATATT
Saved sequence:  GGAGGAATCT
Last five bases in rand_DNA:  AATCT


# Lab Task

Using __Part C__ of this lab, you have already become familiar with using ```random``` to generate a random DNA sequence; in __Part D__, you learned how to incorporate this into your own function that can be called repeatedly. For today’s lab task, you’ll use these concepts to generate a biased DNA sequence, and then compare it to a randomly generated DNA sequence. You will make use of ```random```, write your own functions, and import pre‐existing functions, as well.

Here are some functions that you will need. Be sure to run this block of code before you start work on your lab task:

In [None]:
import math

#This function returns the average and standard deviation of a list of numbers
def avg_std(numbers):
    N = len(numbers)
    total = sum(numbers)
    avg = float(total) / N
    stdev = 0.0
    for x in numbers:
        stdev = stdev + (x - avg)**2
    stdev = math.sqrt(stdev / N)
    return "%.2f +/- %.2f" % (avg, stdev)

#This function returns a list of frequencies of Adenines in DNA sequences
def freq_A(sequence):
    num_A = sequence.count('A')
    len_seq = len(sequence)
    frequency = (float(num_A) / len_seq)
    return frequency

**OBJECTIVES:**

In the Python cell below:

1. Using ```random```, create your own function that generates random DNA sequences, exactly like we did in __Part D__ of the lab. The length of the sequence should be __variable__. __Hint__: don’t forget to ```import random```!
2. Write a new function, called ```generate_biased_DNA```, using ```random``` to generate a DNA sequence with twice as many As than C/T/G. The length of the sequence should be variable. __Hint__: There are many ways to do this. Ultimately your goal is to have a sequence that is approximately 50% As, and the other 50% is made equally of C/T/G. __IMPORTANT__: make sure your functions are defined before calling them. It is good practice to write functions at the start of your code, not in the middle.
3. Use a ```for``` loop to generate a random sequence of __length 20__, followed by a biased sequence of __length 20__. Make the ```for``` loop run for __10 turns__.
4. Now we will make use of the functions ```avg_std()``` and ```freq_A()``` defined above. Within the ```for``` loop that you wrote in step 3, use the function ```freq_A()``` to count the frequency of adenines (As) in the random and biased DNA sequences. During each iteration, save the output of ```freq_A()``` for each DNA type. __Hint__: Declare lists outside the ```for``` loop, then use ```.append()``` to add the numbers to the respective lists; make sure to have one list for random DNAs, and one list for biased DNAs.
5. Finally, use the function ```avg_std()``` to compute the average and standard deviation for each of your lists. The average frequency of adenines in the biased DNA set should be higher than in the random DNA set. Is this what you observe? Do you ever observe the numbers being similar enough that the biased DNA set looks random?
6. When your code is working properly, save a copy of this .ipynb file and submit it on Blackboard for credit. __Don't forget to write comments!__

In [1]:
import random
import math
def generate_random_DNA(length):
    #initialize a variable to store the sequence
    results=''
    #make the number of turns the length of the sequence
    for i in range(length):
        #add a randomly generated nucleotide to the end
        results= results+random.choice(['A', 'T', 'C', 'G'])
    #return the result to the user
    return results
    
def generate_biased_DNA(length):
 a=['A','T','C','G']
 results=random.choices(a,weights=[3,1,1,1], k=20) # weights to generate bias DNA
 return(results)

#This function returns a list of frequencies of Adenines in DNA sequences
def freq_A(sequence):
    num_A = sequence.count('A')
    len_seq = len(sequence)
    frequency = (float(num_A) / len_seq)
    return frequency
ra=[] # create empty list
bia=[] # create empty list
for i in range(10):  # print 10 times
 seq=generate_random_DNA(20)        # collection of random DNA frequency  
 ra.append(freq_A(seq))

 seq2=generate_biased_DNA(20)
 seq2=''.join(str(x) for x in seq2)
 bia.append(freq_A(seq2))           # collection of biased DNA frequency
 print(i,":",seq+"   "+seq2)

def avg_std(numbers):
    N = len(numbers)
    total = sum(numbers)
    avg = float(total) / N
    stdev = 0.0
    for x in numbers:
        stdev = stdev + (x - avg)**2
    stdev = math.sqrt(stdev / N)
    return "%.2f +/- %.2f" % (avg, stdev) 
print("avg and standard deviation of random",avg_std(ra))
print("avg and standard deviation of biased",avg_std(bia))


0 : AAAAGACACAGCATCCCTGA   CACGGAAAACAAACGCTCAA
1 : GCCGAAAGCAGTTGTGAGCG   TACGAAGAATTAATTCATAA
2 : ATGGTACACGTTCGTGGTTT   TCGCAAACAAACAGCGAACG
3 : AAGCGCCCGGCCGGAGGAAC   ACGGCACAGATTAAAGAAAA
4 : AAGGGGGATCCCGGGAGACA   AACCCATCACTGCGAAACAT
5 : ATCAGAACTCGAGGCCAAAA   AAATAGAGAACAATTGAAAT
6 : TGGCGCCATCCTGCACTTGC   AGCATCCAATTGTAAAACAG
7 : ACTGTTGCCTCGGAAGCCTT   GGATCAAAGATGAACCAAGT
8 : AACAAAGCTCGAGCTATCTA   GAGAAGATGAAGGCAATAGA
9 : GATAACTTATGGCCGTATCT   TCCTAAAAACTTAAAAATAA
avg and standard deviation of random 0.28 +/- 0.12
avg and standard deviation of biased 0.50 +/- 0.06
