# Session 6 - Random Mutations

## Learning Objectives

* Model mutation with random numbers by learning how to randomly select a nucleotide in DNA and then mutate it to some other (random) nucleotide.
* Use random numbers to generate DNA sequence data sets, which can be used to study the extent of randomness in actual genomes.
* Repeatedly mutate DNA to study the effect of mutations accumulating over time during evolution.

## 6.1 Random - generating pseudo-random numbers and patterns

### Simulation
Most of our labs in this course are driven by data, but simulation is one of the major tools of computational research. How do we know if our motif or any other pattern may have biological significance? The first step is to determine if it occurs more frequently than expected in a randomized set of sequences or other data. Simulation studies are already an important part of ecology, epidemiology and evolution where models are needed to predict trajectories of environmental change and where simulation studies are used to test theory that is not testable by current laboratory methods. Increasingly simulation is used in the study of basic cellular processes, like the cell cycle and protein export and it is an important tool in the growing field of Systems Biology in which computer models of cells and organisms are being developed to understand and predict drug response. In many of these models mutation is a fundamental component.

### Random Number Generators

A random number generator generates a sequence of numbers or symbols that lack any pattern, i.e. appear random.  However, the Python and most random number generators are not truly random but rely algorithms for generating a sequence of numbers that approximates the properties of random numbers.  Thus they are often called Pseuorandom number generators.  This is fine for us because we are not developing software for security purposes. 

In [1]:
#!/usr/bin/env python

# Example 6.1
# Name: random_exercise.py
# Description: A program to test randomization methods

# Import the python package random to your program.
import random

# Random - returns a floating point number in the range [0, 1) 
# (that is, between 0 and 1, including 0.0 but always smaller than 1.0).
a = random.random()
print ('return of random.random =', a)

# Randint - returns a random integer N such that a <= N <= b. 
b = random.randint(1,3)
print ('return of random.randint =', b)

# Randrange - returns a random integer N such that a <= N < b. 
c = random.randrange(1,3)
print ('return of random.randrange =', c)

# Randrange - returns a random integer N such that a <= N < b using a step 
# random.randrange(start, stop[, step])
d = random.randrange(0,10,5)
print ('return of random.randrange =', d)

# Choice - returns a random element from a list.
bases = ['A', 'C', 'G', 'T']
e = random.choice(bases)
print ('return of random.choice =', e)

# Shuffle - shuffles elements in a list.
bases = ['A', 'C', 'G', 'T']
random.shuffle(bases)
print ('return of random.shuffle =', bases)



return of random.random = 0.13329501915834008
return of random.randint = 3
return of random.randrange = 2
return of random.randrange = 5
return of random.choice = G
return of random.shuffle = ['G', 'C', 'A', 'T']


Rerun Example 6.1 5 times so that you understand the results.  There are also many other specialized generators in this module see http://docs.python.org/2/library/functions.html.   Below are some example programs that use randomization.

In [2]:
#!/usr/bin/env python

# Example 6.2
# Name: random_DNA_synthesis.py
# Description: A program that creates a random piece of DNA 
# between 10 and 100 nucleotides in length

import random

length = random.randint(10,100)
sequence = ''
bases = ['A', 'C', 'G', 'T']
for i in range(length):
    sequence += random.choice(bases)
print (sequence)

CTAGGCCTCCGGATATACAAGAAAAATCCGGCTTAGGG


In [3]:
#!/usr/bin/env python

# Example 6.3
# Name: string_to_list.py
# Description: An example where a string is turned into a list
# and one element is randomly selected

import random

# A string can be turned into a list with each nucleotide as an item
# in the list using the list function
DNA = list('ACGTACGTACGTACGTACGT')
base1 = random.choice(DNA)
print (base1)
base2 = random.choice(DNA)
print (base2)
base3 = random.choice(DNA)
print (base3)

T
G
C


In [4]:
#!/usr/bin/env python

# Example 6.4
# Name: mutate_DNA.py
# Description: Create a point mutation in a DNA sequence at a random site

import random

# A string can be turned into a list with each nucleotide as an item
# in the list using the list function

original_DNA = 'ACGTACGTACGTACGTACGT'
DNA = list(original_DNA)

# randomly choose the site to mutate

DNA_length = len(DNA)
mutation_site = random.randrange(0,DNA_length)

# remove the nucleotide 
original_base = DNA.pop(mutation_site)

# randomly choose the new nucleotide 
# (and make sure the original base is not a candidate)
bases = ['A', 'C', 'G', 'T']
bases.remove(original_base)
new_base = random.choice(bases)

# insert the new base at the site
DNA.insert(mutation_site, new_base)

# turn the DNA list back into a string
new_DNA = ''.join(DNA)

print (DNA_length, mutation_site, original_base, new_base)
print (original_DNA)
print (new_DNA)

20 10 G C
ACGTACGTACGTACGTACGT
ACGTACGTACCTACGTACGT


# Exercises

1. Write a program that randomly generates a nucleotide sequence with a length between 40 and 60.

2. Write a program that mutates the DNA sequence 10 times and prints the resulting sequence aligned with the orginal sequence and with the polymorphisms noted (as in Lab 4.)

3. Calculate the number of substitutions that accumulate after the above random sequence is mutated 10, 20, 30, 40 and 50 times.  Does the number of substitutions equal the number of mutations? Why or Why not?

* Next - <a href="http://nbviewer.ipython.org/github/jeffreyblanchard/EvoGenV5/blob/master/EvoGenV5_Lab7.ipynb">Session 7 : Dictionaries</a>
* Previous - <a href="http://nbviewer.ipython.org/github/jeffreyblanchard/EvoGenV5/blob/master/EvoGenV5_Lab5.ipynb">Session 5 : Lists</a>
