# Day 2 notebook

The objectives of this notebook are to practice
* the biological concepts of 
  * the translation of RNA to protein
  * the consequences of mutations in protein-coding DNA
  * gene regulatory networks
* the Python concepts of 
  * creating and using Python modules
  * reading input from files
  * calling functions on lists of arguments using unpacking

## Translation of RNA to protein

### PROBLEM 1 - RNA fragment translation (1 POINT)

Continuing our work on writing functions that work with DNA, RNA, and protein sequences, let us write a function that translates all codons of a string representing an RNA fragment, where we again assume that the first three bases of the RNA correspond to a codon and that the length of RNA is a multiple of three. At this point, we will not require that the first codon is the start codon and that the last codon is a stop codon.

You will likely want to copy some of your functions from the Day 1 notebook here to help with implementing this function.  If you wish, you may create a Python module with those functions (see the related Problem 8 below).  Also, as in the previous in-class activities, you will likely find the `join` method of the empty string convenient for concatenating all of the strings in a list together.  For example,

In [1]:
bases = ["A", "C", "G", "T"]
''.join(bases)

'ACGT'

In [2]:
###
### ""
genetic_code = {
 'AAA': 'K', 'AAC': 'N', 'AAG': 'K', 'AAU': 'N',
 'ACA': 'U', 'ACC': 'U', 'ACG': 'U', 'ACU': 'U',
 'AGA': 'R', 'AGC': 'S', 'AGG': 'R', 'AGU': 'S',
 'AUA': 'I', 'AUC': 'I', 'AUG': 'M', 'AUU': 'I',
 'CAA': 'Q', 'CAC': 'H', 'CAG': 'Q', 'CAU': 'H',
 'CCA': 'P', 'CCC': 'P', 'CCG': 'P', 'CCU': 'P',
 'CGA': 'R', 'CGC': 'R', 'CGG': 'R', 'CGU': 'R',
 'CUA': 'L', 'CUC': 'L', 'CUG': 'L', 'CUU': 'L',
 'GAA': 'E', 'GAC': 'D', 'GAG': 'E', 'GAU': 'D',
 'GCA': 'A', 'GCC': 'A', 'GCG': 'A', 'GCU': 'A',
 'GGA': 'G', 'GGC': 'G', 'GGG': 'G', 'GGU': 'G',
 'GUA': 'V', 'GUC': 'V', 'GUG': 'V', 'GUU': 'V',
 'UAA': '*', 'UAC': 'Y', 'UAG': '*', 'UAU': 'Y',
 'UCA': 'S', 'UCC': 'S', 'UCG': 'S', 'UCU': 'S',
 'UGA': '*', 'UGC': 'C', 'UGG': 'W', 'UGU': 'C',
 'UUA': 'L', 'UUC': 'F', 'UUG': 'L', 'UUU': 'F'}
###

def translate_rna_fragment(rna):
###
### YOUR CODE HERE
    squares = []
    for x in range(0, len(rna), 3):
        squares.append(rna[x:x+3])
    #print(squares)
    result = ""
    for x in range(0, len(squares)):
        #print(genetic_code[squares[x]])
        squares[x] = genetic_code[squares[x]]
    
    
    for x in squares:
        result += x
    #print(result)
    #print("-----")
    return result



In [3]:
# tests for translate_rna_fragment
assert translate_rna_fragment("UUUGCGACUUAU") == "FAUY", "Failed on input 'UUUGCGACUUAU'"
assert translate_rna_fragment("ACG") == "U", "Failed on input 'UGA'"
assert translate_rna_fragment("") == "", "Failed on the empty string"
print("SUCCESS: translate_rna_fragment passed all tests!")

SUCCESS: translate_rna_fragment passed all tests!


### PROBLEM 2 - RNA translation (1 POINT)

In this problem we step up the complexity and perform RNA translation of an entire RNA molecule.  Recall that RNA translation of a full RNA molecules involves:
1. Scanning for the first occurrence of a start codon ("AUG")
2. Translation of the start codon and successive codons in the same reading frame
3. Termination of translation when a stop codon (in the same reading frame as the prior codons) is reached

You may assume that the RNA molecule contains a start codon as well as a downstream, in-frame, stop codon.

In [4]:
genetic_code = {
 'AAA': 'K', 'AAC': 'N', 'AAG': 'K', 'AAU': 'N',
 'ACA': 'U', 'ACC': 'U', 'ACG': 'U', 'ACU': 'U',
 'AGA': 'R', 'AGC': 'S', 'AGG': 'R', 'AGU': 'S',
 'AUA': 'I', 'AUC': 'I', 'AUG': 'M', 'AUU': 'I',
 'CAA': 'Q', 'CAC': 'H', 'CAG': 'Q', 'CAU': 'H',
 'CCA': 'P', 'CCC': 'P', 'CCG': 'P', 'CCU': 'P',
 'CGA': 'R', 'CGC': 'R', 'CGG': 'R', 'CGU': 'R',
 'CUA': 'L', 'CUC': 'L', 'CUG': 'L', 'CUU': 'L',
 'GAA': 'E', 'GAC': 'D', 'GAG': 'E', 'GAU': 'D',
 'GCA': 'A', 'GCC': 'A', 'GCG': 'A', 'GCU': 'A',
 'GGA': 'G', 'GGC': 'G', 'GGG': 'G', 'GGU': 'G',
 'GUA': 'V', 'GUC': 'V', 'GUG': 'V', 'GUU': 'V',
 'UAA': '*', 'UAC': 'Y', 'UAG': '*', 'UAU': 'Y',
 'UCA': 'S', 'UCC': 'S', 'UCG': 'S', 'UCU': 'S',
 'UGA': '*', 'UGC': 'C', 'UGG': 'W', 'UGU': 'C',
 'UUA': 'L', 'UUC': 'F', 'UUG': 'L', 'UUU': 'F'}

def translate_rna(rna):
### YOUR CODE HERE
    #rna.find('AUG')
    for x in range(0, len(rna)):
        #string[start:end:step]
        if rna[0:3] == "AUG":
            #print("yes")
            break
        else:
            #print("no")
            rna = rna[1:len(rna)]
            continue
            
    squares = []
    for x in range(0, len(rna), 3):
        if rna[x:x+3] == "UAA" or rna[x:x+3] == "UGA" or rna[x:x+3] == "UAG":
            break
        squares.append(rna[x:x+3])
    print(squares)
    
    for x in range(0, len(squares)):
         #print(genetic_code[squares[x]])
         squares[x] = genetic_code[squares[x]]
    print(squares)
    result = ""
    for x in squares:
        result += x
    print(result)
    print("-----------")
    
    print(result)
    return result







In [5]:
# tests for translate_rna
assert translate_rna("AUGACUUAUUAG") == "MUY", "Failed on input 'AUGACUUAUUAG'"
assert translate_rna("AAUGACUUAUUAG") == "MUY", "Failed on input 'AAUGACUUAUUAG'"
assert translate_rna("AUGACUUAUUAGCU") == "MUY", "Failed on input 'AUGACUUAUUAGCU'"
assert translate_rna("AAUGACUUAUUAGCU") == "MUY", "Failed on input 'AAUGACUUAUUAGCU'"
assert translate_rna("AAUGGCAUAUUAACU") == "MAY", "Failed on input 'AAUGGCAUAUUAACU'"
assert translate_rna("UAAUGGCAUCUAAGUGACU") == "MASK", "Failed on input 'UAAUGGCAUCUAAGUGACU'"
print("SUCCESS: translate_rna passed all tests!")

['AUG', 'ACU', 'UAU']
['M', 'U', 'Y']
MUY
-----------
MUY
['AUG', 'ACU', 'UAU']
['M', 'U', 'Y']
MUY
-----------
MUY
['AUG', 'ACU', 'UAU']
['M', 'U', 'Y']
MUY
-----------
MUY
['AUG', 'ACU', 'UAU']
['M', 'U', 'Y']
MUY
-----------
MUY
['AUG', 'GCA', 'UAU']
['M', 'A', 'Y']
MAY
-----------
MAY
['AUG', 'GCA', 'UCU', 'AAG']
['M', 'A', 'S', 'K']
MASK
-----------
MASK
SUCCESS: translate_rna passed all tests!


### PROBLEM 3 - Consequences of the deltaF508 mutation in CFTR (2 POINTS)

We will now revisit the CFTR gene fragment that we saw in the day 1 notebook and examine how the deltaF508 mutation impacts the resulting amino acid sequence of the encoded protein.  Below is again the sequence of the CFTR gene fragment (sense strand).

In [6]:
cftr_gene_fragment = ("ACTTCACTTCTAATGGTGATTATGGGAGAACTGGAGCCTTCAGAGGGTAA"
                      "AATTAAGCACAGTGGAAGAATTTCATTCTGTTCTCAGTTTTCCTGGATTA"
                      "TGCCTGGCACCATTAAAGAAAATATCATCTTTGGTGTTTCCTATGATGAA"
                      "TATAGATACAGAAGCGTCATCAAAGCATGCCAACTAGAAGAG")
print(cftr_gene_fragment)

ACTTCACTTCTAATGGTGATTATGGGAGAACTGGAGCCTTCAGAGGGTAAAATTAAGCACAGTGGAAGAATTTCATTCTGTTCTCAGTTTTCCTGGATTATGCCTGGCACCATTAAAGAAAATATCATCTTTGGTGTTTCCTATGATGAATATAGATACAGAAGCGTCATCAAAGCATGCCAACTAGAAGAG


And here is the sequence of the mutated CFTR gene fragment, which has bases 129, 130, and 131 (1-based coordinates) removed.

In [7]:
deltaf508_fragment = cftr_gene_fragment[:128] + cftr_gene_fragment[131:]
print(deltaf508_fragment)

ACTTCACTTCTAATGGTGATTATGGGAGAACTGGAGCCTTCAGAGGGTAAAATTAAGCACAGTGGAAGAATTTCATTCTGTTCTCAGTTTTCCTGGATTATGCCTGGCACCATTAAAGAAAATATCATTGGTGTTTCCTATGATGAATATAGATACAGAAGCGTCATCAAAGCATGCCAACTAGAAGAG


Recall that the removed bases were

In [8]:
print(cftr_gene_fragment[128:131])

CTT


Using your translation function and considering the coordinates of the deltaF508 mutation, determine
1. how many codons of the non-mutant fragment overlap with this deletion.  In other words, how many codons in the original gene sequence have bases removed by the deletion.
2. which single amino acid is effectively deleted from the resulting mutant RNA fragment

Note that this CFTR sequence is only a fragment (substring) of the entire CFTR gene and so will not begin with a start codon or end with a stop codon.  You will also need to know that the first three bases of the fragment are a codon.

Answer the above two questions by assigning your answers to the variables in the code below.

In [9]:
print(translate_rna_fragment(deltaf508_fragment.replace('T','U')))
print(translate_rna_fragment(cftr_gene_fragment.replace('T','U')))
#------------------------------
# Assign to the variable 'num_codons_overlapping_with_deletion' an integer
###
num_codons_overlapping_with_deletion = 2
###
# Assign to the variable 'amino_acid_deleted' a single character string
###
amino_acid_deleted = "F"
###


USLLMVIMGELEPSEGKIKHSGRISFCSQFSWIMPGUIKENIIGVSYDEYRYRSVIKACQLEE
USLLMVIMGELEPSEGKIKHSGRISFCSQFSWIMPGUIKENIIFGVSYDEYRYRSVIKACQLEE


In [10]:
# test for num_codon_overlapping_with_deletion
assert isinstance(num_codons_overlapping_with_deletion, int), "num_codons_overlapping_with_deletion is not an integer"
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [11]:
# test for amino_acid_deleted
assert isinstance(amino_acid_deleted, str), "amino_acid_deleted is not a string"
###
### AUTOGRADER TEST - DO NOT REMOVE
###


## The lac operon gene regulatory network
For the remainder of this notebook, we will practice with very simple modeling of the lac operon gene regulatory network that was described in lecture.  

We will add one more component to this network, which is the *CAP* protein.  In an environment with low levels of glucose the CAP protein binds to the regulatory region for the lac operon and increases transcription of the lac operon, but only when *lacI* is not bound to the regulatory region.  This allows the cell to try to make up for the lack of glucose by increasing its metabolism of lactose instead.

The entire network, along with the deterministic functions/circuits for each downstream variable, is defined by the figure below.  The `lactose`, `lacI`, `glucose`, and `CAP` input variables take the value `True` when they are present and the value `False` when they are absent.  The `lacI_bound` and `CAP_bound` variables are also boolean valued and depend on a subset of the input variables.  The `lacZ` variable represents the transcription level of the lac operon, and can take one of three values: "absent", "low", and "high."

![Lac circuit](network.png)

We will define this network in Python using a series of very simple functions, which can then be composed.

### PROBLEM 4 - lacI_bound (1 POINT)
Write a function that specifies the lacI-bound circuit, as defined in the figure above.  This should be a simple one-liner that uses a boolean expression (e.g., using the operators `and`, `or`, and `not`).

In [12]:
def lacI_bound(lactose, lacI):
### YOUR CODE HERE
    return not lactose and lacI


In [13]:
# tests for function lacI_bound
assert lacI_bound(False, False) == False, "Failed on input False, False"
assert lacI_bound(False, True) == True, "Failed on input False, True"
assert lacI_bound(True, False) == False, "Failed on input True, False"
assert lacI_bound(True, True) == False, "Failed on input True, True"
print("SUCCESS: lacI_bound passed all tests!")

SUCCESS: lacI_bound passed all tests!


### PROBLEM 5 - CAP_bound (1 POINT)
Write a function that specifies the CAP-bound circuit, as defined in the figure above.  This should also be a simple one-liner that uses a boolean expression.

In [14]:
def CAP_bound(glucose, CAP):
### YOUR CODE HERE
    return not glucose and CAP



In [15]:
# tests for function CAP_bound
assert CAP_bound(False, False) == False, "Failed on input False, False"
assert CAP_bound(False, True) == True, "Failed on input False, True"
assert CAP_bound(True, False) == False, "Failed on input True, False"
assert CAP_bound(True, True) == False, "Failed on input True, True"
print("SUCCESS: CAP_bound passed all tests!")

SUCCESS: CAP_bound passed all tests!


### PROBLEM 6 - lacZ (1 POINT)
Write a function that specifies the lacZ circuit, as defined in the figure above.

In [16]:
def lacZ(is_lacI_bound, is_CAP_bound):
### YOUR CODE HERE
    if is_lacI_bound == False and is_CAP_bound == False:
        return "low"
    elif is_lacI_bound == False and is_CAP_bound == True:
        return "high"
    else:
        return "absent"
        
        
        


In [17]:
# tests for function lacZ
assert lacZ(False, False) == "low", "Failed on input False, False"
assert lacZ(False, True) == "high", "Failed on input False, True"
assert lacZ(True, False) == "absent", "Failed on input True, False"
assert lacZ(True, True) == "absent", "Failed on input True, True"
print("SUCCESS: lacZ passed all tests!")

SUCCESS: lacZ passed all tests!


### PROBLEM 7 - lacZ full circuit (1 POINT)
Finally, let us write a function that gives the value of lacZ depending on the input variables `lactose`, `lacI`, `glucose`, and `CAP`.  Use composition of the functions above to accomplish this in a simple one-liner.

In [18]:
def lacZ_full_circuit(lactose, lacI, glucose, CAP):
### YOUR CODE HERE
    return lacZ(lacI_bound(lactose, lacI), CAP_bound(glucose, CAP))


In [19]:
# tests for function lacZ_full_circuit
# note that these are only a subset of the possible inputs
assert lacZ_full_circuit(False, False, False, False) == "low", "Failed on input False, False, False, False"
assert lacZ_full_circuit(False, False, False, True) == "high", "Failed on input False, False, False, True"
assert lacZ_full_circuit(False, True, False, True) == "absent", "Failed on input False, True, False, True"
assert lacZ_full_circuit(False, True, True, True) == "absent", "Failed on input False, True, True, True"
print("SUCCESS: lacZ_full_circuit passed all tests!")

SUCCESS: lacZ_full_circuit passed all tests!


### PROBLEM 8 - creating and using a module (1 POINT)
We will now practice with creating a Python module.  Modules make it easy to reuse your code and keep related functions, variables, and classes in a separate namespace.

For this problem, you are to do the following:
1. Create a new file `lac_network.py` in your workspace by clicking on "File" menu and selecting "New -> Python file".
2. Copy the Python code for each of your lac circuit functions that you have written above into the `lac_network.py` file.
3. Import your `lac_network` module within this notebook and test that you can call functions defined within the module.

*Important note regarding module import in notebooks*: If you have already run the import statement for a module in a notebook and then make a change to the source code of the module, using the `import` statement again will not use the updated code.  You can either restart the Python kernel and re-run the cell with the `import` statement (recommended) or use the `reload` function from the [`importlib`](https://docs.python.org/3.7/library/importlib.html) standard module. 

In [20]:
# tests for the lac_network module
import lac_network
assert lac_network.lacZ_full_circuit(False, False, False, False) == "low", "lac_network.lacZ_full_circuit failed"
assert lac_network.lacI_bound(False, False) == False, "lac_network.lacI_bound failed"
assert lac_network.CAP_bound(False, False) == False, "lac_network.CAP_bound failed"
assert lac_network.lacZ(False, False) == "low", "lac_network.lacZ failed"
print("SUCCESS: lacZ_network module passed all tests!")

SUCCESS: lacZ_network module passed all tests!


### PROBLEM 9 - reading from a file (1 POINT)
We will now practice with reading data from a text file.  In particular, we will read a file that contains some input values for the `lacZ_full_circuit`, with a different set of input values on each line.  The input values will be specified for the variables `lactose`, `lacI`, `glucose`, and `CAP`, and in that order.  On each line, the values will be separated by one or more whitespace characters.

Write a function that takes as input the name of a file containing input values for the `lacZ_full_circuit` function and outputs a list of lists of the input values.  The first tuple in the list should correspond to the input values from the first line of the file, and so on for the other lines in the file.

You will likely find the following function, which converts the strings "True" and "False" to Python boolean values, of use.

You will also likely find the `split` method of string functions helpful for converting each line of the file into a list of tokens.

In [21]:
def str_to_bool(s):
    if s == 'True':
         return True
    elif s == 'False':
         return False
    else:
         raise ValueError("Input is neither 'True' nor 'False': " + s)

In [51]:
def read_lacZ_full_circuit_inputs(filename):
    ### YOUR CODE HERE
    f = open(filename)
    booleanList = []
    twoDList = []
    for x in f:
        twoDList.append([str_to_bool(i) for i in x.split()])
    #print(twoDList)
    return twoDList

In [52]:
# tests for read_lacZ_full_circuit_inputs
assert read_lacZ_full_circuit_inputs("lac_network_inputs.txt") == [[False, False, False, False],
                                                                   [False, False, False, True],
                                                                   [False, False, True, False],
                                                                   [False, False, True, True]]
print("SUCCESS: read_lacZ_full_circuit_inputs passed all tests!")

SUCCESS: read_lacZ_full_circuit_inputs passed all tests!


### PROBLEM 10 - calling functions with arguments in a list (1 POINT)
Finally, we will practice calling a function that takes multiple arguments when the arguments are packed within a single list or tuple.  For this we will use Python's [unpacking](https://docs.python.org/3/tutorial/controlflow.html#unpacking-argument-lists) syntax.

For example, suppose we have a number of words in a list, and we want to use the print statement to print all of the words on a single line, separated by a single space character.  We can use unpacking to make this easy.

In [53]:
some_words = ["Here", "are", "some", "words"]

In [54]:
print(some_words)

['Here', 'are', 'some', 'words']


In [55]:
print(*some_words)

Here are some words


Notice the difference in the output from the two print statements above.  In the first statement, `print` is given a single argument, a list of strings.  In the second statement, `print` is given multiple arguments, each an element from the list `some_words`.

Use this argument unpacking technique to write a function that calls `lacZ_full_circuit` on each of a list of input value lists, such as is returned by the `read_lacZ_full_circuit_inputs` function you defined in the previous problem.  The function should return a list of the corresponding outputs from the `lacZ_full_circuit` function.

In [81]:
def call_lacZ_full_circuit_on_list(inputs_list):
    ### YOUR CODE HERE
    print(*inputs_list)
    solutions = []
    
    #for x in inputs_list:
    #   solutions.append(lacZ_full_circuit(*x))
        
    solutions = [lacZ_full_circuit(*x) for x in inputs_list]
        
    print(solutions)    
    return solutions


In [82]:
# tests for call_lacZ_full_circuit_on_list
test1_list = [[False, False, False, False],
              [False, False, False, True],
              [False, False, True, False],
              [False, False, True, True]]
assert call_lacZ_full_circuit_on_list(test1_list) == ['low', 'high', 'low', 'low'], "Failed on test1_list"
test2_list = [[False, False, False, True]]
assert call_lacZ_full_circuit_on_list(test2_list) == ['high'], "Failed on test2_list"
print("SUCCESS: call_lacZ_full_circuit_on_list passed all tests!")

[False, False, False, False] [False, False, False, True] [False, False, True, False] [False, False, True, True]
['low', 'high', 'low', 'low']
[False, False, False, True]
['high']
SUCCESS: call_lacZ_full_circuit_on_list passed all tests!


## Submitting your notebook

Congratulations, you have reached the end of this notebook!  To double check that you have completed all of the problems correctly, you are encouraged to restart Python and run all of the cells in the notebook from beginning to end.  You can do so automatically by running the command "Restart Kernel and Run All Cells" from the "Kernel" menu.  After running this command, look through your notebook and check that all of the tests ran successfully.

To submit your work, click on "Submit" button in the upper right corner.  You may submit as many times as you wish and your final grade will be based on your most recent submission.  After you submit, a grade report will become available telling you how many points you received on each problem in the notebook.