# An Introduction to Dictionaries and Functions in Python

In this background section, we will cover:
1. The central dogma of biology 
2. An introduction to dictionaries in Python 
3. An introduction to functions and 
4. A video putting everything together

If you're already familiar with any of the concepts covered in 1-3, feel free to skip straight to the video in section 4. 

## 1. Central Dogma 
--------
The central dogma of biology describes information flow in biological systems. It can be described by the following: DNA makes RNA, and RNA makes protein. <br>

Watch the video below to get an overview of this process: 

In [1]:
# Imports the functionality that we need to display YouTube videos in a Jupyter notebook.  
# You need to run this cell before you run ANY of the YouTube videos.
from IPython.display import YouTubeVideo
YouTubeVideo("itsb2SqR-R0",width=640,height=360)

#### If you need more background on the central dogma, please see the following resources: 
* [Khan Academy introduction to central dogma](https://www.khanacademy.org/science/high-school-biology/hs-molecular-genetics/hs-rna-and-protein-synthesis/a/intro-to-gene-expression-central-dogma)
* [Khan Academy MCAT review of central dogma](https://www.khanacademy.org/test-prep/mcat/biomolecules/amino-acids-and-proteins1/v/central-dogma-of-molecular-biology-2)

## 2. Dictionaries
-----
### What is a dictionary?
Imagine an English-language dictionary. If we want to know the meaning of the word "science", we can look up that word in the dictionary. The word, "science", is a *key*, and its definition is a *value*; together they form a *key:value pair*. Python dictionaries are just like this. They allow us to store information associated with a key, that allows us to look up the information (the value) when we need it. <br>

To continue with our example, we can even make a small English-language dictionary in Python, using just the first few words of each definition for visual clarity:


In [9]:
EnglishDict = {'science':'the intellectual and practical activity...','humanities':'learning or literature...'}

If I want to look up words in my dictionary, I can:

In [10]:
EnglishDict['science']

'the intellectual and practical activity...'

### Why use a dictionary?
In order to illustrate why we would use a dictionary, let's think about the differences between dictionaries and what we've learned in previous classes: lists. Before today, if we wanted to put a bunch of values into one object to reference later, we put them in a list. Lists are indexable: that is, each element is associated with a number, from `0` to `n-1`, where `n` is the length of the list. If we want to access an element of the list, we either have to know its numerical index, or iterate over the list until we find it. <br>

Keeping that in mind, let's talk about the two benefits of dictionaries: **keys** and **speed**. <br>

As is probably evident from the above example, there are clear instances in which a dictionary's use of keys makes a dictionary the right choice to solve a given problem. If we have identifiers that refer to some data, a dictionary very intuitively helps us keep the identifiers and data paired together correctly. In a list, we can only use integers as identifiers, and this isn't helpful if there's some meaning to the identifier, like how a word refers to a definition. If we tried to encode our English dictionary as a list, for example, we wouldn't be able to look up definitions by words, nor know what defintions go with which word: <br>



In [11]:
englishList = ['the intellectual and practical activity...','learning or literature...']

However, there is also another important benefit to dictionaries that isn't immediately obvious: lookup in dictionaries is much faster than in lists. This benefit is due to how Python stores dictionaries. In a list, even if we know the numerical index of an item, we have to iterate through the whole list until we find that item. This means that, if the time it takes to look at one element in a list is 1 arbitrary time unit, it takes `n` time units to find the nth element of the list. For example, if we want the 6th element in the list, we will spend 6 units of time looking for it. In a dictionary, on the other hand, we can directly look up the element we want, without having to go through all the other elements. No matter where the element is in the dictionary, it will take 1 time unit to find it. For the curious, the implementation in Python that allows this is explained [here](http://www.jessicayung.com/how-python-implements-dictionaries/). From Python 3.6 onwards, dictionaries are faster than lists, and don't use more memory. Therefore, it is sometimes useful to implement a dictionary even when it's not intuitive why a list couldn't suffice. This comes in handy when working with very large datasets, where there are thousands or more elements in a list. Just something to keep in mind if your code is running super slow! <br>

### How to use a dictionary
Making a dictionary in python is as simple as the following line of code:

    myDict = {'key':'value'}
    
The great thing about dictionaries is that the value can be anything; a string, int, float, or *mutable* (changeable) data types such as lists, or even other dictionaries! Really the only rule about dictionaries is that they keys have to be *immutable*: you can't use a list as your key, but you can use a string, because you can change the elements of a list, but you can't change the characters in a string. And this makes sense -- because if you can change the keys in your dict, how can you know what they are in order to look them up? <br>

How do we use dictionaries we've made? Some common operations, using our English example: <br>

**Looking up a value by it's key:** Two equivalent methods <br>

    

In [12]:
EnglishDict['humanities']

'learning or literature...'

In [13]:
EnglishDict.get('humanities')

'learning or literature...'

**Getting a list of all keys:**

In [14]:
list(EnglishDict.keys())

['science', 'humanities']

**Add a new key:value pair:**

In [15]:
EnglishDict['engineering'] = 'concerned with the design, building...'
EnglishDict

{'science': 'the intellectual and practical activity...',
 'humanities': 'learning or literature...',
 'engineering': 'concerned with the design, building...'}

## 3. Functions
_____
### What is a function?
In programming, a function allows us to call the same set of code over and over on different inputs. A function consists of an organized block of code that performs some modular (meaning it works independently) task, and we can apply within different sections of our code. Functions also have a short piece of documentation within their definition, called a *docstring*, that allows users to understand exactly what the function does and how it is used. <br>

We use the command  `return` within the body of a function to pass a value out of the function once we've produced it; if we won't return the value, it will disappear forever once the function is done running!<br>



In [10]:
myList = [1,2,3,4]

def returnThirdElement(alist):
    """
    Returns the third element in a list.
    
    parameters:
    alist: a list of length three or greater
    """
    return alist[2]

thirdElmt = returnThirdElement(myList)
print('thirdElmt: {}'.format(thirdElmt))
print('myList: {}'.format(myList))

thirdElmt: 3
myList: [1, 2, 3, 4]


This is an example of a custom Python function. We can define functions to do anything we want! This will be discussed in the section **How to use a function**.

### Why use a function?
Why bother with functions? It might seem non-intuitive to define functions for such simple tasks, that only take a few lines of code to execute. But what if I want to perform the same task over and over on different input data? What if I want to execute a procedure that's 15 lines of code instead of 5? That is what functions are perfect for: to avoide repetitive code. We often perform the same procedures multiple times in a given script, and functions allow us to reduce the repetition in our code, making it easier to read and edit, and helping us avoid copy-and-paste errors, since we only have to write the same section of code one time. Here are some examples of these benefits:<br>

**Ease of editing:** Imagine that I want to do the procedure in `returnThirdElement` multiple times, but I haven't written a function. My code might look something like this:

    firstList = [1,2,3,4]
    secondList = [5,6,7,8]
    thirdList = [10,11,12,13]
    
    thirdEltFirst = firstList[2] 
    thirdEltSecond = secondList[2]
    thirdEltThird = thirdList[2] 
    
But what if I decide I want the first element, not the third? Now I have to manually change every instance of \[2\] in my code: 

    firstList = [1,2,3,4]
    secondList = [5,6,7,8]
    thirdList = [10,11,12,13]
    
    thirdEltFirst = firstList[0] 
    thirdEltSecond = secondList[0]
    thirdEltThird = thirdList[0]
    
Whereas if I had used my function `returnThirdElement` the switch would have been as simple as changing \[2\] to \[0\] in the body of my custom function one time, and changing the function name to make sure it still describes what the function does: 

    def returnThirdElement(alist):
        """
        Returns the third element in a list.

        parameters:
        alist: a list of length three or greater
        """
        return alist[2]

Changes to: 

    def returnFirstElement(alist):
        """
        Returns the first element in a list.

        parameters:
        alist: a list of length three or greater
        """
        return alist[0]
        
This might seem trivial in this example, because I only repeated myself three time; but as you write more complex code and start to repeat the same functions over and over, only needing to change the value one time becomes much more useful. Another example would be if I called the same argument (variable that I pass to the function) multiple times within the function: I would only have to change the value of the argument, instead of every instance within the code.

**Avoiding copy paste errors:** Imagine I wanted to get the third elements of ten or twenty lists, instead of 3. The more I copy-paste and change variable names the more likely I am to make an error, and not notice: 

    firstList = [1,2,3,4]
    secondList = [5,6,7,8]
    thirdList = [10,11,12,13]
    fourthList = [1,2,3,4]
    fifthList = [5,6,7,8]
    sixthList = [10,11,12,13]
    seventhList = [1,2,3,4]
    eighthList = [5,6,7,8]
    ninthList = [10,11,12,13]
    tenthtList = [1,2,3,4]
    
    thirdEltFirst = firstList[2] 
    thirdEltSecond = secondList[2] 
    thirdEltThird = thirdList[2]
    thirdEltFourth = ourthList[2] 
    thirdEltFift = fifthList[2] 
    thirdEltSixth = thirdList[2] 
    thirdEltSeventh = sevethList[2]
    thirdEltEighth = eighthList[2]
    thirdEltNinth = ninthList[2
    thirdEltTenth = tenthList[2]

Can you spot the copy-paste and typing errors? If I had used a function, my code might look like this:

    lists = [firstList, secondList, thirdList, fourthList, fifthList, sixthList, seventhList, eighthList, ninthList, tenthtList]
    
    thirdEltsLists = []
    for alist in lists:
        thirdElt = returnThirdElement(alist)
        thirdEltsLists.append(thirdElt)

### How to use a function
There are two parts to using a custom function. The first is defining it, and the second is applying it. <br>

#### 1. Defining a custom function
Defining a function is as simple as the following: 


In [14]:
def myFunc(x,y=5):
    """
    this function does <description here>

    parameters:
    x: some parameter
    y: some other parameter. default = 5
    """
    # my code here


There are three elements to a function definition: the header, the docstring, and the body. <br>

Let's return to our function `returnThirdElement` to look closer at the anatomy of a function. <br>

![title](img/functionDiagram.png)


The **header**: the header is `def` followed by the name of your function, its parameters, and a colon. In parentheses are the generic variables representing the inputs (arguments) your function will accept. You can create default values for parameters by using an `=`. <br>

The **docstring**: this is one of the most important aspects of coding: making your code readable and reusable, not only by other people, but also by you! We all think we'll remember exactly what it was that we were trying to do, but a few months later it's as if we're looking at a stranger's code. A docstring is the fastest way to remind yourself what you wanted this function to do. It should contain information about what the function does or returns, and what the parameters are: what type of object they are, and what rules they have to follow, enclosed in triple quotes. Remember to write a docstring for *all* your custom functions, even if they seem simple! <br>

And last but not least, the **function body**. This is where you put the lines of code you want this function to run. Make sure you use the same variable names that you provided in the function header. 

#### 2. Using your custom function
**Calling your function**. Here are several valid ways to use the example function we just defined.
    
    x = 7
    y = 9
    myFunc(x,y)
    
    myFunc(x)
    
    myFunc(7,9)
    
    myFunc(7)
    
    myFunc(x=7,y=9)
    
Since y has a default, we don't have to pass y to the function if we don't want or need to. If we do pass a value for y, it will override the default value. However, since x doesn't have a default, we have to pass a value for x, otherwise the function will throw an error. Lastly, if we want to specify the arguments with keywords, as in `myFunc(x=7,y=9)`, once we specify one argument with a keyword, all subsequent arguments must be specified with keywords. `myFunc(x=7,9)` will throw an error. <br>

**Getting help for functions**. Oftentimes, we use other people's functions, so we don't necessarily know what all the arguments are, or what exactly the function does or returns. If we want to quickly see the docstring written in the function, we can use Python's built-in `help()` function. This prints the docstring. For example, if we wanted to know what `returnThirdElement` does:

In [19]:
help(returnThirdElement)

Help on function returnThirdElement in module __main__:

returnThirdElement(alist)
    Returns the third element in a list.
    
    parameters:
    alist: a list of length three or greater



This is why it's so important to write good docstrings! This allows everyone who uses this function to quickly get info on how to use it. Additionally, if you're using a function from a widely-used Python module, you can also google the function to find the more extensive documentation. 

**One last point:** *We can use a function to return multiple values.* <br>
If we return a list or a tuple (like a list, but immutable, see [this page](https://www.tutorialspoint.com/python/python_tuples.htm) for a brief explanation), we can "catch" that output in one of two ways. <br> 

Imagine we have the following function:

    def my_func(some_input):
        """
        Adds 1, 2, and 3 to the input.
        
        parameters:
            some_input: int, some input
            
        returns: a tuple with three elements
        """
        return (some_input+1, some_input+2, some_input+3)

1. We can assign the tuple or list to one vairable
   
       output = my_func(3)
   
   This will result in `output` being a tuple with three elements, `(4,5,6)`.
   
   
2. We can assign the elements of the tuple or list to individual variables

       output1, output2, output3 = my_func(3)
   
   This will result in 3 variables whose values are the integers from the returned tuple: `output1 = 4`, `output2 = 5`, `output3 = 6`.
   

## 4. Putting it together
--------
Watch the video below from the beginning to ~22 minutes, and pay attention to the following:

* How to create a dictionary
* How to retrieve dictionary values using a key
* How to use the `.get()` function
* How to define a function using `def`
* How to specify and use argument defaults
* How to change the order of arguments when using a function
* Function outputs
* How to use `return` to store variables

We will be using the genetic code as an example of a dictionary, and defining custom functions to translate the coding sequence of GFP (Green Fluorescent Protein) and Rubisco (Ribulose-1,5-bisphosphate carboxylase/oxygenase) into proteins. Pay attention to how the central dogma is demonstrated in this example!

Execute the cell below to have this data and follow along in the video.

In [1]:
genetic_code = {'TTT':'F','TTC':'F','TTA':'L','TTG':'L','CTT':'L','CTC':'L','CTA':'L','CTG':'L',
                'ATT':'I','ATC':'I','ATA':'I','ATG':'M','GTT':'V','GTC':'V','GTA':'V','GTG':'V',
                'TCT':'S','TCC':'S','TCA':'S','TCG':'S','CCT':'P','CCC':'P','CCA':'P','CCG':'P',
                'ACT':'T','ACC':'T','ACA':'T','ACG':'T','GCT':'A','GCC':'A','GCA':'A','GCG':'A',
                'TAT':'Y','TAC':'Y','TAA':'-','TAG':'-','CAT':'H','CAC':'H','CAA':'Q','CAG':'Q',
                'AAT':'N','AAC':'N','AAA':'K','AAG':'K','GAT':'D','GAC':'D','GAA':'E','GAG':'E',
                'TGT':'C','TGC':'C','TGA':'-','TGG':'W','CGT':'R','CGC':'R','CGA':'R','CGG':'R',
                'AGT':'S','AGC':'S','AGA':'R','AGG':'R','GGT':'G','GGC':'G','GGA':'G','GGG':'G'}

gfp = "ATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCTGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAA"
rubisco = "ATGGCGAGCACCTTTAGCGCGACCACCAGCAGCTGCAACCTGAGCAGCAGCGCGGCGATTAGCAGCTTTCCGCTGGCGGCGGGCAAACGCAACGCGAACAAAGTGGTGCTGCCGCGCAAAAACCGCAACGTGAAAGTGAGCGCGATGGCGAAAGAACTGCATTTTAACAAAGATGGCAGCGCGATTAAAAAACTGCAGAACGGCGTGAACAAACTGGCGGATCTGGTGGGCGTGACCCTGGGCCCGAAAGGCCGCAACGTGGTGCTGGAAAGCAAATATGGCAGCCCGAAAATTGTGAACGATGGCGTGACCGTGGCGAAAGAAGTGGAACTGGAAGATCCGGTGGAAAACATTGGCGCGAAACTGGTGCGCCAGGCGGCGGCGAAAACCAACGATCTGGCGGGCGATGGCACCACCACCAGCGTGGTGCTGGCGCAGGGCCTGATTGCGGAAGGCGTGAAAGTGGTGGCGGCGGGCGCGAACCCGGTGCTGATTACCCGCGGCATTGAAAAAACCAGCAAAGCGCTGGTGGCGGAACTGAAAAAAATGAGCAAAGAAGTGGAAGATAGCGAACTGGCGGATGTGGCGGCGGTGAGCGCGGGCAACAACCATGAAGTGGGCAACATGATTGCGGAAGCGCTGAGCAAAGTGGGCCGCAAAGGCGTGGTGACCCTGGAAGAAGGCAAAAGCGCGGAAAACAGCCTGTATGTGGTGGAAGGCATGCAGTTTGATCGCGGCTATATTAGCCCGTATTTTGTGACCGATAGCGAAAAAATGACCGTGGAATTTGAAAACTGCAAACTGCTGCTGGTGGATAAAAAAATTACCAACGCGCGCGATCTGATTAACATTCTGGAAGATGCGATTCGCAGCGGCTTTCCGATTGTGATTATTGCGGAAGATATTGAACAGGAAGCGCTGGCGACCCTGGTGGTGAACAAACTGCGCGGCAGCCTGAAAATTGCGGCGCTGAAAGCGCCGGGCTTTGGCGAACGCAAAAGCCAGTATCTGGATGATATTGCGATTCTGACCGGCGGCACCGTGATTCGCGAAGAAGTGGGCCTGACCCTGGATAAAGCGGATAAAGAAGTGCTGGGCAACGCGGCGAAAGTGGTGCTGACCAAAGATACCACCACCATTGTGGGCGATGGCAGCACCCAGGAAGCGGTGAACAAACGCGTGAGCCAGATTAAAAACCAGATTGAAGCGGCGGAACAGGAATATGAAAAAGAAAAACTGAGCGAACGCATTGCGAAACTGAGCGGCGGCGTGGCGGTGATTCAGGTGGGCGCGCAGACCGAAACCGAACTGAAAGAAAAAAAACTGCGCGTGGAAGATGCGCTGAACGCGACCAAAGCGGCGGTGGAAGAAGGCATTGTGGTGGGCGGCGGCTGCACCCTGCTGCGCCTGGCGAGCAAAGTGGATGCGATTAAAGATACCCTGGCGAACGATGAAGAAAAAGTGGGCGCGGATATTGTGAAACGCGCGCTGAGCTATCCGCTGAAACTGATTGCGAAAAACGCGGGCGTGAACGGCAGCGTGGTGAGCGAAAAAGTGCTGAGCAGCGATAACCCGAAATATGGCTATAACGCGGCGACCGGCAAATATGAAGATCTGATGGCGGCGGGCATTATTGATCCGACCAAAGTGGTGCGCTGCTGCCTGGAACATGCGAGCAGCGTGGCGAAAACCTTTCTGATGAGCGATTGCGTGGTGGTGGAAATTAAAGAACCGGAAAGCGCGCCGGTGGGCAACCCGATGGATAACAGCGGCTATGGCAACATT"

In [3]:
# Watch this video to learn about functions, dictionaries, and L-systems
YouTubeVideo("HIxmLFrVBYQ",width=640,height=360)

# Introduction to Dictionaries and Functions: Putting it into practice

In the interactive part of today's lesson, we will do the following exercises: 
1. Introductory exercises for dictionaries
2. Introductory exercises for functions
3. An exercise that combines dictionaries and functions
4. A challenge problem

## 1. Introduction to Dictionaries

Let's start with a simple example of a dictionary. Let's make a dictionary with unique identifier specific for you. Make a dictionary named "student_dict" containing your "name", "age", "birthday", and "netID" with their affiliated values to you.

In [1]:
# write your code here

Now to your student_dict, add a new key called "ice_cream" and add your favorite flavor.

In [3]:
# write your code here

We can now access specific information within this dictionary based on the key itself. Hoepfully, you can see how the key:value functionality of a dictionary comes in handy, and is intuitive for use in this situation.

Now print your netID using the dictionary key : value system.

In [1]:
# write your code here

As dictionaries grow in complexity it may be helpful to see all key values. Get a list of all key values.

In [2]:
# write your code here 

Lastly, use a for loop to iterate through each of your keys and their affiliated ID's.

In [3]:
# write your code here

## 2. Introduction to Functions

The most important use of functions is to perform the same tasks over and over with just one line of code, in order to reduce repetitiveness in our scripts. Imagine that we want to calculate the area of multiple circles. Instead of typing out the calculation over and over, we can define a function that calculates the area of a circle:

In [11]:
import math # we  need a function from the math module inside our custom function

def calculate_circle_area(r):
    """
    Calculates the area of a circle.
    
    parameters:
        r: int or float, radius of the circle
    
    returns: area of the circle
    """
    circle_area = r**2 * math.pi
    return circle_area


Now we can use our function to find the area of some circles. Demonstrated below are some ways you can call your function. You can input a number directly into the function, specify with a keyword argument, or pass a variable.

In [13]:
function_example_1 = calculate_circle_area(7) # pass a value directly 
print("Area of circle 1: ", function_example_1)

function_example_2 = calculate_circle_area(r=4) # pass a value using keyword argument
print("Area of circle 2: ", function_example_2)

test_radius = 42 
function_example_3 = calculate_circle_area(test_radius) # pass a variable
print("Area of circle 3: ",function_example_3)

Area of circle 1:  153.93804002589985
Area of circle 2:  50.26548245743669
Area of circle 3:  5541.769440932395


**Your turn!** Using the function above "circle_area" calculate the radius of a circle of your choice.

In [4]:
# write your code here

**Your turn!** Now, write a function that performs a set of statistical operations (for example, calculates mean, min, max) on a list of numbers (float, int).

*Some info you might want to know to answer this question:* <br>

**Remember!** Functions can have multiple outputs. <br>

**Python has the following built-in functions:** `sum()`, `len()`, `min()`, and `max()`. Try using `help()` to get information on how to use them!

In [17]:
# write your code here

Now call the function you just wrote and have it calculate and return the mean, minimum and maximum for the test data provided.

##### Hint:
* Mean = 5.8888888888889
* Minimum = 1
* Maximum = 12

In [5]:
# list of test data
data = [1,2,3,5,6,7,8,9,12]

# write your code here

## 3. Synthesis Questions: Dictionaries and Functions

## Synthesis Question

#### PAPER CODE THIS FIRST

Genes within the DNA of eukaryotic organisms (ie plants and animals) are structured so that there are multiple fragments (exons) of the gene that are stitched together to produce the sequence that codes for the protein (imaged below). Because genes have this modular structure, it gives the organism the advantage of one gene having multiple possible protein sequence outputs (isoforms) that may perform different tasks within the cell. For example if there is a gene with 3 exons that can code for a protein we could have the following possible gene isoforms:

            1--2--3     ---
            1--2           |
            1--3           |
            2--3            }--- Isoforms: Structural variants of the same gene
            1              |
            2              |
            3           ---

These 7 variants of the same gene, are products of *alternative splicing*, in which some exons are [included or excluded](https://en.wikipedia.org/wiki/Alternative_splicing) to make different protein products from the same gene sequence.

Exons all contain sequence that is read by the ribosome to produce a protein and the different combinations of exons are what produce isoforms. Introns represent dead space between exons and UTRs and do not contribute to the protein structure, but instead signal for each exon "block". Introns likely function to aid in regulation of DNA --> RNA and function for signalling necessary for the alternative splicing of isoforms.  

In [4]:
from IPython.display import Image
Image(filename='img/DNA_alternative_splicing.gif') 

<IPython.core.display.Image object>

The DNA sequences in this challenge question are from the moss *Physcomitrella patens*. You can explore more about this organism [here](https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Ppatens).

### Problem:

Your challenge is to take the dictionary listed below, which contains the exon and intron elements for one gene and write a function that will return:

1) a sequence string for the longest possible isoform (after splicing out the introns). 
>
2) the length of the longest isoform
>
3) the number of possible isoforms based on the number of exons
>
>
**HINT:** the code below may help with breaking your dictionary up by exons:
   
    for key in dictionary:
       search_key = 'exon'
       if search_key in key:
       
**HINT:** number of possible isoforms is equal to ( $2^{x} -1$) where x is the total number of exons


In [None]:
# Gene model dictionary for developing function
Pp3c1_80V3_1 = {'exon1':'CAAAGTCGTATGGGCACGAGGCTCGTTGAATCCACCCACCTTCAACCATGCTTGCTGAAGTCCTCCTCTGGATTCTCAGTGATGCCGCTGCCATCACAATACTGTCTTGCCATGCTGGGTGCGGTGCCACTTTGCCGCTCCCTCCTCCTGCGGGTCCTCCTCCTCCTCCTCATCGAGCACTTCCCTGCTCTCCTTCCTCGCCTCTTCAACCATGAATGGTGCAGTTGGTTTGTTGTTGTCTTCTGGGACGAAGGTAGTCGGACTGGGCGTGGTCCTTGTGATGCTTGCAGTGCAGTCGCATGCGTTGACGCCAGATGGGTTCGCTCTGCTGGAGTTTAAGCAGGGGTTGTCTGCAACGGATGGTTTGCTCTAAAGCTGGAATGCCTCTGACGCATCGCCATGCGGATGGGAAGGCATTCAGTGCACCGATTCTGGTGATGTGGATTCCATCTCAGTGAAGAGTCTCAGGGGAGATTGCAGTTCGTAAGAATTCATCGAGTCTGGGAACCTGCGAGGACAGATCCCTTCGGAGCTGGCAAACTGCACCCGACTTGAGACCCTGAACCTCATGAACAATGAGTTATCTGGAAAGCTTCCGGGCGAGCTTGGGAATCTGACTGCGTTGACCAAGTTGCTAGTCTCCCGAAACAGTTTGGAGGGAGAGATTCCAATTTCAGTTGCGGCAAGTCCGAGCTTGTCTATTTTCAACCTCAGTGAGAACCTTTTCAGTGGTAGAGTTCCCAAAGCGCTGTACAATAATCTTAATCTCCAGGTGGTAAACGTCGGAGTAAATAGATTCTCTGGCGACGTTACAGCAG',
                'intron1':'GTACAATTTTACTACTGATGGTTCCTCTTTGTGTAATTCTACGCGCACAATCTCCGCAGCATTGAAAATCAAACATAGTTTTGAGACCACTCTGCAAATCCTCACATTAGTTTTCTAAATTGTTTAAGTAGTACGAGTGATTGTATTGCAATGGAAATCATGAATAAGATTAGCTCCAAGGTTACTCCACTCGATTTCCACCACTGAGACATCTGAGACGTAACGCAG',
                'exon2':'ACCTGGAAGAAATGTCGAAGTTGCCAAACATTTGGGGCATACAGATGAACGCGAACCAATTCACTGGTTCCTTGCCACCGTCGATTGGAAACCTGTCGTCGCTGCAATATCTTGACCTCAGTTTCAACAATCTGGATGGCATCATTCCAGAGTCGATTGCCAACTGCTCCTCTCTCCAGTATCTGGTCCTATCTTCAAACAAGCTTACCGGATCAATACCGCGGACTGTCGGACAATGCTCTAATCTTGAATTTGTCAACCTAGCTCAGAACTATTTGTCAGGAGATATACCGGCAGAGATAGGAAACTGCACCAAATTGCGAGTGCTTCATTTGGGGGGCAACAAGTTCAAAGGCAAATTGAAGGTGGACTTCAGTAGAGTTACATCATCCAACTTAATCCTAGGGATCTCCAACAACTCGTTTATAGGTGATATAAACTTCTTCGAAAGTATAGCCACGAATCCTAATTTCACGATTGTGAGCGCATGCTTGAACAACCTCACTGGGACGATTCCTACGAACTATGATGTGAAAAGGTTATCGAAGCTTCAGGTGCTCATGTTAGGATATAACAAGCTGGAAGGTAAGGTTCCGGAGTGGATGTGGGAGTTGCCTAGTCTACAGGTATTGGACCTCTCGAATAACAAATTGAGCGGGCCGGTGACAAGCAGCAGCAACTTTACACTACTGAATGGTTTCATCCACAAAAATGTGAAGACAGTGCCGTACAACTGCCATAAACTGGATTCTTACTGCGCCTACGGTTTCGATTTCTACCTCAATGATCGAAAATTTGAAGTGTCGATGAGCTACTTGACATACTTCAAATATCTCGACATCTCCTGTAACCAATTTAGCGGCATCATTCCACCCAGCATAGGCAAGCTTACCAACCTATCTTATTTAAATCTCTCAAATAATGCATTTACAGGTGTGATTCCCGCAGCAATGGGAAGAATCTTCAACCTGCAGTCCTTCGACGTGTCTCACAACCTTCTCACAGGACCCATTCCACAAGAATTCGCCGGCCTCAGTCAGTTGGCGGATCTAAAGATGGGAAACAATTCCCTTTCTGGCCCCATTCCTAGGAGTATCCAGTTGCAGTCATTCTCAGTTGACAGCTTCCTCCCTGGCAACGACGAGCTCTGTAACGAGCCTCTTGCCAGATTATGCATCGTTTCGAAAAACGATTCCACCACTACAGCTGATCCTGTAAACTTCAACTCAGATTCCATCGAAAATTTTATTTCGGTACTTGGCTTTGTAGTCGGCTTTGTTGCCCTCGCCATTGCCATATTTGTACGTCACTATCAAAACCACCCCAAGAAAAAGGTGGTGCCTGCTGATTCCAGCCTAACACGTAATTACGACCGCTATGGAGCCTTCAAGCTTCCAGAATAGGTCATGCAGTCTAGTCAGGCTGGCTTGCAACTAGTAAGATGCGTCAGCACATTTACAAAATGGGGATTTGTGCAGGCAGACTCGGAGCCATGTTGCAGTTTTATTGTCACAAAGCTGCTTATCGAGGGTGTCCTAGGCAGCAGAGTTGTAATAGAACCTGATGATGACTAGGAGTGTGGGATGTTCTGTTAATGGTTTAATTGCAGCACCATTGGAAAGTCAAAATAGTGGCGAATTCTCCATGCAAATTCACCTGATTCTGCACCACCATGACGAGGCATTTCCTGCATCCACAATTGAAGTTGGGGTTTTAGTGGATGAACGCCCTTGAAACCATTCATTGGTTGAACAATCTGGGTGTGTGCTCAAGCTGTCCACCTTATTGTGGGTTGGCTCCCATGATCCCTGCCCACAAATGTAGTGCTTGTATTAATGAAGAGAAATTGCTAATTTTTTTCTAAAAATTCAACCTAAAATACACAATATTAAAATTTTAAGTATTACTGTTTCTAAAATACAGAAAATAATTTTATAAAATAATAAACGATGTCTACAA'
               }

# hint
# length of longest isoform is 2777 bases
# number of possible isoforms is 3

In [None]:
# Write your function here


In [None]:
# Call your function here


In [None]:
# More gene models to test the versatility of your function
Pp3c1_60V3_1 = {'exon1':'GTGTGTTTGTTGTGAGATCGAAGAAGAAGGAAAAGGTTGGGGCATGTGGAAGCAAGGAAGTTTGAGGGGATCTGAGTGAGAGAGAGAGAGTGGGTGAGTGTGAGAGAGAGTAAGTGAAGAAGGGCCCTACTATTCTGGCAGCAGCAATAGGATAGTTAGTTAGCTGCGTGGTCTGACCAGTCCGTCGTGTTGCAGGCAGGCAGGCAGGGGTTGGAGACAGACAGACAGACAGCCAGAGGGAGGGAGGGGGGTTTAGGGTTTAGGTCTTCCCCTTTCCCTTTCTCGCTCTCCCTTCTCCTCCCCTTCCTCTCCTCCAGTTTCTCCTCCCGCGTGCAAGCAAGCTCTCTCTCTCTCTCTTCGCTGCCTTCTTTGCTGCTCAACTTCTTTTCGTTCCATCTACAGCAGTTGCGTACAACTTCACATTCAGAAGCAGGCCTTTTTCTCCTATCATCACCTAAACCCTACCCCCCCCCCCCCCTTATTCCTTGCGCTCTCCCCTTTCGCTACTTTCACGACCCTCTTTTCTTTCTCTTCTTTTTTGAAATTCCATCCCTCGCTTCGGCGCAAGATTTTGACGCTCATCCATAATGTAACAACGTACACAACACTCTTCCACGACGCTACTGCACAATCATGATGGAGCTAGTGTCCCCTTCTTTGCACCTCTTCTACTCCAAGCGCTGCTTCTTCACTCTCTCGCGCTCATTCCAACATGACCCCTACCCTCCATCCTCACTCTCTCTGATCAGTCTTCTCTGCCTTCCCCCGCTTCTTATTGCCCCCGCTCACCCAGATTATG',
                'intron1':'GTAATTGGCTTCCACACCCCTCTTGTTCCCTCCTTTTCAGGCAAACAGCTGGCCTCCGCTGCTTTTCTTTGCCTCGGGTTCTGCGTTTGGACTCATTCTCACTGCCTACAGGACCGCTCTTTCGGTTTGGCTGTCCCGATGTTAGTTTCTCTCTTCCTACCTCTTGGTAGATGCACGCCCGTTTCTGTATTCCTTGTCCACATGGATTGTTAAGCTTGTTCGGACTTAACACAGGGTTTTTCTTTGGTTTTGTCTGGTTGTTGCTCCTTTTAAATTTTTATGGAATCCTTTGACGAGGACTGGTAAGTCAAATGTGAAAGTAGCTCTCCAAAATTTGATAGGAAGTGTGGGAGTGAAGGCGCGACTGCAATTATGAATGATAGATAGGTATCAAGCCAGAGACGGGGATCACAAGTCTTCTTGTCTCGTTGCTTCTTCATCCGTCTAAATTGAATGCTGTGTTGATTTAACAGCATATTTTGGTGTCAGGTTGGTTGTTTACAACCGAGTTTTCTGTTCTCGGAACATTTCTGCGTCCAATGCACTAATAATGGACTAATTCAATGGTTTATGCAG',
                'exon2':'CTGACGAAAACAGAGGCAGTTCAACTATCTCTTGTTAGAGCCGATGCGGTTCAGAGAAGGATATACATATAGAGGGAGAGGATCTGAATATGGCTACTGCTAAGGCGAAACTCCATACGGTTATGGTCAGGTTAGGTCAGAAATGCCGGTATGAAACTTGGCAGTTGCCACACTTGGGAACAGGGTCTGTTGGTGCAGGACACCAGGAAACTGCGCTTCTTGAGTTCAGCAAGGAGGGTCCTATGTTTGCATGTAAGTTACATTTGCCCGATGGTACTACCATTCAATCCAACAGTTTTCGTCGGAAAAAGGATGCCGAGCAAGATGCAGCTCTTCATGCACTTCAGAAG',
                'intron2':'GTGCGCTTTGATGATTCAACATTTTTATATTCACGATCACATGGTGGAGGCTTTTCTGGAATTGTCCTTTTCTTCTACTTTCCTGTATTTAAAGAATATTAAATCGATGTGCTTGAAGACTGTCTACTGATAGTCTCATGCAAGTTTGGAAGCATGCTCTTTTTTAGCCCTATTGTGTGCGTTTCTGTTGTCTAATATACTCGAGTTTAAATTATGCCGTCAGACACTAGCGGAAACCCGTGCCGTTGGGCACTGCAATTTCCGGCACTGCCATCCATGCATTCTCTCGTTCATCCATGTTCCCTACTTCCCATGCCAATGAGAAGACTTCTTTGTCTTTTTTTTTCTTATACGTATGTATTTTTTCCACATATTCATGGAATACTATAAAATATATAAATGGATATAATCTTACTCTCTTTGTATTCAGGTTCAGGCCGCACAGTTAAGTTTTCATTTATGACATCGACTTGGCAACCTTGCAG',
                'exon3': 'ATGGGAATACCATATGAACCAGGTGCTTTACAGGCTACTACTGCTGAATCATGGGAGGGTCTACATCGCAAAGTTTCTTTGGCCTTCACCGATCAG',
                'intron3': 'GTTTATTCTTTCTCACTATATTAAGTGAAGCGGTATTTGAAGTAGCTGTCAGGAATCAGCCTGTAGGACTAATTATTTTATAATAATTTTGAATCTTAGATCCAATTGCTTGTTTTTTGCATTTGTTCGGCTTAAGTTACTTGTGATCGATGGGTCTGATAGGGTTATCACATCGTGAACTTCAGTCATCGTATAATGGTAGGATTTTCCATTCCTACACTGCATTATTTTACATACTTCTGGATGTCTGTTTTTTTAATAATCAGAAGATTTATGCGCCCGACTTGCACGTTCGTTTGGTCTCTGCACCGTAGTAGTATTGAGATGGAGAACGGGTTGTAGAACAAGTGTGTATATCCTTTTGGTTCAAAATCATCCACCGTAGGACTAGTACTTCTGAACTTTTGTAGCACGTTTTGTACGGCATTGTAAGTGCTAGTATTTAGAGTGGAGGAAGCATTTTTTATACTTTTCACGTTTACGAATGTAAATATAACCTGACGTTGGTTTACATTTCTGTAATTATGTAAAGGCTCTACTCGTGTGATCATGTACGAAGTGGTGATTTCGTTTATTTTTATGTAGATGGTGCTGTCTTACAAGCCACTGGCGGAGCACTTTAGAGCTGCTGTGCAGCGTAAAGGTTCACGGTTTGGGCAAGTACCAGCCATAGTGCTGACAGTGTTGGATGGCAAGATTACATCGCAGTGCAAAGCAATCGACCTACTTGCGGAAAAAAATCCTGCTTATGCTGCTGCTCTTGTTTGCAGAGCAGCTGCTGCTTGCCCATCCCTCAGGCTATCTGATGATGGCCTGTGGATTGGTCGCTCTGACCCATTTTCTCCTGAACTTGTAACGAAACTTCTTACTGATCGAGAGAACCGAGTTGAATCAGTCGCGTCGGGGAATGGAGATGAGATGCATAATCTGGAAACAACGTGTTCACAGTTCAATAGAGCATTTAGATTTGAAGCAGTATATATTCCTGCATATGTTAAAGAGGAATTAAGGACAATTAGTCTGACTGCATTACCAGATGTTTATTACCTGGATACCATAGCAAATGCACTGGGTCTGCAAGATACAGGACAAGTGTTCATGTCAAGGTGCTGTTTCTTACTATAGGAGTGGGTTTATCCTTGTTTCTTGCGGGTCTGAGATATTATGAATACCGCTGTAGTAAACGTCTCTGTCTTTGTTTCTTGGCAGTTTCGAATGTGATTCGTATCGCCTCGTTGGTGCGTAGGATCTCTTAAGTCTCATTTGTATTGTGCTTTCAG',
                'exon4':'GCAGATTGGAAAGGCACCTGATGGGTCTCGCTTATTTTGGGGAGCACGAGAGAGACTTCCTGTAGTGAAGTTTCCTGATGTTGAGCTGCTCGAGCAATTTGAACCCTCAAAAATTTATCGCGACATTTGTTCCCAGGATGGTCCTATGGAGGCATTGCGAAACACTGGAGGACGGATTTCTAGAAATGAGCGAGCTAGCTTTCTTGTGGGTCATGCTATTGATGGTGATGCTATTCTTGCTACTGTTGGATCCACCTGGACATCTGATGGAAGATGCATCTATGATAACTTCACATTGAGTTGCTTTCATAG',
                'intron4':'GTATGCTTTGTCAAATATCCTTCTTGGTTCTAGAAGCAGTTCATCACTATTTTTGAAGAGAGATTCCTGGAGACCTACGTTCCCGAACATGTGATGGATTCTGCAAGTTTATTTCGATATGCAG',
                'exon5':'GTTAATGTTGGGGAGAAACCCTTGGGGTGCATACAAGCTAAGTAGACGGTCGCTGTTAGTGGCTGATCTGCCAAAGGTCTACACCTGCAGGGCACATTGGCAAGGTGCTAGCCCAAAATCTCTTTTAGCAGATTTTTGTCACCAGCATCGCCTCTCCGAAGCACAGTATACTTGTAACGATACACAGGAGTCCTGCAATAGTACCGCAGGCGGACACTCATTAGAGATAGGTCATACTAAAGGTTTAATGTCCATGAATAACAACGGTATCTCAAAGCAGGGTAATCCTGGAAGTAGTAAGCAGGGGCCTTTTCAGTGCAAAGTTCGTGTAGGCTCTGCGGGAAACAAAGCTCCCACATACTTCCAGTCAGATGGCTTTTTCCGAAGTCGCAATGATGCAATTCAGAGTGCTGCATTGAATGCTCTTTTGAGCTATGGAAGATGGTCTGGAACAGGTTGCCTTTGCAGTTACTTTCAAAACCAAGATTGTTGCAAGTCAAATGGAGATTTCTTGGGTAGTAATCCTCAAGACTCTACTGTTTACAAGTCAGATGAAAGCAGTGGGCAGTCGGAATTTTTGTCATTCAGAGTCATTGCTGAGGAAGATACTCTTGGCGATCGGCCTCCTCCAGGTTCAATGGTATTTGTGAGCTACACTGTAAACTTGATCGACGAAGGATCTTGTTGTAATGGTGATAATTCTTCTGATATGCTATTAATACATGACTTAGAGTCTCAATCAGACTTCAAATTTGAGCTTGGTGTGGGCGCTGTAATTGGCCAAATCGATGCATGTGTTAGTCAGGCCACCGTTGGCCAAACCTTGCAATTTTGTTTACCCGTGGAGGCACTAGGTGTGTTATTTGCAGCCAGCAGCGAACTTGGCGAAAATCGGCAAG',
                'intron5':'GTATGTTATCAGCATTCTCTAACCATATGCCTGTATTGTTTTTCATGCTTCTGCAAAACAGCATGTGGAGGGGTAGTTTTTCACTGTATTAGAGATATACCCTTACATTAATTGAACATATTTGCCGTATATTGTACTGTGGCTTGGTATAG',
                'exon6':'GTCTGGTACTTGAATATACTGTGAAGTTGCTTAAATTTGAAGAAGCGATGGAGGAGAGGATAGAGTCCTCACATTTTGCACCCCCTCTTTCCAAGCAGCGGATAGAGTTCGCTCGCACAATGATAAATGCCCTGGAGGCGAAGACTCTG',
                'intron6':'GTATTGCAACTAACTTAGACTTGGTGACACTTATGAGCTAAATGATGCTTGCCGTTTTTGAAAACTTGAAAAATCGCTTTCAATTTTTATTGATCATCTTTGCAACCACCTTGCATGATATGAGGCAG',
                'exon7':'GTGGATTTGGGATGTGGGTCTGGAAGCCTTCTGGAAGCCCTTTTAAGAGAACCTAATACGCTGGAATATATGATTGGCATTGACATATCTCGGAAAGCCTTGATTCGTGGGGCCAAG',
                'intron7':'GTATGCGGACCCTTGTTCACCAATTATCAGTCACTTGTAATATTCTGGATCGTAGACTATCGGGGGTCTCTAGTTGGTATGAGGCCTCTGGTATTCCTAATTTTGCATTCCTGTAACCTTTCAGCTCCTGAACGAGATCATTGCAGATTGACTGAGTGATATGGAATGACCAAGTTTTTTTTTTTTTTTTTTTTTTTTTTGTGTGTGTTGAGCATTTCAG',
                'exon8':'TCACTGAGTGCAAGTTTAGCGAAACAGAATGCTGCTCATAGCATCCAAAGTATCACTCTCTACGAGGGCTCGATCTCTGCTATGGACTTACGTCTTCGCTCCCCAGATTTGGCAACCTGCATTGAA',
                'intron8':'GTATGTTCTCAGTTCCTCTTAGTTTTAAAATTATATATATATATATATATATATATTTATACATACATATATACATAAAAAAAATCACTTTTCACATTGCTTTTGTGTCTGAATTTGTTTTGTTTGCATCCATTAATGTCCTGTCTTCCACACAAAAGAGTCTCTTTGTAAAATTTGAAAGTGGCGATCGTAGCTTCATTATGAAGTGAGCTCGGCAAATATTATCCTGCCTAACTTGAAGAATGGATGAAG',
                'exon9':'GTGGTGGAGCACATGGATCCAGAACCTTTGCGCAAATTGGGAAAAAGTATACTTGGAAAATTGGTTCCAAAAGTGTGGTTAGTGAGCACGCCAAACATCGAGTACAATCCAGTCATTCGCGGCCTGGAGTGGGACCCTGAAAGCAACAGTCTTAATAAACCCGGTCCTACGGTTTTTCCTGAACTGACCGACTCCAAGAAGCTAATGGACATGGAAACCCAGAACCTCAGAAACCATGACCATAGATTTGAGTGGACACGAGCTGAATTTCGGGAGTGGGCCTCCCTTCTTGCTTCGCAGTACGGTTATCAAGTGCGTTTCGCAGGTGTTGGTGGAGACGGGGAAGACGATGACAACAGTCCTGGCTTCGCCACACAGATTGCTGTTTTTGCACATAATGGTGTGGTCTTCCCTACATTTTGTCAGGAAGCAGGTGTTAGTAAATGCGATGGCTCAGATAGTGTGCCAACAGCTACAGAAGTGGAGATGACAGATAAGCGATTGGAAACTGAACAGGATCCTGTTTCACAGTTAAAGGAACTTTGGCAGTGGACCTCGCCTGCACACCCTGCAGCAATTTTGTAGGGAGGATGTGGTTGTCTCTATGTTTAGAGTTTAGCTTTGATCATGCTTGTTTGTTAAGAGAAGAGCCATACGCTAAAACTCGGTCCCCTCCCCAATTTTTGATCAACGGCGTCCCAAATCATTACATTAGAGTTTTTGTGCCGACTATCATCAGTGGTGGAAAAGATGGGAGTGGTTGAGAAATTTTGACTGCCTGTAAATTGTTAGCATTTTTAAACGATATTCTGAACGGTTGTAGTAAAAGACTCGAGGAAATTATGGCAGCCCCCGTTGCGGGTTTAAGAAAAGGTATTTTTTATATTAACTCCCCATCAGGTGATATAAACTCAGTGCAAATTTTCAGGATCTCGTCAGTGCTAGGTGTTAATTTTAAAACTCCATCCTCAATGAGGAATCGAAACAAATCAAACAAAATCCGAACTCATTCTCAGATATCTGCAGAGAATTTTGATATCTTGTACTGGAAAAATATCTTATATATCTCTGATACACGTTGAAATATTA'  
               }

Pp3c1_370V3_1 = {'exon1':'GAGTGAGCCCGTTTCCATGTACCCTCTGGGAACAGGCTCAGGTCGGGAAGATCAGGCTCCGTCAGAAAGTTCGTTCGTCGAGGACTGGGGAGCAGAGGGGCAAGGGAAGGGGGAGAGGAGGAGGGTGGAAGGCAAAGCGAGCGAGGGAGGGAGGGAGAGAGTGGAGAGAGTGGAGAGGGAGAGAGCCCTCCCCTCGCTCGCTCGCTCACTCACTCTCACAAGAGAGAGAGGCGATCCCCTTCGCCGATGCCAAAACCTCCCTTCCCCAACCTCCTCCTCGTGCTCTCGAATGCCTCGCATTCCCTCTCTCTCTCACTCTCTCTTTCTCTCCCTCTCCGCTCCCCTTTCCAATTTCCGCCGCTGTTCGTGACCTGCCTTCTT',
                 'intron1':'GTAAGGCGCTCTTACCCTTGCCCTAGCTTATCGCTGTGCAAGTAAGTCTGGTGGCCAGTTCCTATGCCATTCTTCGTCGCTGCTCTATACTTTTGTCGCGACAGGAATTTGCGTTGTTGAGTTTTTAGTGGAGGCTTTTTGTTGTTATTGTTGTTGGTGGTGGTGGAGGACGTTTGGGCGAGTGTTAGAATTGTGGTATTGTGAATCTTTTCGTGAGATCGAGGTTTTATAGGGAAGCTCGAGTTGGATTGGGATGGCTGTGAATGCGGGGAATTTCGAGGGGTTGTGAAGGGAAAAGATGGCTGTGTTGTTGTCGTTTTGGTGGTGTAGAGGCGTTTGAAATGCGAGAGCTGAGGATGAGGGAGGTCGCGTTGTTTGGTTTCTGACCTCGCTTCCGAAGGGCGACAGAGTGAGCAGGATTGATTGTGGATAGCTGTCTTTGGAAGAAGAGTTTTTCATAATTTTGATTCAGCACATTTGGTTGGGTCTGGTTTACGGTTGTCAGTGGAATGAGATGGTTTCGCGGGCCCTTGTTGACCTCGTCAAGTTGTTGTGCGTCACGCTAGATTTTGCGCGCATTGGTACTTGCACCGTCTGCGCGGGTCTTTAGGGCTGAGGGCAGGCAAGTGACTGGTACAGTATGTTTCAAGTCATTTCATAGCTTTTTCATTTGTTGAGTCGTTTGCTGCGTATAGAATGTTTTATTTTGTATTTTCATTCAGAAAGGAGTCGACGTTCATGACACGTCGATTTGGACGGGTGCTGATATGATTTGAGGATTTGTATTTCACAACCTCCCATTCATCTCTGTAAAAGGTGGTCATTATATTTGGCTTCCCAGCCCTAACTGCCCGAGACTTTTTCTAACCTTGATTTAGTCCGTGCCCCGATCGTATTCTTTGTGGCGCTCAGCCGATGGTGTTGGTGCCTCGTCTCCTGGTCATCTTGCACCCATCACCATGACGTGTATTCGTATCTTGAGAAATAAATTTATTTGGATTTCCACGATATCCTGCACCCTAACTGCTCCTTAACATGGTGCTCATCTCCCGTTGTTCCCGGTTTTTGGAGACGCTGCTGCAACCCTCTGGGCTTATGATATATCTTGCTTTCTACTCCTCCGCGCCTTTGTGTATACTCTTCTTCGGCATACTCTGATAACTGTCAAGGCTGATTATTGGATTACGTGTGTTACAAGGGTGCATAATTTTTTTTCAACAATAGTGGAATTTGAATTTTACGATTTCCCTTATCATTGGAGTTCCTGCCTCCTGTTTTTTGAAAGGTATCTGTGTTTGGTACCTGCCCTGACTGTGAGGGGTAACCTGATTGTTTTTGGTTCGATTGAACTACTGAAAAGTGATGATCACGGTAAAATCTTGTGTCGTCATAACTGTGAACGATGATTATTTGCTCTCAAAGATGGATTTGTCTTCTGCCTGATTCTACTTCCTTTCCTGGTGTCTGTGATGAGCGGGACGTTATTGTTGTGTTGCTCAAATTATGTGTCATCATCTTGTTTAGCATTCTACACCTGTAGTATATAATAGGGCTGTGAAATGAGGGAATCCATCAATGTCGCTGAATGACTTGCATCTGGACCTCCTGATGTTCTGCTTATGTTAGCAGTTATAATGGTCGATAATATCCAAGGTCTCGTGCAAATAGTTGTGTCGTTCATTTTCCTTTCCGCTTCATCGCTCTTGATAAGCGGTGATTCTTAGTATTTGCTTGTTTTCATCTTAAATTTTTAATTTTAAAACAATGTGATGAACATTTCAGGCTAGCTTGGGTCTTGGTGATGGACCGAGTTTATGAACGTTTTTTCTCGTGTCGTTTGCAG',
                 'exon2':'AATCCGTCTAAGTAGAGTCTGCCCGAATAAAACAACGGAGGTAAAAGTGGAGAAGAAGAGCTCCCTATAAGTGGGGATAACTGTGAGGGAGGGGAATGGACAGCGCTGTATTAGATGATATCATAGTGCGGCTGCTCGAGGTGCGGACTGGGCGGCCGGGCAAGCAAGTGCAGCTCTCGGAGGCCGAAATTCGTCAACTTTGTGTTACGTCTAAGGATATATTTTTGTCACAGCCTAACCTCCTTGAGCTGGAGGCGCCAATCAAGATTTGTG',
                 'intron2':'GTATGTCCTCATTTTACATGGCATTGTGTCTTTGCATTTTTAGACACCAGTGATCATGACGGTTCATGTAAATGCGCAATAGGAGTAAGCTTAACAGAGATGTTTGGTGTTTTGATCAATACAAGCTTGCAACTACTAGCTTTCTATCGTTGATTGATTGGGCAGTAGCTGGATTGAAATTTTTGTATACTCATGTTCTCGAATTTCTAATTTGGAATCCTGGAACAGTGTAACATTTTGCTTGAAATGTACGTTAATACCCAGTGCCAAATTAACGACTTCTAGTTTTTTATTTTATTCTATTTTATTTTATTTTAAGCACCTACTAGTTACGTGCTGTTTCTTTTCTTGAAGCTGGTCACTGACAAGTGTTGGCATGACACTGCAG',
                 'exon3':'GTGACATCCATGGGCAGTATTCCGACCTCTTGAAACTTTTTGAATATGGTGGTTTTCCACCTGAGGCCAACTACTTGTTTTTAGGGGATTATGTTGACAGAGGAAAGCAGAGTTTAGAAACTATTTGCCTTCTCTTGGCATATAAAATCAAGTACCCAGAGAACTTTTTTTTGCTAAGAGGAAACCATGAATGTGCATCAATCAACCGAATATACGGCTTCTATGATGAATGTAAACGTCGTTTCAACATACGGTTGTGGAAGACGTTTACAGATTGTTTCAACTGCCTGCCCGTGGCTGCCTTGATTGATGAAAAGATTTTATGCATGCATGGAGGCCTTTCTCCGGAACTCAAGAACTTGGACCAGATAAAAAAGATTGCTCGGCCAACAGATGTACCTGATGCAGGCTTGTTGTGTGACCTTCTATGGGCGGATCCTGACAAAGATGTATCTGGATGGGGAAACAATGACCGAGGAGTTTCCTTCACATTTGGTCCAGAAACAGTCGCAGAGTTTCTACAAAAGCATGACTTGGATCTTATCTGCCGAGCCCATCAA',
                 'intron3':'GTACTGTGATCAACTCTCAATTTAAGTATAGCTTAAACTTCGAGTTGATTATTCATGAGGATATGTTTAGGAGTGTTTATTTTTAATCCGAAAAAGTTCAGGTCCTTGCTTGAGTATTGGTGGAACCAGTGAGGAGTGCTAGATTTTTCCAGCATTGATTAATTTACTTTTTTCGTTGGGTAGGGATCGTAGATGGTTAAACCGTAATTTAGAGATCTGATAATTAATGACTTGCGTGGGATTCCGCATGCAG',
                 'exon4':'GTAGTTGAGGATGGTTATGAATTCTTCGCCAAGCGCCAGCTGGTTACTATTTTCTCTGCACCGAATTACTGTGGTGAATTTGATAATGCGGGTGCCATGATGAGTGTTGACGATACTCTCATGTGCTCATTTCAGATACTAAAACCAGCGGAAAAGAAGCAAAAGTTTGTCTCTTATGGCAACAATCCGTCACGACCTGGAACGCCTCCTCGTGCCGCCAAG',
                 'intron4':'GTCAGCAGATTTTGAAGCACGAGAAACTTACTTGATCGAGTCACTGATTGTCGCGTGTATGTAAATGTACTATGAATGCATGGAATTGTCCATTCTATGACTTCCACGTAAGATCCTGAAATCGGTGTTTCTGGGGAGGGCCAG',
                 'exon5':'GGATAATGCCAGGTGATGGTCGTGAGCTAGAGAAACCAACCTATAAACAATGAGCGCCTTTGGAGCGGAAGCTTTATACTCTTGAGCTCCGTTGACGGATGTCAGCTACTGATGTGGGCTTGCTCTTTGGAACAGCTTCATCTAGTGGGTGACCTATAAGCAGTCCGCTGCTGAGCTGGAGTTTGTGGCCAGCAGAAAGTTGAGCATGACTTCAGAGACACACATACGTATGCATTCAACGCACGCACAAACTTAGGATCTGTAGTTATTTCGGAAGCTAGCGTGGTATCCTTGGGAGGGCGGTTTCGCACATGTCTTAACTCTCTACCGTTGCTTGAGCAAGAGGTTTGTTAGGGTGGGATTGGGGCTGTTGGGGTGTGATATGTCTGTAGTTGGTGGTCTGCAGCGGAGCGATTATGCCAATCGTAGTTAATGTACATTAGGGAGCGTGAGCACCCGGTTTCAAGTCACTCCCATCTCATTTTCAGTTTAGTCAAACCTATATTGCAATTCGCAATCTGCTTGACTAGAGTTTCTGTGTCGTCGTATTGGCTTAAATGTCTGCCTCGTTGAACATGCCTTTTCTGGTCATCGGTGTGTAGGCACCCCGCTTGGATATTCTCAAATTTCATAGCCCTCTTAGGACGCACCTCTAGTGAGCATCACCCTCGCCTTCCATTTATTTGAATTTTTTTTTTTTAGAATGTGTTCTTCCAGAATCTATTGCAGATCATTATGGTATACCTCATTTTTGACGACAGATCACTGCCACTACGTCAGTGATTTTTTTATCAATTAAGGTACATATGCATTTCCGTACATATACCAGTGTTGGACCATGCAATCAAAGTTAAAACTAAGTACGAGATCGTGTGTATTGTCATTCTGGACTCGTCATCGTTAGATCTATGGCTAAACTCAACCAGACTTCTTTACACATTTTGTGTCGAATCTGTATTCTTTCATCTTGTTTTAGTACTGGTAATGTTTGACAATTCTGAAGTGGAATTCGATGTTTTGCATTC'
                }

Pp3c1_750V3_1 = {'exon1':'ATGAACTCCATTCAATTTGTGCAGGCGATCGGGGCCTGCGCTTGTTCTGACTCCTGGCGTTCGATGTGTTCGAGGGATCAGGCTATTGCTTTTTCGTTCTCGAATTTGGATGTCAAGGCTGCGTGCCTATCCTCTTCTATACGCCTGCCCAAGGCTAGGAGTTCTCGCCCCTCACCAATTTGTAATACGATTGCGCCTTGTCGTGAACCGCTTCGAACTGAAGAATCGACCTCTGCAGTGCCAGCGGCTGCAATGAAGCTTTCTTTTCTACTTGTCTGCGATGCGTTCAACTCTATGACGCAACGGACTTTCCTCAAACTACAAGAGAATGGCCATCAAGTCATCGTGCATGAATGGAAGGATGGTGATACGATGATTGCCACTGTTGATTGTTTCGAGCCCGACGTCATTCTCTGTCCTTTTCTTACAAAACGTATTCCTGAAGCCATTTATAATGGTACCGTTCCCTGTCTTGTAATTCACCCTGGAATAGAAGGCGATCGGGGCATGTCGTCAATTGACTGGGCGCTTCAAGAGTCGCAGGAAGAGTGGGGTGTGACGATCCTGCAAGCTGAAGAGGAGATGGATGCTGGACCGATTTGGGCAACCAAAAATTTTAGAATCAGTCGAGATTTTGGAGAATCACCAACAAAGTCGAGCCTCTACCGAATTGACTGCGTCGACGCCGCTATGAAGTGTCTCGACGAAGTTCTTCATAATCTCAAGTACGACATCGCGCCTAGACCCTTGGATTACACGAACCCTTGTGTTAGAGGGTCGTTGCGGCCAAGAATGATGCAGGCAGATCGCAGACTGGATTGGAATCTTCCTGCCGACGAGCTCGCTGAGATCATCCGCGCCTCGGACTCTCAGCCTGGAACGCAGTCCGTTATTGCTGGCGAGAAGTATTTGCTCTTTGGTGCCCATGTCGAAAGGAATCCGCCGAAGTTGCATCCTGGATCGGCCCCGATGGACTTACTCGGACAGCGAGACGGCGCTGTTTTGATTCGTTGCGGCGAGGGGACAACGCTGTGGATCACCCACTTGAAAAAAACTGTAAAGTCGATCAAACTACCAGCTGTGATGGTGCTGCCAACCTCACTACTGGAAACTCTTCCACGTATCGGCGACCCTGAGATGGAGATTCCAATGAGATCTTGGCCACAAACTTTTCAGGAGATTTTCTACTGGACCCAGAACGGTGTCACTTACTTGTGGTTCGATTTCTACAACGGCGCAATGAATACGGACCAATGCCGCCGTTTGGCACAAGCGCTGGACCACATCGACCAGGTTTGCGATAGCCAGGTCCTTGTCTTGATGGGAGGAATAAATTATTTCAGTAATGGGATTCACTTGAACACAATCGAAGCTGCAACCGATTCCGCCCAGGAGACTTGGGAGAACATCAACGCGATAGATGATGTCGTGCTTCGAATCCTTAACTGCAACAAGCGAGTCACTGTCTCAGCATTTCTAGGAAGTGCTGGAGCGGGTGGCGTCATGGCCGCAATTGCGGCGGATATCGTATGGGCGCACGGAAATGTGGTGCTCCACCCAAGCTACAAGGCCATGGAATTGTATGGGTCCGAGTATTGGACGTACTCCCTACCTAAACGGGTAGGTTTCAAAATCGCAGGTCAGTTCGTGAATTCCACCGACCCAGTGTCAGCATCGCAGGCCAAGAAAGTGGGTTTAATCGACGATATATTGTCAGCTTCCAGCGTTGGCTTTGTGGAGAGTGTGATTGAGCGCGCTGAGAAGCTTGCTCGAGGGCGCCAATGCGATTTGAAAATTGCAGAGAAGAAGAAACAGCTCCCACTTCTTCAAGGCCAACTCCAGCAGCATCGACATGATGAGCTTGTTCAGATGAAAAAATGCTTCGCGTCAGACGATTATAACCGGAAGCGACAAGAATTCGTTTTAAAAATGGCCGGAACCATGCCATGCCAGGTGGATCTTATGCCTGTTAAGGCTCAAGGCGCGATGTTGAAGAGGACCGATGTTCTAATCAGCAGGTAA'}

Pp3c1_1940V3_1 = {'exon1':'CTCGTTTCCACGTGCGACCGTTGCTGGATCTTAATCTCCCCATTCACACCCCAGCTCAGATCGTTTTGCGCACAAGTGGCGCGTTGGGTGTTACGAGATCCTTGCCGTGGTAGCTCGTGGGATGAGTATTGTCACGATCTGGATGAAGTCTGAGCGTCGGGGTTGACTCATCTAGGTACTACGCTCAATCTTGCTTGCTTTATAAGTCGACCTTGTGCTTCACATAGAAATGCGTTGTTGTGTCACTCAAACCGATCTAGTTATTGAAATTTTGTGGCACGCCGGAGATATTTCTGAGTTGAGGTGGGTCTTTCAATTGTTGAGTGGTAGTAAGAGCGTGTGTTCATGTCGGCAAACAGTGCTACCGCTCAAGGACCATGGGGAGGGTCAGGCGGGCACCCTTTCTATGATGGCAGAGGTGATGTTGTCGAGATTGATGTCACCTACACCAACGACCATGTGACCAAGCTGCAAGTCGCGTACGCGGAGAGCACCGGCAGCCGGTGGCACAGTCCTACTCATGGTTCGCATGGCGGTCATGACGAGAAG',
                  'intron1':'GTAACGAATTGAGGACTTCGCTAGCTGACTTTTAATGGATATTTGTGCGCAGTGATCTCTGTTTGTTTCGATGTTGGATGATTGGAGTTGCTGAATCGATGGCTTGTTTTGGGTTAGTAG',
                  'exon2':'ATAACGCTAGATTACCCGGAGGAGTACCTTACGCAAGTAGTGGGCACATACGGGAGGTGTATAAATTCGATCTCCTTCATCACGAACAAGGGGACGTATGGGCCTTTCGGGAACACGGAAGGGGAGGGTTTCGAATCTCCAGCGGATGTGGTTATAGTTGGATTCTTTGGTCGATCTGGTTCCATTATCGACCAGCTTGGAGTGCTCACAATCGAAGCCAGTGTCGACAATGTCCAG',
                  'intron2':'GTTTTGTATCAGTTCTACCATTCTCTTCCCCCAGATTATTTGGATCCCCAACTTCTTCCATATTCTTTCGATCTCATCGTCCTTGAATGTTCATTCTAGATGAAGCATTTCAGAGATGGTATTTTAATTAGTGGAAACATCTAATCGATATTCAG',
                  'exon3':'CTGGACAAGCCTTTGAAAAGTACCGTGATAACGCAAGGACAATGGGGAGGGCATGGAGGATATGACTTCTGCGATGGCAGAGGCGATGTTGTGGAGATCACGGTGAAATATGACGATGAATGCGTGCATTTGTTACAAGCCGAGTACCAGCACAGTGGTGATCGATTCTCAGGTGCTTGTCACGGTGAAGGAGAGGAAGGAGAAGAAGCCAAG',
                  'intron3':'GTAAATGAAATAGGAACAATATCACTCAACTGCTTTACCGAATCTTAAGCTTCAGGTTCCATGAGAATGCACAGAGAGCTTCGAATCCTGCGACAATCATTTTGTTGATATACTTGCTGGCTATTGTTGTGCAG',
                  'exon4':'GTTTCGCTGAACTTCCCAACCGAGCGTTTAATGCAAGTGAAGGGCACTTACGACCCTCGCGGCTATCTGACCTCAATCTCATTGATCACCAACAATGAAACATACGGACCTTTCGGAAACTCCCGAGGACAGCACTTTCAGTCTCTGCCTCATGGTGTTTTGGGGTTCTGTGGCAGAAGTGGTCGAGTGGTCGACCAACTGGGAGTACTCACTTACGTTGAAAACCCTTGGAATTCTCATCTCGACAAAAAG',
                  'intron4':'GTATTGGATTGTGTTGATACTCTTCAATTCTGCGAGTTTCAAGGTTGTTTGTTGATGAAGGCTATGCGCTCTATTGGTTGCATAACTGAAGCACTTGAATGTGTTTGTATGAAATGTAG',
                  'exon5':'CCTGCACGCCTTGAAATCTCGACGGTTGTCAACGGGCCATGGGGAGGCTCCGGCGGACAAGATTTCTATGATGGAAGAGGTGACGTCGTGGAGATTTTAGTGAACTTCAGCAAAGTTGCCGTAACCACGTTGCAAGTGACATATGAACAGTGCGGCACTAGATTCGAGGGCGCTCCTCATGGTGGCGCAGGCGGAGACTCCTGGAAATCCCAAATTGGGATAGGCAAAAATTTAGGTGAAGAATCCAGCAAG',
                  'intron5':'GTATATACTCCTAAACCTGATTTTGATCTGACACGAACGACAATCTAACATTCTGTGGAGCTGCTGCTCCATTACAATTGATTGGCGTTCAACCTCCCATGTGAATGCTTACACTTTGTTTCGCTCTGTGTGGGCAG',
                  'exon6':'CTTTGTCTGGAATTCCCAGAGGAGTTTCTGTTGCAAGTGAAGGGTACATATGGACCAATTCCTTCACGTACATCCGATGCAGTGACATCGCTAACCTTCGTCACGAACAAGCAAACATATGGTCCTTACGGTGTTCCCAGCGGCCAAGAGTTTGAGACTCCTGCAACTGGAGTTGTAGGATTCTTCGGCAAAGCGGGTGCCCGTCTAGATCAACTGGGGGTGTTCACCAAATTTTCTGAAAGTGCTGAATAGTAACGTGATCTTGCAGTGATGTAATTGCACATTTGGAGCCAGTCGCTACTTCAAGCCAAATCAGGTTGTTCTTTGATTATGCGGTGTTGATGGGTGACCTCCCATGCTGCCAGAGTGATGATGTCTATAATACCCACATTTCATAAATAAAATTATGTTCGCTCATCGTGTAAACATGGTTCATTTTGTGAGTTTTATTATCATTCTTGTAACATTGAGATGGCTACGTAATAGACGATTCTAGCATGTGCATCAAAACTCCTTCGTGGTTAGGCTCAAATTAGCTTAGGCTGTTGTGAAGATAACGTGCATTATCTCTGAAACTTGGCATGTTAATCAAGTTTGCTCGTAAGTTAAAGTCTGCAAGCAATGGATTGAAACTGTTCTTCCATCTTCAAGTGCAGCAAAGAAAGAAGAGGCAGTGGCATGATCTGGATTATACCATGAATAAAGAAAGAATTTTTAGTCATCATGCAAAGAGCACAGAGGGGAAGCAGTATGTCGATGTGGCAATGAATAGTAGGAACTCTATAACCATTAAATATGCTTGGA'
                 }

## Challenge Question

#### PAPER CODE FIRST

In modern approaches for gene discovery and genome assembly we use a technology called nucleotide sequencing where we look at small segments of DNA or RNA to determine the length and sequence of genes and genomes.

For gene discovery we often sequence mRNA, as it represents most directly gene sequences that get translated into proteins. This approach is typically used in place of sequencing DNA because it can be unclear where genes are in a sequence of DNA, as there are many non-gene sequences involved in many facets of the cell cycle beyond just the central dogma. Although sequencing mRNA provides a more straight forward product, as it most likely contains an entire gene, it still includes regions that are not part of the coding sequence (CDS: the portion of a gene that is translated into amino acids). These external regions are involved in the signalling to the ribosome and the cellular transportation machinary that translates mRNA into protein. These regions of the mRNA are referred to as UnTranslated Regions (UTR's). These regions play an important role in cell regulation but may not be important for the biological application in question in our experiment.

In order to identify the CDS of the gene, we look for the longest open reading frame (ORF) in the sequence. An ORF is simply a stretch of DNA or RNA sequence that starts with a start codon and ends with a stop codon.

Your objective is to write a function that takes a sequence of DNA, formatted in a dictionary, identify the CDS within that sequence, and translate it to a protein sequence. This includes identifying the start codon (ATG), the stop codon (TAA, TAG, TGA), and translating the regions in between (i.e. MTGCIDSIIDISKKDI-). A dictionary containing the codons and their affiliated amino acids are provided below 

#### In summary, your job is to write a function that takes as an input a sequence of DNA and returns an amino acid sequence from the longest ORF
##### Note: The challenge in this question includes identifying where the start codon is located, as it is not always at the beginning of the sequence. 

In [21]:
from IPython.display import Image

### Problem: 

Given a DNA sequence, find the CDS and translate it.
>
The diagram below gives you a visual explanation of the task. Your goal is to find the black arrow (the start codon), and translate until you find a stop codon.
![A diagram of the provided sequence. At the beginning and end there are thin black lines, representing the UTR's. In the middle is a blue rectangle, which represents the coding sequence. On the left side of the blue rectangle is an arrow pointing to the right, which represents the stop codon and the direction of translation.](img/Example_1.png)



In [80]:
genetic_code = {'TTT':'F','TTC':'F','TTA':'L','TTG':'L',
                'CTT':'L','CTC':'L','CTA':'L','CTG':'L',
                'ATT':'I','ATC':'I','ATA':'I','ATG':'M',
                'GTT':'V','GTC':'V','GTA':'V','GTG':'V',
                'TCT':'S','TCC':'S','TCA':'S','TCG':'S',
                'CCT':'P','CCC':'P','CCA':'P','CCG':'P',
                'ACT':'T','ACC':'T','ACA':'T','ACG':'T',
                'GCT':'A','GCC':'A','GCA':'A','GCG':'A',
                'TAT':'Y','TAC':'Y','TAA':'-','TAG':'-',
                'CAT':'H','CAC':'H','CAA':'Q','CAG':'Q',
                'AAT':'N','AAC':'N','AAA':'K','AAG':'K',
                'GAT':'D','GAC':'D','GAA':'E','GAG':'E',
                'TGT':'C','TGC':'C','TGA':'-','TGG':'W',
                'CGT':'R','CGC':'R','CGA':'R','CGG':'R',
                'AGT':'S','AGC':'S','AGA':'R','AGG':'R',
                'GGT':'G','GGC':'G','GGA':'G','GGG':'G'}

##### Hints:
Use this string to check your output:

In [None]:
'MEEKENGGGVKLSNATKKNKKSNIWRCFRSLDNGYPTVEQVDNHGNVDME\
SALTDKHPTHLVVMVNGLIGSDKDWRFCAKQFLKGFPNDLIVHCSKCNSA\
LATLDGVDVMGSRLADEVISVIQRYPNLQKISFIGHSLGGLIARYAVAKL\
YTQDGTNQASQQNGDVKSVASNDLCSEYISNGKIAGLEPINFITIASPHL\
GSRGHRQVPMFCGVRSLERLGFYTSVIIKRTGRHVYLKDKVNGQPPLLVQ\
MTSDSEDLKFISALQSFKRRVVYANVLSDHLVGWSTSSIRRRSELPKCKN\
LARSGRYPHILKEGAANTTEQEGSMDQEANGHKTRTATMEETMIRGLSKL\
SWERVDVSFKGSKQRYLAHNTIQVNNPWMNSDGADVIQHMIDNFSV'

This should be the longest open reading frame for this gene. This is the output if we put this gene into an [ORF Finder](https://www.ncbi.nlm.nih.gov/orffinder/)

In [108]:
# write your function here

In [110]:
# find the CDS within this test sequence dictionary
ANAE_DN21434_c0_g1_i1_len_1555 = { "DNA_seq_assembly": "GTCAAACAGCCAGGTTTCCCAACAGTCCAACGGCGGCGCCAAAACCCCGACAAAGATTTCATTATTACCCCCTTACATAAGTCAAACAAACTCAGCTCCAGTGGACCCCACAACTAACTAACTAACTAACTACATATATATTCAATCACATTTCATTAGAAGAATAATTATTGTTGCAATCAATCAAATGGAAGAAAAAGAAAACGGCGGTGGTGTAAAATTATCAAATGCTACAAAGAAGAATAAAAAATCAAATATTTGGAGGTGTTTTAGATCATTGGATAATGGTTATCCAACGGTTGAACAAGTTGACAATCATGGCAATGTTGATATGGAATCGGCTTTGACCGATAAACATCCCACTCATCTTGTTGTCATGGTCAATGGCTTAATTGGCAGTGATAAAGATTGGAGATTTTGTGCGAAACAGTTTTTAAAAGGGTTTCCCAATGATCTCATTGTGCACTGTAGCAAATGTAACTCTGCATTGGCTACACTTGACGGTGTTGACGTGATGGGAAGTCGGTTAGCAGATGAGGTGATATCTGTGATACAACGATATCCCAATCTTCAGAAGATCTCTTTTATAGGTCACTCACTTGGTGGCTTAATAGCAAGATATGCTGTTGCTAAGCTTTACACACAAGATGGCACAAATCAAGCATCTCAACAAAATGGAGATGTAAAATCGGTGGCATCTAATGATCTTTGTTCAGAGTATATCTCAAATGGAAAAATTGCTGGATTAGAGCCTATCAACTTTATTACTATTGCATCTCCACATCTTGGTTCTAGAGGACATAGACAGGTCCCGATGTTCTGTGGAGTTAGAAGTCTTGAAAGACTAGGGTTTTATACATCAGTTATAATTAAAAGAACAGGGAGACATGTGTATTTAAAAGATAAGGTTAATGGACAACCTCCTTTGTTGGTTCAGATGACTAGTGATTCTGAAGACCTAAAGTTCATATCTGCTTTGCAGTCCTTCAAGCGCCGAGTTGTTTATGCCAATGTGCTTTCTGACCATCTTGTGGGATGGAGCACATCATCAATTCGGCGTCGAAGTGAGCTGCCTAAGTGTAAGAATCTTGCAAGAAGTGGTAGATATCCCCATATTCTGAAGGAAGGTGCAGCAAACACTACTGAACAAGAAGGCTCTATGGACCAAGAAGCCAATGGTCACAAGACTAGGACTGCGACAATGGAAGAGACGATGATCAGAGGCCTGAGTAAATTAAGCTGGGAACGGGTTGACGTCAGCTTTAAAGGAAGTAAACAAAGATACCTTGCACACAACACGATCCAGGTAAATAATCCGTGGATGAATTCTGATGGTGCTGACGTCATACAACACATGATTGATAATTTTTCAGTTTAGATATAGAAGCTCCCGTAAGTTCATTTGTTCAGTAATATATGGATTGTATATAAACATCTACCCAATTTTTGACTCCCTGTTGTTGTATGAATTAAATATCCATTCGGCTGGTACGTGGTAGAGTAGATACTTTTAGAGTATCATATGAGAAACTGGTAAGTGTCATTTTTGCACCAG"}

# write your code here

# append your amino acid sequence to your test sequence dictionary