# Bioinformatics Introduction to Coding

## Programming Basics 3

### Last lesson recap:

- functions vs. methods
- guanine/cytocine percentage exercise
- complimentary nucleotide exercise
- package management

If any of the topics in the recaps are fuzzy to you, remember you can always go back through the notebooks and practice again. You should use them as a resource to familiarize yourself with the basics!

### Coming up this lesson:

- lists & indexing
- codon selection exercise
- dictionaries
- codon dictionary exercise

## Lists & Indexing

Storing individual numbers or strings is nice, but often scientists work with larger sets of data. It turns out you can store multiple pieces of information together in an object called a **data structure**. Some of these structures are more complicated and we'll learn about them later, let's start with the most simple type of data structure you'll see in Python... **lists**!

A **list** is just a collection of pieces of data, stored one after the other... kind of like an actual list!

In [None]:
# stick a bunch of comma-separated objects between square brackets like this and voila, you made a list
my_list = [1,2,3,4,8,41,5,9,5,0]
print(my_list)

In [None]:
# items in a list might even be different data types entirely
# here we have floats, ints, and a string
another_list = [27.2,5,9,"random string"]
print(another_list)

In [None]:
# we can store values generated from other functions we might want to use
from random import random
random_list = [random(),random(),random(),random(),random(),random()]
print(random_list)

Having information grouped together can be VERY useful, but we need a way to specify individual items so we can do things with the data. You do this by specifying an **index**, or position, of what you want.

In [None]:
# Python starts counting at 0, so when we say the 0th thing that will be the initial item
# we place the index value in brackets after the data structure
print(my_list[0])
# we can also specify other positions
print(my_list[1])
print(my_list[5])

In [None]:
# we CANNOT specify an index that doesn't exist since we'll get an error...go ahead and try it anyway
print(my_list[10])

In [None]:
# Python is especially neat because it lets you count from the end of things too, using negative numbers
# The index -1 will give us the first item at the end of the list, -2 the second from the end, and so forth
print(my_list[-1])
print(my_list[-2])

### Slicing

Perhaps you want to grab multiple items from a list at once. In Python this is called **slicing**, and you will use two indexes for this. The first index tells us where to start slicing and is inclusive. The second index tells us where to stop and is exclusive. The consequences of this inclusive-exclusive system are a little unintuitive, but you'll get the idea after some examples.

In [None]:
# the indexes are separated by a colon :
# first=inclusive, second=exclusive
# here's a few different examples
# this should give you the 0th, 1st, and 2nd items
print(my_list[0:3])
# this should give you the 2nd - 6th items
print(my_list[2:7])
# the entire list for reference
print(my_list)

In [None]:
# remember how sometimes functions you can have optional arguments? You can do that with slicing too
# if you include a 3rd number, it lets us specify the step size (or how many items we want to skip)
# this means we'll print every second item
print(my_list[::2])
# this one means we start at the beginning, stop at the 9th index, and only include every 3rd item
print(my_list[0:9:3])

### List methods

One last useful thing to know about lists. They have their own methods and that lets us use them dynamically, so the contents can change as we need them to. Collected new measurements? Add them to the old. Already used a particular piece of data and want to remove it from a collection of other data? Get rid of it!

Let's go over a few of the more popular list manipulation methods and then you'll do an exercise to practice.

In [None]:
# remember that list of random numbers from earlier? Let's see how many numbers we had i.e. length of the list
print("Number of values in list:", len(random_list))
# your advisor says your sample size needs to be larger, let's generate more numbers
random_list.append(random())
random_list.append(random())
random_list.append(random())
print("Number of values after appending:", len(random_list))

In [None]:
#pop takes the item at a given index and returns that value
#useful so you can store it in another variable or whatever
print("Number of values in list:", len(random_list))
current_measure = random_list.pop(3)
print("Number of values in after popping:", len(random_list))
print("Value of current measure", current_measure)

Remember when I mentioned that some methods and functions will mutate the objects they interact with? `append` and `pop` are examples of such methods; they implicitly **mutate** (change) the list they are attached to, which is why we don't need to re-assign the list into a variable.

The `pop()` method is also good at demonstrating the difference between a function **side effect** and a function **return value**. As a side effect, `pop()` removes the item at the given index from the list. `pop()` also **returns** the value it just removed. This is why our `current_measure` variable contains just a single value, rather than the entire list, and why `random_list` has one fewer item even though we did not re-assign it.

## Codon Selection Exercise
I'm about to say something that is gonna blow your mind.

Ready?

A string is just a list of characters. (An immutable list of characters, to be pedantic)

![blow-mind-mind-blown.gif](attachment:blow-mind-mind-blown.gif)


That means you can use index values to slice a string!

I'm going to give you a string of DNA data below, and you're going to find some codons for me.

In [None]:
# here's some DNA for you
DNA_data = "CTTTGCCCACGCACCTGATCGCTCCTCGTTTGCTTTTAAGGACCGGACGAACTACAGAGCATTGGAAGAATCTCTACCTGCTTTACAAAG"
# you can specify which characters you want, and then save them to a new variable name
first_codon = DNA_data[0:3]
# now we have a string of just three nucleotides...a codon!
print(first_codon)

In [None]:
# select four contiguous codons from the sequence and save them to these variables

codon1 = DNA_data[]
codon2 = DNA_data[]
codon3 = DNA_data[]
codon4 = DNA_data[]

# we'll use these later, so make sure to double check the contents of each variable so they're what you want
print(codon1, codon2, codon3, codon4)

## Dictionaries

A list of data is cool and all, but sometimes it's better to store information in an explicitly linked way. Let's say you've always got some sort of identifying information that you'll want to associate with additional info: student IDs and students, gene IDs and sequences, etc. For situations like this, what you want is a **dictionary** (called `dict` in Python). Sometimes called associative arrays in other languages, Python dictionaries store **key/value** pairs. You can access the dictionary with a key and out pops the value! Let me show you what I mean:

In [None]:
# first, let's make a simple dictionary
# instead of using [] like a list we use the curly brackets {}
# keys are separated from their value by a colon :
example_dictionary = {"key1":10, "key2":15, "key3":20}
#now let's specify some keys and see what we get
print("Our first key value is", example_dictionary["key1"])
print("Our third key value is", example_dictionary["key3"])

There are lots of useful instances where we might use associated values like this in a biological context. One that's especially useful for molecular biology is the association between RNA codons and amino acids. Let's make another dictionary for that, then you'll write some code to wrap things up for this notebook.

In [None]:
# code courtesy of Juris Laivins's github
# make sure to give credit where credit is due!

#notice that our RNA below doesn't have any Ts because thymine is replaced with uracil in RNA
RNA_Codons = {
    # 'M' - START, '_' - STOP
    "GCU": "A", "GCC": "A", "GCA": "A", "GCG": "A",
    "UGU": "C", "UGC": "C",
    "GAU": "D", "GAC": "D",
    "GAA": "E", "GAG": "E",
    "UUU": "F", "UUC": "F",
    "GGU": "G", "GGC": "G", "GGA": "G", "GGG": "G",
    "CAU": "H", "CAC": "H",
    "AUA": "I", "AUU": "I", "AUC": "I",
    "AAA": "K", "AAG": "K",
    "UUA": "L", "UUG": "L", "CUU": "L", "CUC": "L", "CUA": "L", "CUG": "L",
    "AUG": "M",
    "AAU": "N", "AAC": "N",
    "CCU": "P", "CCC": "P", "CCA": "P", "CCG": "P",
    "CAA": "Q", "CAG": "Q",
    "CGU": "R", "CGC": "R", "CGA": "R", "CGG": "R", "AGA": "R", "AGG": "R",
    "UCU": "S", "UCC": "S", "UCA": "S", "UCG": "S", "AGU": "S", "AGC": "S",
    "ACU": "T", "ACC": "T", "ACA": "T", "ACG": "T",
    "GUU": "V", "GUC": "V", "GUA": "V", "GUG": "V",
    "UGG": "W",
    "UAU": "Y", "UAC": "Y",
    "UAA": "_", "UAG": "_", "UGA": "_"
}

In [None]:
# let's double check that our dictionary will show us some output before you do some coding yourself
print(RNA_Codons["GCU"])
print(RNA_Codons["UUA"])
# if we ask the dictionary for a key we didn't specify, we'll get an error
# can't use those T's
print(RNA_Codons["TTA"])

## Codon Dictionary Exercise

Okay, time to put the pieces together. You have codons that you've defined. You also have a dictionary which takes codon information and returns amino acids. To finish up this notebook, you're going to create your very first peptide chain! Feel free to drop some feedback at the end of the notebook as per usual.

In [None]:
# your first problem is that some of your codons include thymine, those codons won't be dictionary keys
# you can look at the contents of your codons and replace them like this
print(codon1)
# finish this line so you assign the new value
 = codon1.replace("T","U")
# and then you can double check to make sure everything looks okay
# sometimes I like to leave print statements like this where I can just comment them out until I need them
#print(codon1)

In [None]:
# clean up your other codons here; note that not all of yours might need this since each person's can be different
# still, it doesn't hurt if you do it to codons without T either!


In [None]:
# Okay, so you have 4 codons which you can use as keys for our dictionary we made
# let's try it out here
print(RNA_Codons[codon1])

In [None]:
# great job! Except...we need a place to stick that amino acid
# Printing something to your screen is a bad way to store data
# let's make an empty list, nothing in it yet
peptide = []

# you know that 'RNA_Codons[Codon1]' gives you an amino acid from the dictionary
# use one of the functions you've learned to append that amino acid to peptide
peptide.


In [None]:
# Now do the same thing for the 3 other codons, and print your list when you're done to finish up the workbook!
