In [None]:
# Intro to Dictionaries -- In Class Oct 15
# Your Name
# Date

# Dictionaries #

Dictionaries are data structures made up of key:value pairs. Keys and values can be any python data type with one restriction. 

Keys must be immutable (so, you could use a tuple as a key, but not a list). Values can be any python data type.

How is this different from a tuple of two different kinds of object e.g. (str, list)? A tuple is a sequence, and it is indexed using the built-in numerical index that is common to all sequence data types.

A dictionary is not inherently indexed. Instead, it is indexed using its keys, that is, you use the key to look up the value.

# Let's make a dict() #

We can make a dictionary manually.

my_dict = {} creates an empty dictionary. Do this in the cell below and then print the object to see what it looks like.

In [1]:
my_dict = {}
print(my_dict)

{}


# Add to a dictionary #

We can add items to a dictionary in a few different ways.

Make an empty dictionary called eng2sp. Then add a word pair to the dictionary. In the cell below, add this value and the next four value pairs to the dictionary one at a time. In this dictionary, the English words are our keys and the Spanish words are the values. one/uno, two/dos, three/tres, four/cuatro, five/cinco.

eng2sp['one'] = 'uno'

Once you have populated the dictionary, use a print command to print it out. 

In [2]:
eng2sp = {}
eng2sp['one'] = 'uno'
eng2sp['two'] = 'dos'
eng2sp['three'] = 'tres'
eng2sp['four'] = 'quatro'
eng2sp['five'] = 'cinco'
print(eng2sp)

{'one': 'uno', 'two': 'dos', 'three': 'tres', 'four': 'quatro', 'five': 'cinco'}


# Filling a dict() manually #

We can also define a populated dictionary all at once with this syntax:

```my_dict = {'key':'value','key2':'value2', etc.```

In the cell below, make the eng2sp dictionary all in one statement.

In [12]:
eng2sp = {'one':'uno','two':'dos','three':'tres'}
print(eng2sp)

{'one': 'uno', 'two': 'dos', 'three': 'tres'}


# dict() does not have .append() #

There is NO METHOD analogous to .append() for dictionaries. We either pass in a key:value pair that gets added to our dict() as shown above, or we can update it with the contents of a second dict() using dict.update(dict2).

In the cell below, make a second dictionary containing the word pairs six/seis, seven/siete, eight/ocho, nine/nueve, ten/diez. Add it to your original dictionary using dict.update(dict2)

In [38]:
e2g = {'one':'uno','two':'dos','three':'tres','four':'cuatro','five':'cinco'}
e2g2 = {'six':'seis','seven':'siete','eight':'ocho','nine':'nueve','ten':'diez'}
e2g.update(e2g2)
print(e2g)

{'one': 'uno', 'two': 'dos', 'three': 'tres', 'four': 'cuatro', 'five': 'cinco', 'six': 'seis', 'seven': 'siete', 'eight': 'ocho', 'nine': 'nueve', 'ten': 'diez'}


# Extract a value from a dictionary #

To extract a value from a dictionary, you can use an index. The syntax is like using a slice except that the value in the square brackets is your key.

dict['key']

This is, of course (because python) not the only way to get a value from a dictionary. You can also use the .get() dictionary method:

dict.get('key')

In the cell below, use each of these ways to get the value corresponding to the key 'three' from your English to Spanish dictionary.

In [29]:
print(eng2sp.get('three'))

tres


# Common dict() errors #

If you request a key that is not in the dictionary, then python will raise a KeyError exception. Try this statement with your dictionary and see how the error looks:

```eng2sp['eleven']```

If we use dict['key'] to call a nonexistent key, vs. using dict.get(), we get different results. dict.get() of a nonexistent key will return the value None. Also try this below and print the result.

Using dict.get() would be the preferred way to access a value in a dictionary if you did not want your program to error out over a missing key.

In [31]:
print(eng2sp.get('eleven'))

None


# Keys must be unique #

What happens if you try to add a new key/value pair six/once to your eng2sp dictionary? Try it in the cell below. Print the dict() before and after you add the value.

You won't get an error if you add a non-unique key:value pair to the dictionary, but you'll overwrite the original value.

In [39]:
print(e2g)
e2g['six'] = 'once'
print(e2g)

{'one': 'uno', 'two': 'dos', 'three': 'tres', 'four': 'cuatro', 'five': 'cinco', 'six': 'seis', 'seven': 'siete', 'eight': 'ocho', 'nine': 'nueve', 'ten': 'diez'}
{'one': 'uno', 'two': 'dos', 'three': 'tres', 'four': 'cuatro', 'five': 'cinco', 'six': 'once', 'seven': 'siete', 'eight': 'ocho', 'nine': 'nueve', 'ten': 'diez'}


# Check for existing keys with dict.has_key() #

If you want to be sure you don't overwrite a value in your dictionary, you can check before assigning a key:value pair using a built-in called .has_key(). In the cell below, check your eng2sp dictionary for the keys 'three' and 'eleven'. Print the value returned by the dict method to see what type of value it returns.

In [33]:
if 'three' in eng2sp:
    print(eng2sp['three'])

tres


# Making a dictionary with zip() #

We have already used the zip() command to zip lists together into lists of tuples.

Could we do the same and make a dictionary? Of course! Python has a function for that, dict().

Just like we used list(zip(list1,list2)), we can use the dict() function with the zip() of 2 lists. We can't use it with more than 2.

In the cell below, zip the two lists together into a dict. Of course if you were going to do this in real life, you'd have to be sure your dictionary's keys were all unique first.

In [44]:
english = ["one","two","three","four","five"]
spanish = ["uno","dos","tres", "cuatro","cinco"]
eng2sp = dict(zip(english,spanish))


# Sidebar: unique keys #

How could you tell if a list's keys were all unique? Think back to other  list functions that we know. Here are two lists.

```english = ["one","two","three","four","five"]
english2 = ["one","one","two","three","four","four","five"]```

Can you write a snip of code that would distinguish between a list that has a set of unique keys and one that does not?

# Populate a dict() with a for loop #

Say you have two lists with parallel indices. Using the commands we've learned above, put the key value pairs into the dict using a for loop. zip() is convenient, but that only works if we have the two lists ready made, so we want to think about how to add values to the dict one by one, without manually adding them.

In [54]:
english = ["one","two","three","four","five"]
spanish = ["uno","dos","tres", "cuatro","cinco"]
lis = {}
for i in range(len(spanish)):
    lis[english[i]] = spanish[i]
print(lis)

{'one': 'uno', 'two': 'dos', 'three': 'tres', 'four': 'cuatro', 'five': 'cinco'}


# Populate a dict() from a file #

Let's use the file aamw.txt that you are already familiar with from last week. Remember, the format of the non-comment lines is:

```I	131.1736
L	131.1736
K	146.1882```

To make a dictionary out of this file, you need to open the file, loop over the lines, and if they are non-comment lines, put the molecular weight as the value in the dictionary with the amino acid code as the key. You can do this with a normal for loop with a nested if block.

In [46]:
aamwdict = {}
with open("aamw.txt") as file:
    for line in file.readlines():
        if line[0] != '#':
            aamwdict[line.split()[0].strip()] = line.split()[1].strip()
print(aamwdict)

{'I': '131.1736', 'L': '131.1736', 'K': '146.1882', 'M': '149.2124', 'F': '165.1900', 'T': '119.1197', 'W': '204.2262', 'V': '117.1469', 'R': '174.2017', 'H': '155.1552', 'A': '89.0935', 'N': '132.1184', 'D': '133.1032', 'C': '121.1590', 'E': '147.1299', 'Q': '146.1451', 'G': '75.0669', 'P': '115.1310', 'S': '105.0930', 'Y': '181.1894'}


# Populate a dict() with a comprehension #

We have already made the complement of a DNA sequence in a number of ways. We used the built in string method .replace(). We also used .maketrans() and .translate(). We can also easily use a dictionary to solve this problem.

To make a dictionary of the ATGCTACG string, we could use a dict comprehension:

dnac = "ATGCTACG"
dnacomp = {dnac[i]:dnac[i+4] for i in range(0,int(len(dnac)/2))}

A dict comprehension statement is just like any other comprehension statement, except that what you are generating inside the squirrely brackets is a key:value pair instead of just a value. The syntax is exactly like when we put "one":"uno" manually into a dictionary up above. key:value separated by a colon.

In the cell below, see if you can create a comprehension statement that will put the values from the aamw.txt lines into a dictionary.

In [53]:
with open("aamw.txt") as op:
    op = op.readlines()
di1 = {i[0]:i[2:10].strip() for i in op if i[0] != '#'}
di1

{'I': '131.1736',
 'L': '131.1736',
 'K': '146.1882',
 'M': '149.2124',
 'F': '165.1900',
 'T': '119.1197',
 'W': '204.2262',
 'V': '117.1469',
 'R': '174.2017',
 'H': '155.1552',
 'A': '89.0935',
 'N': '132.1184',
 'D': '133.1032',
 'C': '121.1590',
 'E': '147.1299',
 'Q': '146.1451',
 'G': '75.0669',
 'P': '115.1310',
 'S': '105.0930',
 'Y': '181.1894'}

# Use your dicts to solve familiar problems #

All you'll need to start using your dictionaries is the dict.get() command. In the two cells below, use the dictionaries we saw how to make above.

In the first cell, make the reverse complement dictionary, and then use it to produce the reverse complement of the following sequence:

"ATGCATGCATGC"

In the second cell, make the amino acid molecular weight dictionary, and use it to calculate the molecular weight of insulin, like we did last time with the list of tuples. Because we do have unique keys for each value, this problem is a perfect time to use a dictionary rather than a list of tuples.

"MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN" 

In [58]:
rev = []
dnac = "ATGCTACG"
dnacomp = {dnac[i]:dnac[i+4] for i in range(0,int(len(dnac)/2))}
for i in "ATGCATGCATGC":
    rev.append(dnacomp.get(i))
print(''.join(rev))

TACGTACGTACG


In [59]:
insulin = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN"
weight = 0
with open("aamw.txt") as op:
    ope = op.readlines();
di1 = {i[0]:i[2:10].strip() for i in op if i[0] != '#'}
di1
for a in insulin:
    weight += float(di1.get(i))
print(weight)

ValueError: I/O operation on closed file.