# Exercise 2: Dictionaries

Author: Laura Gutierrez Funderburk

Created on: April 18 2018

Last modified on: April 21 2018
 
### Abstract

In this notebook, we will walk through examples involving dictionaries. 

I have created a script, called functions.py, containing a number of functions that will be used. Whereas we will eventually practice going from Jupyter to a .py script, for now workshop participants are encouraged to use the functions available and to place the focus solely on dictionaries. 

### About Python dictionaries

Here you can find basic syntax for dictionaries. We identify *keys* and *doors*. We can then define a dictionary as follows:

My_dictionary = {keys : doors}

What is the deal here? Well, if we are interested in accessing a value stored under a particular key, we simply call

My_dictionary[key]

and we will obtain only the information associated to it. 

But that is enough chatting. Let us dive into coding!

### Warm up Exercise

Suppose we have as many people as celestial bodies in our celestials array, and suppose each person gets to go to exactly one one them. Suppose, furthermore, they happen to go in the same order of apperance as in our celestials array. If we ever want to check who went were, a dictionary would do the trick!

In [None]:
# Let us bring back our celestials array
celestials = ['Moon','Sun','Neptune','Mars','Jupiter','Venus']
# Define space_travellers where each entry is a person
space_travellers = ["James","Sonia","Vero","Tom","Lily","Manny"]

# Dictionary
space_dictionary = {space_travellers[i]: celestials[i] for i in range(6)}
print(space_dictionary)

Suppose we do not really need to know where everyone went, but feel rather curious about Lily in particular. We simply call our space_dictionary as follows. 

In [None]:
print(space_dictionary["Lily"])

As it turns out, dictionaries are a powerful tool in extracting specific information within files. We will now move on to more complex examples.

### Dictionaries and files

Let us add a few more files into the picture. Now that we are a bit more comfortable with comprehension lists and extracting particular data from a file, let us know use such data as keys. 

In [None]:
# For now, please just run this command. We will explore it in more detail in later exercises. 
%run -i functions.py

In [None]:
# Store directory location into variable
data_directory = "./DATA/"
# We add one more character to our list of files
ALL_GENE_FILE_DIRECTORY = "./DATA/ALL_GENE_file"

In [None]:
# As before, we access only the files ending in .csv and store its contents into a table
data = store_data_in_table(data_directory,"csv")
# Parsing
disected_data = disect_table(data)
#Parsing
fam_pair = get_families(disected_data)

We can use dictionaries for a number of purposes. 

For example, the elements in our array fam_pair act as identifiers. With the help of the pre-defined function output_entries_in_ALL_GENE_FILE, we can match an identifier with its appropriate value. 

In [None]:
# DICTIONARIES IN ACTION!!
# Use a comprehension list to get only the first member in the pairs we generated. 
# All these entries were found under CLUSTER_A in the data.csv file
CL_A_entries = [item[0] for item in fam_pair]

# Get size of CL_A_entries array
size_A = len(CL_A_entries)

# Define dictionary
CL_A_dictionary = {CL_A_entries[i]:output_entries_in_ALL_GENE_FILE(CL_A_entries[i]) for i in range(size_A)}

In [None]:
# Print specific info using keys
print(CL_A_dictionary["MZ22526841"])
print("\n")
print(CL_A_dictionary['MZ22522073'])

We can also use dictionaries to open different files. 

Suppose I need to use the information in the strings above to extract information from over 16 files. I can manipulate the information on the string to use the appropriate key word to open the file I am interested in. 

In [None]:
#Define dictionary
file_dictionary = {'atroparvus':'atroparvus-EBRO_SCAFFOLDS_AatrE1.fa',
                  'arabiensis':'arabiensis-Dongola_SCAFFOLDS_AaraD1.fa',
                  'culicifacies':'culicifacies-A37_SCAFFOLDS_AculA1.fa'}
# Key words
key_words= {'atroparvus', 'arabiensis', 'culicifacies'}

In [None]:
for key in key_words:
    print(file_dictionary[key])

### Your turn

Now that you are more comfortable with comprehension lists, it is your turn to extract only those identifiers found in the CLUSTER_B column in the data.csv file. 

Recall that we obtained identifiers in the CLUSTER_A column by running the command:

CL_A_entries = [item[0] for item in fam_pair]

In [None]:
#Hint:
for item in fam_pair:
    print(item[1])

In [None]:
# Define CL_B_entries 
CL_B_entries  = [_ _ _]

Get the total number of entries in CL_B_entries

In [None]:
# Size of CL_B_entries
size_B = len(_ _ _)

Define a dictionary CL_B_dictionary where:

keys := elements in CL_B_entries 

values :=  values obtained when running the function output_entries_in_ALL_GENE_FILE(CL_B_entries[i])


In [None]:
CL_B_dictionary = {_ _ _ : _ _ _ for i in range(size_B)}

Access the values associated with the keys MZ22526881, MZ22514750. Print their content. 

In [None]:
# Access specific values within our CL_B_dictionary
print(CL_B_dictionary[_ _ _])
print("\n")
print(_ _ _)

Expand the file_dictionary and the keywords by adding, respectively, the following files and keys:

Files: 'albimanus-STECLA_SCAFFOLDS_AalbS1.fa', 'darlingi-Coari_SCAFFOLDS_AdarC2.fa'

Keys: 'albimanus', 'darlingi'

Then complete the following foor loop. 

In [None]:
# Expand dictionary
file_dictionary = {'atroparvus':'atroparvus-EBRO_SCAFFOLDS_AatrE1.fa',
                  'arabiensis':'arabiensis-Dongola_SCAFFOLDS_AaraD1.fa',
                  'culicifacies':'culicifacies-A37_SCAFFOLDS_AculA1.fa',
                  '_ _ _',
                  '_ _ _'}
# Expand key words
key_words= {'atroparvus', 'arabiensis', 'culicifacies','_ _ _', '_ _ _'}

# Complete value in for loop
for _ _ _ in key_words:
    print(file_dictionary[key])

### Review

In this exercise, we learned basic use of dictionaries. 

We then explored how to use dictionaries to extract and organize data as we need. 

Examples include using data within data files as identifiers, but also as useful tools when opening files. 