# Day 8:  Control, Lists & Dictionaries

## Data File Background
View the file titled `data/PRJNA301554.tsv`. This is a tab-delimited file obtained from the National Center for Biotechnology Information (NCBI) [Sequence Read Archive](https://www.ncbi.nlm.nih.gov/sra). This resources stores genome sequence data collected globally. Here you can obtain information about the sequencing runs that have been deposited for an organism and within an experimental project.  The file contains information about gene expression sequencing (cDNA sequences) from the project numbered [PRJNA301554](https://www.ncbi.nlm.nih.gov/bioproject/301554). 

You do not need to understand all of the columns but we will use these columns:

| Column Header | Column Index |
| ------------- | ------------ |
| BioSample | 0 |
| genotype | 8 |
| time | 9 |
| treatment | 10 |
| Organism | 23 | 
| tissue | 28 |

## Excercise #1
Demonstrate how to get the list of keys from the following code:
```python
capitals = {}
capitals['TX'] = 'Austin'
capitals['CA'] = 'Sacramento'
capitals['GA'] = 'Atlanta'
capitals['NY'] = 'Albany'
```

How do you get a list of values?

## Exercise #2:

We want to sum these two lists of numbers and store the result in a dictionary with the name of the variable being the key but, the following code is not working. Can you fix it?
```python
# The lists to sum.
list1 = [1,2,3,4,5,6,7,8,9,10]
list2 = [2,4,5,6,10,12,14,16,18,20]

# Create the dictionary where we will store the sums.
results = {}

# Iterate through the first list and sum the numbers. 
# Store the growing sum in the dictionary.
for item in list1:
    results['list1'] = results['list1'] + item

# Iterate through the first list and sum the numbers. 
# Store the growing sum in the dictionary.
for item in list2:
    results['list2'] = results['list2'] + item

# Print the results
print(results)
```
The result should be:

```python
{'list1': 55, 'list2': 107}
```

## Exercise #3

Write code that will open the `data/PRJNA301554.tsv` file and print a list of treatments and the number of samples there are for each.

The output should look like this:

```python
{'CONTROL': 143, 'HEAT': 128, 'RECOV_HEAT': 63, 'DROUGHT': 71, 'RECOV_DROUGHT': 70}
```

## Exercise #5.

Use the Pretty Print package to print a more human-readable view of the dictionary. The output should look like the following:
```python
{   'CONTROL': 143,
    'DROUGHT': 71,
    'HEAT': 128,
    'RECOV_DROUGHT': 70,
    'RECOV_HEAT': 63}
```

## Exercise #4

Using the same input file, now create a dataframe that will include sums of both treatment and genotype
The output should look like this when using the pretty print package.

```python
{   'genotypes': {   'Azuenca (AZ; IRGC#328, Japonica)': 119,
                     'Kinandang puti (KP; IRGC#44513, Indica)': 118,
                     'Pandan wangi (PW; IRGC#35834, Japonica)': 118,
                     'Tadukan (TD; IRGC#9804, Indica)': 120},
    'treatments': {   'CONTROL': 143,
                      'DROUGHT': 71,
                      'HEAT': 128,
                      'RECOV_DROUGHT': 70,
                      'RECOV_HEAT': 63}}
```

## Exercise #5
Write code to separate into a list, the genotype into four separate data elements: 
1. The cultivar (e.g. Azuenca)
2. The cultivar abbreviation (e.g. AZ)
3. The IRGC number (e.g. IRGC#328)
4. The subspecies (e.g. Japonica).  

This information is all contained in the genotype string of each line, in the same format. For example, these
```
Azuenca (AZ; IRGC#328, Japonica)
Kinandang puti (KP; IRGC#44513, Indica)
Pandan wangi (PW; IRGC#35834, Japonica)
Tadukan (TD; IRGC#9804, Indica)
```
Should be converted to these:
```python
['Azuenca', 'AZ', 'IRGC#328', 'Japonica']
['Kinandang puti', 'KP', 'IRGC#44513', 'Indica']
['Pandan wangi', 'PW', 'IRGC#35834', 'Japonica']
['Tadukan', 'TD', 'IRGC#9804', 'Indica']
```
Be sure to use the following to accomplish this:
1. use a function named `split_genotype` that 
   - accepts a single string as an argument which contains the genotype string.
   - returns a list of the four elements from that string.
2. Uses the `str.replace()` function to help break the genotype string.


## Exercise #6
Now, lets adjust our code from example #4 to create a dictionary where we include the subspecies count too. This will require that we split the genotype like we did in Exercise #5 to get that information. The output should look like the following:

```python
{   'genotypes': {   'Azuenca': 119,
                     'Kinandang puti': 118,
                     'Pandan wangi': 118,
                     'Tadukan': 120},
    'subspecies': {'Indica': 238, 'Japonica': 237},
    'treatments': {   'CONTROL': 143,
                      'DROUGHT': 71,
                      'HEAT': 128,
                      'RECOV_DROUGHT': 70,
                      'RECOV_HEAT': 63}}
```