
<a name="top"></a>

# Introduction to Python Programming for Bioinformatics. Lesson 4

<details>
<summary>
About this notebook
</summary>


This notebook was originally written by [Marc Cohen](https://github.com/mco-gh), an engineer at Google. The original source can be found on [Marc's short link service](https://mco.fyi/), and starts with [Python lesson 0](https://mco.fyi/py0), and I encourage you to work through that notebook if you find some details missing here.

Rob Edwards edited the notebook, adapted it for bioinformatics, using some simple geneticy examples, condensed it into a single notebook, and rearranged some of the lessons, so if some of it does not make sense, it is Rob's fault!

It is intended as a hands-on companion to an in-person course, and if you would like Rob to teach this course (or one of the other courses) don't hesitate to get in touch with him.

</details>
<details>
<summary>
Using this notebook
</summary>

You can download the original version of this notebook from [GitHub](https://linsalrob.github.io/ComputationalGenomicsManual/Python/Python_Lesson_4.ipynb) and from [Rob's Google Drive]()

**You should make your own copy of this notebook by selecting File->Save a copy in Drive from the menu bar above, and then you can edit the code and run it as your own**

There are several lessons, and you can do them in any order. I've tried to organise them in the order I think most appropriate, but you may disagree!
</details>


<a name="lessons"></a>
# Lesson Links

* [Lesson 4 - Dictionaries](#Lesson-4---Dictionaries)
  * [Dictionary Operations](#Dictionary-Operations)
  * [Rule of thumb for truth value of lists, and dictionaries](#Rule-of-thumb-for-truth-value-of-lists,-and-dictionaries)

Previous Lesson: [GitHub](Python_Lesson_3.ipynb) | [Google Colab](https://colab.research.google.com/drive/1NbWawPfWAQV2x56rG0SvcMNpXI7sn3R0)

Next Lesson: [GitHub](Python_Lesson_5.ipynb) | [Google Colab](https://colab.research.google.com/drive/1VmGd4AAb1fBKOjmemYIKnPgu58xGE5so)


# Lesson 4 - Dictionaries

A dictionary is an organized collection of key/value pairs.
The data is organized for quick access via the key, somewhat like a real dictionary, where words are the keys and their definitions are the associated values.

Dictionaries are defined using curly braces with key:value pairs separated by commas, like this:
```
websites = {
  'Flinders': 'https://www.flinders.edu.au',
  'EdwardsLab': 'https://edwards.flinders.edu.au',
  'FAME': 'https://fame.flinders.edu.au'
}
```

This data type is known by various names in other languages:
- map (C++)
- hash (Java)
- associative array (generic term)

This object type is extremely powerful for representing indexed data. The keys in a dictionary are arranged to facilitate fast lookup by key value.
- They are optimized for direct, not sequential, access
- There is no implied order of keys or values
- You can't index a dictionary by position
- But you can index dictionaries very quickly by key value, as we’ll see
- You can't take slices of a dictionary
- Dictionaries, like lists, can grow, shrink, or change over time

- Dictionary keys must be something that can't change (e.g., string, number, tuple) because changing keys on the fly would confuse the dictionary.
- Dictionary values can have any type

[Python's dictionary documentation](https://docs.python.org/3/library/stdtypes.html#mapping-types-dict)

[Rob's introduction to programming dictionaries](https://youtu.be/uW8-HkmNq4Q?si=zhzlUxGut6ARE26K) (but because it's Java they are called hashes).


## Dictionary Operations


In [None]:
# Create an empty dictionary (use curly braces instead of parens or square brackets)
genetic_code = {}
print(genetic_code)

In [None]:
# Create and initialize a dictionary
genetic_code = { 'UUU' : 'Phe', 'UUA': 'Leu' }
print(genetic_code)

In [None]:
# If the same key occurs multiple times, python only keeps the last value
x = { 'UUU' : 'Leu', 'UUU' : 'Phe' }
print(x)


In [None]:
# but the same value may appear any number of times.
x = { 'CGA' : 'Arg', 'CGC' : 'Arg', 'CGG' : 'Arg', 'CGU' : 'Arg' }
print(x)

In [None]:
# Get the size of a dictionary (returns number of key/value pairs)
genetic_code = { 'UUU' : 'Phe', 'UUA': 'Leu', 'CGA' : 'Arg', 'CGC' : 'Arg', 'CGG' : 'Arg', 'CGU' : 'Arg' }
print(len(genetic_code))

In [None]:
# Retrieve the value associated with a given key
genetic_code = { 'UUU' : 'Phe', 'UUA': 'Leu', 'CGA' : 'Arg', 'CGC' : 'Arg', 'CGG' : 'Arg', 'CGU' : 'Arg' }
codon = 'UUU'
amino_acid = genetic_code[codon]
print(f"The translation of {codon} is {amino_acid}")

In [None]:
# The value inside the square brackets may be a literal, a variable or any
# arbitrary expression. Similar syntax to list indexing but key based,
# not positional.
# Attempting to retrieve a non-existent key causes an error
genetic_code = { 'UUU' : 'Phe', 'UUA': 'Leu', 'CGA' : 'Arg', 'CGC' : 'Arg', 'CGG' : 'Arg', 'CGU' : 'Arg' }
codon = 'AAA'
genetic_code[codon]

In [None]:
# play it safe by testing for key existence before access
genetic_code = { 'UUU' : 'Phe', 'UUA': 'Leu', 'CGA' : 'Arg', 'CGC' : 'Arg', 'CGG' : 'Arg', 'CGU' : 'Arg' }
codon = 'AAA'
if codon in genetic_code:
    amino_acid = genetic_code[codon]
    print(f"The translation of {codon} is {amino_acid}")
else:
    print(f"We didn't find a translation for {codon}")

codon = 'UUU'
if codon in genetic_code:
    amino_acid = genetic_code[codon]
    print(f"The translation of {codon} is {amino_acid}")
else:
    print(f"We didn't find a translation for {codon}")

# When used with dictionaries, the in operator only checks the existence
# of keys, not values. You can also use “not in” to test for non-existence
# of a key.

In [None]:
genetic_code = { 'UUU' : 'Phe', 'UUA': 'Leu', 'CGA' : 'Arg', 'CGC' : 'Arg', 'CGG' : 'Arg', 'CGU' : 'Arg' }
# loop through a dictionary (this iterates over the dictionary keys)
for codon in genetic_code:
    amino_acid = genetic_code[codon]
    print(f"The translation of {codon} is {amino_acid}")

## Nested Dictionaries
Just like we can have nested if statements, nested loops, and nested lists, we can also have nested dictionaries.
- dictionary of lists:  ```{'key1': [1,2], 'key2': [3,4]}```
- dictionary of dictionaries:
```
{
    'key1' : {
      'key1' : [1, 2],
      'key2' : [3, 4]
    }
    'key2' : {
      'key1' : [1, 2],
      'key2' : [3, 4]  
    }
}
```
This can get arbitrarily complex (dictionaries of lists of dictionaries of...).

If you want to explore nested dictionaries in more detail, have a look at [Marc's Python lesson 5](mco.fyi/py5), which covers these concepts in depth.



## Rule of thumb for truth value of lists, and dictionaries

All of these objects may be used as boolean values. The rules for converting a list, or map into a boolean value are as follows:
- if the object is empty, it evaluates to False
- if the object is non-empty, it evaluates to True

[Return to the lesson listing](#lessons)

[Return to the top of the notebook](#top)
