##### 30 Oct 2019 

# Dictionaries

This notebook introduces a new type of "container"

A container is a collection of other objects.  We've already seen one type of container
* a list is an ordered collection
* access items in a list by their _location_: `a[0]`, `a[1]`, _etc_

Note that a container is itself an object, which means we can call methods to perform operations on the collection
* example: to add an item to a list call the `append` method:  `a.append(42)`

Python has several other types of containers
* **dictionary** (access items by _name_)
* **tuples** (fixed size, immutable)
* **sets** (unordered, no duplicates)

An important part of programming it to learn which "tools" are available and to choose the right one for the job...

## Review of Lists

A list is a common way to organize collections of objects in Python 

In [23]:
spectrum = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']

Some operations on lists are defined by built-in functions

In [24]:
len(spectrum)

7

Other operations are performed by calling a method

In [3]:
spectrum.append('black')

In [4]:
spectrum.insert(0, 'white')

In [5]:
print(spectrum)

['white', 'red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet', 'black']


### Contents of Lists 

We can make lists of integers:

In [6]:
fib = [1, 1, 2, 3, 5, 8, 13]

We can make lists of lists:

In [7]:
house = [ [8, 7], [10, 10], [9, 13], [18, 28], [4, 7] ]

One thing we didn't mention:  Python doesn't care if we mix the kinds of objects in a list

In [8]:
a = [42, 'three', 6.7, [ 'red', 0] ]

In [9]:
a

[42, 'three', 6.7, ['red', 0]]

## List Index 

The main thing that distinguishes lists from other kinds of collections:
* lists are **ordered**

Items in a list are linear sequences of items
* the first item is at location 0
* the last (in a collection of $n$ items) is at location $n-1$

The syntax for accessing an item in a list the list name followed by a location written in brackets

In [10]:
print(spectrum)

['white', 'red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet', 'black']


In [11]:
spectrum[1]

'red'

In [12]:
spectrum[4]

'green'

A negative index specifies a location from the end:

In [13]:
spectrum[-1]

'black'

In [14]:
fib[-1]

13

## Aside:  Using `import` for Shared Code and Data

For the examples below I created several dictionaries and put them in files

To show how they work I'm going to import them into the IPython session for this notebook

### Importing from a Library 

We have been using Python's `import` statement to load items from various libraries

In [15]:
from math import pi, cos

In [16]:
from random import randint

The same statement is used to import definitions from our own files

### Importing from a File 

We can also import functions and variable definitions from our own Python files

The rules:
* the file you want to import from must have a name that ends with `.py`
* the file must be in the same folder as the program doing the importing (_e.g._ in the same folder as a notebook)
* every statement in the file is executed, then names defined in the file are made available to the notebook

### Example:  Load a List from a File 

A file named `example_lists.py` has the definitions of various lists of integers:
* download from the Data section of Canvas, save in the same folder as this notebook
* if you're using JupyterLab you can open that file in a separate panel

In [30]:
from example_lists import primes

ImportError: cannot import name 'primes' from 'example_lists' (/home/jovyan/Bi410/example_lists.py)

In [31]:
primes[3]

NameError: name 'primes' is not defined

In [20]:
primes[4:7]

NameError: name 'primes' is not defined

#### Important Note 

Notice how the file name appears in that `import` statement: it is **not a string**, and it **does not include the `.py` extension**

#### A Note for the Future 

It is possible to create your own library and use it in several different projects
* _e.g._ make a folder named `Python/MyLibrary/foo` in your home directory
* use `import foo` from any of your programs -- they don't have to be in the same folder as the library
* there are a lot of fuzzy details, however, so we won't attempt it in this course

### Example:  Load all the Lists in the File 

We can also import all the items defined in a file:

In [21]:
from example_lists import *

ModuleNotFoundError: No module named 'example_lists'

In [None]:
fibonacci[:5]

In [None]:
catalan[:5]

## Dictionaries 

Another kind of collection in Python is called a **dictionary**

The difference between a dictionary and a list:
* items in a list are accessed by **location**
* items in a dictionary are accessed by **name**

Example (from _PCfB_):  we can make a dictionary to hold molecular weights of amino acids
* the dictionary name will be `mol_wt`
* to get the weight of an amino acid, write `mol_wt[x]`
* with a dictionary, the index is the name of an amino acid, not its location in the list

This is how we look up the molecular weight of Alanine (abbreviation: `A`):

```
>>> mol_wt['A']
89.09
```

### Example Dictionaries 

Before we see how to create a dictionary we'll look at some motivating examples that show how dictionaries are used 

A file named `example_dictionaries.py` has the definitions of several dictionaries
* download it from Canvas, save it in the same folder as this notebook

### Example:  `mol_wt` 

This statement imports the `mol_wt` dictionary:

In [22]:
from example_dictionaries import mol_wt

ModuleNotFoundError: No module named 'example_dictionaries'

In [None]:
mol_wt['A']

In [None]:
mol_wt['P']

If we didn't have a dictionary to store this data we'd have to use a list and remember where each value was stored in the list

### Example: `rgb` 

A dictionary named `rgb` has a set of RGB color values

In [None]:
from example_dictionaries import rgb

In [None]:
rgb['red']

In [None]:
rgb['green']

### Example: `cipher` 

A simple technique for encrypting a string is called a **substitution cipher**
* define a **key**, a set of rules for substituting letters

Example key:  `H` ➔ `N`, `E` ➔ `T`, `L` ➔ `Q`, and `O` ➔ `J`

Using this key we would encrypt `HELLO` as `NTQQJ`

In [None]:
from example_dictionaries import cipher

In [None]:
cipher['H']

In [None]:
cipher['E']

In [None]:
message = ''
for ch in 'HELLO':
    message += cipher[ch]
print(message)

## Dictionaries are Tables 

One way to think of a dictionary object is that it's like a table with two columns

| color | hex | 
| -- | -- |
| `red` | `#FF0000` |
| `green` | `#00FF00` |
| `blue` | `#0000FF` |

When we write an expression like `rgb[x]` we're asking Python to find the row that has `x` in the left column and tell us corresponding value in the right column.

### Name vs Location 

When Python evaluates
```
a[x]
```
it first asks "what is `a`?"

* if `a` is a **list** or a **string**, `x` is expected to be a number that corresponds to a **location** in `a` and the value is the item at that location

* if `a` is a **dictionary** `x` is expected to be the **name** of an item in the dictionary and the value is the item associated with that name

### Terminology 

In some languages dictionaries are called **maps** or **association lists**

Phrases involving "map" or "associate" are often used to describe dictionaries
* "`rgb` maps a color name to its color code"
* "`rgb` is a mapping from names to color codes"
* "`rgb` associates a color with its code"

The left sides of the name-value pairs are called **keys**
* the items in a dictionary are also called **key-value** pairs

## How to Make a Dictionary 

A dictionary looks like a list, except
* it's defined with braces ("curly brackets")
* items are **name-value pairs** separated by colons

This example shows how to create a dictionary that associates a distance name with its value, in feet:

In [None]:
dist = { 'yard' : 3, 'fathom' : 6, 'furlong' : 660, 'mile': 5280 }

In [None]:
dist['yard']

### Use Multiple Lines if Necessary 

We can write a dictionary (or list) definition across several different lines if we want

This is a common format:

In [None]:
dist = {
    'furlong': 660, 
    'mile':    5280, 
    'yard':    3, 
    'fathom':  6,
}

Note how there is a comma after the last item
* Python doesn't care about the extra comma
* programmers often put commas at the end of every line because it makes it easy to extend the list by adding a new line

## Accessing Items in the Dictionary 

Here are some expressions that use our new dictionary:

In [None]:
dist['yard']

In [None]:
dist['mile']

In [None]:
dist['mile'] / dist['furlong']

### Missing Names 

This is what you'll see if you try to look up a name that is not in the dictionary:

In [None]:
dist['chain']

## Adding Items to a Dictionary 

Simply use an assignment statement to add a new item

In [None]:
print(dist)

In [None]:
dist['league'] = 15840

In [None]:
dist['inch'] = 0.83

In [None]:
print(dist)

### Another Way to Make a Dictionary

A very common "pattern" for programs that use dictionaries
* define an empty dictionary
* use a loop to create data, _e.g._ by reading records from a TSV file
* assignment statemements in the body of the loop to add items one a time

### Example: Genetic Code

The genetic code is a mapping
* we want to associate each 3-letter DNA codon with the 1-letter symbol of the amino acid it encodes

Examples:
```
TTT ➔ F   (Phenylalanine, or Phe)
TTC ➔ F
TTA ➔ L   (Leucine, or Leu)
TTG ➔ L
TCT ➔ S   (Serine, or Ser)
...
```

We want to create a dictionary named `gc` that associates each codon with its amino acid letter:
```
{ 'TTT': 'F', 'TTC': 'F', 'TTA': 'L', ... }
```

There are 4 possible letters at each position, and 3 positions, so the total number of codings is $4^3 = 64$

### Building the Genetic Code Dictionary 

The complete code is in a file named `genetic_code.csv` (you can download it from our server):

In [None]:
! head -5 genetic_code.csv

#### Plan 

* initialize an empty dictionary
* use a loop to read each record from the file
* use split to break the line into parts
* add a new item to the dictionary, with the first part as the key and the second as the value

#### Sandbox 

In [None]:
line = open('genetic_code.csv').readline()

In [None]:
line

In [None]:
line.strip().split(',')

In [None]:
codon, aa = line.strip().split(',')
print(codon, '->', aa)

#### Code 

In [None]:
gc = { }
with open('genetic_code.csv') as f:
    for line in f:
        codon, aa = line.strip().split(',')
        gc[codon] = aa

In [None]:
gc['ATG']

In [None]:
gc['TTA']

In [None]:
gc['TAA']

In [None]:
print(gc)

## Operations with Dictionaries 

Use `len` to find out how many items are in a dictionary

In [None]:
len(gc)

Methods named `keys` and `values` make lists of names and values in a dictionary

In [None]:
list(dist.keys())

In [None]:
list(dist.values())

Use `in` to see if a name is a key in the dictionary:

In [None]:
'yard' in dist

In [None]:
'chain' in dist

## Iterating Over a Dictionary 

We've seen several examples of how to use a `for` statement

Iterate over a list:

In [None]:
for x in ['alpha', 'beta', 'gamma']:
    ...

Iterate over a range:

In [None]:
for x in range(10):
    ...

Iterate over a string:

In [None]:
for x in 'aloha':
    ...

Iterate over a file:

In [None]:
for x in open('genetic_code.csv'):
    ...

### Dictionaries in `for` Statements 

A dictionary is a collection, so it's not surprising we can iterate over a dictionary, too:

In [None]:
for x in dist:
    print('There are', dist[x], 'feet in a', x)

**Notes**
* `x` is assigned one of the keys
* the loop body is executed, then Python gets another key and repeats
* the loop ends after all keys have been used
* the order is undefined -- all we know is that the loop will be executed once for each key-value pair