In [None]:
%reload_ext postcell
%postcell register

# Dictionaries

Dictionaries are one of the three built-in containers or data structures in Python. 

### Creating dictionaries

If you already know the contents of a dictionary, such as a mapping between state abbreviation and state name, you can create a populated dictionary using the dictionary literal syntax:

In [None]:
states_lookup = {"IL": "Illinois", "MI": "Michigan", "NJ":"New Jersey"} # pre-populated dict
states_lookup["NY"] = "New York" # add entry

states_lookup

If your dictionary needs to be created empty, and you will add values while the program is being executed, you can create dictionaries these common ways:

In [None]:
states_lookup = {} # empty dict
states_lookup["NY"] = "New York" # add entry

states_lookup

In [None]:
states_lookup = dict() # empty dict
states_lookup["NY"] = "New York" # add entry

states_lookup

**Exercise**
Creating an empty dictionary using `{}` or `dict()` produces the same result. Which do you prefer and why?

### Using dictionaries to remember values, which are updated in loops

A very common pattern of dictionary usage is to iterate through a file, a list or a stream and increment a counter for the variable being processed. Think of the Game of Thrones file, where we looped through all the records. Each time we found a character, we either added them to the dictionary or incremented their counter.

In [None]:
data_file_location = "../../datasets/deaths-in-gameofthrones/game-of-thrones-deaths-data.csv"
killers = dict() # dictionary data type

file = open(data_file_location, "r", encoding='utf8')

for line in file:
  tokens = line.split(',')
  if tokens[4] in killers: kill_count = killers[tokens[4]]
  else: kill_count = 0
  kill_count = kill_count + 1
  killers[tokens[4]] = kill_count

file.close()
killers

Notice, in line 8, we are checking to see if a name already exists in the dictionary using the `in` keyword. Python provides a couple of ways we can avoid this.

1. Initiate dictionary using defaultdict(int)
2. In line 8, instead of calling `killers[tokens[4]]`, which will throw an exception if item doesn't exist, call `killers.get(tokens[4], 0)`

Recall that a normal dictionary will throw an error if you ask it for a key which doesn't exist:

In [None]:
test_dict = dict()
test_dict["i_dont_exist"]

### dict.get(key, defaultvalue)

You normally access items in a dictionary using the syntax `dict[key]`. However, you can also use the syntax `dict.get(key, defaultvalue)`. If a key doesn't exist, this syntax will return a default value. This way, you don't have to check if a key exists, insert it first, etc. Here is how the above code will change with this method:

In [None]:
data_file_location = "../../datasets/deaths-in-gameofthrones/game-of-thrones-deaths-data.csv"
killers = dict() # dictionary data type

file = open(data_file_location, "r", encoding='utf8')

for line in file:
  tokens = line.split(',')
  kill_count = killers.get(tokens[4], 0)
  kill_count = kill_count + 1
  killers[tokens[4]] = kill_count

file.close()
killers

### defaultdict

A dictionary created with `defaultdict(int)` will return a zero if a a key doesn't exist. Similarly, if you create a dictionary using `defaultdict(list)`, the dictionary will return an empty list if a key doesn't exist. Here is how our code will change to take advantage of this feature:

In [None]:
from collections import defaultdict
data_file_location = "../../datasets/deaths-in-gameofthrones/game-of-thrones-deaths-data.csv"
killers = defaultdict(int)

file = open(data_file_location, "r", encoding='utf8')

for line in file:
  tokens = line.split(',')
  killers[tokens[4]] += 1

file.close()
killers

**Exercise** Paste the code above, and earlier version of the code (further above) at https://www.diffchecker.com/ to see exactly which lines changed. (be very careful when posting professional code on such websites, you might get in trouble for posting company code on random websites)

In [None]:
test_dict2 = defaultdict(int)
test_dict2["i_dont_existXXXXXXXXXXXXXXXXXXXXXX"]

### Checking to see if a value exists using `in`

Dictionaries work with the `in` operator to check for membership. 

In [None]:
states_lookup = {"IL": "Illinois", "MI": "Michigan", "NJ":"New Jersey", "NY":"New York"}

In [None]:
"IL" in states_lookup

In [None]:
"WI" in states_lookup

**Exercise** Does "HI" exist _as a key_ in `states_lookup`?

In [None]:
%%postcell exercise_025_170_a

#type your answer here

### Iterating through a dictionary

Given a list, you have learned how to pick out specific values, corresponding to specific keys. There are times when you need to iterate (loop) through every entry. Python 3.6 makes this very easy:

In [None]:
states_lookup = {"IL": "Illinois", "MI": "Michigan", "NJ":"New Jersey", "NY":"New York"}

Get a `list` of all keys:

In [None]:
states_lookup.keys()

Get a `list` of all values:

In [None]:
states_lookup.values()

Get a `list` of both keys and values (list of tuples):

In [None]:
states_lookup.items()

In [None]:
for state_code, state_name in states_lookup.items():
    print(state_code, state_name)

Alternative (most languages do this)

In [None]:
for state_code in states_lookup.keys():
    print(state_code, states_lookup[state_code])

#### List of tuples are a common pattern
List of tuples, like you see above, show up quite often. We will see them in the next lecture, when we combine two lists using the `zip` function. Pandas dataframes understand them natively and we can even create dictionaries out of such lists:

In [None]:
dict([('IL', 'Illinois'), ('MI', 'Michigan'), ('NJ', 'New Jersey'), ('NY', 'New York')])

### Datatypes which work with dictionaries

So far, we have only used strings and numbers as keys and values in a dictionary. Dictionaries can be used with most datatypes, including dictionaries themselves (nested dictionaries)

#### Nested dictionaries

In [None]:
demographics_lookup = {
    "homer": {"age":38, "lastname":"simpson"},
    "marge": {"age":36, "lastname":"simpson"},
    "bart": {"age":10, "lastname":"simpson"},
    "lisa": {"age":8, "lastname":"simpson"},
}

In [None]:
demographics_lookup['homer']

Notice how you can access nested dictionaries using a simple syntax:

In [None]:
demographics_lookup['homer']['age']

**Exercise** If we can access an existing value using `demographics_lookup['homer']['age']`, can we create a new entry using `demographics_lookup['barney']['gumble'] = 32`? Why not? (hint, same reason we had to use defaultdict)

#### Tuples in dictionaries
Tuples can be used as keys or values.

In [None]:
tuple_dict = dict()

tuple_dict[(1,2)] = (3,4)
tuple_dict[(5,6)] = (7,8)
tuple_dict[(9,10)] = (11,12)

tuple_dict

#### Lists are not allowed to be keys in a dictionary
Since keys can be changed after they have been created (aka _mutable_), and only immutable objects can be keys for a dictionary, lists can not be keys (but can be values)

In [None]:
list_dict = dict()

list_dict[[1,2,3]] = 4

In [None]:
kids_lookup = dict()

kids_lookup['homer'] = ['bart', 'lisa', 'maggie']
kids_lookup['ned'] = ['rod', 'todd']

kids_lookup

**Exercise** Can dictionaries be used as keys to a dictionary?

In [None]:
d1 = {'name':'homer', 'weight':260}

d2 = dict()
d2[d1] = 45