# Introduction to Dictionaries

Dictionaries are sort of like lists, except that we access them with a key, rather than with the index. A key can be a number of different objects: a string, a number, or even a tuple (which we will talk about in a moment).

Dictionaries are within "curly braces"-- `{}` -- and each key is separated by the value with a colon.

The following creates a new dictionary, and then shows how to add or edit entries.

In [3]:
basketball_wins = {'Purdue': 5,
                   'IU': 2,
                   'Northwestern': 0}

# To add a new entry
basketball_wins['Michigan'] = 5

# The same syntax updates an existing entry
basketball_wins['Purdue'] = 6

print(basketball_wins)

{'Purdue': 6, 'IU': 2, 'Northwestern': 0, 'Michigan': 5}


Note that when we print the dictionary, it may be in a different order than how we put items into it. While lists maintain the same order, dictionaries are "unordered". This is why you can't access an item in a dictionary by an index number.

Rather, you access the data associated with a key by entering the name of the key.

In [5]:
basketball_wins['Purdue']

6

In [6]:
# But you get an KeyError if it doesn't exist

basketball_wins['Wisconsin']

KeyError: 'Wisconsin'

One approach to deal with this is to use the `get` method. It will return a default value if the key doesn't exist.

In [7]:
basketball_wins.get('Wisconsin', 0)

0

While the keys must be unique, the values can change. The following code takes in a string and counts the letters in it.

In [8]:
string = """
I have been one acquainted with the night.
I have walked out in rain—and back in rain.
I have outwalked the furthest city light.

I have looked down the saddest city lane.
I have passed by the watchman on his beat
And dropped my eyes, unwilling to explain.

I have stood still and stopped the sound of feet
When far away an interrupted cry
Came over houses from another street,

But not to call me back or say good-bye;
And further still at an unearthly height,
One luminary clock against the sky

Proclaimed the time was neither wrong nor right. 
I have been one acquainted with the night.
"""
string = string.lower()
letter_dict = {}
for letter in string:
    # Don't count new lines or spaces
    if letter in ['\n',' ']:
        continue
    if letter in letter_dict:
        letter_dict[letter] = letter_dict[letter] + 1
    else:
        letter_dict[letter] = 1
        
print(letter_dict)


{'i': 34, 'h': 32, 'a': 44, 'v': 8, 'e': 57, 'b': 8, 'n': 37, 'o': 29, 'c': 13, 'q': 2, 'u': 13, 't': 46, 'd': 21, 'w': 11, 'g': 9, '.': 7, 'l': 18, 'k': 7, 'r': 22, '—': 1, 'f': 6, 's': 19, 'y': 12, 'p': 8, 'm': 8, ',': 3, 'x': 1, '-': 1, ';': 1}


### Excercise 1 

See if you can modify the code above to count how often each word appears instead.

In [13]:
## Your code here

string = """
I have been one acquainted with the night.
I have walked out in rain—and back in rain.
I have outwalked the furthest city light.

I have looked down the saddest city lane.
I have passed by the watchman on his beat
And dropped my eyes, unwilling to explain.

I have stood still and stopped the sound of feet
When far away an interrupted cry
Came over houses from another street,

But not to call me back or say good-bye;
And further still at an unearthly height,
One luminary clock against the sky

Proclaimed the time was neither wrong nor right. 
I have been one acquainted with the night.
"""
string = string.lower()
word_dict = {}
for word in string.split():
    # Don't count new lines or spaces
    if word in ['\n',' ']:
        continue
    else:
        word = word.strip('.!')
        word_dict[word] = word_dict.get(word, 0) + 1
        
print(word_dict)

{'i': 7, 'have': 7, 'been': 2, 'one': 3, 'acquainted': 2, 'with': 2, 'the': 8, 'night': 2, 'walked': 1, 'out': 1, 'in': 2, 'rain—and': 1, 'back': 2, 'rain': 1, 'outwalked': 1, 'furthest': 1, 'city': 2, 'light': 1, 'looked': 1, 'down': 1, 'saddest': 1, 'lane': 1, 'passed': 1, 'by': 1, 'watchman': 1, 'on': 1, 'his': 1, 'beat': 1, 'and': 3, 'dropped': 1, 'my': 1, 'eyes,': 1, 'unwilling': 1, 'to': 2, 'explain': 1, 'stood': 1, 'still': 2, 'stopped': 1, 'sound': 1, 'of': 1, 'feet': 1, 'when': 1, 'far': 1, 'away': 1, 'an': 2, 'interrupted': 1, 'cry': 1, 'came': 1, 'over': 1, 'houses': 1, 'from': 1, 'another': 1, 'street,': 1, 'but': 1, 'not': 1, 'call': 1, 'me': 1, 'or': 1, 'say': 1, 'good-bye;': 1, 'further': 1, 'at': 1, 'unearthly': 1, 'height,': 1, 'luminary': 1, 'clock': 1, 'against': 1, 'sky': 1, 'proclaimed': 1, 'time': 1, 'was': 1, 'neither': 1, 'wrong': 1, 'nor': 1, 'right': 1}


## Tuples

Tuples are very similar to lists. They are created with parentheses -- `()` -- rather than with square brackets. 

In [8]:
my_tuple = (4,13,'hello')

Like lists, items in a tuple can be accessed by indexing.

In [10]:
my_tuple[1]

13

However, tuples are "immutable", meaning that they can't be changed after they are created. So, things like "append" and "pop" won't work.

This immutability is (for complicated reasons) an important attribute of dictionary keys, and tuples are often used in dictionaries. For example, let's say you wanted to store the population of cities in the US. You might create a dictionary like this:

In [1]:
population_dict = {('Georgia', 'Atlanta'): 498000,
              ('Illinois', 'Atlanta'): 1692,
              ('Illinois', 'Chicago'): 2750000
             }

## Example + Excercise

The following code takes a csv table of city populations that I grabbed from the US Census bureau API and saved [here](https://raw.githubusercontent.com/jdfoote/Intro-to-Programming-and-Data-Science/master/resources/data/uscities.csv). The first few lines below downloads the file. The next bit of code converts the file into a dictionary that looks like the above.

In [14]:
import csv
import requests
import codecs

# This downloads the file and then opens it. You could also save it to your computer, and open it in the normal way
f = requests.get('https://raw.githubusercontent.com/jdfoote/Intro-to-Programming-and-Data-Science/master/resources/data/uscities.csv')
f_csv = csv.reader(codecs.iterdecode(f.iter_lines(), 'utf-8'))
next(f_csv) # This just skips the header row, so it isn't in our data
population_dict = {}
for row in f_csv:
     # To get these numbers, I just opened the CSV file and looked at which columns had this data
    city = row[1]
    state = row[2]
    population = int(row[0])
    if (state, city) in population_dict: # Check for the same city twice in the same state
        print(state, city)
    else:
        population_dict[(state, city)] = population
        
# This code prints the first few items in the dictionary, to make sure it looks like it's right
print(list(population_dict.items())[:5])

[(('Alabama', 'Abbeville city'), 2560), (('Alabama', 'Adamsville city'), 4281), (('Alabama', 'Addison town'), 718), (('Alabama', 'Akron town'), 328), (('Alabama', 'Alabaster city'), 33487)]


It looks right, so let's press on.

By using tuples as keys, you can do things like summarize by one or the other entries in the tuple.

In [16]:
state_populations = {}
for city in population_dict:
    state = city[0] # Extract the state from the key
    city_pop = population_dict[city] # Extract the population from the value
    try: # If the key exists, then add the population
        state_populations[state] += city_pop
    except KeyError: # Otherwise set the value to the population
        state_populations[state] = city_pop
    
print(state_populations)

{'Alabama': 2998987, 'Alaska': 497834, 'Arizona': 5791407, 'Arkansas': 2001152, 'California': 32965607, 'Colorado': 4279051, 'Connecticut': 1379443, 'Delaware': 276116, 'District of Columbia': 705749, 'Florida': 10766975, 'Village of Islands village; Florida': 6317, 'Georgia': 4685332, 'Hawaii': 345064, 'Idaho': 1257628, 'Illinois': 11040504, 'Indiana': 4505674, 'Iowa': 2524555, 'Kansas': 2418759, 'Kentucky': 2473075, 'Louisiana': 2179262, 'Maine': 376400, 'Maryland': 1528860, 'Massachusetts': 3636758, 'Michigan': 5051706, 'Minnesota': 4665488, 'Mississippi': 1505545, 'Missouri': 4067590, 'Montana': 580973, 'Nebraska': 1500339, 'Nevada': 1752205, 'New Hampshire': 428025, 'New Jersey': 4233773, 'New Mexico': 1402249, 'New York': 12403348, 'North Carolina': 5962201, 'North Dakota': 592706, 'Ohio': 7604347, 'Oklahoma': 3044153, 'Oregon': 2963745, 'Pennsylvania': 5586259, 'Rhode Island': 547731, 'South Carolina': 1865562, 'South Dakota': 626159, 'Tennessee': 4090552, 'Moore County metropol