# Dictionaries

By [Allison Parrish](http://www.decontextualize.com/)

## Dictionaries

The dictionary is a very useful data structure in Python. The easiest way to conceptualize a dictionary is that it's like a list, except you don't look up values in a dictionary by their index in a sequence---you look them up using a "key," or a unique identifier for that value.

We're going to focus here just on learning how to get data *out* of dictionaries, not how to build new dictionaries from existing data. We're also going to omit some of the nitty-gritty details about how dictionaries work internally. You'll learn a lot of those details in later courses, but for now it means that some of what I'm going to tell you will seem weird and magical. Be prepared!

### Why dictionaries?

For our purposes, the benefit of having data that can be parsed into dictionaries, as opposed to lists, is that dictionary keys tend to be *mnemonic*. That is, a dictionary key will usually tell you something about what its value is. (This is in opposition to parsing, say, CSV data, where we have to keep counting fields in the header row and translating that to the index that we want.)

Lists and dictionaries work together and are used extensively to represent all different kinds of data. Often, when we get data from a remote source, or when we choose how to represent data internally, we'll use both in tandem. The most common form this will take is representing a table, or a database, as a *list* of records that are themselves represented as *dictionaries* (mapping the name of the column to the value for that column). We'll see an example of this when we access web APIs in a subsequent tutorial.

Dictionaries are also good for storing *associations* or *mappings* for quick lookups. For example, if you wanted to write a program that was able to recall the capital city of every US state, you might use a dictionary whose keys are the names of the states, and whose values are the corresponding capitals. Dictionaries are also used for data analysis tasks, like keeping track of how many times a particular token occurs in an incoming data stream.

### What dictionaries look like

Dictionaries are written with curly brackets, surrounding a series of comma-separated pairs of *keys* and *values*. Here's a very simple dictionary, with one key, `Obama`, associated with a value, `Hawaii`:

In [None]:
{'Obama': 'Hawaii'}

Here's another dictionary, with more entries:

In [None]:
{'Obama': 'Hawaii', 'Bush': 'Texas', 'Clinton': 'Arkansas', 'Trump': 'New York'}

As you can see, we're building a simple dictionary that associates the names of presidents to the home states of those presidents. (This is my version of JOURNALISM.)

The association of a key with a value is sometimes called a *mapping*. (In fact, in other programming languages like Java, the dictionary data structure is called a "Map.") So, in the above dictionary for example, we might say that the key `Bill Clinton` *maps to* the value `Arkansas`.

A dictionary is just like any other Python value. You can assign it to a variable:

In [None]:
president_states = {'Obama': 'Hawaii', 'Bush': 'Texas', 'Clinton': 'Arkansas', 'Trump': 'New York'}

And that value has a type:

In [None]:
type(president_states)

At its most basic level, a dictionary is sort of like a two-column spreadsheet, where the key is one column and the value is another column. If you were to represent the dictionary above as a spreadsheet, it might look like this:

| key   | value   |
| ----- | ------- |
| Obama | Hawaii |
| Bush | Texas |
| Clinton | Arkansas |
| Trump | New York |

The main difference between a spreadsheet and a dictionary is that dictionaries are *unordered*. For an explanation of this, see below.

Keys and values in dictionaries can be of any data type, not just strings. Here's a dictionary, for example, that maps integers to lists of floating point numbers:

In [None]:
{17: [1.6, 2.45], 42: [11.6, 19.4], 101: [0.123, 4.89]}

> HEAD-SPINNING OPTIONAL ASIDE: Actually, "any type" above is a simplification: *values* can be of any type, but keys must be *hashable*---see [the Python glossary](https://docs.python.org/2/glossary.html#term-hashable) for more information. In practice, this limitation means you can't use lists (or dictionaries themselves) as keys in dictionaries. There are ways of getting around this, though!

A dictionary can also be empty, containing no key/value pairs at all:

In [None]:
{}

### Getting values out of dictionaries

The primary operation that we'll perform on dictionaries is writing an expression that evaluates to the value for a particular key. We do that with the same syntax we used to get a value at a particular index from a list. Except with dictionaries, instead of using a number, we use one of the keys that we had specified for the value when making the dictionary. For example, if we wanted to know what Bill Clinton's home state was, or, more precisely, what the value for the key `Clinton` is, we would write this expression:

In [None]:
president_states["Clinton"]

Going back to our spreadsheet analogy, this is like looking for the row whose first column is "Clinton" and getting the value from the corresponding second column.

If we put a key in those brackets that does not exist in the dictionary, we get an error similar to the one we get when trying to access an element of an array beyond the end of a list:

In [None]:
president_states['Franklin']

As you might suspect, the thing you put inside the brackets doesn't have to be a string; it can be any Python expression, as long as it evaluates to something that is a key in the dictionary:

In [None]:
president = 'Obama'
president_states[president]

You can get a list of all the keys in a dictionary using the dictionary's `.keys()` method:

In [None]:
president_states.keys()

That funny-looking `dict_keys(...)` thing isn't *exactly* a list, but it's close enough: you can use it anywhere you would normally use a list, like in a list comprehension:

In [None]:
[item.upper() for item in president_states.keys()]

... or a `for` loop:

In [None]:
for item in president_states.keys():
    print(item)

And a list of all the values with the `.values()` method:

In [None]:
president_states.values()

If you want a list of all key/value pairs, you can call the `.items()` method:

In [None]:
president_states.items()

(The weird list-like things here that use parentheses instead of brackets are called *tuples*---we'll discuss those at a later date.)

### Other operations on dictionaries

[Here's a list of all the methods that dictionaries support](https://docs.python.org/3.6/library/stdtypes.html#mapping-types-dict). I want to talk about a few of these in particular. First, the in operator (which we've used previously to check to see if there's a substring in a string, or a particular item in a list), also works with dictionaries! It checks to see if a particular key exists in the dictionary:

In [None]:
'Obama' in president_states

In [None]:
'Franklin' in president_states

A dictionary can also go in a `for` loop, in the spot between `in` and the colon (where you might normally put a list). If you write a for loop like this, the loop will iterate over each key in the dictionary:

In [None]:
for item in president_states:
    print(item)

### Dictionaries can contain lists and other dictionaries

Dictionaries are often used to represent *hierarchical* data structures, that is, data structures with a top-down organization. For example, consider a program intended to keep track of a shopping list. In such a program, you might want to categorize grocery items by category, so you might make a dictionary that has a key for each category:

In [None]:
shopping = {'produce': ['apples', 'oranges', 'spinach', 'carrots'],
            'meat': ['ground beef', 'chicken breast']}

The `shopping` dictionary above has two keys, whose values are both *lists*. Writing an expression that evaluates to one of these lists is easy, e.g.:

In [None]:
shopping['meat']

And you could write a `for` loop to print out the items of one of these lists fairly easily as well, e.g.:

In [None]:
print("Produce items on your list:")
for item in shopping['produce']:
    print("* " + item)

Slightly more challenging is this: how do you write an expression that evaluates to (let's say) the *first item* of the list of produce? The trick to this is to remember how indexing syntax works. When you have a pair of square brackets with a single value inside of them, Python looks immediately to the left of those square brackets for an expression that *evaluates to* either a list or a dictionary. For example, in the following expression:

In [None]:
[5, 10, 15, 20][3]

... you can think of Python as looking at this expression from right to left. It sees the `[3]` first and then thinks, "okay, I need to find something that is a list or dictionary directly to the left of this, and grab the third item (index-wise)." In fact, it *does* find a list or a dictionary (i.e., the list `[5, 10, 15, 20]`) and evaluates the entire expression to `20` accordingly.

With that in mind, let's rephrase the task. I want to get:

* the first item
* from the list that is the value for the key `produce`
* in the dictionary `shopping`

We can work at this problem by following these instructions and then writing the expression *in reverse*. To get the first item from a list, we write:

    ????[0] # the first item
    
`????` is just a placeholder for the part of the code that we haven't written yet, but we know that it has to be a list. Then, to get the value for the key `produce`:

    ????["produce"][0] # from the list that is the value for the key `produce`
    
Again, `????` is a placeholder, but now we know it has to be a dictionary. The dictionary, of course, is `shopping`, so we can fill that in as the last step:

    shopping["produce"][0]
    
Let's see what that expression evaluates to:

In [None]:
shopping["produce"][0]

Exactly right! But let's say we want to take the organization in our dictionary up a notch and create separate categories for fruits and vegetables. One way to do this would be to make the value for the key `produce` be... another dictionary, like so:

In [None]:
shopping = {'produce': {'fruits': ['apples', 'oranges'], 'vegetables': ['spinach', 'carrots']},
            'meat': ['ground beef', 'chicken breast']}

This is now a pretty complicated data structure! (Well, not *that* complicated compared to what you'll see, e.g., in responses from web APIs. But it's the most complicated data structure we've made so far.) If we were to draw a schematic of this data structure, it might look something like this:

    shopping (dictionary)
        -> produce (dictionary)
            -> fruits (list)
            -> vegetables (list)
        -> meat (list)
        
In prose: `shopping` is a variable that contains a dictionary. That dictionary has two keys: `produce`, whose value is itself a dictionary, and `meat`, whose value is a list. (Whew!)

Given this data structure, let's work through how to do the following tasks:

* Get a list of all fruits
* Get a list of all categories of produce
* Get the first fruit
* Get the second vegetable

Getting a list of the fruits requires getting the value for the `fruits` key in the dictionary that is the value for the `produce` key. So we start out with:

                       ['fruits'] -> Step one
            ['produce']['fruits'] -> Step two
    shopping['produce']['fruits'] -> Step three
    
The final expression:

In [None]:
# A list of all fruits
shopping['produce']['fruits']

Continuing with our tasks:

In [None]:
# a list of all categories of produce
shopping['produce'].keys()

In [None]:
# the first fruit
shopping['produce']['fruits'][0]

In [None]:
# the second vegetable
shopping['produce']['vegetables'][1]

### Adding key/value pairs to a dictionary

Once you've assigned a dictionary to a variable, you can add another key/value pair to the dictionary by assigning a value to a new index, like so:

In [None]:
president_states['Reagan'] = 'California'

Take a look at the dictionary to see that there's a new key/value pair in there:

In [None]:
president_states

### On the order of dictionary keys

So something strange is happening here, and you may have already noticed it. If we write some code that iterates over the keys of a dictionary, the keys show up in one order:

In [None]:
for item in president_states:
    print(item)

Whereas if we simply evaluate the dictionary, the keys show up in a different order:

In [None]:
president_states

What gives? Here's what's up. Underneath the hood, Python stores the key/value pairs in a dictionary *in the order you added them to the dictionary*. This means that when you add a new item to the dictionary, it will show up *last* when you iterate over the dictionary (or get a list of its keys or values, etc.). However, when you simply evaluate a dictionary, Jupyter Notebook takes it upon itself to display the keys in *alphabetical order* instead. So the order that Jupyter Notebook shows the key/value pairs in is *not* the same as the order for the key/value pairs you would get if you iterated over the list in a for loop.

To add to the confusion, in previous versions of Python, the order of key/value pairs in a dictionary was *arbitrary* (i.e., deterministic but not repeatable; adding the same items to a dictionary might produce different orderings across Python sessions). And the developers have warned us that this aspect of dictionaries might change in the future ([technical discussion here](https://mail.python.org/pipermail/python-dev/2016-September/146327.html)). So don't rely on the fact that *right now* Python preserves insertion order in dictionaries.

### Dictionary keys are unique

Another important fact about dictionaries is that you can't put the same key into one dictionary twice. If you try to write out a dictionary that has the same key used more than once, Python will silently ignore all but one of the key/value pairs. For example:

In [None]:
{'a': 1, 'a': 2, 'a': 3}

Similarly, if we attempt to set the value for a key that already exists in the dictionary (using `=`), we won't add a second key/value pair for that key---we'll just overwrite the existing value:

In [None]:
test_dict = {'a': 1, 'b': 2}
test_dict['a']

In [None]:
test_dict['a'] = 100
test_dict['a']

In the case where a key needs to map to multiple values, we might instead see a data structure in which the key maps to another kind of data structure that itself can contain multiple values, like a list:

In [None]:
{'a': [1, 2, 3]}