# Dictionaries

## Introduction

There are different types of data structures available in Python, while some of these might come across as very similar, they all have different pros and cons. Previously you have seen two such data structures: lists and tuples. In this chapter you will learn about a third one: Dictionaries. A dictionary is a type of "associative memory" or key:value storage, it allows you to describe two pieces of data and their relationship. This might seem very abstract, but lets break it down.

We would sometimes like to describe several pieces of information in a way that it is clear that they belong together.  For example if I want to keep track of the grade of a student I can store their u-number and their grade in a tuple as such:

In [44]:
student = ('u123456', 8.5)

However, if there are multiple students then we cannot suffice with a single tuple. Previously we have worked with lists, and a first straightforward solution seems to be to store this information in two lists:

In [45]:
unumbers = ['u123456', 'u223416', 'u383213', 'u234178']
grades = [8.5, 7, 8, 6.5]
print(sum(grades) / len(grades)) # Calculate the average grade

7.5


This solution allows us to store the information of multiple students, and makes it very easy to analyse the grades. However, the relationship between the grade and the u-number is not modeled in an explicit way, which means we have to remember that the order of the students list and the grades list are the same. If the order of one of these ever changes we might be in big trouble.

Another solution is to store a list of tuples, like so:

In [46]:
students = [('u123456', 8.5), ('u223416', 7), ('u383213', 8), ('u234178', 6.5)]
# Calculating the average grade:
sum_of_grades = 0
for (unumber, grade) in students: 
    sum_of_grades = sum_of_grades + grade
print(sum_of_grades / len(students)) # The average grade

7.5


Using a list of tuples we can store multiple students while clearly modeling the relationship between the student and their grade. Calculating the average grade is also not that complicated as we just have to loop over the list and extract the grade from the tuple in order to calculate the sum of all the grades. However, what if we would like to look up the grade of a particular student? 

In [5]:
students = [('u123456', 8.5), ('u223416', 7), ('u383213', 8), ('u234178', 6.5)]
query = 'u234178'
for student in students:
    if student[0] == query:
        print student[1]

6.5


We needed to go through the first 3 students and then find out the grade of the student in question. But imagine if this list was way longer containing all the archive data as well e.g.: for 5000 students per year the list would be 50,000 element long for 10 years of data. 

Now let's imagine that we cant't reuse the u-numbers and every time we have a new student we have to make sure that the newly generated number is not among the old ones. So what do we do?

In [9]:
students = [('u123456', 8.5), ('u223416', 7), ('u383213', 8), ('u234178', 6.5)]
candidate = 'u234179'
print candidate in students # checking membership
# which is the same as
for student in students:
    if student[0] == candidate:
        print "Yes"
    else:
        print "No"

False
No
No
No
No


We went through the whole list just to check out if we have an element in it or not! Thats pretty inefficient. The take home message here is that 

## Using dictionaries

At first glance the Python code for using a dictionary is very similar to to that for lists, with the major difference being that for lists we use [] and for dictionaries we use {}, like so:

In [47]:
empty_list = [] # Create an empty list
empty_dict = {} # Create an empty dictionary

If we want to create a dictionary containing the u-numbers and the grades we can do that in two ways, we can directly create a dictionary from scratch or we can convert our list of tuples to a dictionary. When creating a dictionary from scratch we use a colon ':' to indicate the key : value pairs. 

In [48]:
# From scratch:
dict_students = {'u123456': 8.5, 'u223416' : 7, 'u383213' : 8, 'u234178' : 6.5}
alternative_dict = dict(students) # Convert the list of tuples to a dictionary

print(dict_students)
print(alternative_dict)
# From the print statements we can see that the two dictionaries are identical.
# Note: the order of the dictionary as printed below might not be identica to how we created it,
# a dictionary is unordered, as such the order of the elements is random.

{'u383213': 8, 'u223416': 7, 'u234178': 6.5, 'u123456': 8.5}
{'u383213': 8, 'u223416': 7, 'u234178': 6.5, 'u123456': 8.5}


One of the nicest attributes of dictionaries is the manner in which we can access the elements. We can use the key to get the corresponding value, so if we want to retrieve the grade of the student with u-number u223416 we can do:

In [49]:
print(dict_students['u223416'])

7


For a list we can only do this in the same way if we know the position in the list that the student is at, and if the position is unknown we have to check all elements of the list (which can be slow for big lists):

In [50]:
# If you don't know the position in the lists of tuples:
for (unumber, grade) in students:
    if unumber == 'u223416':
        print(grade)

# If we do know the position:
print(students[1]) # When we created our list this student was added second, so it is at position 1

7
('u223416', 7)


Back to dictionaries. If we want to modify the content of a dictionary we can use the key to access the element we want to change:

In [51]:
dict_students['u223416'] = 8 # Change the grade of student u223416 to an 8
dict_students['u223416'] = 7 # and back to a 7

dict_students['mistake'] = 0 # Accidently add a mistake to our dict, notice that this adds a new element to the dict!
print(dict_students)

del dict_students['mistake'] # Remove the mistake
print(dict_students)

{'u383213': 8, 'mistake': 0, 'u223416': 7, 'u234178': 6.5, 'u123456': 8.5}
{'u383213': 8, 'u223416': 7, 'u234178': 6.5, 'u123456': 8.5}


## Iterating over dictionaries

Another nice attribute of dictionaries is that we can iterate over the content of a dictionary. While the order of the dictionary is essentially random, this can be useful in cases where the order is not important. For example when we want to calculate the average grade:

In [52]:
dict_sum = 0
for unumber in dict_students:
    grade = dict_students[unumber] # Retrieve the grade from the dictionary
    dict_sum = dict_sum + grade
print dict_sum / len(dict_students)

7.5


However, this is even more complicated than our list of tuples example, so you might now be wondering how dictionaries are better for this example. Dictionaries are very useful because we can iterate over them in different ways, and there are several functions that help you with this:

* dict.keys(), to return a list of all the keys in our dict
* dict.items(), to return a list of tuples of all the (key, value) pairs in our dict
* dict.values(), to return a list of all the values in our dict

As you saw in the above for loop, by default Python will choose the dict.keys() function and loop over all the keys in the dict. So the same example, but more explicit would be:

In [53]:
dict_sum = 0
for unumber in dict_students.keys():
    grade = dict_students[unumber] # Retrieve the grade from the dictionary
    dict_sum = dict_sum + grade
print dict_sum / len(dict_students)

7.5


We can also use the dict.items() function for this example, so simplify it: 

In [54]:
dict_sum = 0
for (unumber, grade) in dict_students.items():
    # Now we no longer need to retrieve the grade from the dictionary
    dict_sum = dict_sum + grade
print dict_sum / len(dict_students)

7.5


However, because we can also return a list of all the values we can simplify the code to calculate the average grade significantly, by taking the sum of the dict.values() and dividing this by the number of elements in our dictionary:

In [55]:
print(sum(dict_students.values()) / len(dict_students)) # Calculate the average grade

7.5


Because dictionaries are a very clever, it is possible to use them as lists of tuples, and as multiple lists. Which is something we can verify using some print statements:

In [56]:
# The unumbers list from our multiple lists example
print('unumbers:', unumbers)
# is the same unumbers as dict_students.keys() (but in a different order, because dictionaries are unodered)
print('dict_students.keys():', dict_students.keys())

print()

# And the same for the grades
print('grades:', grades)
# and dict_student.values()
print('dict_students.values():', dict_students.values())

print()

# Similar the list of tuples 
print(students)
# Contains the same tuples as dict_students.items()
print(dict_students.items())

('unumbers:', ['u123456', 'u223416', 'u383213', 'u234178'])
('dict_students.keys():', ['u383213', 'u223416', 'u234178', 'u123456'])
()
('grades:', [8.5, 7, 8, 6.5])
('dict_students.values():', [8, 7, 6.5, 8.5])
()
[('u123456', 8.5), ('u223416', 7), ('u383213', 8), ('u234178', 6.5)]
[('u383213', 8), ('u223416', 7), ('u234178', 6.5), ('u123456', 8.5)]


Thanks to these possibilities of dictionaries we have all the benefits of both lists of tuples and multiples lists, and none of the downsides, making dictionaries the perfect solution for this kind of use case.

## Storing more complex values

Until now we have only considered the case in which the key:value pairs that we want to store consist of a single value. However, it is possible to store much more complex values in dictionaries. We could for example store an entire  list:

In [57]:
courses = {'Data processing': ['u123456', 'u383213', 'u234178'], 
           'Statistics' : ['u123456', 'u223416', 'u234178'], 
           'Introduction HAIT' : ['u123456', 'u223416', 'u383213']}

print(courses)

{'Statistics': ['u123456', 'u223416', 'u234178'], 'Data processing': ['u123456', 'u383213', 'u234178'], 'Introduction HAIT': ['u123456', 'u223416', 'u383213']}


Which allows us to store which students are taking which courses. But because we have shown in the previous examples that we would like to store the students grades, we can actually also store another dictionary as the value:

In [58]:
courses_grades = {'Data processing': {'u123456': 8, 'u383213' : 8, 'u234178' : 6.5}, 
           'Statistics' : {'u123456': 5.2, 'u223416' : 7.2, 'u234178' : 8}, 
           'Introduction HAIT' : {'u123456': 8, 'u223416' : 6.6, 'u383213' : 7}}

print(courses_grades)

print()

# Calculating the average grade per course:
for (course, results) in courses_grades.items():
    avg = sum(results.values()) / len(results)
    print(course, avg)

{'Statistics': {'u223416': 7.2, 'u234178': 8, 'u123456': 5.2}, 'Data processing': {'u383213': 8, 'u234178': 6.5, 'u123456': 8}, 'Introduction HAIT': {'u383213': 7, 'u223416': 6.6, 'u123456': 8}}
()
('Statistics', 6.8)
('Data processing', 7.5)
('Introduction HAIT', 7.2)


Additionally, we could repeat this pattern indefinitely, or we can choose to store a list of multiple grades rather than a single grade per course. However, it depends on your use case and how complex your data structure needs to be. Often simpler is better.

Repeatedly placing data structures inside similar other data structures is commonly referred to as nesting, which allows to create an hierarchy of (nested) data structures.

## More complex keys

So far all the keys we've used were strings, but they don't have to be. In fact, Python is happy with a great variety of key types (except for lists). In this section we'll consider two additional key types: integers, and tuples.

When using an integer as the key a dictionary can look a lot like a list, as we can see from the following example:

In [59]:
# Create a dictionary and a list of our 8 digit number.
integer_dict = {0: 4, 1: 0, 2: 6, 3: 3, 4: 1, 5: 7, 6: 8, 7: 5} 
integer_list = [4, 0, 6, 3, 1, 7, 8, 5]

# Go over the numbers one by one and compare them
for i in range(0, 8):
    print("Entry", i, "has the value", integer_dict[i], "in the dictionary, and", integer_list[i], "in the list.")
    

('Entry', 0, 'has the value', 4, 'in the dictionary, and', 4, 'in the list.')
('Entry', 1, 'has the value', 0, 'in the dictionary, and', 0, 'in the list.')
('Entry', 2, 'has the value', 6, 'in the dictionary, and', 6, 'in the list.')
('Entry', 3, 'has the value', 3, 'in the dictionary, and', 3, 'in the list.')
('Entry', 4, 'has the value', 1, 'in the dictionary, and', 1, 'in the list.')
('Entry', 5, 'has the value', 7, 'in the dictionary, and', 7, 'in the list.')
('Entry', 6, 'has the value', 8, 'in the dictionary, and', 8, 'in the list.')
('Entry', 7, 'has the value', 5, 'in the dictionary, and', 5, 'in the list.')


As you can see we can use the same syntax to access the values in both the dictionary and the list, and because we assigned them similarly all the values are the same. Yet, on closer inspection you will notice that a dictionary and a list behave very differently:

In [60]:
# Create another dict and list
another_dict = {0: 4, 1: 0, 2: 6, 3: 3, 4: 1, 5: 7, 6: 8, 7: 5} 
another_list = [4, 0, 6, 3, 1, 7, 8, 5]
# Print their values
print(another_dict.values())
print(another_list)
print()
# Remove the 4th element from both the list and the dict
del another_dict[4]
del another_list[4]
# Print their values again
print(another_dict.values())
print(another_list)
print()
# They look similar enough right?
# But what if we print the keys for the dictionary
print(another_dict.keys())
# The 4 is gone, but it still goes to 7
print(range(0, len(another_list)))
# Yet the list goes from 0 to 6
print()
# and if we try to print the 4th element from the list
print(another_list[4])
# It works well, but the dict will give an error
print(another_dict[4])

[4, 0, 6, 3, 1, 7, 8, 5]
[4, 0, 6, 3, 1, 7, 8, 5]
()
[4, 0, 6, 3, 7, 8, 5]
[4, 0, 6, 3, 7, 8, 5]
()
[0, 1, 2, 3, 5, 6, 7]
[0, 1, 2, 3, 4, 5, 6]
()
7


KeyError: 4

Hopefully this example makes it clear that it can be tempting to think of a dictionary as a list, but that their behaviour is very different and it can lead to problems if we treat a dictionary as a list.

A more useful application for dictionaries with integers as keys is found when the keys have more importance, and we don't want to start at 0, for example when we want to keep track in which year a group of students were born:

In [None]:
birthyears = {1989: ['u123456'], 1991: ['u223416', 'u234178'], 1992: ['u383213']}
print(birthyears)

In the case of birthyears it is useful that we can use a dictionary instead of a list as we do not want to create a list that starts at 0, because of the severe lack of students that were born before the year 1900. 

It is also possible to create a dictionary with a tuple as the key, this can be useful when we want to use two distinct pieces of information to create the key. An example of this is when we have a list of courses and grades, but for multiple years, as the courses are given every year.

In [None]:
yearly_courses_grades = {('Data processing', '2013/2014'): {'u123456': 8, 'u383213' : 8, 'u234178' : 6.5}, 
                         ('Data processing', '2014/2015'): {'u423486': 7, 'u213242' : 9, 'u265421' : 7.5},
                         ('Statistics', '2013/2014') : {'u123456': 5.2, 'u223416' : 7.2, 'u234178' : 8},
                         ('Statistics', '2014/2015') : {'u423486': 6.5, 'u213242' : 8, 'u265421' : 7}}

# And we can retrieve the information using the same tuples
print('The grade for u123456 for Data Processing 2013/2014 was:', 
      yearly_courses_grades[('Data processing', '2013/2014')]['u123456'])

print('The grade for u123456 for Data Statistics 2013/2014 was:', 
      yearly_courses_grades[('Statistics', '2013/2014')]['u123456'])

Similarly to the values, the keys can be made very complex, but often you don't need such complexity. Try to keep it simple and appropriate for your purpose.

## Excercises

* Write a function that given two lists returns a dictionary whose keys are the first list, and the values the second list.
* Copy the 'courses_grades' dictionary from the example above and calculate the average grade for the student with u-number 'u123456' across all three courses.