# Containers in Python

By now, you are familiar with a few fundamental coding concepts. But it might now be immediately obvious how these techniques apply to humanities research. In what follows, cover  Python  

## 1. Lists

[VU] **At the end of this chapter, you will be able to:**
* create a list
* add items to a list
* extract/inspect items in a list
* perform basic list operations
* use built-in functions on lists 

Lists resemble strings: both are a **sequence** of values. But whereas a string was a sequence of characters, a list can contain values of the any type. These values we call **elements** or **items**.

## 1.1 Introduction

Consider the first sentence (represented as a string) from Franz Kafka's the trial.

In [2]:
sentence = "Someone must have slandered Josef K., for one morning, without having done anything truly wrong, he was arrested. "

[MK] Words are made up of characters, and so are strings in Python, like the string stored in the variable sentence in the block above. For the sentence above, it might seem more natural for humans to describe it as a series of words, rather than as a series of characters. Say we want to access the first word in our sentence. If we type in:

In [3]:
first_word = sentence[0]
print(first_word)

S


[MK]Python only prints the first character of our sentence. (Think about this if you do not understand why.) We can transform our sentence into a list of words (represented by strings) using the split() function as follows:

In [4]:
words = sentence.split()
print(words)

['Someone', 'must', 'have', 'slandered', 'Josef', 'K.,', 'for', 'one', 'morning,', 'without', 'having', 'done', 'anything', 'truly', 'wrong,', 'he', 'was', 'arrested.']


The variable `sentence` now holds to the whole first line of Kafka's Trial. Each element in the list is now (approximately) a word. Run the code below to see the difference.

In [5]:
first_word = words[0]
print(first_word)

Someone


We apply the `split()` function to the variable `sentence` and we assign the result of the function (we call this the 'return value' of the function) to the new variable `words`. 

By default, the split() function in Python will split strings on the spaces between consecutive words and it will returns a list of words. However, we can pass an argument to `split()` that specifies explicitly the string we would like to split on. In the code block below, we will split a string on commas, instead of spaces. Do you get the syntax?

This is often useful for parsing information from a csv file. For example, the line below has the structure of the [Google Ngram](https://books.google.com/ngrams). The Ngram Viewer allows researcher to explore long-term cultural [trends](https://books.google.com/ngrams/graph?content=king%2C+queen&case_insensitive=on&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t4%3B%2Cking%3B%2Cc0%3B%2Cs0%3B%3Bking%3B%2Cc0%3B%3BKing%3B%2Cc0%3B%3BKING%3B%2Cc0%3B.t4%3B%2Cqueen%3B%2Cc0%3B%2Cs0%3B%3BQueen%3B%2Cc0%3B%3Bqueen%3B%2Cc0%3B%3BQUEEN%3B%2Cc0). The source data of this corpus comprises the yearly word and document frequences for ~ 5 million books between 1500 and 2008. The lines are separated by hard returns ("\n"). Each line holds four element: word, year, word frequency, document frequency.

Using the split function, we can easily **parse** this file, i.e. recognize and read its content. First we split the string by their "\n" and then each line by their ","

In [9]:
google_ngram = "queen,1900,20394,3435\nqueen,1901,23340,2935\nqueen,1902,23120,3035"
google_ngram = google_ngram.split("\n")
print(google_ngram)
first_line = google_ngram[0]
print(first_line.split(','))

['queen,1900,20394,3435', 'queen,1901,23340,2935', 'queen,1902,23120,3035']
['queen', '1900', '20394', '3435']


The reverse of the `split()` function can be accomplished with `join()`, it turns a list into a string, with a specific 'delimiter' or the string you want to use to join the items.

In [1]:
observation = ['queen', '1900', '20394', '3435']
delimiter = ', '
csv_string = delimiter.join(observation)
print(csv_string)

queen, 1900, 20394, 3435


In the previous chapter we argued that variables operates as "boxes"--you put a value in there, to save it for later. Until now the box could only contain one items, a string or a number. Lists expand the possibilities, they serve as "container". Now you can stuff your box with many elements as you'd like. Let's have a look at how this works.

**Exercise**: [To do]

##  1.2 Creating a list--the basic rules 

To store an empty list in variable, simply assign ``[]`` (square brackets) to a variable name.

In [None]:
x = []

For sure empty lists are not immediately useful. Create a list with content, enclose the individual items within square brackets, separated by a comma.

In [10]:
my_grades = [8,9,6,7]
print(my_grades)
my_favorite_songs = ['','']
print(my_grades)
my_garbage = ['Potatoe',[1,2,3],9.03434,'frogs']
print(my_grades)

[8, 9, 6, 7]


[VU] 
### General rules:
* Lists are surrounded by square brackets and the elements in the list are separated by commas
* A list element can be **any Python object** - even another list (e.g. * List can be an collection of numbers, strings, floats (or a combination thereof))
* A list can store values with different types
* A list can be empty

## 1.3 List operations
Python allows at least the following very useful list operations:

Arithmetic operators:
* **concatenation**
* **repetition**

but also includes comparison and membership operators!

### Arithmetic operators

In [None]:
Similar to strings, Python comes with specific operations (``*`` and ``+``) that you can apply to a list.

In [None]:
The ``+`` operator concatenates lists

In [12]:
a = [1, 2, 3]
b = [4, 5, 6]
c = a + b
print(c)

[1, 2, 3, 4, 5, 6]


In [None]:
Similarly, the * operator repeats a list a given number of times:

In [14]:
# First example of the * operator
print([0]*4)

[0, 0, 0, 0]


In [13]:
# First example of the * operator
a = ['spam','Spam','SPAMMM']
b = a * 5
print(b)

['spam', 'Spam', 'SPAMMM', 'spam', 'Spam', 'SPAMMM', 'spam', 'Spam', 'SPAMMM', 'spam', 'Spam', 'SPAMMM', 'spam', 'Spam', 'SPAMMM']


The first example multiplies the single-itemed list four times. The second repeats the list with typographic variations on the word 'spam' five times.

[VU] Of course, you can use lists in membership boolean expressions. The `in` operator checks whether the items 'meaning' appears in the variable `life`.

In [47]:
life = ['a', 'lot', 'of', 'stuff']
print('meaning' in life)

False


And you can use lists in comparison boolean expressions

In [46]:
print([3, 2] == [2, 3])

False


## 1.4 Indexing, slicing and replacing
**[To do: similar to strings, point out the mutability]**
[VU] **Indexing** and **slicing** works the same way as with strings. Every item in the list has hence its own index number. We start counting at 0! The indices for our
list ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn'] are as follows:


J.S. Bach|W.A. Mozart|F. Mendelssohn
---|---
0|1|2
-3|-2|-1


[VU] We can hence use this index number to extract items from a list (just as with strings)

In [11]:
composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']
print(composer_list[0])
print(composer_list[1])
print(composer_list[2])

J.S. Bach
W.A. Mozart
F. Mendelssohn


Obviously, we can also use **negative indices**:

In [13]:
composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']
print(composer_list[-1])
print(composer_list[-2])
print(composer_list[-3])

F. Mendelssohn
W.A. Mozart
J.S. Bach


If you wondered, `-0` just returns the first element.

In [14]:
print(composer_list[-0])

J.S. Bach


And we can extract one part of a list using **slicing**:

In [27]:
composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']
list_with_less_composers = composer_list[:2]
print(list_with_less_composers)

['J.S. Bach', 'W.A. Mozart']


A common error is to retrieve elements by indices greater than the length of the list (minus -1).

In [18]:
print(composer_list[5])

IndexError: list index out of range

The `IndexError` tells you that it could not find an items at position five, as the range of the positions only goes from 0 till 2.

Index notation can be used to replace elements in the list. Let's  say, we want to get rid of John (at position 1) en replace him with Copernicus. As lists are **mutable** you replace the items.

In [28]:
composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']
composer_list[1] = 'Elvis P.'
print(composer_list)

['J.S. Bach', 'Elvis P.', 'F. Mendelssohn']


Similarly, a slice operator on the left side of an assignment can update multiple elements:

In [22]:
composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']
composer_list[1:] = ['L. van Beethoven','A. Webern']
print(composer_list)

['J.S. Bach', 'L. van Beethoven', 'A. Webern']


## Intermezzo: Mutability

The fact that you were able to replace an element by index (as in the above cell) relates to the mutability of lists. For example, performing a similar manipulation on a string object, will cause a `TypeError`.

In [30]:
misspelled = 'Pythvn'
misspelled[4] = 'o'

TypeError: 'str' object does not support item assignment

If we convert the string to a list we can get rid of this naughty typo.

In [33]:
misspelled = list('Pythvn')
print(misspelled)
misspelled[4] = 'o'
print(misspelled)

['P', 'y', 't', 'h', 'v', 'n']
['P', 'y', 't', 'h', 'o', 'n']


In short: lists are **mutable**--you can manipulate the content of list variables--whereas strings are not. Question: can you predict whether the following code raises an error?

In [36]:
word = 'kitten'
word+='s'
print(word)

'kittens'

This remember that this is equivalent to word = word + 's', i.e. it does not change the variable, but creates a new one  and assigns it to a variable with the same name, effectively replacing the content of the box.

In [None]:
[TO DO]Index operator list[start:stop:step]

In [7]:
till_twenty = list(range(0,21))
print(till_twenty)

evens = till_twenty[2:-1:2]
print(evens)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
[2, 4, 6, 8, 10, 12, 14, 16, 18]


## 1.5 List methods

As lists are mutable, they provide a much more flexible data type. Lists come with specific **methods**, a set of powerful tools that Python already pre-cooked for you. These tools help you with building and manipulating lists.

### Adding items to a list
Most of the crucial list functionalities are provided by the in-built list **methods**: functions attached to the list object. For an overview of the available methods run the code below (scroll down, for this course you can ignore the methods starting and ending with double underscores.)

In [30]:
writers_list = []
print(type(a_list))

<class 'list'>


We learn, unsurprisingle to that the variable a_list is of type `list`. Let's inspect the functionalities Python provides for working with lists.

In [26]:
help(list)

### 1.5.1 append() and extend()

The first method we encounter is ``append``. To see what this method does use the same `help` function as before

In [27]:
help(list.append)

Help on method_descriptor:

append(...)
    L.append(object) -> None -- append object to end



`append` is method that adds new items to the end of a list. It has one positional argument and returns `None` (we come back to this a few block below)

The Python help functionalities helps you exploring the methods attached to an object. 

**Exercise**: find out what the method ``extend`` does, and how to apply it to the writers_list.

In [23]:
composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']
print(composer_list)
composer_list.append('L. van Beethoven')
print(composer_list)

['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']
['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn', 'L. van Beethoven']


In [None]:
# add some other writers to the list here

Do you get the syntax that goes with the `append()` function? The list we wish to append the item to goes first and we join the `append()` function to this list using a dot (`.`). In between the round brackets that go with the function name, we place the actual string that we wish to add to the list. We call such a input **value** an **'argument'** or a **'parameter'** that we **'pass'** to a function. Next, the function will return a 'return value'. 

Make sure that you are familiar with this terminology because you will often come across such terms when you look for help online!

Functions in Python are generally divided into **fruitful** and **void** functions? `append` is a void function, similar to `print`, it performs an operation (adds one element to the list) but returns nothing. Understanding this distinction may help you tracing bugs in your code.

In [24]:
a = composer_list.append('J. des Prez')
print(composer_list)
print(a)

['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn', 'L. van Beethoven', 'J. des Prez']
None


[VU] It might be a bit confusing at first that a list method returns None. Please carefully look at the difference between the two following examples. Please predict what will be printed in each code snippet below:

In [25]:
a_list = [1, 3, 4]
a_list.append(5)
print(a_list)

[1, 3, 4, 5]


In [26]:
a_list = [1, 3, 4]
a_list = a_list.append(5)
print(a_list)

None


[CS] It is important to distinguish between operations that modify lists and operations that create
new lists. For example, the append method modifies a list, but the + operator creates a
new list

In [None]:
# find out what extend does
# apply the method to the writers_list

The `append()` method is especially powerful in a `for` loop. We have a closer look at loops later, but the code below show a context in which the `append()` method is often applied. For example, we have a .tsv table which lists composers by their country of origin. Imagine, we want study composers by nationality. The code below shows how to extact the relevant information from this table.

In [45]:
data = 'Justus Johann Friedrich Dotzauer\tGermany\nSaid Rustamov\tAzerbaijan\nFlor Alpaerts\tBelgium\nPetko Staynov\tBulgaria\nTheodor Ludwig Wiesengrund Adorno\tGermany\nAnna Amalia, Duchess of Brunswick-Wolfenbüttel\tGermany'

In [46]:
print(data)

Justus Johann Friedrich Dotzauer	Germany
Said Rustamov	Azerbaijan
Flor Alpaerts	Belgium
Petko Staynov	Bulgaria
Theodor Ludwig Wiesengrund Adorno	Germany
Anna Amalia, Duchess of Brunswick-Wolfenbüttel	Germany


As previously shown, we parse this table with the `split()` function. First we the identify the rows (seperated by hard returns or "\n") and later the cells within each row (seperated by tabs or "\t")

In [40]:
rows = data.split('\n')
print(rows)

['Said Rustamov\tAzerbaijan', 'Flor Alpaerts\tBelgium', 'Petko Staynov\tBulgaria', 'Theodor Ludwig Wiesengrund Adorno\tGermany', 'Anna Amalia, Duchess of Brunswick-Wolfenbüttel\tGermany']


To process the seperate cells, we first create an empy list called `table`. Then we iterate over each row created by `split('\n')` and split each row by their tab-symbol. The last step converts each row (which is still a string) to a list. 

In [49]:
rows = data.split('\n')
# The rows are of type list
print(type(rows))
# But the first element in this list is still a string
print(type(rows[0]))

<class 'list'>
<class 'str'>


The code below makes clear during every iteration in the for loop.

In [50]:
# get the rows from the string
for row in rows:
    print(row)
    tsv_splitted = row.split('\t')
    print(tsv_splitted)

Justus Johann Friedrich Dotzauer	Germany
['Justus Johann Friedrich Dotzauer', 'Germany']
Said Rustamov	Azerbaijan
['Said Rustamov', 'Azerbaijan']
Flor Alpaerts	Belgium
['Flor Alpaerts', 'Belgium']
Petko Staynov	Bulgaria
['Petko Staynov', 'Bulgaria']
Theodor Ludwig Wiesengrund Adorno	Germany
['Theodor Ludwig Wiesengrund Adorno', 'Germany']
Anna Amalia, Duchess of Brunswick-Wolfenbüttel	Germany
['Anna Amalia, Duchess of Brunswick-Wolfenbüttel', 'Germany']


Remember, we wanted to study the nationality of these composer. Let's therefore store the countries in a seperate list names `nationalities` using the `append()`. First we have to create an empty list, in which we can store our information later. The we iterate over the rows and get the second cell (index=1) where the country of birth is stored.

In [51]:
# Create an empty list
nationalities = []

# Iterate of the all the rows
for row in rows:
    # Create a new variable tsv_splitted which stores the list
    # returned by the split() function
    tsv_splitted = row.split('\t')
    # Append the second/last item in tsv_split to the nationalities list
    nationalities.append(tsv_splitted[1])

print(nationalities)

['Germany', 'Azerbaijan', 'Belgium', 'Bulgaria', 'Germany', 'Germany']


Creating the empty list might seem like a superflous step, but is crucial nonetheless--if a variable is not explicitly defined, Python does not now where to store the selected items.

Note also that the list keeps the order in which the rows are processed.

### 1.5.2 count()

Once we collected the information, we can start counting: how many of these composers come from Germany?

In [55]:
help(list.count)

Help on method_descriptor:

count(...)
    L.count(value) -> integer -- return number of occurrences of value



The *count()* method has one positional argument **value** and returns an integer. As the name already indicates, the method returns an integero that represents how often the value occurs in the list.

In [87]:
countries = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']
print(countries.count('Germany'))
print(countries.count('Belgium'))

3
1


### 1.5.3 sort()

The `sort()` function is a void function that sorts strings in alphabetical and numbers ascending order.

In [61]:
help(list.sort)

Help on method_descriptor:

sort(...)
    L.sort(key=None, reverse=False) -> None -- stable sort *IN PLACE*



In [59]:
countries.sort()
print(countries)

['Azerbaijan', 'Belgium', 'Bulgaria', 'Germany', 'Germany', 'Germany']


In [64]:
my_grades = [9,8,10,7,9,9]
my_grades.sort()
print(my_grades)

[7, 8, 9, 9, 9, 10]


The `reverse` argument allows you to sort in ascending (reverse=False) or descending (reverse=True) order.

In [None]:
my_grades.sort(reverse=True)
print(my_grades)

Before evaluating the cell block below, can you guess what resulting order will look like?

In [67]:
my_grades = [9,8,10,7,9,9]
countries = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']

grades_and_countries = my_grades + countries
print(grades_and_countries)

grades_and_countries.sort(reverse=False)
print(grades_and_countries)

[9, 8, 10, 7, 9, 9, 'Germany', 'Azerbaijan', 'Belgium', 'Germany', 'Bulgaria', 'Germany']


TypeError: '<' not supported between instances of 'str' and 'int'

Unfortunately the standard Python `order()` method is not smart enough to deal with a mixed list (type-wise). As an aside: a convenient solution solution would be to cast the integers as strings.

In [69]:
my_grades = ['9','8','10','7','9','9']
countries = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']

grades_and_countries = my_grades + countries
grades_and_countries.sort(reverse=False)

print(grades_and_countries)

['10', '7', '8', '9', '9', '9', 'Azerbaijan', 'Belgium', 'Bulgaria', 'Germany', 'Germany', 'Germany']


The `key` argument allows you to further refine your sorting. As we have not covered yet enough Python concepts for you to properly understand how this works, we leave it for the moment at two examples.

Basically, you pass a function for the argument `key`, for example `len` which counts how many items a value contains. If you pass the function `len` as an argument, this will count how many characters each string contains, and order the list by the length of each item.

In [96]:
# Sorting by the length of string
countries = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']
countries.sort(key=len,reverse=True)
print(countries)

['Azerbaijan', 'Bulgaria', 'Germany', 'Belgium', 'Germany', 'Germany']


Or sort the list by the frequency in which the items occur.

In [98]:

# Sorting by frequency of occurence
countries_ref = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']
# Actually this is more elegant:
# from collections import Counter
# countries.sort(key=Counter(countries).get,reverse=True)
countries.sort(key=countries_ref.count,reverse=True)
print(countries)

['Germany', 'Germany', 'Germany', 'Azerbaijan', 'Bulgaria', 'Belgium']


### 1.5.3 remove()

Let's assume our good read collection has grown a lot and we would like to remove some of the books from the list. Python provides the function `remove()` that you can call on a list and which takes as argument the item we would like to remove. 

In [None]:
good_reads = ["The Hunger games", "A Clockwork Orange", 
             "Pride and Prejudice", "Water for Elephants", "Illias"]
print(good_reads)
good_reads.remove("Water for Elephants")
print(good_reads)

If we try to remove a book that is not in our collection, Python raises an error to signal that something is wrong.

In [None]:
good_reads.remove("White Oleander")

Note, however, that `remove()` will only delete the *first* item in the list that is identical to the argument which you passed to the function. Execute the code in the block below and you will see that only the first instance of "Pride and Prejudice" gets deleted.

In [None]:
good_reads = ["The Hunger games", "A Clockwork Orange", 
             "Pride and Prejudice", "Water for Elephants", "Pride and Prejudice"]
good_reads.remove("Pride and Prejudice")
print(good_reads)

## An overview of list methods

In [None]:
#define some lists and variables
a = [1,2,3]
b = 4
c = [5,6,7]
x = 1
i = 2

#do some operations 
a.append(b)     # Add item b to the end of a
a.extend(c)     # Add the elements of list c at the end of a
a.insert(i,b)   # Insert item b at position i
a.pop(i)        # Remove from a the i'th element and return it. If i is not specified, remove the last element
a.index(x)      # Return the index of the first element of a with value x. Error if it does not exist
a.count(x)      # Return how often value x is found in a
a.remove(x)     # Remove from a the first element with value x. Error if it does not exist
a.sort()        # Sort the elements of list a
a.reverse()     # Reverses list a (no return value!)

print(a)

## 1.6 Nested Lists

### Tables

Lists can even contain lists. Why is this useful you may wonder? Nested list are a convenient



### Matrices
 A nested list can also represent a **matrix**, a notion we will often encounter further in this course.

[FROM WIKIPedia] In mathematics, a matrix (plural: matrices) is a rectangular array[1] of numbers, symbols, or expressions, arranged in rows and columns. For example, the dimensions of the matrix below are 2 × 3 (read "two by three"), because there are two rows and three columns:
![An example of a matrix](https://wikimedia.org/api/rest_v1/media/math/render/svg/d16330f5f99566fa754114ff04cd176d6185c796)

We can represent this matrix as a nested list an assign it to the variable `nested_list`.

In [99]:
nested_list = [[1,9,-13],[20,5,-6]]

To retrieve elements of the matrix, we use indexing and slicing techniques from previous course.

In [35]:
print(nested_list[0])
print(nested_list[0][0])
print(nested_list[1][2])
print(nested_list[0][:-2])

[1, 9, -13]
1
-6
[1]


To finish this section, here is an overview of the new concepts and functions you have learnt. Go through them and make sure you understand them all.

-  list

-  `.split()`
-  `.append()`
-  `.count()`
-  `.remove()`
-  `.sort()`
-  nested lists
-  *mutable* versus *immutable*

# 2. Dictionaries

## 2.1 Introduction

[CS] A dictionary is like a list, but more general. In a list, the indices have to be integers; in a dictionary they can be (almost) any type.

For example, using the index operator, we can retrieve the element at a certain position:

In [49]:
# Example 
all_my_friends = ['John','Mary','Benny']
# retrieve element by index
my_first_friend = all_my_friends[0]
print(my_first_friend)

John


You can think of a dictionary as a mapping between a set of indices (which are called keys) and a set of values. Each key maps to a value. The association of a key and a value is called a **key-value** pair or sometimes an item.

Imagine having to look up someone's number. Here the index (or key) would be the names of all citizens with a telephone, and the values the numbers. Here a numerical index does not make sense, because we want to retrieve the number by name, not by position in the book.

For example, what is Susan's phone number?

Dictionaries privode you with the data structure that make such tasks (looking up values by keys) exceptionally  easy.

For example, let's have a look at the `telephone_numbers` variable below:

In [51]:
telephone_numbers = {'Frank': 4334030, 'Susan': 400230, 'Guido': 487239}

... and now print Susan's telephone number

In [52]:
print(telephone_numbers['Susan'])

4334030

Note how similar this looks to retrieving the nth element in a list, e.g. `my_list[n]`.

Of course, you could do something similar with a list (the look-up by key), but that would be very impractical.

In [53]:
telephone_numbers = ['Frank', 4334030, 'Susan', 400230, 'Guido', 487239]
print(telephone_numbers[telephone_numbers.index('Susan')+1])

400230


[VU] That's pretty inefficient. The take home message here is **that lists are not really good if we want two pieces of information together**. Dictionaries for the rescue!

## 2.2 Creating a dictionary


[VU]
* a dictionary is surrounded by curly brackets and the key/value pairs are separated by commas.
* A dictionary consists of one or more **key:value pairs**, the key is the 'identifier' or "name" that is used to describe the value.
* the **keys** in a dictionary are unique
* the syntax for a key/value pair is: KEY : VALUE
* the keys (e.g. 'Frank') in a dictionary have to be **immutable**
* the values (e.g. 8) in a dictionary can by **any python object**
* a dictionary can be empty


In [None]:
english2deutsch = {'ambulance':'Krankenwagen',
                  'clever':'klug',
                  'concrete':'Beton'}


* Please note that **keys** in a dictionary have to **immutable**. Lists can not appear as key. 
* Anything can be a value.


Because keys have to be immutable, a list can not appear in this location. This should raise an error:

In [56]:
a_dict = {['a', 'list']: 8}
print(a_dict)

TypeError: unhashable type: 'list'

This should work:

In [55]:
a_dict = { 8:['a', 'list']}
print(a_dict)

{8: ['a', 'list']}


### 2.2.1 Adding items to a dictionary

[VU]There is one very simple way in order to add a **key:value** pair to a dictionary. Please look at the following code snippet:

In [101]:
english2deutsch = dict()
#or try english2deutsch = {}
print(english2deutsch)

{}


In [102]:
english2deutsch['one'] = 'einz'
english2deutsch['two'] = 'zwei'
english2deutsch['three'] = 'drei'
print(english2deutsch)

{'one': 'einz', 'two': 'zwei', 'three': 'drei'}


[VU]Please note that key:value pairs get overwritten if you assign a different value to an existing key.

In [103]:
english2deutsch = dict()
print(english2deutsch)
english2deutsch['one'] = 'einz?'
print(english2deutsch)
english2deutsch['one'] = 'zwei?'
print(english2deutsch)
english2deutsch['one'] = 'drei?'
print(english2deutsch)

{}
{'one': 'einz?'}
{'one': 'zwei?'}
{'one': 'drei?'}


## 2.3 Inspecting the dictionary
[VU] The most basic operation on a dictionary is a **look-up**. Simply enter the key and the dictionary returns the value. In the example below we mapped movies to their box-office performance. Keys are the Movie Titles, and values represent the ticket sales.

In [104]:
bo = {'Avatar': 27879650875, 'Titanic': 2187463944, 'Star Wars: The Force Awakens': 2068223624}

In [107]:
print(bo['Avatar'])

27879650875


[VU] If the key is not in the dictionary, it will return a ``KeyError``.

In [108]:
bo['The Lion King']

KeyError: 'The Lion King'

## 2.3 Dictionary Methods
### .get()

In order to avoid getting an error, you can use the ``get`` method. The first argument is the key to loop-uk, the second one defines the value to be returned if the key is not found

In [109]:
print(bo.get('The Lion King','Not in Dictionary'))
# a good alternative could be 
print(bo.get('The Lion King',False))

Not in Dictionary
False


### .keys()

the **keys** method returns the keys in a dictionary 

In [65]:
student_grades = {'Frank': 8, 'Susan': 7, 'Guido': 10}
the_keys = student_grades.keys()
print(the_keys)

dict_keys(['Frank', 'Susan', 'Guido'])


### .values()

the **values** method returns the values in a dictionary

In [None]:
the_values = student_grades.values()
print(the_values)

We can use the built-in functions to inspect the keys and values. For example:

In [66]:
the_values = student_grades.values()
print(len(the_values)) # number of values in a dict
print(max(the_values)) # highest value of values in a dict
print(min(the_values)) # lowest value of values in a dict
print(sum(the_values)) # sum of all values of values in a dict

3
10
7
25


### .items()

the **items** method returns a list of tuples, which allows us to easily loop through a dictionary.

In [67]:
student_grades = {'Frank': 8, 'Susan': 7, 'Guido': 10}
print(student_grades.items())

dict_items([('Frank', 8), ('Susan', 7), ('Guido', 10)])


## 2.4 Example Counting with dictionaries

Dictionaries are very useful to derive statistics, for example counting words

In [69]:
sentence = 'Obama was the president of the USA'
words = sentence.split()
word2freq = dict()

for word in words: 
    
    if word in word2freq: # add 1 to the dictionary if the keys exists
        word2freq[word] += 1 
    else:
        word2freq[word] = 1 # set default value to 1 if key does not exists 

    print(word, word2freq)

print()
print(word2freq)

Obama {'Obama': 1}
was {'Obama': 1, 'was': 1}
the {'Obama': 1, 'was': 1, 'the': 1}
president {'Obama': 1, 'was': 1, 'the': 1, 'president': 1}
of {'Obama': 1, 'was': 1, 'the': 1, 'president': 1, 'of': 1}
the {'Obama': 1, 'was': 1, 'the': 2, 'president': 1, 'of': 1}
USA {'Obama': 1, 'was': 1, 'the': 2, 'president': 1, 'of': 1, 'USA': 1}

{'Obama': 1, 'was': 1, 'the': 2, 'president': 1, 'of': 1, 'USA': 1}


## 2.5 Recap

To finish this section, here is an overview of the new concepts and functions you have learnt. Make sure you understand them all.

-  dictionary
-  indexing or accessing keys of dictionaries
-  adding items to a dictionary
-  `.keys()`
-  `.values()`

## Exercises - DIY Lists and dictionaries

Inspired by *Think Python* by Allen B. Downey (http://thinkpython.com), *Introduction to Programming Using Python* by Y. Liang (Pearson, 2013). Some exercises below have been taken from: http://www.ling.gu.se/~lager/python_exercises.html.

- Ex. 1: Consider the following strings `sentence1 = "Mike and Lars kick the bucket"` and `sentence2 = "Bonny and Clyde are really famous"`. Split these strings into words and create the following strings via list manipulation: `sentence3 = "Mike and Lars are really famous"` and `sentence4="Bonny+and+Clyde+kick+the+bucket"` (mind the plus signs!). Can you print the middle letter of the fourth sentence?

- Ex. 2: [VU]Create an empty list and add three names (strings) to it using the *append* method

In [None]:
Please use a built-in function to determine the number of strings in the list below

In [None]:
friend_list = ['John', 'Bob', 'John', 'Marry', 'Bob']
#  your code here

In [None]:
Please remove both *John* names from the list below using a list method

In [None]:
friend_list = ['John', 'Bob', 'John', 'Marry', 'Bob']
# your code here

-  Ex. 3: Consider the `lookup` dictionary below. The following letters are still missing from it: 'k':'kilo', 'l':'lima', 'm':'mike'. Add them to `lookup`! Could you spell the word "marvellous" in code language now? Collect these codes into the list object `msg`. Next, join the items in this list together with a comma and print the spelled out version!

> lookup = {'a':'alfa', 'b':'bravo', 'c':'charlie', 'd':'delta', 'e':'echo', 'f':'foxtrot', 'g':'golf', 'h':'hotel', 'i':'india', 'j':'juliett', 'n':'november', 'o':'oscar', 'p':'papa', 'q':'quebec', 'r':'romeo', 's':'sierra', 't':'tango', 'u':'uniform', 'v':'victor', 'w':'whiskey', 'x':'x-ray', 'y':'yankee', 'z':'zulu'}


-  Ex. 4: Collect the code terms in the lookup dict (`alpha`, `bravo`, ...) from the previous exercise into a list called `code_words`. Is this list alphabetically sorted? No? Then make sure that this list is sorted alphabetically. Now remove the items `victor`, `india` and `papa`. Append the words `pigeon` and `potato` at the end of this list. Combine this new list of items into a single string, using a semicolon as a delimiter and print this string. 

- Ex. 5: Write a program that given a long string containing multiple words, prints  the same string, except with the words in backwards order. For example, say I type the string:

`My name is Kaspar von Beelen`
Then I would see the string:

`Beelen von Kaspar is name My`

**Tip**: Try using a negative `step`.

Extra: Try to do this in just one line of code!