# Collections: Sets, dictionaries, and more

_Practical Python for Linguistics and the Humanities -- Alexis Dimitriadis_

Today's interactive tutorial shows how to work with several new "container" data types, in addition to lists: Dictionaries, sets, and tuples. To gain easy access to lots of text, we first learn how to read text from files.

Some exercises in this unit are from *Think Python* by Allen B. Downey (http://thinkpython.com), *Introduction to Programming Using Python* by Y. Liang (Pearson, 2013), or _A Python Course for the Humanities_ by Folgert Karsdorp and Maarten van Gompel. 

## Contents


**[1. Review: Reading and searching](#1.-Review:-Reading-and-searching)**  
&nbsp;&nbsp;&nbsp;&nbsp;
  [1.1 Reading text from files](#1.1-Reading-text-from-files)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [1.2 Appending to a list](#1.2-Appending-to-a-list)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [1.3 List comprehensions](#1.3-List-comprehensions)  

**[2. Sets: Removing repetitions (and order)](#2.-Sets:-Removing-repetitions-%28and-order%29)**  

**[3. Dictionaries: Indexing by name, not position](#3.-Dictionaries:-Indexing-by-name,-not-position)**  
&nbsp;&nbsp;&nbsp;&nbsp;
  [3.1 A dictionary of counters](#3.1-A-dictionary-of-counters)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [3.2 Important dictionary methods](#3.2-Important-dictionary-methods)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [3.3 Lists and "views"](#3.3-Lists-and-"views")  
&nbsp;&nbsp;&nbsp;&nbsp;
  [3.4 Iterating with lists and dictionaries](#3.4-Iterating-with-lists-and-dictionaries)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [3.5 Specialized dictionaries](#3.5-Specialized-dictionaries)  

**[4. Tuples and multiple assignment](#4.-Tuples-and-multiple-assignment)**  
&nbsp;&nbsp;&nbsp;&nbsp;
  [4.1 Using multiple assignment](#4.1-Using-multiple-assignment)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [4.2 Working with tuples](#4.2-Working-with-tuples)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [4.3 Handy tuple iterators](#4.3-Handy-tuple-iterators)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [4.4 Tuple or list?](#4.4-Tuple-or-list?)  

**[5. Sorting](#5.-Sorting)**  
&nbsp;&nbsp;&nbsp;&nbsp;
  [5.1 sort() and sorted()](#5.1-sort%28%29-and-sorted%28%29)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [5.2 Sorting dictionaries](#5.2-Sorting-dictionaries)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [5.3 Advanced sorting: "Sorting keys"](#5.3-Advanced-sorting:-"Sorting-keys")  

**[6. Deleting data](#6.-Deleting-data)**  

**[7. What we have learned](#7.-What-we-have-learned)**  

**[8. Additional Exercises (harder)](#8.-Additional-Exercises-%28harder%29)**  


## 1. Review: Reading and searching

_**Preparation:** Download the file "austen-emma.txt", and ensure that it is present in the same folder as this notebook._ 

### 1.1 Reading text from files

"Opening" a file gives us a special "file object": a kind of connection
to the disk file, from which we can "read" text. When we open a file, python will look for it in the same folder as the script.

In [None]:
connection = open("austen-emma.txt")
print(connection)

There are numerous ways to read text from a file, but we will only need one today: We'll construct a string containing the file's entire contents, then we will convert this string into a list of words.

In [None]:
conn = open("austen-emma.txt")
alltext = conn.read()     # A single string
conn.close()

print("We read", len(alltext), "characters.")
emmawords = alltext.split() # A list of (short) strings
print(len(emmawords), "words.")
print(emmawords[0:20])

Reading a file advances through it, like playing a music file or video. Once we have read to the end, there is nothing more to read: if we try, we get the empty string.
To read from the start again, the simplest way is to
re-open the file (i.e., make a new connection).

The list-of-words format is useful for counting or searching for
individual words. (Recall that the flag `end=" "` allows us to put words on the same line.)

In [None]:
for word in emmawords:
    if word.endswith('ings'):
        print(word, end=" ")
print()

Look at the output and you'll see that this list includes a pair of words stuck together with punctuation: Splitting up "words" at spaces is too simplistic, but we'll tolerate it for the time being.

### 1.2 Appending to a list

We already know how to build a list with words that have a certain property: Create an empty list, and write a loop that appends every qualifying word to this list.

### Your turn:

Adapt the earlier `for`-loop so that instead of printing words that end in `-ings`, it collects them in a list named `ingswords`. Print out the number of words on the list, with an informative message. Finally, print out the words of the list, separated by spaces.

In [None]:
# YOUR CODE:



### 1.3 List comprehensions

A "list comprehension" is a very powerful alternative to a list-building loop:
Essentially, a loop inside the new list generates all its elements.

Comprehensions can include a test. Only elements that satisfy the `if` clause
are added to the list.

In [None]:
she = [x for x in emmawords if x.lower() == "she"]

The above one-liner is equivalent to building a list with `append()`:

In [None]:
she = []
for x in emmawords:
    if x.lower() == "she":
        she.append(x)

### Your turn:

Use a list comprehension to make a list of all words that end in "-ings". Then print the list.

In [None]:
# YOUR CODE:



Note that our set of words contains many repetitions. In python, the simplest way to get rid of repetitions is to form a set.

## 2. Sets: Removing repetitions (and order)

Like a list, a set is a "collection" of objects. But there are two important differences: In Python, as in mathematics, the elements of a set come in no particular order. And it is not possible to contain an element twice: Something is either an element of a set or is it not. A common use of sets in Python is precisely to discard repetitions:

In [None]:
text = "the quick brown fox jumps over the lazy dog"
words = text.split()
singles = set(words)
print(singles)
print(set([1,1,1,200,100,300,100]))

The elements of a set can be enumerated with a for-loop, but they come out in arbitrary order:

In [None]:
for word in singles:
    print(word, end=" ")

Python supports the standard set operations from mathematics:

In [None]:
A = set(["x", "y", "z"])
B = set(["a", "b", "x"])
print("Union:",        A | B)  # or A.union(B)
print("Intersection:", A & B)  # or A.intersection(B)
print("Difference:",   A - B)  # or A.difference(B)

The real purpose of python sets is efficiency. Python can
_very_ quickly determine if something is an element of a set. We can check for set membership with the keyword `in`. While this works with lists as well, **looking up an element in a long list is much, much slower than checking a set.**

In [None]:
if "fox" in singles:
    print('The word "fox" is in the set')

### Your turn:

Convert the list of words ending in -ings into a set. Print them one at a time, as before.

In [None]:
# YOUR CODE:



## 3. Dictionaries: Indexing by name, not position

Recall we access the elements of a list by their numerical index: `words[10]`, etc.
A _dictionary_ (aka _hash_ or _associative array_)
is like a list, but the locations of its elements
have names, called _keys_, instead of numbers. 

The values can be anything: strings, numbers, lists,
other dictionaries, etc. We can add values to a dictionary, and print out specific values, like this:

In [None]:
part_of_speech = dict()
part_of_speech["go"] = "Verb"
part_of_speech["illness"] = "Noun"
print(part_of_speech["go"])

A dictionary is in essence a collection of key-value pairs, and we can write them directly:

In [None]:
fr_ned = {"table":"tafel", "chaise":"stoel" }

Keys are not necessarily strings, but they must be unique
and "immutable". In particular: A list cannot be a key.

Dictionary values can be anything: Here's a translation dictionary that uses a list when there is more than one translation. 

In [None]:
fr_ned = {"table":"tafel", 
          "le": ["de", "het"], 
          "chaise": "stoel" }

As with sets, the order of elements is not retained.

In [None]:
print(fr_ned)

We can check for a key in the dictionary with `in` or `not in`:

In [None]:
w = "chaise"
if w in fr_ned:
    print(w, "means", fr_ned[w])
else:
    print("I don't know what", w, "means!")

If we try to look up a key that is not in our dictionary, we'll get a `KeyError`. This is similar to the `IndexError` we get if we try to index past the end of a list.

In [None]:
print(fr_ned["velo"])   # will raise an error

Like sets, python's dictionaries are designed to make looking up keys extremely fast. In contrast, looking up a value in a list requires examining each of its elements, which gets very slow with long lists. Avoid having to scan a list for a value.

### Exercise 1

Consider the dictionary `lookup` below. It represents a ["radio alphabet"][1] traditionally used to spell out words under noisy conditions. The following letters are still missing from it: `'k':'kilo', 'l':'lima', 'm':'mike'`. Add them to `lookup`. Could you spell the word "marvellous" in radio alphabet now? Collect the corresponding codes into a list named `msg`. Next, join the items in this list into a single string, with a comma between words, and print the spelled out version.

[1]: https://en.wikipedia.org/wiki/Spelling_alphabet

In [None]:
lookup = {'a':'alfa', 'b':'bravo', 'c':'charlie', 'd':'delta', 'e':'echo', 
          'f':'foxtrot', 'g':'golf', 'h':'hotel', 'i':'india', 'j':'juliett', 
          'n':'november', 'o':'oscar', 'p':'papa', 'q':'quebec', 'r':'romeo', 
          's':'sierra', 't':'tango', 'u':'uniform', 'v':'victor', 'w':'whiskey', 
          'x':'x-ray', 'y':'yankee', 'z':'zulu'}

In [None]:
# YOUR CODE:



### 3.1 A dictionary of counters

A dictionary can keep track of a large number of values at once. Let's use one to count how often each word occurs in a file:

In [None]:
wordcounts = dict()
for word in emmawords:
    if word in wordcounts:
        wordcounts[word] += 1
    else:
        wordcounts[word] = 1

print("kindness", wordcounts["kindness"])

Note the `if-else` structure carefully. The loop examines each word in the text being counted. The statement `if word in wordcounts` tests whether a word is found as a key in the dictionary (i.e., if the word has been seen before). If a word is in the dictionary, its count is incremented by one. But if the word is not yet in the dictionary, there is no count to increment: trying would trigger a lookup error. Hence the `else` clause simply adds the new word to the dictionary, with count 1.

### Your turn:

Build a list of all words in Jane Austen's _Emma_ that end with "ness". Then construct a dictionary `nesscounts` that counts how often each word ending in "ness" occurs. 

In [None]:
# YOUR CODE:



In [None]:
# Check the result:
print("kindness", nesscounts["kindness"])

**Variation:** Solve the previous problem again, but without the intermediate list: Construct a dictionary `nesscounts2` that counts how often each word ending in "ness" occurs, by looping over all words in Emma but only counting words that end in "ness".

In [None]:
# YOUR CODE:



The two methods should have given us the same result. We can use the relation `==` to check if two dictionaries are exactly the same. (If yours are not, go back and try to figure out your mistake.)

In [None]:
if nesscounts == nesscounts2:
    print("The two dictionaries are equal")
else:
    print("Error somewhere: The dictionaries are not equal!")

### 3.2 Important dictionary methods

What can we do with a dictionary? We can look up individual keys in it, but we can also extract all its elements. There are several ways to do this: We can get the keys, the values, or keys and values together, in pairs. Look carefully at the output of the following methods:

In [None]:
print(fr_ned.keys())

In [None]:
print(fr_ned.values())

In [None]:
print(fr_ned.items())

The python function `len()`, which gives us the size of a string or list, will also give us the number of elements in a set or dictionary.

In [None]:
print("Number of elements:", len(fr_ned))

### 3.3 Lists and "views"

In Python 3, the methods `keys()` and `values()` return a list-like
object called a _view_. We can mostly use them just like lists, but
they can be explicitly converted to lists when necessary:

In [None]:
print("View:", fr_ned.values())

In [None]:
print("List:", list(fr_ned.values()))

Views are a kind of _iterator,_ a powerful python concept.
A major advantage over lists is that they don't actually
build a list of the keys of values (which could be very long).
Do not convert a view to a list unless it is necessary.

### 3.4 Iterating with lists and dictionaries

To work with the contents of a dictionary, there are several ways to loop (iterate) over them. Use whichever method gives you the data needed for a particular task.

We know that iterating over a list (with a for-loop) gives its elements.

In [None]:
fr_nouns = [ "table", "chaise", "velo" ]
for word in fr_nouns:
    print(word)

Iterating over a dictionary gives its ***keys***! 

In [None]:
for word in fr_ned:
    print(word)

We can also iterate over the keys by explicitly writing `fr_ned.keys()`:

In [None]:
for word in fr_ned.keys():
    print(word)

Of course, once we have the keys we can use them to get the values too:

In [None]:
for word in fr_ned:
    print(word, fr_ned[word])

But we can also use `dict.values()` to get the values, or `dict.items()` to get keys and values together:

In [None]:
for val in fr_ned.values():
    print(val)

In [None]:
for key, val in fr_ned.items():
    print(key, "-->", val)

Note the two-variable form of the last `for`-loop. It will be explained in the next section.

### Your turn: 

The dictionary `nesscounts` contains words ending in "-ness" and how often they occurred in our text. (Ensure that it is still defined, by re-running the relevant cells if necessary.) Print, _nicely,_ each word in `nesscounts` and the number of times it was seen. ("Nicely" means you should not just dump the dictionary; each word and its count should be on a separate line, with no stray quotes etc. It is not necessary to align the numbers to the same column.)

In [None]:
# YOUR CODE:



### 3.5 Specialized dictionaries

The Python library `collections` offers specialized dictionary subclasses. 

- A [`defaultdict`][3] does not raise an error when we look for a key that does not exist, but adds  the key with a default value. It is handy for avoiding the existence test (and two versions of the insertion code) when we are collecting values in a dictionary.

- A [`Counter`][1] is specially designed for counting. Passing a sequence of objects to its constructor will count the different elements, and store their frequency in the dictionary. A key that does not exist will be treated as having frequency 0, so that it is possible to increment keys without checking if they were already present.  `Counter` also adds useful methods such as `most_common()`, which returns all or some of the known keys sorted in order of frequency.

- An [`OrderedDict`][2] will return its contents in the same order that they were inserted.

[1]: https://docs.python.org/3/library/collections.html#collections.Counter
[2]: https://docs.python.org/3/library/collections.html#collections.OrderedDict
[3]: https://docs.python.org/3/library/collections.html#collections.defaultdict

In [None]:
from collections import Counter
letters = Counter("Abracadabra")  # A string is a sequence of letters
letters["B"] += 1  # Not an error
print(letters)

## 4. Tuples and multiple assignment

A `tuple` is a special kind of short list. We create an "implicit" tuple when we list several values separated by commas: 

In [None]:
fruit = "banana", "apple", "pear"
print(fruit)

Conversely, a tuple can be "unpacked" into an equal number of variables:

In [None]:
a, b, c = fruit
print(b)

Simultaneously (multiple assignment):

In [None]:
odd, even = 10, 5

Tuples, like lists, support indexing and slices.

In [None]:
fruit = "banana", "apple", "pear"
print(fruit[0])

But tuples are _immutable_: We cannot add elements or modify existing ones. If we try, we get an error.

In [None]:
fruit[0] = "strawberry"

Tuples, like lists, support concatenation (which creates a new tuple):

In [None]:
print((1, 2) + (3, 4))

### 4.1 Using multiple assignment

Sometimes (especially with more mathematical tasks), we want to exchange the value of two variables. This task is famously tricky for beginning programmers, because the obvious approach does not work:

In [None]:
a = "left"
b = "right"

# Let's try to exchange `a` and `b`:

a = b    # Does not work!
b = a

Can you see why the above goes wrong? Print the values of `a` and `b` and see if you understand.

In contrast to the above, multiple assignment happens conceptually "in one step", so the following works as intended: 

In [None]:
a = 5
b = 10

a, b = b, a  # Simultaneous assignment: works

We can use the same technique to progress along sequences of words or numbers. Do you recognize the following algorithm?

In [None]:
x = y = 1
for n in range(10):
    print(x, end=' ')
    x, y = y, x+y
print()

### 4.2 Working with tuples

The dictionary method `items()` generates a list of tuples, each containing a key and a value from the dictionary.

In [None]:
for item in fr_ned.items():
    if item[0].startswith("c"):
        print(item[0], item[1])

Instead of indexing into the tuple, it's much nicer to unpack it in place (as we did earlier):

In [None]:
for frword, nlword in fr_ned.items():
    if frword.startswith("c"):
        print(frword, nlword)

The number of variables must match the length of the tuple,
or we get an error:

In [None]:
a, b = fruit   # Error: fruit is a tuple of length 3

In [None]:
for a, b, c in fr_ned.items():  # Error: items() gives tuples of length 2
    print(b)

### 4.3 Handy tuple iterators

Recall that we can use a for-loop to iterate over a list, which loops over the elements of the list without using a visible index. To get the index along with the values, use `enumerate()`. It gives us a sequence of tuples, each with an index and the corresponding element:

In [None]:
colors = ['green', 'red', 'purple']
for n, col in enumerate(colors):
    print(n, col)

If you have two separate lists and you want to iterate over them in parallel, use `zip()`. It creates a sequence of tuples containing one element from each list. Like `enumerate()`, it is not a real list but an "iterator": a special object that we can loop over, or convert into a list.

In [None]:
nl = "Dit is een zin".split()
en = "This is a sentence".split()
parallel = zip(nl, en)
print(parallel)
print(list(parallel))

for pair in zip(nl, en):
    print(pair)

Iterators are a little tricky: If you try to list `parallel` a second time, you'll get nothing:

In [None]:
print(list(parallel))

This is because iterators are similar to reading a file: when we read from one, we "advance" it until there's nothing more to read. If you need to use one repeatedly, save it to a list-- or just make a fresh iterator.

In [None]:
pairs = list(zip(nl, en))  # make a real, permanent list

Python's iterators are designed to be used with for-loops. In fact, python makes an iterator behind the scenes _every_ time we use a for-loop. When things go well, it all "just works".

Once we have a saved list of pairs, we can use tuple unpacking to print or save just one element.

In [None]:
# Separate the pairs: Extract the Dutch part, then the English part 
for n, e in pairs:
    print(n, end=" ")
print()

### Your turn:

Loop over the variable `pairs` and print just the English words. Then use another loop to collect the English words into a new list, `engwords`.

In [None]:
# YOUR CODE:



### 4.4 Tuple or list?

Lists can be modified by adding, deleting or modifying elements.

Because tuples are "immutable", they are legal as `dict` keys.
Lists are not.

**Python "best practice":**

- **Use lists** for homogeneous collections, e.g., a list of students.

- **Use tuples** when each element has a different function, e.g.
`(first_name, last_name, age, address)`

## 5. Sorting

### 5.1 sort() and sorted()

Let's go back to our set of words ending in -ings. It would make more sense to print them out alphabetically. Since sets have no order, we convert the set of words to a list (which is still in a random-like order). Then we _sort_ the list:

In [None]:
ingsset = set(ingswords)
ingslist = list(ingsset)

print("Before sorting:")
for word in ingslist:
    print(word, end=" ")
print("\n")
      
ingslist.sort()
print("After sorting:")
for word in ingslist:
    print(word, end=" ")
print()

The list has been alphabetized (but note that capital letters are ordered before lowercase letters.) The list method `sort()` will modify the list in-place, rearranging its elements.
To sort a list "non-destructively", i.e., without modifying it, use the function `sorted()`. It returns a sorted copy of the list, without modifying the original.

In [None]:
ingslist = list(ingsset)  # Re-create our unsorted list
print("Sorted:", sorted(ingslist))
print()
print("Original:", ingslist)

### 5.2 Sorting dictionaries

Dictionaries, like sets, don't have an in-place method `sort()`: The order of a dictionary's elements is unpredictable by design. But we can process or print a dictionary in sorted order, by using `sorted()` on 
the sequence produced by one of the dictionary methods `dict.keys()`, `dict.values()` or `dict.items()`. 

The simplest case is to display a dictionary alphabetically, by key.
For example, recall our dictionary `nesscounts` which contains the frequencies of words ending in "-ness". To print the words and their contents alphabetically, we sort the words (the dictionary keys) and print each key and its corresponding value.

In [None]:
for word in sorted(nesscounts.keys()):
    print(word, nesscounts[word])

In the above we wrote `nesscounts.keys()` for the sake of explicitness; recall that iterating over a dictionary object (or passing it to a list-building constructor or iterator, such as the function `sorted()`) will give the same result: A list of the keys.

### 5.3 Advanced sorting: "Sorting keys"

Because upper and lower case letters are treated as different characters, capitalized words were sorted before lowercase words in the last output. To alphabetize "A" and "a" together ("case-insensitive sorting"), we could build and sort a dictionary containing everything in lower case.
But suppose we still want to see the words in their original form? In this case, we must sort according to a "key" that is different from the word itself. 

Python's `sort` routines accept an optional argument `key` (no relation to dictionary keys), for just this purpose. The value of `key` must be a **function** that takes one argument (it will be one of the elements in the list to be sorted), and returns a "sorting key" that will determine its place in the sort. For case-insensitive sorting, the key for each word can be its lowercase version. This ensures that "A" and "a" are treated as equal:

In [None]:
def lowercased(x):
    return x.lower()

ingsbetter = sorted(ingslist, key=lowercased)
print(" ".join(ingsbetter))

Note that by passing the dictionary object to `sorted`, we get back a sorted list _of the keys._ Once we've sorted the keys, of course, we can print the full dictionary (keys and values) in the desired sorted order.

To avoid littering our code with super-simple functions that are only used once, we could have used python's `lambda` construction instead of defining `lowercased()`. It defines an unnamed function that returns the expression after the `:`.

In [None]:
ingsbetter = sorted(ingslist, key=lambda x: x.lower())

### Your turn:

Sort and print out the dictionary `nesscounts` (including values) in case-insensitive alphabetic order.

In [None]:
# YOUR CODE:



### Your turn: Sorting by value

Wouldn't it be nice if we could see the contents of `nesscounts` by descending order of frequency? For this, the sort key function must have access to the value corresponding to each key. The best way to achieve this is by sorting the value of `nesscounts.items()`, which produces the contens of the dictionary as a sequence of tuples, `(key, value)` (i.e., `(word, frequency)` in our case). Construct a sorting key function that uses the frequency, not the word, as the sorting key. (The function must still take one argument, but in this case it will be a tuple.)

Your function will probably produce the dictionary in ascending order of frequency (small frequencies first). To convert it to descending order, you can add the option `sorted(..., reverse=True)`. (Another option: The key function can negate the frequencies, so that high frequencies produce smaller keys.)

Print out the dictionary `nesscounts` in order of descending frequency (greatest first).

In [None]:
# YOUR CODE:



## 6. Deleting data

The "mutability" of most python data types wouldn't be worth much if we could only add data to them. As with lists, is  possible to selectively remove elements from a set or dictionary
using python's general-purpose deletion operator `del`. 

In [None]:
print("List:")
numbers = list(range(10, 20))
print(numbers)
del numbers[3:8]
print(numbers)

In [None]:
print("Dictionary:")
lettercounts = {'s':4, 'p':2, 'm':1, 'i':4}

print(lettercounts)
del lettercounts["p"]
print(lettercounts)

In the above, we deleted elements by specifying the index or key of the element(s) to be removed. We can also remove elements from a list by value: The list method `remove()` will delete the first element with matching value:

In [None]:
numbers.remove(18)
print(numbers)

Sets also have a `remove()` method:

In [None]:
print("Set:")
letters = set("mississippi")
print(letters)
letters.remove("m")
print(letters)

Take a good look at the above sets. Fun fact: If we use a string to initialize a set (or a list), the _letters_ of the string become the elements of the new object. This is most often done by mistake.

### Exercise 2

Collect the code terms in the lookup dict (`alpha`, `bravo`, ...) from the previous exercise into a **list** called `code_words`. Is this list alphabetically sorted? No? Then arrange for the list to be sorted alphabetically. Now remove the items `victor`, `india` and `papa`. Append the words `pigeon` and `potato` to the end of this list. Combine this new list of items into a single string, using a semicolon (`;`) as a delimiter and print this string. 

In [None]:
# YOUR CODE:



## 7. What we have learned

### What you need to know by heart

- How to store and retrieve values in a dictionary.
- How to find the unique values in a list, by converting into a set to discard repetitions.
- How to loop over the contents (keys, values or both) of a dictionary.
- How to check if something is in a set. You should also know NOT to use a list (especially a large one) for this purpose.

### What you should remember you saw

- "Tuple unpacking": Your for-loops can use multiple loop variables to loop over a list of tuples.
- The `items()` method of a dictionary allows you to loop over keys and values together.
- You can use a dictionary to count multiple things at once (e.g., the frequencies of words in a document).
- The `collections` module provides specialized dictionary classes for various purposes, including counting.
- To print out a list, set or dictionary in alphabetical order, sort it (which produces a list).
- You can specify various ordering criteria for sorting.

## 8. Additional Exercises (harder)

### Exercise 3

Translate the following sentence to French using the two dictionaries. Do it in two steps: First get the literal translation of the words and glue those together in one string. Then use the `rewrite_rules` dictionary to get the spelling right. Print the result in one string.

Recall that the string method `replace()` can be used to replace all instances of a substring with something else. (Since python strings are "immutable", the result is returned as a new string.)

In [None]:
garfield="I love lasagna"

EN_FR_dict= {"I":"je","love":"aime","lasagna":"lasagne"}
rewrite_rules={"e a":"'a","e e":"'e"}

In [None]:
# YOUR CODE:



### Exercise 4

Many words are derived from other words, and _-ness_ is a well-known suffix that can be added to English adjectives. In this exercise we study its use in Jane Austen's _Emma_, using a mix of data structures.

<!-- This version does not use Regexps -->

1. Read in the text of the file `austen_emma.txt`, and separate it into words.

2. Create a set `emmaset` with all the words it the text. To avoid storing separate capitalized and lowercase versions (`the` and `The`), convert each word to lower case before you add it to the collection.

3. Examine each word in `emmaset`. If it ends in `ness`, check if the base word is also in `emmaset`. E.g., if the word is `idleness`, look for the base word `idle`.  
    If the base word is found, create a tuple containing the two words and add it to a list `wordpairs`. If the base word is not found, add the ness-word to a list `orphans`. For example, _Emma_ contains `sullenness` but not `sullen`.

4. A word like `laziness` is derived from `lazy`, not `lazi`. Enhance your code to take this spelling change into account--i.e., be prepared to recognize pairs like `(laziness, lazy)`.

5. Sort and print out the two lists you created. Print a short statement reporting their length. I found 74 word pairs and 17 orphans (some due to attached punctuation).

In [None]:
# YOUR CODE:



<!-- -----------------------------------------------------------------

You've reached the end of Chapter 2! Ignore the code block below -- it's only there to make the page prettier.

from IPython.core.display import HTML
def css_styling():
    styles = open("styles/custom.css", "r").read()
    return HTML(styles)
css_styling()

-->