# Exercise 9: Dictionaries and sets
Let's get some practice working with dictionnaries and sets. Dictionaries are a useful because they offer a flexible way to structure information and access objects with a very explicit syntax, while sets are primarily useful for their efficiency.

## 9.1 Dictionaries

### Ex 9.1.1
Imagine you have some data on employees. The data for one employee coul be stored in a dictionary and look something like:

In [None]:
employee = {"first": "Roger",
            "last": "Federer",
            "age": 41}

- Add a new key `salary` with a value of `800000`
- Then print the first and last name of the employee as well as his salary.

In [None]:
employee['salary'] = 800000
print("{} {} earns {}".format(employee["first"], employee["last"], employee["salary"]))

### Ex 9.1.2
Of course we could store the data for all the different employees in a dictionary `employees`

In [None]:
employees = {'roger.federer': {"first": "Roger", "last": "Federer", "age": 41, "salary": 800000},
             'john.doe': {"first": "John", "last": "Doe", "age": 62, "salary": 40000},
             'mr.smith': {"first": "Mr", "last": "Smith", "age": 21, "salary": 60000},
             'hugo.boss': {"first": "Hugo", "last": "Boss", "age": 89, "salary": 120000}}

#### Ex 9.1.2 a
John Doe has his birthday, increase his age by 1.

In [None]:
employees['john.doe']['age'] += 1

In [None]:
employees['john.doe']['age']

#### Ex 9.1.2 b
Well now John Doe retires, remove him from the list of employees.

In [None]:
employees.pop('john.doe')

#### Ex 9.1.2 c
Now it's the end of the year and all employees get a raise of 5%. Use a for loop to modify the salaries accordingly, then print the salaries

In [None]:
for employee_id in employees:
    employees[employee_id]['salary'] *= 1.05

In [None]:
for employee in employees.values():
    print("{} {} earns {}".format(employee["first"], employee["last"], employee["salary"]))

### Ex 9.1.3
Dictionaries can be easily used as mappings. As an example imagine we have a categorical variable that can take the values `high`, `medium` and `low`, and we would like to transform that to integer values 3, 2 and 1.
- Define a dictionary `mapping` with keys `high`, `medium` and `low` and their corresponding values as values
- Use that dictionary to create a new list `mapped` containing the values of `categorical` mapped to integers

In [None]:
categorical = ['high', 'low', 'medium', 'medium', 'low', 'high']

In [None]:
mapping = {'low': 1, 'medium': 2, 'high': 3}
mapped = []
for value in categorical:
    mapped.append(mapping[value])

print("mapped:", mapped)

## 9.2 Sets

### Ex. 9.2.1
From the two lists below, extract the sets of unique elements. Then calculate their interesection.

In [None]:
l1 = [1, 2, 3, 2, 6, 2]
l2 = [1, 5, 1, 4, 3]

In [None]:
s1 = set(l1)
s2 = set(l2)
intersection = s1.intersection(s2)
print("s1", s1)
print("s2", s2)
print("intersection", intersection)

# Supplementary exercises

## Ex. 9.3
We start again with the dictionary of employees. 

Below I'll import `pprint` which you can use instead of `print` when printing a dictionary. Its output is much prettier.

In [None]:
from pprint import pprint

### Ex. 9.3.1
We would now like to specify in what department each employee works. You will find that data in the `departments` dictionary. Use it to add a `department` key to each employee in the `employees` dictionary.

In [None]:
employees = {'roger.federer': {"first": "Roger", "last": "Federer", "age": 41, "salary": 800000},
             'john.doe': {"first": "John", "last": "Doe", "age": 62, "salary": 40000},
             'mr.smith': {"first": "Mr", "last": "Smith", "age": 21, "salary": 60000},
             'hugo.boss': {"first": "Hugo", "last": "Boss", "age": 89, "salary": 120000}}
departments = {'tech': ['mr.smith', 'john.doe'],
               'sales': ['roger.federer', 'hugo.boss']}

In [None]:
for dep_name in departments:
    for emp_id in departments[dep_name]:
        employees[emp_id]['department'] = dep_name

pprint(employees)

### Ex. 9.3.2
It's again that time of the year when the employees get a raise, but the raises depend on the department in which the employee works and are stored in `raises`. Modify the employees'salaries accordingly, then print the new salaries.

In [None]:
raises = {'tech': 1.02, 'sales': 1.1}

In [None]:
for emp_id, employee in employees.items():
    emp_raise = raises[employee['department']]
    print("{} gets a {:.0f}% raise".format(emp_id, 100*(emp_raise - 1)))
    employee['salary'] *= emp_raise

print()
for employee in employees.values():
    print("{} {} earns {}".format(employee["first"], employee["last"], employee["salary"]))


### Ex. 9.3.3
A new employee is starting, his salary is the average salary of his department.
- calculate his salary
- add him to the `employees` dictionary

In [None]:
new_employee = {"first": "Richard", "last": "Hendriks", "age": 34, "department": "tech"}

In [None]:
salaries = []
for employee in employees.values():
    if employee["department"] == new_employee["department"]:
        salaries.append(employee["salary"])

average_salary = sum(salaries) / len(salaries)
new_employee["salary"] = average_salary

In [None]:
employee_id = "{}.{}".format(new_employee["first"].lower(), new_employee["last"].lower())
employees[employee_id] = new_employee
pprint(employees)

### Ex. 9.3.4
We would like to make some statistics about word frequencies. For every word in the text below, count how often it appears.

*HINTS:*
* start by replacing dots and commas by spaces to keep only the words
* strings have a method called `lower`, have a look at it, it might help
* remember the `str.split()` method to cut a string into a list of words.
* you can use `key in dict` to test whether a certain `key` is in a dictionary `dict`.

In [None]:
text = """The European languages are members of the same family. Their separate existence is a myth. 
    For science, music, sport, etc, Europe uses the same vocabulary. 
    The languages only differ in their grammar, their pronunciation and their most common words. 
    Everyone realizes why a new common language would be desirable: one could refuse to pay expensive translators. 
    To achieve this, it would be necessary to have uniform grammar, pronunciation and more common words. If several 
    languages coalesce, the grammar of the resulting language is more simple and regular than that of the individual 
    languages. The new common language will be more"""

In [None]:
word_count = {}
text = text.replace("."," ").replace(","," ").lower()
words = text.split()
for word in words:
    if not word in word_count:
        word_count[word] = 1
    else:
        word_count[word] += 1
print(word_count)

### Ex. 9.3.5
We would now like to compare 2 different texts.
* For each text, determine the set of words that appear in it
* Calculate the set of words that appear in both texts
* Calculate the set of words that appear in only one of the texts (*HINT: look at the `symmetric_difference` method*)
* Calculate the fraction of words that are common (appear in both texts)
* Calculate the average length of words for both sets

*HINTS:*
* start by replacing dots and commas by spaces to keep only the words
* strings have a method called `lower`, have a look at it, it might help
* remember the `str.split()` method to cut a string into a list of words.

In [None]:
text1 = """The European languages are members of the same family. Their separate existence is a myth. 
    For science, music, sport, etc, Europe uses the same vocabulary. 
    The languages only differ in their grammar, their pronunciation and their most common words. 
    Everyone realizes why a new common language would be desirable: one could refuse to pay expensive translators. 
    To achieve this, it would be necessary to have uniform grammar, pronunciation and more common words. If several 
    languages coalesce, the grammar of the resulting language is more simple and regular than that of the individual 
    languages. The new common language will be more"""
text2 = """Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live 
    the blind texts. Separated they live in Bookmarksgrove right at the coast of the Semantics, a large language ocean.
    A small river named Duden flows by their place and supplies it with the necessary regelialia. It is a paradisematic 
    country, in which roasted parts of sentences fly into your mouth. Even the all-powerful Pointing has no control about
    the blind texts it is an almost unorthographic life One day however a small line of blind text by the name of Lorem Ipsum decided to"""

In [None]:
text1 = text1.replace("."," ").replace(","," ").lower()
text2 = text2.replace("."," ").replace(","," ").lower()
words1 = set(text1.split())
words2 = set(text2.split())

common = words1.intersection(words2)
difference = words1.symmetric_difference(words2)
common_frac = len(common)/len(words1.union(words2))
print("{:.2}% of the words appear in both texts".format(100*common_frac))

lengths = []
for word in common:
    lengths.append(len(word))
average_length_common = sum(lengths)/len(lengths)

lengths = []
for word in difference:
    lengths.append(len(word))
average_length_difference = sum(lengths)/len(lengths)

print("There are {} common words with average length {}".format(len(common),average_length_common))
print("There are {} words present in only one of the texts with average length {}".format(len(difference),average_length_difference))