This is actually a section of short exercises, so we'll keep it here instead of inside the projects folder for simplicity.

#### Exercise 1

Let's revisit an exercise we did right after the section on dictionaries.

You have text data spread across multiple servers. Each server is able to analyze this data and return a dictionary that contains words and their frequency.

Your job is to combine this data to create a single dictionary that contains all the words and their combined frequencies from all these data sources. Bonus points if you can make your dictionary sorted by frequency (highest to lowest).

Solve it using `defaultdict` and `Counter`

For example, you may have three servers that each return these dictionaries:

In [15]:
d1 = {'python': 10, 'java': 3, 'c#': 8, 'javascript': 15}
d2 = {'java': 10, 'c++': 10, 'c#': 4, 'go': 9, 'python': 6}
d3 = {'erlang': 5, 'haskell': 2, 'python': 1, 'pascal': 1}

#### Solution

In [19]:
from collections import Counter, defaultdict

In [18]:
def merge(*dicts):
    unsorted = defaultdict(int)
    for d in dicts:
        for k, v in d.items():
            unsorted[k] += v

    return dict(sorted(unsorted.items(), key=lambda e: e[1], reverse=True))

merge(d1, d2, d3)

{'python': 17,
 'javascript': 15,
 'java': 13,
 'c#': 12,
 'c++': 10,
 'go': 9,
 'erlang': 5,
 'haskell': 2,
 'pascal': 1}

In [21]:
def merge(*dicts):
    unsorted = Counter()
    for d in dicts:
        unsorted.update(d)
    return dict(unsorted.most_common())

merge(d1, d2, d3)

{'python': 17,
 'javascript': 15,
 'java': 13,
 'c#': 12,
 'c++': 10,
 'go': 9,
 'erlang': 5,
 'haskell': 2,
 'pascal': 1}

#### Exercise 2

Suppose you have a list of all possible eye colors:

In [42]:
eye_colours = ("amber", "blue", "brown", "gray", "green", "hazel", "red", "violet")

Some other collection (say recovered from a database, or an external API) contains a list of `Person` objects that have an eye colour property.

Your goal is to create a dictionary that contains the number of people that have the eye colour as specified in `eye_colours`. The wrinkle here is that even if no one matches some eye colour, say `amber`, your dictionary should still contain an entry `"amber": 0`.

Here is some sample data:

In [43]:
class Person:
    def __init__(self, eye_colour):
        self.eye_colour = eye_colour

In [44]:
from random import seed, choices
seed(0)
persons = [Person(colour) for colour in choices(eye_colours[2:], k = 50)]

As you can see we built up a list of `Person` objects, none of which should have `amber` or `blue` eye colours

Write a function that returns a dictionary with the correct counts for each eye colour listed in `eye_colours`.

#### Solution

In [45]:
def count_eye_colours(persons: list, possible_eye_colours: tuple):
    counts = Counter({colour: 0 for colour in possible_eye_colours})
    counts.update(person.eye_colour for person in persons)
    return counts

count_eye_colours(persons, eye_colors)

Counter({'violet': 12,
         'gray': 10,
         'red': 10,
         'green': 8,
         'hazel': 7,
         'brown': 3,
         'amber': 0,
         'blue': 0})

#### Exercise 3

You are given three JSON files, representing a default set of settings, and environment specific settings. The files are included in the downloads, and are named:

- common.json
- dev.json
- prod.json

Your goal is to write a function that has a single argument, the environment name, and returns the "combined" dictionary that merges the two dictionaries together, with the environment specific settings overriding any common settings already defined.

For simplicity, assume that the argument values are going to be the same as the file names, without the .json extension. So for example, `dev` or `prod`.

The wrinkle: We don't want to duplicate data for the "merged" dictionary - use ChainMap to implement this instead.

#### Solution

Here are the following steps:

1. Write a function `load_settings` that takes in an environment name (str) and returns a dictionary of its settings found in its JSON file.
2. Write a function `settings` that takes an environment name (str) and returns a `ChainMap` dictionary which merges the specified environment with the `common.json` settings.
3. Recognise that `dev` and `common` both have a key called `database` whose value is a dictionary. We don't want the dev dictionary to override the common dictionary; instead, we want to merge them.
4. Write a function called `chain_recursive` which takes two dictionaries (child and parent). Iterate through the values of the child and if it's a dictionary *and* that key is in the parent dictionary (with an associated dictionary value), merge the two dictionaries using the `chain_recursive` function.
5. Modify your `settings` function to return a `chain_recursive` of the two environment dictionaries instead of a regular `ChainMap`

In [54]:
import json
from pprint import pprint

# STEP 1
def load_settings(env: str):
    with open(f"{env}.json") as f:
        settings = json.load(f)
    return settings

pprint(load_settings('common'))

{'data': {'input_root': '/default/path/inputs',
          'numerics': {'precision': 6, 'type': 'Decimal'},
          'output_root': '/default/path/outputs'},
 'database': {'db_name': 'deepdive', 'port': 5432, 'schema': 'public'},
 'logs': {'format': '%(asctime)s: %(levelname)s: %(clientip)s %(user)s '
                    '%(message)s',
          'level': 'info'}}


In [55]:
from collections import ChainMap

# STEP 2
def settings(env: str):
    env_settings = load_settings(env)
    common_settings = load_settings('common')
    return ChainMap(env_settings, common_settings)

pprint(settings('dev'))  # 1st dict in output is child; 2nd is parent

ChainMap({'data': {'input_root': '/dev/path/inputs',
                   'numerics': {'type': 'float'},
                   'operators': {'add': '__add__'},
                   'output_root': '/dev/path/outputs'},
          'database': {'pwd': 'test', 'user': 'test'},
          'logs': {'format': '%(asctime)s: %(levelname)s: %(clientip)s '
                             '%(user)s %(filename)s %(funcName)s %(message)s',
                   'level': 'trace'}},
         {'data': {'input_root': '/default/path/inputs',
                   'numerics': {'precision': 6, 'type': 'Decimal'},
                   'output_root': '/default/path/outputs'},
          'database': {'db_name': 'deepdive', 'port': 5432, 'schema': 'public'},
          'logs': {'format': '%(asctime)s: %(levelname)s: %(clientip)s '
                             '%(user)s %(message)s',
                   'level': 'info'}})


In [57]:
# STEP 3, 4

def chain_recursive(child, parent):
    chain = ChainMap(child, parent)
    for k, v in child.items():
        if isinstance(v, dict) and k in parent:
            chain[k] = chain_recursive(child[k], parent[k])

    return chain

common_d = load_settings('common')
dev_d = load_settings('dev')

pprint(chain_recursive(dev_d, common_d))

ChainMap({'data': ChainMap({'input_root': '/dev/path/inputs',
                            'numerics': ChainMap({'type': 'float'},
                                                 {'precision': 6,
                                                  'type': 'Decimal'}),
                            'operators': {'add': '__add__'},
                            'output_root': '/dev/path/outputs'},
                           {'input_root': '/default/path/inputs',
                            'numerics': {'precision': 6, 'type': 'Decimal'},
                            'output_root': '/default/path/outputs'}),
          'database': ChainMap({'pwd': 'test', 'user': 'test'},
                               {'db_name': 'deepdive',
                                'port': 5432,
                                'schema': 'public'}),
          'logs': ChainMap({'format': '%(asctime)s: %(levelname)s: '
                                      '%(clientip)s %(user)s %(filename)s '
                              

Notice that `'database'` has the `user` and `pwd` from `dev`, but also the `db_name`, `schema` and `port` from `common`

In [58]:
# STEP 5

def settings(env: str):
    env_settings = load_settings(env)
    common_settings = load_settings('common')
    return chain_recursive(env_settings, common_settings)

In [59]:
pprint(settings('dev'))

ChainMap({'data': ChainMap({'input_root': '/dev/path/inputs',
                            'numerics': ChainMap({'type': 'float'},
                                                 {'precision': 6,
                                                  'type': 'Decimal'}),
                            'operators': {'add': '__add__'},
                            'output_root': '/dev/path/outputs'},
                           {'input_root': '/default/path/inputs',
                            'numerics': {'precision': 6, 'type': 'Decimal'},
                            'output_root': '/default/path/outputs'}),
          'database': ChainMap({'pwd': 'test', 'user': 'test'},
                               {'db_name': 'deepdive',
                                'port': 5432,
                                'schema': 'public'}),
          'logs': ChainMap({'format': '%(asctime)s: %(levelname)s: '
                                      '%(clientip)s %(user)s %(filename)s '
                              

In [60]:
pprint(settings('prod'))

ChainMap({'data': ChainMap({'input_root': '$DATA_INPUT_PATH',
                            'output_root': '$DATA_OUTPUT_PATH'},
                           {'input_root': '/default/path/inputs',
                            'numerics': {'precision': 6, 'type': 'Decimal'},
                            'output_root': '/default/path/outputs'}),
          'database': ChainMap({'pwd': '$PG_PWD', 'user': '$PG_USER'},
                               {'db_name': 'deepdive',
                                'port': 5432,
                                'schema': 'public'})},
         {'data': {'input_root': '/default/path/inputs',
                   'numerics': {'precision': 6, 'type': 'Decimal'},
                   'output_root': '/default/path/outputs'},
          'database': {'db_name': 'deepdive', 'port': 5432, 'schema': 'public'},
          'logs': {'format': '%(asctime)s: %(levelname)s: %(clientip)s '
                             '%(user)s %(message)s',
                   'level': 'info'}})
