### A Practical Application of DefaultDict

This is going to be a really short one.

The Python standard library's `collections` module contains the `defaultdict` class.

In [1]:
from collections import defaultdict

With this class we can create dictionaries that auto-insert a non-existent key when accessed.

This default value is calculated by calling a **callable** (aka factory function) we define when we create the `defaultdict` instance.

For example, the `int()` callable, returns the zero (`0`) integer:

In [2]:
int()

0

So, we can use `int` as the "factory" function for a defaultdict, and the default value for a non-existent key will be `0`:

In [3]:
dd = defaultdict(int)

Now, this behaves like a regular dictionary, where we can insert new keys, or retrieve them as usual:

In [4]:
dd['a'] = "test"

In [5]:
dd['a']

'test'

In [6]:
dd

defaultdict(int, {'a': 'test'})

But watch what happens if we request a non-existent key:

In [7]:
dd['b']

0

As you can see, we got that default value back, **and moreover**, the dictionary was **mutated**:

In [8]:
dd

defaultdict(int, {'a': 'test', 'b': 0})

This is very different behavior from a regular dictionary and using the `get` method:

In [9]:
d = {'a': 'test'}

In [10]:
d.get('b', 0)

0

In this case we still get that default value, but the dictionary was not mutated:

In [11]:
d

{'a': 'test'}

The `defaultdict` is essentially a subclass of a regular dictionary, and in fact we can easily cast it to a regular `dict`:

In [12]:
d = dict(dd)

In [13]:
d

{'a': 'test', 'b': 0}

One particularly useful application of the default dict is when we are building up a dictionary, and we would like to avoid writing logic such as the following example.

Suppose we want to use a dictionary to count things, like maybe a frequency distribution of numbers in a list:

In [14]:
import random

In [15]:
d = {}
random.seed(0)
data = [random.randint(1, 5) for _ in range(10000)]

In [16]:
for number in data:
    if number in d:
        d[number] += 1
    else:
        d[number] = 1

In [17]:
d

{4: 2001, 1: 2006, 3: 2023, 5: 1956, 2: 2014}

Using a `defaultdict` we can essentially code this same logic this way:

We could use `int` as our factory function, but just to show you another approach that can be expanded to have a default other than `0`.

In [18]:
dd = defaultdict(lambda: 0)

In [19]:
for number in data:
    dd[number] += 1

In [20]:
dd

defaultdict(<function __main__.<lambda>()>,
            {4: 2001, 1: 2006, 3: 2023, 5: 1956, 2: 2014})

As a side note, there is much easier to do this, using the `Counter` class in the the same `collections` module:

In [21]:
from collections import Counter

In [22]:
Counter(data)

Counter({4: 2001, 1: 2006, 3: 2023, 5: 1956, 2: 2014})

But, I wanted to show a simple example of how `defaultdict` works.

Let's look at a less trivial example.

Suppose you have a data set that consists of orders that need to be processed for a variety of clients.

We want to generate a new data model that is basically a dictionary whose keys are the clients, and where the associated value is a list of the orders for that client.

Let's simulate this dataset:

In [23]:
from uuid import uuid4

random.seed(0)
suppliers = ['AAA', 'BBB', 'CCC', 'DDD', 'EEE']
clients = ['ZZZ', 'YYY', 'XXX', 'WWW']
orders = [
        {
            "supplier": random.choice(suppliers),
            "client": random.choice(clients),
            "order_number": str(uuid4()), 
            "item_id": str(uuid4()), 
            "quantity": random.randint(1, 100)
        }
    for _ in range(100)
]

In [24]:
orders[0]

{'supplier': 'DDD',
 'client': 'WWW',
 'order_number': 'b09f3324-5f76-411d-803c-0615d6039917',
 'item_id': '70efc863-cb37-44d0-9497-4c5baafcd5dd',
 'quantity': 6}

Now let's re-shape this list into a dictionary with the keys corresponding to the clients, and the corresponding values are lists of the orders for each client (key).

Let's use a standard dictionary first:

In [25]:
client_orders = {}
for order in orders:
    client = order['client']
    if client not in client_orders:
        client_orders[client] = [order]
    else:
        client_orders[client].append(order)

In [26]:
client_orders

{'WWW': [{'supplier': 'DDD',
   'client': 'WWW',
   'order_number': 'b09f3324-5f76-411d-803c-0615d6039917',
   'item_id': '70efc863-cb37-44d0-9497-4c5baafcd5dd',
   'quantity': 6},
  {'supplier': 'CCC',
   'client': 'WWW',
   'order_number': 'd0909dbe-c4ed-4c7b-b2ec-80e7bcf60b50',
   'item_id': '05e9cf21-76be-4f7f-ba7c-507ae6f7afbc',
   'quantity': 52},
  {'supplier': 'CCC',
   'client': 'WWW',
   'order_number': 'd9ea3705-3903-43dc-80b4-8533783c77d1',
   'item_id': 'cea5ee6d-51fd-4d39-bf07-d353efa38b37',
   'quantity': 46},
  {'supplier': 'CCC',
   'client': 'WWW',
   'order_number': 'dfe4a1cf-c8c1-4359-9913-8e2dcc9c6730',
   'item_id': '07ee5f7c-f8e3-47f5-ac01-a328df1e1adf',
   'quantity': 72},
  {'supplier': 'DDD',
   'client': 'WWW',
   'order_number': 'cfcbcaf2-9625-42de-bbba-ad73a0eaee51',
   'item_id': '2e32f529-5c2d-4ea6-9f8d-eccd0de1f1c3',
   'quantity': 67},
  {'supplier': 'BBB',
   'client': 'WWW',
   'order_number': '35240eff-b8f9-4d57-a65b-8936aa0b959b',
   'item_id': 'b41

Now let's do the same thing, but using a `defaultdict` instead, using `list` for our factory function - this means the default will be an empty list.

In [27]:
client_orders_dd = defaultdict(list)
for order in orders: 
    client_orders_dd[order['client']].append(order)

In [28]:
dict(client_orders_dd)

{'WWW': [{'supplier': 'DDD',
   'client': 'WWW',
   'order_number': 'b09f3324-5f76-411d-803c-0615d6039917',
   'item_id': '70efc863-cb37-44d0-9497-4c5baafcd5dd',
   'quantity': 6},
  {'supplier': 'CCC',
   'client': 'WWW',
   'order_number': 'd0909dbe-c4ed-4c7b-b2ec-80e7bcf60b50',
   'item_id': '05e9cf21-76be-4f7f-ba7c-507ae6f7afbc',
   'quantity': 52},
  {'supplier': 'CCC',
   'client': 'WWW',
   'order_number': 'd9ea3705-3903-43dc-80b4-8533783c77d1',
   'item_id': 'cea5ee6d-51fd-4d39-bf07-d353efa38b37',
   'quantity': 46},
  {'supplier': 'CCC',
   'client': 'WWW',
   'order_number': 'dfe4a1cf-c8c1-4359-9913-8e2dcc9c6730',
   'item_id': '07ee5f7c-f8e3-47f5-ac01-a328df1e1adf',
   'quantity': 72},
  {'supplier': 'DDD',
   'client': 'WWW',
   'order_number': 'cfcbcaf2-9625-42de-bbba-ad73a0eaee51',
   'item_id': '2e32f529-5c2d-4ea6-9f8d-eccd0de1f1c3',
   'quantity': 67},
  {'supplier': 'BBB',
   'client': 'WWW',
   'order_number': '35240eff-b8f9-4d57-a65b-8936aa0b959b',
   'item_id': 'b41

In [29]:
client_orders == client_orders_dd

True

One thing worth mentioning is that `defaultdict` is slower than a plain `dict` - but depending on your use case, the reduced code complexity may outweigh the performance hit.