# A Python puzzle: Ways of using a sequence of values with defaultdict

This page came out of a simple little coding puzzle Aaron Maxwell, author of [_Powerful Python_](https://powerfulpython.com/) had sent to his mailing list: how to use default of 1 with a Python defaultdict.  My answer to this was `defaultdict(lambda: 1)`, and then I thought of making this a bit harder and coming up with a way to make the default value non-constant, for example a sequence.  This turned into an interesting exploration of callables and persisting state between calls (without using global variables, of course) in the Python language.

## Dicts and defaultdict

The Python `dict` type (_associative array_ or _hash_ in some other languages) has the semantics that if you try to look up a key that doesn't exist it raises a _KeyError_ exception.  This is fine for many use cases, but it makes some other uses cases unnecessarily complicated.   An example is using a dict to count the number of occurences of individual words in some source text; the following would be a concise and simple solution, but doesn't work...

In [18]:
source_text = "ma ma hu hu"
d = dict()
for word in source_text.split():
    d[word] += 1

KeyError: 'ma'

To make this work we'd have to test if the word is already in `d`, and if it isn't intialize its associated value to 0.
Fortunately there is a type in the standard library that does exactly this automatically... the [`collections.defaultdict`](https://docs.python.org/3/library/collections.html#collections.defaultdict) class from Python's standard library is a `dict` which on looking up a non-existing key calls a user-provided _initializer_ _callable_ (which the manual entry calls a "default factory") instead of raising an exception...

In [14]:
from collections import defaultdict

source = "ma ma hu hu"
dd = defaultdict(int)
for word in source.split():
    dd[word] += 1
for (key, count) in dd.items():
    print(f'{key} occurred {count} times')


ma occurred 2 times
hu occurred 2 times


Here we're using `int` as the default factory because calling `int()` without arguments returns zero... we might also have used `defaultdict(lambda: 0)`; the default factory has to be a callable object, i.e. a function, method, type constructor or anything else that provides a `__call__` method.

## The Puzzle

Now back to our puzzle.  What can we pass to defaultdict as a default factory so that our default values will be serialized rather than constant, i.e. 1 for the first key failed lookup, 2 for the seoncd, etc.  We'll do this without using global variables.  We don't care if the sequence continues counting across defaultdicts or gets restarted with each defaultdict instantiation.

So far we'e discovered 5 distinct solutions to this in Python.  If you find any others, by all means send them to me.

Scroll down to see our solutions...

```
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  .
  ```

## 1) Using a counter object

The argument we need to pass to defaultdict has to be a callable, and we need to keep state between calls (tracking the count), so that suggests an object (state) and a method (callable).  A simple counter object would seem to be the most obvious (and probably most "Pythonic") solution.  The object might define a `nextcount()` method, but since it does nothing else we might was well just put the counting code on the `__call__` [dunder](https://wiki.python.org/moin/DunderAlias) method, making the object itself callable.

In [19]:
class CounterObj:
    def __init__(self):
        self.count = 0
    def __call__(self):
        self.count += 1
        return self.count

d = defaultdict(CounterObj())
print(d['a'], d['b'], d['a'], d['z'])

1 2 1 3


## 2) Subclassing 'int' to instantiate a serialized default integer

This was actually the first solution I envisioned when I started thinking about this problem... The manual entry for defaultdict gives the example of `defaultdict(int)` for default elements of 0. This works because 0 is the default `int` when you call the int class (int type constructor) without arguments.  So we should be able to create a subclass of `int` that makes a serialized default instance instead.  This turns out to be pretty easy, using and extending the `__new__` dunder method and keeping the count in a class attribute.  A `SerialInt` is a normal integer except that its default is a serial number rathern than 0.

In [6]:
class SerialInt(int):
    count = 0
    def __new__(self, x=None, **kwargs):
        self.count += 1
        return(int.__new__(self, self.count if x is None else x, **kwargs))

d = defaultdict(SerialInt)
print(d['a'], d['b'], d['a'], d['z'])

1 2 1 3


## 3) Using an iterator or generator

Although considered the "Pythonic" way of producing a sequence, iterators or generators aren't a great match for this problem because they aren't in themselves callable.  But we can "cheat" and pass the `__next__` dunder method of the generator as the default factory!  (This is cheating because dunder methods are normally not supposed to be called or passed directly.)

In [23]:
def seq_generator_func():
    count = 1
    while True:
        yield count
        count += 1

d = defaultdict(seq_generator_func().__next__)
print(d['a'], d['b'], d['a'], d['z'])

1 2 1 3


What's more, there is a counter generator already in the standard library, so we don't even have to code it; the following would seem to be the shortest possible solution to our puzzle:

In [21]:
from itertools import count

d = defaultdict(count(1).__next__)
print(d['a'], d['b'], d['a'], d['z'])

1 2 1 3


## 4) Using a mutable default argument to maintain state

I had started on this notebook and put it down for a while when I came across this [page about Python gotchas](https://docs.python-guide.org/writing/gotchas) and learned about what they call the mutable default arguments gotcha.  This is a consequence of the fact that in Python arguments are always passed by reference combined with the fact that Python evaluates default arguments at function definition time.  This means that if a default argument is mutable, for example a list, as in `f(l=[])`, then any changes made to it will persist between calls (rather than, in this case, `l` being re-initialized to an empty list on each call, which is what the programmer might have intended).  This is absolutely correct behavior, but can be surprising.

I immediately thought of my puzzle... here is another way of keeping state between calls of a callable!  Turns out it is also the most compact of our solutions to the puzzle! (outside of using an iterator from the standard library)

In [22]:
def counter(count=[0]):
    count[0] += 1
    return count[0]

d = defaultdict(counter)
print(d['a'], d['b'], d['a'], d['z'])

1 2 1 3


## 5) Using a lexical closure

Lexical closures are a way for a function object to maintain state between invocations that's heavily used in Lisp, not so much in Python where objects are usually preferred.  But at least in Python 3 it is possible, although there are some gotchas... specifically, we need to use a [`nonlocal`](https://docs.python.org/3/reference/simple_stmts.html#grammar-token-nonlocal-stmt) declaration for the count variable.  Also, in Lisp the inner function would usually be a lambda since it really doesn't need a name, but in Python lambdas are deliberately too limited to have side effects (like updating `count`).  I spent some time researching the reasons for this limitation, and without going into the details, what it comes down to is that Python should be Python and not try to be Lisp.  It does makes sense, but it's good to know that we do have the power of lexical closures available and functions are still first class objects even if they do have to have a name.

In [3]:
def make_counter():
    count = 0
    def counter():
        nonlocal count
        count += 1
        return count
    return counter

d = defaultdict(make_counter())
print(d['a'], d['b'], d['a'], d['z'])

1 2 1 3
