### Idiomatic Python - Counting Things with Dictionaries

Often we have to count things up, and we use a dictionary to store the name of what we are counting (as the key), and the number of times we counted it (as the value).

For example, suppose we have some text, and we want to count the number of times each character occurs in that text.

I'll be using the `faker` library to generate some random text (I cover `faker` in another video in this channel).

In [1]:
from faker import Faker

Faker.seed(0)
faker = Faker(['la'])

text = faker.paragraph(nb_sentences=20)
text

'Aliquam vitae laborum ullam rerum voluptas. Nesciunt tenetur magnam eligendi quidem nulla. Voluptates minus provident nobis corporis. Quas tempore placeat iusto. Explicabo et odit dignissimos. Labore sint ea rem molestias accusamus quaerat. Quae quos numquam nostrum repudiandae ex. Provident aperiam totam dolore rem consequuntur. Ipsum aspernatur eum magni ut autem mollitia. Molestias repellendus molestiae vitae. Molestias enim at reiciendis et doloribus delectus reprehenderit. Nostrum omnis labore. Perspiciatis consectetur corrupti aliquam. Tempore unde molestiae hic. Eius repellat sed tempora nihil veniam neque. Laudantium odit praesentium voluptatem facere eveniet beatae. Occaecati sapiente doloribus quasi blanditiis dolore recusandae ex. Aut corporis possimus aliquid. Odio vel nobis asperiores commodi deleniti rerum. Hic fuga voluptatem alias. Deserunt voluptas quaerat.'

So how can we approach this?

Here's one way I sometimes see it done:

In [2]:
count_1 = {}
for ch in text:
    if ch in count_1:
        count_1[ch] += 1
    else:
        count_1[ch] = 1
count_1

{'A': 2,
 'l': 38,
 'i': 69,
 'q': 11,
 'u': 56,
 'a': 68,
 'm': 41,
 ' ': 113,
 'v': 11,
 't': 58,
 'e': 92,
 'b': 10,
 'o': 51,
 'r': 48,
 'p': 26,
 's': 54,
 '.': 21,
 'N': 2,
 'c': 22,
 'n': 37,
 'g': 5,
 'd': 26,
 'V': 1,
 'Q': 2,
 'E': 2,
 'x': 3,
 'L': 2,
 'P': 2,
 'I': 1,
 'M': 2,
 'h': 3,
 'T': 1,
 'f': 2,
 'O': 2,
 'H': 1,
 'D': 1}

Now let's work on improving this and making the code more Pythonic.

First let's get rid of that `if` statement.

In [3]:
count_2 = {}
for ch in text:
    count_2[ch] = count_2.setdefault(ch, 0) + 1

And let's make sure our results match:

In [4]:
count_1 == count_2

True

Ok, so this is maybe a bit better, but when was the last time you used `setdefault` - ever heard of it before even? And in this case there is absolutely no need to create that first entry with a value of `0` (which is what `setdefault` does).

So, maybe a slight improvement would be to use the `get()` method with a default value - that's at least a bit more common:

In [5]:
count_3 = {}
for ch in text:
    count_3[ch] = count_3.get(ch, 0) + 1

In [6]:
count_3 == count_1

True

We can turn to Python's `defaultdict` for an even better solution:

In [7]:
from collections import defaultdict

In [8]:
count_4 = defaultdict(int)
for ch in text:
    count_4[ch] += 1

In [9]:
count_4 == count_1

True

But the **best** solution, is to actually use Python's `Counter`:

In [10]:
from collections import Counter

In [11]:
count_5 = Counter(text)

In [12]:
count_5 == count_1

True

Technically a `Counter` object is a subclass of `dict`:

In [13]:
issubclass(Counter, dict)

True

So, we can easily recover a "regular" `dict` from a `Counter` object:

In [14]:
count_5

Counter({'A': 2,
         'l': 38,
         'i': 69,
         'q': 11,
         'u': 56,
         'a': 68,
         'm': 41,
         ' ': 113,
         'v': 11,
         't': 58,
         'e': 92,
         'b': 10,
         'o': 51,
         'r': 48,
         'p': 26,
         's': 54,
         '.': 21,
         'N': 2,
         'c': 22,
         'n': 37,
         'g': 5,
         'd': 26,
         'V': 1,
         'Q': 2,
         'E': 2,
         'x': 3,
         'L': 2,
         'P': 2,
         'I': 1,
         'M': 2,
         'h': 3,
         'T': 1,
         'f': 2,
         'O': 2,
         'H': 1,
         'D': 1})

In [15]:
dict(count_5)

{'A': 2,
 'l': 38,
 'i': 69,
 'q': 11,
 'u': 56,
 'a': 68,
 'm': 41,
 ' ': 113,
 'v': 11,
 't': 58,
 'e': 92,
 'b': 10,
 'o': 51,
 'r': 48,
 'p': 26,
 's': 54,
 '.': 21,
 'N': 2,
 'c': 22,
 'n': 37,
 'g': 5,
 'd': 26,
 'V': 1,
 'Q': 2,
 'E': 2,
 'x': 3,
 'L': 2,
 'P': 2,
 'I': 1,
 'M': 2,
 'h': 3,
 'T': 1,
 'f': 2,
 'O': 2,
 'H': 1,
 'D': 1}

Let's take a quick look at the various timings for these 5 approaches.

In [16]:
def count_1(text: str) -> dict[str, int]:
    count = {}
    for ch in text:
        if ch in count:
            count[ch] += 1
        else:
            count[ch] = 1
    return count

def count_2(text: str) -> dict[str, int]:
    count = {}
    for ch in text:
        count[ch] = count.setdefault(ch, 0) + 1
    return count

def count_3(text: str) -> dict[str, int]:
    count = {}
    for ch in text:
        count[ch] = count.get(ch, 0) + 1
    return count

def count_4(text: str) -> dict[str, int]:
    count = defaultdict(int)
    for ch in text:
        count[ch] += 1
    return dict(count)

def count_5(text: str) -> dict[str, int]:
    return dict(Counter(text))

In [17]:
from timeit import timeit

In [18]:
time_1 = timeit("count_1(text)", globals=globals(), number=10_000)
time_2 = timeit("count_2(text)", globals=globals(), number=10_000)
time_3 = timeit("count_3(text)", globals=globals(), number=10_000)
time_4 = timeit("count_4(text)", globals=globals(), number=10_000)
time_5 = timeit("count_5(text)", globals=globals(), number=10_000)

print(f"{time_1=:.3f}")
print(f"{time_2=:.3f}")
print(f"{time_3=:.3f}")
print(f"{time_4=:.3f}")
print(f"{time_5=:.3f}")

time_1=0.410
time_2=0.351
time_3=0.343
time_4=0.399
time_5=0.198


As you can see, using `Counter` is the fastest - it is also the simplest code.

Note that using a `defaultdict` incurs a performance penalty over a regular dict, so this:

```python
count = {}
for ch in text:
    count[ch] = count.get(ch, 0) + 1
```

is more efficient than this:

```python
count = defaultdict(int)
for ch in text:
    count[ch] += 1
```

But that slight performance increase comes at the cost of code clarity - your choice.

At the end of the day, use `Counter` to count elements of an iterable - that is the most Pythonic.

Counters are sometimes referred to as [multisets](https://en.wikipedia.org/wiki/Multiset) (sets whose elements are not necessarily unique)