# Some notes on Python dictionaries
### A few pointers to better utilise python dictionaries

## Introduction
Dictionaries are flexible containers that map a key to a value and allow for fast queries on it. Knowing how to use dictionaries is an essential skill for the repertoire of any Pythonist. 

This blog covers a few notes on Python dictionaries which have helped me improve my `dict` game. Much of it might be already familiar to you if are an (>=) intermediate level python programmer. But I still hope that you will enjoy this article and perhaps take away a thing or two. 

I have tried to make this blog beginner-friendly, but if you are not very familiar with `dict`, you might need a [refresher](https://realpython.com/python-dicts/). Let's begin.

## Note 1: A basic example of using a dictionary
For a basic demonstration of how `dict` works, let's calculate the frequency of items in a list. 
This is a pretty obvious use-case for a `dict`. 
We will start by implementing this in the most basic way possible and improve our code as we proceed.

1. Initialise an empty dict. In this dict, the key would be the items and its value will be the frequency
2. Iterate through the list
    1. If an item is not in the dict, create a new key and set it's (default) value to 1
    2. Else, Increment the count of the item by 1


In [1]:
#a list containing some items
my_list = ['a','a','b','b','b','c']

#initialise an empty dict
freq_dict = {}

#iterate through the list
for item in my_list:
    #if item is not present, create new key and set it's (default) value to 1, 
    #since we have seen only 1 occurance till now
    if item not in freq_dict:
        freq_dict[item]=1
    else:
        #increment the count of the item by 1
        freq_dict[item]+=1

#print results
print(freq_dict)

{'a': 2, 'b': 3, 'c': 1}


## Note 2: `setdefault` elegantly handles missing keys

We can avoid the use of the `if-else` clause in the above code and handle missing keys using the `setdefault` method.
If the item/key is in the dictionary, `setdefault` returns its value. Otherwise, it inserts the item with the given value and returns it.
The explaination in the [documentation](https://docs.python.org/3.8/library/stdtypes.html#dict.setdefault) of `setdefault` if pretty straight forward and goes as follows:
> `setdefault(key[, default])` <br>
> If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to `None`.

This not only decreases the number of lines of code but also makes the code more readable and pythonic. 
Let's uses `setdefault` to improve the example in Note 1.


In [2]:
#a list containing some items
my_list = ['a','a','b','b','b','c']

#initialise an empty dict
freq_dict = {}

#iterate through the list
for item in my_list:
    freq_dict.setdefault(item,0)
    freq_dict[item]+=1

#print results
print(freq_dict)

{'a': 2, 'b': 3, 'c': 1}


#### Here is another example of `setdefault` to make it clearer, since this is an important method for all pythonist's to learn.
The code below separates all the odd and even numbers from a list and put it into its own list/bucket.
One bucket contains all the odd numbers while the other contains all the odd numbers. We will use `num%2` ( [a hash function](https://en.wikipedia.org/wiki/Hash_function) ) to create the key, since `odd_num%2==1` and `even_num%2==0`.


In [3]:
#initialise a list of the first n=10 integers numbers
num_list = [num for num in range(10)]

#initialise an empty list
buckets = {}

for num in num_list:
    bucket = buckets.setdefault(num%2,[])
    bucket.append(num)

print(f'Odd Numbers:  {buckets[1]}')
print(f'Even Numbers: {buckets[0]}')

Odd Numbers:  [1, 3, 5, 7, 9]
Even Numbers: [0, 2, 4, 6, 8]


In [4]:
#initialise a list of the first n=10 integers numbers
num_list = [num for num in range(10)]

#initialise an empty list
buckets = {}

for num in num_list:
    buckets.setdefault(num%2,[]).append(num)

print(f'Odd Numbers:  {buckets[1]}')
print(f'Even Numbers: {buckets[0]}')

Odd Numbers:  [1, 3, 5, 7, 9]
Even Numbers: [0, 2, 4, 6, 8]


Of course, you can argue that in the above code, we can simply initialize the `buckets` `dict` as `buckets = {'odd':[], 'even':[]}`. But think of non-trivial use-cases where you won't know the keys beforehand, for example, reading a `.csv` file with counties and their cities, where each row is given as `<country_name>,<city_name>` and you need to group all the countries and their cities; a complicated hash function with an arbitrary number of buckets, etc.

For the sake of comparison, here are two (intentionally ugly) alternatives for the same example without using `setdefault`


In [5]:
num_list = [num for num in range(10)]

#initialise an empty list
buckets = {}

for num in num_list:
    #this is rather ugly
    #also, mod is an expensive operator
    bucket = buckets.get(num%2,[])
    bucket.append(num)
    buckets[num%2]=bucket

print(f'Odd Numbers:  {buckets[1]}')
print(f'Even Numbers: {buckets[0]}')

Odd Numbers:  [1, 3, 5, 7, 9]
Even Numbers: [0, 2, 4, 6, 8]


In [6]:
num_list = [num for num in range(10)]

buckets = {}

for num in num_list:
    if num%2 not in buckets:
        buckets[num%2] = []    
    buckets[num%2].append(num)

print(f'Odd Numbers:  {buckets[1]}')
print(f'Even Numbers: {buckets[0]}')

Odd Numbers:  [1, 3, 5, 7, 9]
Even Numbers: [0, 2, 4, 6, 8]


## Note 3: `collections.Counter()` 
Going back to our previous example of calculating frequency, it's such a recurring task that python has a built-in for it, called `Counter`, in the `collections` module.
`Counter` comes with useful methods such as `most_common(n)` to quickly find out the most frequent n items and is very similar to a `dict`([docs](https://docs.python.org/3/library/collections.html#collections.Counter)).

In [7]:
#a list containing some items
from collections import Counter
my_list = ['a','a','b','b','b','c']

#initialise an empty dict
freq_dict = Counter(my_list)

#print results
print('Frequency: ', freq_dict)
print('Most Common:', freq_dict.most_common(2))

Frequency:  Counter({'b': 3, 'a': 2, 'c': 1})
Most Common: [('b', 3), ('a', 2)]


## Note 4: Dictionary comprehensions
[Comprehensions](https://docs.python.org/3/tutorial/datastructures.html?highlight=comprehension#list-comprehensions) are one of the most useful tools in Python and of course, it is supported by dictionaries as well. The syntax is mostly the same as that of list comprehensions, with the difference being the use of `{..}` instead of `(..)` and requires you to define a `key: value` pair. 
It should be pretty apparent from the code examples below. 

Lets first look the example in the docs. It creates a `dict` containing the numbers from 0 to 9 and their squares.


In [8]:
{ x: x ** 2 for x in range(5)}

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

Another example on `dict` comprehension. The example below maps countries to their capitals.

In [9]:
DATA = [
    ('India','New Delhi'),
    ('Iceland', 'Reykjavik'),
    ('China','Beijing'),
    ('Japan', 'Tokyo'),
    ('UK','London'),
]
country_capitals = { country:capital for country,capital in DATA }
print(f'Capital of India: {country_capitals["India"]}')

Capital of India: New Delhi


In [10]:
#restraining the dict to only countries that starts with 'I'
country_startswith_i = { country:capital for country,capital in DATA if country.startswith('I') }
print(country_startswith_i)

{'India': 'New Delhi', 'Iceland': 'Reykjavik'}


## Note 5: Insertion order in python dictionaries and `OrderedDict`

As of Python 3.7+, dictionaries do maintain insertion order. However, it is not recommended to rely upon it. Many popular libraries (and programmers) assume that the ordering in `dict` doesn't matter, as it most often doesn't. If you want to preserve insertion order, you should instead use `OrderedDict`([docs](https://docs.python.org/3/library/collections.html#collections.OrderedDict)) which remembers the order of items inserted by default. In addition to clearly conveying your intentions, it also has the added benefit of not having to worry too much about backward compatibility. If you wish to learn more about this, I highly recommend [this blog](http://gandenberger.org/2018/03/10/ordered-dicts-vs-ordereddict/) post by Greg Gandenberger.


##  Note 6: Dict keys need to hashable and things that are as hashable.

For an object to work as a key in a dictionary, it needs to be hashable. Examples of hashable objects are `int`, `string`, `float`, etc.  Specifically, it needs to meet the three following requirements.

1. It should support the `hash()` function via a `__hash__()` method whose value never changes over the lifetime of the object.
2. It supports equality comparison via `__eq__()` method.
3. If `a == b` is `True` then `hash(a) == hash(b)` must also be `True`.

It is for these same reasons, a `tuple` can be a key in `dict`
while a `list` can't. ([read more](https://stackoverflow.com/questions/7257588/why-cant-i-use-a-list-as-a-dict-key-in-python))

On this note, User-defined types are hashable by default. This is because their hash value is their id() and they all compare not equal.
One note of caution is that, custom implementations of `__hash__()`,
`__eq__()` should only take into account those object attributes that never change during the lifetime of the object.


In [11]:
class Node:
    """Implements a LinkedList node"""
    def __init__(self, val=0, next_node=None):
        self.val = val
        self.next_node = next_node

a = Node(1)
b = Node(2)

#dict with Node object as keys
node_dict = {a:'node_a', b:'node_b'}
print(node_dict)


{<__main__.Node object at 0x11006be80>: 'node_a', <__main__.Node object at 0x11006bac0>: 'node_b'}


## Note 7: Dictionaries are fast but trade space for time
Internally, dict uses a hash table. And by design these hash tables are sparse, meaning these are not very space-efficient. For a large number of records, it might be more space-efficient to store them as in compact objects such as  `tuples`.

Even though `dict` has a significant memory overhead, as long as it fits in memory, allows fast access.


## Note 8: Use `sets` instead when you uniqueness is all you need
It is often to the case to find all unique items from a collection. It might be tempting to use a `dict` with a dummy value since all keys in a dict are unique by default. 

In such a case, it is better to use a `set` instead. Python `sets` guarantees uniqueness. It also has nice properties similar to sets in mathematics like union and intersection. Similar to `dict`, elements in a set it must be hashable. 

But if the case requires a hashable `set`, you will have to use `frozenset` instead since a `set` is not hashable. `frozenset` is a hashable version of `set`, so it can be put inside a `set`.

Since this work is on dictionaries, I will leave you with the [link](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset) to the docs to learn more about sets.

In [12]:
#do
items = [1,1,1,2,2,3,3]
uniq_items = set(items)
print(uniq_items)

{1, 2, 3}


In [13]:
#don't
{item:0 for item in items }.keys()

dict_keys([1, 2, 3])

## Summary
1. Basic demonstration of how `dict` works by calculating the frequency of items in a list.
2. Missing keys can be elegantly handled by `setdefault`
3. `collections.Counter` is a specialized container for counting hashable objects
4. Comprehensions are supported for creating dictionaries.
5. Even though `dict` preserves order, but it's not something that should be relied upon. Better use `OrderedDict` for the same. 
6. Dictionary keys need to be hashable. User-defined objects are hashable by default.
7. Dictionary are fast but are space inefficient since it uses sparse hash tables.
8. Use sets instead of dictionaries when you only need to find unique values.


## Conclusion
This blog covered a few notes on the use of Python `dict`. I hope this has been a knowledgeable and enjoyable read for you. If you spot any mistakes please leave a comment so that I can correct it at earliest. Any feedback and suggestion are also welcome. 
You can find a Jupyter notebook version of this blog in my Github.
Thank you for reading. 


## Further reading

I highly recommend the book "Fluent Python" by Luciano Ramalho, which covers all the topics in this blog in great depth.
This blog is also largely inspired by his wonderful book.