# 01 - Creating Python Dictionaries

#### Literals

For example:

In [1]:
d = {'john': ['John Cleese'],
     (0, 0): 'origin',
    }

#### Constructor

This approach is less flexible than using literals because the keys must be a valid identifier name (e.g. variable, function, class names, etc). The key will be converted into a string. We cannot create a dictionary with a tuple as a key using this approach:

It has the form `dict(key1=value1, key2=value2)`

In [8]:
d = dict(john=['John Cleese'], my_func='this is a function')

We can also use another form with the `dict()` constructor: `dict([(key1, value1), [key2, value2]])`. 

As you can see, the key-value pairs can be any iterables e.g. tuples, lists etc. Also they can be contained in any iterable. In the above, they are contained in a list.

In [1]:
d = dict([('a', 100), ['b', 200]])
d

{'a': 100, 'b': 200}

We can also pass dictionaries to `dict()`. This will produce a **shallow copy**:

In [2]:
d = {'a': 1, 'b': 2, 'c': [3, 4, 5]}

copy = dict(d)
d

{'a': 1, 'b': 2, 'c': [3, 4, 5]}

In [3]:
d['c'].append(100)

print(d)
print(copy)

{'a': 1, 'b': 2, 'c': [3, 4, 5, 100]}
{'a': 1, 'b': 2, 'c': [3, 4, 5, 100]}


#### Dictionary Comprehensions

For example:

In [11]:
d = {str(i): i ** 2 for i in range(5)}
d

{'0': 0, '1': 1, '2': 4, '3': 9, '4': 16}

Here's another example:

In [4]:
keys = ['a', 'b', 'c']
values = (1, 2, 3)

d = {k: v for k, v in zip(keys, values)}
d

{'a': 1, 'b': 2, 'c': 3}

#### `dict.fromkeys()`

This created a dictionary with `specified keys` all having the **same value**. It has the form `dict.fromkeys(iterable, value=None)` where the iterable must have **hashable elements**. These elements will become the keys.

In [13]:
d = dict.fromkeys(['a', (0,0), 250], 'N/A')
d

{'a': 'N/A', (0, 0): 'N/A', 250: 'N/A'}

Any iterable will do, so we can pass a generator expression if we like:

In [15]:
d = dict.fromkeys((i**2 for i in range(5)), False)
d

{0: False, 1: False, 4: False, 9: False, 16: False}

# 02 - Common Operations

Most common operations will be related to the keys not the values. For example `len(d)` will return the number of keys in `d`.

#### Membership Tests

Membership tests are seeing if keys are present in a dictionary - they're very efficient. All we need to do is hash the key and traverse the probe sequence.

We can use the `in` and `not in` operators to test the presence of a **key** in a dictionary:

In [27]:
d = dict(a=1, b=2, c=3)

In [28]:
'a' in d

True

#### Removing elements from a dictionary

We can use the `del` operator, `.pop(key)` method or the `.popitem()` method to remove a key from a dictionary:

In [29]:
d = dict.fromkeys('abcd', 0)

In [30]:
d

{'a': 0, 'b': 0, 'c': 0, 'd': 0}

We can remove a key this way. If it doesn't exist, we get a `KeyError` exception.

In [31]:
del d['a']
d

{'b': 0, 'c': 0, 'd': 0}

When the key is popped, the **value** is returned

In [32]:
print(d.pop('b'))
d

0


{'c': 0, 'd': 0}

We can specify a default value to `.pop()` so that a `KeyError` exception isn't thrown when we can't find the key:

In [34]:
print(d.pop('idontexist', None))

None


The `.popitem()` method will remove the **last** item that was inserted into the dictionary and return that item, i.e. a key-value pair. In other words, **last inserted - popped first -> LIFO**.

In [35]:
print(d.popitem())

('d', 0)


#### Inserting keys with a default

Sometimes we may want to insert an element in a dictionary with a default value, but only if the element is not already present. It has the form: `.setdefault(key, value)`.

In [37]:
d = {'a':1, 'b':2, 'c':3}

print(d.setdefault('a', 100))
print(d.setdefault('d', 100))
print(d)

1
100
{'a': 1, 'b': 2, 'c': 3, 'd': 100}


#### Examples

##### Example 1

Here we have a string where we want to count the number of each character that appears in the string.
Since we know the alphabet is a-z, we could create a dictionary with these initial keys - but maybe the string contains characters outside of that, maybe punctuation marks, emojis, etc. So it's not really feasible to take that approach.

In [39]:
text = 'Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos, qui ratione voluptatem sequi nesciunt, neque porro quisquam est, qui dolorem ipsum, quia dolor sit amet consectetur adipisci[ng] velit, sed quia non-numquam [do] eius modi tempora inci[di]dunt, ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit, qui in ea voluptate velit esse, quam nihil molestiae consequatur, vel illum, qui dolorem eum fugiat, quo voluptas nulla pariatur?'

counts = dict()
for char in text:
    counts[char] = counts.get(char, 0) + 1

print(counts)

{'S': 1, 'e': 77, 'd': 22, ' ': 128, 'u': 69, 't': 65, 'p': 22, 'r': 38, 's': 43, 'i': 76, 'c': 19, 'a': 70, ',': 20, 'n': 37, 'o': 51, 'm': 43, 'v': 15, 'l': 33, 'q': 26, 'b': 5, 'h': 3, 'x': 3, '.': 2, 'N': 1, 'f': 2, 'g': 5, '[': 3, ']': 3, '-': 1, 'U': 1, '?': 2, 'Q': 1}


##### Example 2

This is a continuation of the first example. What we want to do is create a dictionary with three keys: upper, lower and other. The values of these keys should be any iterable that contains all the upper, lower and other values, respectively. 

Since we don't want repeat characters, the values are going to be sets.

The string module will come in handy:

In [41]:
import string

print(string.ascii_lowercase)
print(string.ascii_uppercase)

abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ


In [43]:
import string

text = 'Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos, qui ratione voluptatem sequi nesciunt, neque porro quisquam est, qui dolorem ipsum, quia dolor sit amet consectetur adipisci[ng] velit, sed quia non-numquam [do] eius modi tempora inci[di]dunt, ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit, qui in ea voluptate velit esse, quam nihil molestiae consequatur, vel illum, qui dolorem eum fugiat, quo voluptas nulla pariatur?'

categories = {}

for char in text:
    if char in string.ascii_lowercase:
        key = 'lower'
    elif char in string.ascii_uppercase:
        key = 'upper'
    else:
        key = 'other'

    if key not in categories:   
        categories[key] = set()

    categories[key].add(char)

print(categories)

{'upper': {'N', 'Q', 'S', 'U'}, 'lower': {'f', 's', 'e', 'h', 'c', 'p', 'a', 'd', 'n', 'g', 'i', 't', 'v', 'r', 'u', 'o', 'b', 'x', 'q', 'l', 'm'}, 'other': {' ', ']', '.', ',', '?', '[', '-'}}


To make the output more readable:

In [47]:
for key, value in categories.items():
    print(f"{key}: {''.join(value)}")

upper: NQSU
lower: fsehcpadngitvruobxqlm
other:  ].,?[-


We can improve this by using `setdefault()` so that, if the key doesn't exist, we create it with a default value of `set()` and return that set. If the key does exist, we just get the set back.

In [50]:
import string

text = 'Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos, qui ratione voluptatem sequi nesciunt, neque porro quisquam est, qui dolorem ipsum, quia dolor sit amet consectetur adipisci[ng] velit, sed quia non-numquam [do] eius modi tempora inci[di]dunt, ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit, qui in ea voluptate velit esse, quam nihil molestiae consequatur, vel illum, qui dolorem eum fugiat, quo voluptas nulla pariatur?'

categories = {}

for char in text:
    if char in string.ascii_lowercase:
        key = 'lower'
    elif char in string.ascii_uppercase:
        key = 'upper'
    else:
        key = 'other'

    value = categories.setdefault(key, set())
    value.add(char)

for key, value in categories.items():
    print(f"{key}: {''.join(value)}")

upper: NQSU
lower: fsehcpadngitvruobxqlm
other:  ].,?[-


To further improve the efficiency and also reduce the number of lines of code, we can wrap the if-elif-else into a function.

What's more efficient than `if char in string.ascii_lowercase`? This has to iterate through each character in `string.ascii_lowercase` -> O(n) time complexity.

It would be faster if each character, either upper, lower or other, was a key whose value was 'upper', 'lower' or 'other' -> O(1) time complexity

In [52]:
import string

def key_category_from_char(char):
    lower = dict.fromkeys(string.ascii_lowercase, 'lower')
    upper = dict.fromkeys(string.ascii_uppercase, 'upper')
    char_to_category = {**lower, **upper}

    return char_to_category.get(char, 'other')

text = 'Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos, qui ratione voluptatem sequi nesciunt, neque porro quisquam est, qui dolorem ipsum, quia dolor sit amet consectetur adipisci[ng] velit, sed quia non-numquam [do] eius modi tempora inci[di]dunt, ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit, qui in ea voluptate velit esse, quam nihil molestiae consequatur, vel illum, qui dolorem eum fugiat, quo voluptas nulla pariatur?'

categories = {}

for char in text:
    key = key_category_from_char(char)

    char_set = categories.setdefault(key, set())
    char_set.add(char)

for key, value in categories.items():
    print(f"{key}: {''.join(value)}")

upper: NQSU
lower: fsehcpadngitvruobxqlm
other:  ].,?[-


# 03 - Dictionary Views

#### Basics

We already know `d.keys()`, `d.values()` and `d.items()`, each of which produce an iterable. Since order is maintained, zipping up `.keys()` and `.values()` will produce the same output as `.items()`. 

All of these views are **read-only**. We cannot modify the dictionary by modifying the views.

**Dictionary Views are Dynamic**

**Views are more than just iterables**. This is something unintuitive. If we store the result of *any* of these views in a variable and then modify the dictionary, the variable will reflect this modification. That is to say, looking up the variable performs a dictionary lookup too. 

In [54]:
d = {'a': 1, 'b': 2}

my_items = d.items()
print(my_items)

d['a'] = 100
d['b'] = 200
d['c'] = 300

print(my_items)

dict_items([('a', 1), ('b', 2)])
dict_items([('a', 100), ('b', 200), ('c', 300)])


The `keys()` view behaves like a **set**. 

This makes sense since `sets` are essentially dictionaries with no values. The elements in a set are guaranteed to be unique and hashable. So to reiterate, the **keys of a dictionary are a set**. 

Therefore, the `.keys()` view has set-like functionality. We can perform unions, intersections and differences.

**The `values()` view does *not* behave like a set**. 

This makes sense since it doesn't satisfy either condition of uniqueness and hashability above.

**The `items()` view *may* behave like a set**.

We know that each key-value tuple will be unique from one another because each key is guaranteed to be unique. 

The only thing we need to check is if **all** the values are hashable - Python does this. If they are, then the `items()` view will also have set-like behaviour. 

#### Set Operations

We already know the basics of this:

In [1]:
s1 = {1, 2, 3}
s2 = {2, 3, 4}

Unions:

In [2]:
s1 | s2

{1, 2, 3, 4}

In [3]:
s1 & s2

{2, 3}

Differences: 

What is in `s1` that isn't in `s2`:

In [4]:
s1 - s2

{1}

What is in `s2` that isn't in `s1`:

In [5]:
s2 - s1

{4}

To demonstrate the set-like behaviour of dictionary keys:

In [56]:
d1 = {1: None, 2: None, 3: None}
d2 = {2: None, 3: None, 4: None}

In [57]:
d1.keys() | d2.keys()

{1, 2, 3, 4}

In [58]:
d1.keys() & d2.keys()

{2, 3}

In [59]:
d1.keys() - d2.keys()

{1}

In [60]:
d2.keys() - d1.keys()

{4}

We can demonstrate the set-like behaviour working on the `.items()` view given that the values are hashable:

In [4]:
d1 = {'a': 1, 'b': 2, 'c': 3}
d2 = {'b': 2, 'c': 30, 'd': 4}

d1.items() | d2.items() 

{('a', 1), ('b', 2), ('c', 3), ('c', 30), ('d', 4)}

As you can see, `('c', 3)` and `('c', 30)` are both present because, while they do have the same key, their values are different, so the key-value pairs are different.

We can show that it doesn't work if the values are unhashable (i.e. they're mutable).

In [5]:
d3 = {'a': [1, 2], 'b': [3, 4]}
d4 = {'b': [30, 40], 'c': [5, 6]}

d3.items() | d4.items() 

TypeError: unhashable type: 'list'

#### Examples

##### Example 1

Let's say we have two dictionaries, and we want to create a new dictionary that contains all the items whose keys are in both dictionaries.
We want the value in the new dictionary to be a tuple containing all the values from both dictionaries:

In [33]:
d1 = {'a': 1, 'b': 2, 'c': 3}
d2 = {'b': 2, 'c': 30, 'd': 4}

We can do this with a dictionary comprehension:

In [6]:
new_dict = {key:(d1[key], d2[key]) for key in d1.keys() & d2.keys()}
print(new_dict)

{'b': (2, 2), 'c': (3, 30)}


##### Example 2

###### Part I

For this example, suppose we have two dictionaries, and we want to identify items whose keys are common to both dictionaries, i.e., the **symmetric difference**. We can do this three ways:

Firstly:

In [22]:
d1 = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
d2 = {'a': 10, 'b': 20, 'c': 30, 'e': 5}

unique_keys = (d1.keys() - d2.keys()) | (d2.keys() - d1.keys())
print(unique_keys)

{'d', 'e'}


Secondly:

In [23]:
d1 = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
d2 = {'a': 10, 'b': 20, 'c': 30, 'e': 5}

unique_keys = (d1.keys() | d2.keys()) - (d2.keys() & d1.keys())
print(unique_keys)

{'d', 'e'}


Thirdly:

In [26]:
d1 = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
d2 = {'a': 10, 'b': 20, 'c': 30, 'e': 5}

unique_keys = d1.keys() ^ d2.keys()
print(unique_keys)

{'d', 'e'}


###### Part II

Let's say we want the associated value with the key. We could do this two ways:

Firstly:

In [28]:
unique_dict = {}

for key in unique_keys:
    if key in d1:
        unique_dict[key] = d1[key]

    else:
        unique_dict[key] = d2[key]

print(unique_dict)

{'d': 4, 'e': 5}


Secondly (much better):

In [29]:
unique_dict = {key: d1.get(key) or d2.get(key) for key in unique_keys}
print(unique_dict)

{'d': 4, 'e': 5}


If `d1.get(key)` returns `None` (falsy), then we return whatever `d2.get(key)` returns.

# 04 - Updating, Merging and Copying

#### Updating/Merging

There are three forms to the `.update()` method. They are similar to the different ways we can create dictionaries:

**Updating** means that: for every key-value pair in `d2`, if the key is not in `d1`, insert the pair into d1, otherwise update the value for the key in d1.

##### `d1.update(d2)`

We pass one dictionary `d2` to another dictionary `d1` to update `d1`'s items (key order preserved):

In [39]:
d1 = {'a': 1, 'b': 2}
d2 = {'b': 20, 'c': 30}
d1.update(d2)

print(d1)

{'a': 1, 'b': 20, 'c': 30}


##### `d1.update(iterable)`

The iterable must contain subiterables of the form `(key, value)`, e.g. `iterable = ((key1, value1), (key2, value2))`. The iterable and subiterables do not need to be homogeneous. They can be lists, tuples, list comprehensions, generator expressions or a mix of them - they can be **any iterable**. (key order preserved.)

In [40]:
d1 = {'a': 1, 'b': 2}
d2 = [('b', 20), ['c', 30]]

d1.update(d2)
print(d1)

{'a': 1, 'b': 20, 'c': 30}


##### `d1.update(key1=value1, key2=value2, key3=value3)`

(key order still preserved)

In [41]:
d1 = {'a': 1, 'b': 2}
d2 = {'b': 20, 'c': 30}
d1.update(b=20, c=30)

print(d1)

{'a': 1, 'b': 20, 'c': 30}


##### Unpacking Dictionaries

We can also update via unpacking.

The key thing to note is that **last "update" wins**. If you unpack three dictionaries with common keys, the last to be unpacked will determine the value of the common key. For example:

In [42]:
d1 = {'a': 1, 'b': 2}
d2 = {'a': 10, (0,0): 'origin'}
d3 = {'b': 20, 'c': 30, 'a': 100}

d = {**d1, **d2, **d3}
print(d)

{'a': 100, 'b': 20, (0, 0): 'origin', 'c': 30}


As you can see, `d3` was unpacked last, so the value of `a` was determined by `d3`, not `d1` or `d2`.

An example use case of this is if we have multiple configurations such as default config, global config, dev config and a production config. If we unpack a number of them into a new dictionary in the aforementioned order, then we'll have a superposition of configs:

In [44]:
conf_defaults = dict.fromkeys(('host', 'port', 'user', 'pwd', 'database'), None)
conf_global = {'port': 5432, 'database': 'deepdive'}
conf_dev = {
    'host': 'localhost',
    'user': 'test',
    'pwd': 'test'
}

conf_prod = {
    'host': 'prodpg.deepdive.com',
    'user': '$prod_user',
    'pwd': '$prod_pwd',
    'database': 'deepdive_prod'
}

So if we wanted the production config, all we'll need to do is:

In [46]:
conf = {**conf_defaults, **conf_global, **conf_prod}
print(conf)

{'host': 'prodpg.deepdive.com', 'port': 5432, 'user': '$prod_user', 'pwd': '$prod_pwd', 'database': 'deepdive_prod'}


#### Copying

##### Shallow copies

Shallow copying creates a new container object whose keys and values are shared references with the original object. Updating the original will mutate the copy. But we *can* **insert and delete** without modifying the other because the container is different. 

The three ways to shallow copy are:

- `d_copy = d.copy()`
- `d_copy = {**d}`
- `d_copy = dict(d)`

Therefore, we have to be careful if we have mutable objects in the **keys** and the values.

Notice how I said **keys** above. You would think that you can't have mutable objects for **keys**. It turns out you can. See later on.

##### Deep copies

There will be no shared references even in multi-nested dictionaries. The approach is `d_copy = d.deepcopy()`. This method is general - it works for custom objects, iterables, dictionaries etc (`from copy import deepcopy`)

In [53]:
from copy import deepcopy

d1 = {
    'id': 12345,
    'person': {'name': 'John', 'age': 78},
    'posts': [100, 105, 200]
    }

d2 = deepcopy(d1)

In [54]:
d2['person']['name'] = 'John Cleese'
d2['posts'].append(300)

In [55]:
print(d1)
print(d2)

{'id': 12345, 'person': {'name': 'John', 'age': 78}, 'posts': [100, 105, 200]}
{'id': 12345, 'person': {'name': 'John Cleese', 'age': 78}, 'posts': [100, 105, 200, 300]}


# 05 - Custom Classes and Hashing

Consider the following class:

In [65]:
class Person:
    def __init__(self, name):
        self.name = name

p1 = Person('john')
p2 = Person('john')

By default, custom classes compare == if they have the same ID. In other words, if **iff** `p1 == p2` then `p1 is p2`.

In [66]:
p1 == p2

False

By default, Python makes class instances hashable by hashing the ID of the instance. Therefore, if **iff** `p1 == p2` then `p1 is p2`, then `id(p1) == id(p2)` then `hash(p1) == hash(p2)`.

In [67]:
hash(p1)

127613996201

As a result:

In [59]:
d = {p1: 78}
print(d[p1])

78


In [60]:
print(d[p2])

KeyError: <__main__.Person object at 0x000001DB66F33850>

That may not be want... 

It's somewhat unintuitive as we've seen that `t1 = (1, 2)` and `t2 = (1, 2)` will have the same hash value despite being different objects.

To implement this behaviour is simple. All we need to do is override the `__eq__` ensuring that the comparison being made is of immutable objects such as the name which is a string.

In [69]:
class Person:
    def __init__(self, name):
        self.name = name

    def __eq__(self, other):
        if isinstance(other, Person):
            return self.name == other.name
        else:
            return False

p1 = Person('john')
p2 = Person('john')

In [70]:
print(p1 == p2)

True


But once we implement `__eq__`, the class instance is no longer hashable:

In [71]:
hash(p1)

TypeError: unhashable type: 'Person'

This behaviour is understandable because the hash is determined by the ID. If we assume that `p1 == p2`, then `id(p1) != id(p2)` so `hash(p1) != hash(p2)` - but this violates as our initial assumption of `p1 == p2` implies that `hash(p1) == hash(p2)`.

If we want to define hashing for custom classes, all we need to do is implement `__hash__` and **ensure that it returns an integer** and if `a == b` then `__hash__(a) == __hash__(b)`.

How do we indicate that the class is **not** hashable? We set the `__hash__` *attribute* to `None`; that is, `__hash__ = None`. This is infact what Python does when we implement the `__eq__` method. Note that this attribution is **not** the same as defining the function and returning `None`.

Very often we'll hash some particular defining attribute of the class instance, such as the name.

In [72]:
class Person:
    def __init__(self, name):
        self.name = name

    def __eq__(self, other):
        if isinstance(other, Person):
            return self.name == other.name
        else:
            return False

    def __hash__(self):
        return hash(self.name)

p1 = Person('john')
p2 = Person('john')

In [73]:
d = {p1: 78}
print(d[p1])

78


We can now access `p1`'s key using `p2` because `hash(p1) == hash(p2)`: 

In [74]:
print(d[p2])

78
