# Dictionaries and Sets
- Using isinstance with an ABC is often better than checking whether a function argument is of the concrete dict type, because then alternative mapping types can be used.
- To implement a custom mapping, it’s easier to extend collections.UserDict, or to wrap a dict by composition, instead of subclassing these ABCs.
- The collections.UserDict class and all concrete mapping classes in the standard library encapsulate the basic dict in their implementation, which in turn is built on a hash table.

In [6]:
from collections import abc
my_dict = {}
print(isinstance(my_dict, abc.Mapping))
print(isinstance(my_dict, abc.MutableMapping))

True
True


## Hashable
- An object is hashable if it has a hash value which never changes during its lifetime
- Container types are hashable if they are immutable and all contained objects are also hashable.
- A frozenset is always hashable, because every element it contains must be hashable by definition.
- A tuple is hashable only if all its items are hashable.
- The hash value of a correctly implemented object is guaranteed to be constant only within one Python process.
- User-defined types are hashable by default because their hash code is their id() and the \_\_eq\_\_() method inherited from the object class simply compares the object ids.
- If an object implements a custom \_\_eq\_\_() which takes into account its internal state, it will be hashable only if its \_\_hash\_\_() always returns the same hash code.
- In practice, this requires that \_\_eq\_\_() and \_\_hash\_\_() only take into account instance attributes that never change during the life of the object.

## Dict
- CPython 3.6 started preserving the insertion order of the keys as an implementation detail
- Before Python 3.6, c.popitem() would remove and return an arbitrary key-value pair. Now it always removes and returns the last key-value pair added to the dict.
- A dictcomp builds a dict instance by taking key:value pairs from any iterable.
- The way d.update(m) handles its first argument m is a prime example of duck typing: it first checks whether m has a keys method and, if it does, assumes it is a mapping.
- Otherwise, update() falls back to iterating over m, assuming its items are (key, value) pairs.
- The constructor for most Python mappings uses the logic of update() internally, which means they can be initialized from other mappings or from any iterable object producing (key, value) pairs.

In [9]:
## Creating dict
a = {'one':1,'two':2, 'three':3}
print(list(a.keys()))
b = dict(one=1, two=2, three=3)
print(list(b.keys()))
c = dict([('one', 1),('two', 2),('three', 3)])
print(list(c.keys()))
d = dict(zip(['one','two','three'],[1,2,3]))
print(list(d.keys()))

['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']


### Handling Missing Keys with setdefault
- In line with Python’s fail-fast philosophy, dict access with d\[k\] raises an error when k is not an existing key.
- Every Pythonista knows that d.get(k, default) is an alternative to d\[k\] whenever a default value is more convenient than handling KeyError.
- However, when updating the mutable value found, using either d\[k\] or get is awkward and inefficient.
- **The latter code performs at least two searches for key—three if it’s not found—while setdefault does it all with a single lookup.**

In [32]:
import random
random.seed(10)
power_scores = {}
for _ in range(30):
    num = random.randint(1,10)
    increasing_power = power_scores.get(num, [])
    increasing_power.append(num)
    power_scores[num] = increasing_power
print(power_scores)

{10: [10, 10, 10], 1: [1, 1, 1, 1], 7: [7, 7, 7, 7], 8: [8, 8, 8, 8, 8], 4: [4, 4], 5: [5, 5, 5, 5], 3: [3, 3, 3], 9: [9], 6: [6, 6, 6], 2: [2]}


In [34]:
import random
random.seed(10)
power_scores = {}
for _ in range(30):
    num = random.randint(1,10)
    power_scores.setdefault(num, []).append(num)
print(power_scores)

{10: [10, 10, 10], 1: [1, 1, 1, 1], 7: [7, 7, 7, 7], 8: [8, 8, 8, 8, 8], 4: [4, 4], 5: [5, 5, 5, 5], 3: [3, 3, 3], 9: [9], 6: [6, 6, 6], 2: [2]}


### defaultdict: Another Take on Missing Keys
- When instantiating a defaultdict, you provide a callable that is used to produce a default value whenever __getitem__ is passed a nonexistent key argument.
- Given an empty defaultdict created as dd = defaultdict(list), if 'new-key' is not in dd, the expression dd\['new-key'\] does the following steps:
    1. Calls list() to create a new list.
    2. Inserts the list into dd using 'new-key' as key.
    3. Returns a reference to that list.

In [2]:
import random
from collections import defaultdict
random.seed(10)
power_scores = defaultdict(list)
for _ in range(30):
    num = random.randint(1,10)
    power_scores[num].append(num)
print(power_scores)

defaultdict(<class 'list'>, {10: [10, 10, 10], 1: [1, 1, 1, 1], 7: [7, 7, 7, 7], 8: [8, 8, 8, 8, 8], 4: [4, 4], 5: [5, 5, 5, 5], 3: [3, 3, 3], 9: [9], 6: [6, 6, 6], 2: [2]})


### \_\_missing\_\_
- Underlying the way mappings deal with missing keys is the aptly named \_\_missing\_\_ method.
- This method is not defined in the base dict class, but dict is aware of it: if you subclass dict and provide a \_\_missing\_\_ method, the standard dict.\_\_getitem\_\_ will call it whenever a key is not found, instead of raising KeyError.
- **The \_\_missing\_\_ method is only called by \_\_getitem\_\_ (i.e., for the d\[k\] operator). The presence of a \_\_missing\_\_ method has no effect on the behavior of other methods that look up keys, such as get or \_\_contains\_\_ (which implements the in operator).**

In [38]:
"""Coding a class that allows string keys referenced as ints. Good example of usage is library mapping of hardware pins of GPIO on raspberry pie
so gpio['1'] and gpio[1] should return same information"""

class TestDict(dict):
    def __missing__(self, key):
        # this method is called when key is missing from the dict, that can occur if str key is missing or key sent is an int.
        # str key can go missing when user uses str notation or int notation, hence isinstance check is necessary to avoid infiinte recurssion.
        if isinstance(key, str):
            raise KeyError(key)
        return self[str(key)]
    
    def get(self, key, default=None):
        try:
            return self[key]
        except KeyError:
            return default
    
    def __contains__(self, key):
        return key in self.keys() or str(key) in self.keys()

In [39]:
a = TestDict([('1','one'),('2','two')])
print(a)
print(a['1'],a[1])
a['4'] = 'four'  
# a[5] = 'five' # while a[5] will work as well, keys with different data type will create issues in print(a[4], a['4'])
print(a)
print(a[4], a['4'])
# print(a[5], a['5'])

print(a[3]) # stack trace indicates how recurssion was avoided!

{'1': 'one', '2': 'two'}
one one
{'1': 'one', '2': 'two', '4': 'four'}
four four


KeyError: '3'

- A search like k in my_dict.keys() is efficient in Python 3 even for very large mappings because dict.keys() returns a view, which is similar to a set.
- However, remember that k in my_dict does the same job, and is faster because it avoids the attribute lookup to find the .keys method.

### Variations of dict
- collections.OrderedDict : 
    - Maintains keys in insertion order, allowing iteration over items in a predictable order.
    - The popitem method of an OrderedDict pops the last item by default, but if called as my_odict.popitem(last=False), it pops the first item added. 
    - **The main reason to use OrderedDict is writing code that is backward-compatible with earlier Python versions.**

### Building custom mapping : Subclassing UserDict
- The main reason why it’s better to subclass UserDict rather than dict is that the **built-in has some implementation shortcuts that end up forcing us to override methods that we can just inherit from UserDict with no problems.**
- **Note that UserDict does not inherit from dict, but uses composition: it has an internal dict instance, called data.**

In [40]:
from collections import UserDict
class TestDict(UserDict):
    def __missing__(self, key):
        if isinstance(key, str):
            raise KeyError(key)
        return self[str(key)]
    
    def __contains__(self, key):
        return str(key) in self.data
    
    def __setitem__(self, key, value):
        self.data[str(key)] = value


In [41]:
a = TestDict([('1','one'),('2','two')])
print(a)
print(a['1'],a[1])
a['4'] = 'four' 
a[5] = 'five'
print(a)
print(a[4], a['4'])
print(a[5], a['5'])
print(a[3]) # stack trace indicates how recurssion was avoided!

{'1': 'one', '2': 'two'}
one one
{'1': 'one', '2': 'two', '4': 'four', '5': 'five'}
four four
five five


KeyError: '3'

### Immutable Mappings
- The mapping types provided by the standard library are all mutable, but you may need to guarantee that a user cannot change a mapping by mistake. 
- As such, it’s nice to prevent inadvertent updates to board.pins because the hardware can’t be changed via software, so any change in the mapping would make it inconsistent with the physical reality of the device.
- Since Python 3.3, the types module provides a wrapper class called MappingProxyType, which, given a mapping, returns **a mappingproxy instance that is a read-only but dynamic proxy for the original mapping.**

In [46]:
from types import MappingProxyType
d = {1:'A', 2:'B'}
d_proxy = MappingProxyType(d)
print(d_proxy[1])
print(d_proxy.items())
d_proxy[1] = 'X'

A
dict_items([(1, 'A'), (2, 'B')])


TypeError: 'mappingproxy' object does not support item assignment

In [47]:
d[3] = 'C'
print(d_proxy)

{1: 'A', 2: 'B', 3: 'C'}


## Sets
### Similarities between set and dict views
- In particular, dict_keys and dict_items implement the special methods to support the powerful set operators & (intersection), | (union), - (difference) and ^ (symmetric difference).

In [52]:
a = {1:'A',2: 'B'}
a_lower = {1:'a',2:'b',3:'c'}
a_key_set = {1,2,3,4,5,6}
print(a.keys() & a_lower.keys())
print(a.keys()& a_key_set)

{1, 2}
{1, 2}
