<a href="https://colab.research.google.com/github/present42/PyTorchPractice/blob/main/Fluent_Python_ch3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Dictionaries and Set

<p> Class and instance attributes, module namespaces, and function keyword arguments are some of the core Python constructs represented by dictionaries in memory. </p>
<p> Hashtables are the engines behind Python's high-performance dicts </p>


In [None]:
dial_codes = [
    (880, 'Bangladesh'),
    (55, 'Brazil'),
    (86, 'China'),
    (91, 'India'),
    (852, 'Hong Kong'),
    (82, 'South Korea'),
    (234, 'Nigeria'),
    (92, 'Pakistan'),
    (7, 'Russia'),
    (1, 'United States'),
]

In [None]:
country_dial = { country:code for code, country in dial_codes}

In [None]:
country_dial

{'Bangladesh': 880,
 'Brazil': 55,
 'China': 86,
 'India': 91,
 'Hong Kong': 852,
 'South Korea': 82,
 'Nigeria': 234,
 'Pakistan': 92,
 'Russia': 7,
 'United States': 1}

In [None]:
{code: country.upper()
  for country, code in sorted(country_dial.items())
  if code < 70}

{55: 'BRAZIL', 7: 'RUSSIA', 1: 'UNITED STATES'}

In [None]:
def dump(**kwargs): # we can apply ** to more than one argument in a function call
  return kwargs

dump(**{'x': 1}, y=2, **{'z': 3})

{'x': 1, 'y': 2, 'z': 3}

In [None]:
{'a': 0, **{'x': 1}, 'y': 2, **{'z': 3, 'x': 4}} # duplicated keys are allowed / later occurences overwrites

{'a': 0, 'x': 4, 'y': 2, 'z': 3}

## Merging Mappings with `|`

In [None]:
d1 = {'a': 1, 'b': 3}
d2 = {'a': 2, 'b': 4, 'c': 6}
d1 | d2

{'a': 2, 'b': 4, 'c': 6}

In [None]:
d2 | d1

{'a': 1, 'b': 3, 'c': 6}

In [None]:
d1 |= d2

In [None]:
d1

{'a': 2, 'b': 4, 'c': 6}

## Pattern Matching with Mappings

In [None]:
def get_creators(record: dict) -> list:
  match record:
    case {'type': 'book', 'api': 2, 'authors': [*names]}:
      return names

    case {'type': 'book', 'api': 1, 'author': name}:
      return [name]

    case {'type': 'book'}:
      raise ValueError(f"Invalid 'book' record: {record!r}")

    case {'type': 'movie', 'director': name}:
      return [name]

    case _:
      raise ValueError(f"Invalid record: {record!r}")


In [None]:
b1 = dict(api=1, author='Doublas Hofstadter', type='book', title='Godel, Escher, Bach')

In [None]:
get_creators(b1)

['Doublas Hofstadter']

In [None]:
from collections import OrderedDict
b2 = OrderedDict(api=2, type='book', title='Python in a Nutshell', authors='Martelli Ravenscroft Holden'.split())
get_creators(b2)

['Martelli', 'Ravenscroft', 'Holden']

In [None]:
get_creators({'type': 'book', 'pages': 770})

ValueError: Invalid 'book' record: {'type': 'book', 'pages': 770}

In [None]:
food = dict(category='ice cream', flavor='vanilla', cost=199)
match food:
  case {'category': 'ice cream', **details}:
    print(f"Ice cream details: {details}")

Ice cream details: {'flavor': 'vanilla', 'cost': 199}


<p>Note that the automatic handling of missing keys is not triggered because pattern matching always uses the `d.get(key, sentinel)` method</p>

### Standard API of mapping type

In [None]:
from collections import abc

In [None]:
my_dict = {}
isinstance(my_dict, abc.Mapping)

True

In [None]:
isinstance(my_dict, abc.MutableMapping)

True

In [None]:
my_dict.get('hi', 0)

0

In [None]:
my_dict

{}

### What is Hashable
<p>An object is hashable if it has a hash code which never changes during its lifetime. (it requires <code>__hash__</code>)</p>
<p>It can be computed to other objects.</p>

In [None]:
tt = (1, 2, (30, 40))
hash(tt)

-3907003130834322577

In [None]:
tl = (1, 2, [30, 40])
hash(tl)

TypeError: unhashable type: 'list'

In [None]:
tf = (1, 2, frozenset([30, 40]))
hash(tf)

5149391500123939311

In [None]:
d = {'hi': 1, 3: 'there'}
d.clear()
d

In [None]:
d = {'hi': 1, 3: 'there'}
'hi' in d

True

In [None]:
d.copy() == d

True

In [None]:
import copy
copy.copy(d) == d

True

In [None]:
from collections import defaultdict

In [None]:
def def_value():
  return "hi there"
a = defaultdict(def_value)
a["3"] = "hi"
a[2] = "3"
a[(2, 3)] = "hello"
a.default_factory()

del a["3"]

In [None]:
a

defaultdict(<function __main__.def_value()>, {2: '3', (2, 3): 'hello'})

In [None]:
a.get((3, 4))

In [None]:
for k, v in a.items():
  del a[k]

RuntimeError: dictionary changed size during iteration

In [None]:
a

defaultdict(<function __main__.def_value()>,
            {(2, 3): 'hello', '3': 'hi there'})

# Inserting or Updating Mutable Values
<p>If you wanna retrieve a mutable value and want to update it, there is a better way.</p>

In [None]:
!python index0.py zen.txt

a [(17, 48), (18, 53)]
Although [(9, 1), (14, 1), (16, 1)]
ambiguity [(12, 16)]
and [(13, 23)]
are [(19, 12)]
aren [(8, 15)]
at [(14, 38)]
bad [(17, 50)]
be [(13, 14), (14, 27), (18, 50)]
beats [(9, 23)]
Beautiful [(1, 1)]
better [(1, 14), (2, 13), (3, 11), (4, 12), (5, 9), (6, 11), (15, 8), (16, 25)]
break [(8, 40)]
cases [(8, 9)]
complex [(3, 23)]
Complex [(4, 1)]
complicated [(4, 24)]
counts [(7, 13)]
dense [(6, 23)]
do [(13, 64), (19, 48)]
Dutch [(14, 61)]
easy [(18, 26)]
enough [(8, 30)]
Errors [(10, 1)]
explain [(17, 34), (18, 34)]
Explicit [(2, 1)]
explicitly [(11, 8)]
face [(12, 8)]
first [(14, 41)]
Flat [(5, 1)]
good [(18, 55)]
great [(19, 28)]
guess [(12, 52)]
hard [(17, 26)]
honking [(19, 20)]
idea [(17, 54), (18, 60), (19, 34)]
If [(17, 1), (18, 1)]
implementation [(17, 8), (18, 8)]
implicit [(2, 25)]
In [(12, 1)]
is [(1, 11), (2, 10), (3, 8), (4, 9), (5, 6), (6, 8), (15, 5), (16, 16), (17, 23), (18, 23)]
it [(13, 67), (17, 43), (18, 43)]
let [(19, 42)]
may [(14, 19), (18, 

## Automatic Handling of Missing Keys

<p> There are two ways to do this: </p>

 1. Use `defaultdict`
 2. subclass `dict` or any other mapping type and add a `__missing__` method

In [None]:
!python index_default.py zen.txt

a [(17, 48), (18, 53)]
Although [(9, 1), (14, 1), (16, 1)]
ambiguity [(12, 16)]
and [(13, 23)]
are [(19, 12)]
aren [(8, 15)]
at [(14, 38)]
bad [(17, 50)]
be [(13, 14), (14, 27), (18, 50)]
beats [(9, 23)]
Beautiful [(1, 1)]
better [(1, 14), (2, 13), (3, 11), (4, 12), (5, 9), (6, 11), (15, 8), (16, 25)]
break [(8, 40)]
cases [(8, 9)]
complex [(3, 23)]
Complex [(4, 1)]
complicated [(4, 24)]
counts [(7, 13)]
dense [(6, 23)]
do [(13, 64), (19, 48)]
Dutch [(14, 61)]
easy [(18, 26)]
enough [(8, 30)]
Errors [(10, 1)]
explain [(17, 34), (18, 34)]
Explicit [(2, 1)]
explicitly [(11, 8)]
face [(12, 8)]
first [(14, 41)]
Flat [(5, 1)]
good [(18, 55)]
great [(19, 28)]
guess [(12, 52)]
hard [(17, 26)]
honking [(19, 20)]
idea [(17, 54), (18, 60), (19, 34)]
If [(17, 1), (18, 1)]
implementation [(17, 8), (18, 8)]
implicit [(2, 25)]
In [(12, 1)]
is [(1, 11), (2, 10), (3, 8), (4, 9), (5, 6), (6, 8), (15, 5), (16, 16), (17, 23), (18, 23)]
it [(13, 67), (17, 43), (18, 43)]
let [(19, 42)]
may [(14, 19), (18, 

## Automatic Handling of Missing Keys

In [None]:
import collections
import re
import sys

WORD_RE = re.compile(r'\w+')

# list function is default_factory
# which returns an empty list for a missing key
index = collections.defaultdict(list)
with open(sys.argv[1], encoding='utf-8') as fp:
  for line_no, line in enumerate(fp, 1):
    for match in WORD_RE.finditer(line):
      word = match.group()
      column_no = match.start() + 1
      location = (line_no, column_no)
      # if word is not initially in the index,
      # the default_factory is called to produce the missing value
      # which is an empty list in this case
      # that is assigned to index[word]
      index[word].append(location)

for word in sorted(index, key=str.upper):
  print(word, index[word])

In [None]:
!python index_default.py zen.txt

In [None]:
class StrKeyDict0(dict):
  def __missing__(self, key):
    if isinstance(key, str):
      raise KeyError(key)
    # only give a second chance when key is not string (for preventing infinite recursion)
    return self[str(key)]

  def get(self, key, default=None):
    # print(f"Hi, you are trying to get the value associated with {key}")
    try:
      return self[key]
    except KeyError:
      return default

  def __contains__(self, key):
    return key in self.keys() or str(key) in self.keys()

In [None]:
d = StrKeyDict0([('2', 'two'), ('4', 'four')])
d[4]

'four'

In [None]:
d_test = dict([('2', 'two'), ('4', 'four')])
d_test.keys()

dict_keys(['2', '4'])

### `collections.ChainMap`

- A `ChainMap` instance holds a list of mappings that can be searched as one.

In [None]:
d1 = dict(a=1, b=3)
d2 = dict(a=2, b=4, c=6)
from collections import ChainMap

chain = ChainMap(d1, d2)
print(chain['a'])
print(chain['b'])

1
3


In [None]:
chain['c'] = -1

In [None]:
d1

{'a': 1, 'b': 3, 'c': -1}

In [None]:
d2

{'a': 2, 'b': 4, 'c': 6}

In [None]:
# updates or insertions only affect the first input mapping
# It is useful to mplement interpreters for languages with nested scopes

## `collections.Counter`
- a mapping that holds an integer count for each key

In [None]:
import collections

In [None]:
ct = collections.Counter('abracadabra')
ct

Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})

In [None]:
ct.update('aaaaazzz')

In [None]:
ct

Counter({'a': 10, 'b': 2, 'r': 2, 'c': 1, 'd': 1, 'z': 3})

In [None]:
ct.most_common(3)

[('a', 10), ('z', 3), ('b', 2)]

## `shelve.Shelf`

`shelve.open` module-level function returns a `shelve.Shelf` instance--a simple key-value DBM db backed by the `dbm` module
 - `shelve.Shelf` subclasses `abc.MutableMapping`
 - provides a few other I/O management methods, like `sync` and `close`
 - `Shelf` instance is a context manager, so you can use a `with` block to make sure it is closed after use
 - keys and values are saved whenever a new value is assigned to a key.
 - keys must be strings
 - values must be objects tha the `pickle` module can serialize.


## Subclassing `UserDict`
 - Because `UserDict` extends `abc.MutableMapping`, the remaining methods that make `StrKeyDict` a full-fledged mapping are inherited from `UserDict`, `MutableMapping`, or `Mapping`

In [2]:
import collections

class StrKeyDict(collections.UserDict):

  def __missing__(self, key):
    if isinstance(key, str):
      raise KeyError(key)
    return self[str(key)]

  def __contains__(self, key):
    return str(key) in self.data

  def __setitem__(self, key, item):
    print("Huh, you try to set a item?!")
    # setitem converts any key to str
    self.data[str(key)] = item

In [5]:
strkeydict = StrKeyDict({'a': 2, '3': 2})

Huh, you try to set a item?!
Huh, you try to set a item?!


In [6]:
strkeydict[4] = "hi there"

Huh, you try to set a item?!


In [16]:
# MutableMapping.update ends up calling our implementation of __setitem__
strkeydict.update({'4': 'hey there'})
strkeydict[4]

Huh, you try to set a item?!


'hey there'

In [None]:
# StrKeyDict inerited Mapping.get, which is implemented like StrKeyDict0

## Immutable Mapping

- concrete use case: hardware programming library -> the `board.pins` mapping represetns teh physical GPIO pins on the device. It's useful to prevent inadvertent updates to `board.pins` because hardware can't be changed via software

In [17]:
from types import MappingProxyType
d = {1: "A"}
d_proxy = MappingProxyType(d)
d_proxy

mappingproxy({1: 'A'})

In [18]:
d_proxy[1]

'A'

In [19]:
d_proxy[2]

KeyError: 2

In [20]:
d[2] = 'B'

In [21]:
d_proxy[2]

'B'

In [22]:
d_proxy[2] = 'B'

TypeError: 'mappingproxy' object does not support item assignment

## View
 - allows high-performance operations on a `dict` without necessary copying of data
 - `.keys()`, `.values()` `.items()` return instances of classes called `dict_keys`, `dict_values`, `dict_items`.
  * `dict_keys`, `dict_values`, `dict_items` are internal classes (not available via `__builtins__`
 - view object is a dynamic proxy

In [23]:
d = dict(a=10, b=20, c=30)
values = d.values()

In [30]:
values # __repr__ of view object shows its content

dict_values([10, 20, 30])

In [26]:
len(values)

3

In [31]:
list(values) # values are iterable -> can create a list

[10, 20, 30]

In [28]:
reversed(values)

<dict_reversevalueiterator at 0x7aedb33ea390>

In [29]:
values[0]

TypeError: 'dict_values' object is not subscriptable

In [32]:
d['z'] = 99

In [33]:
d

{'a': 10, 'b': 20, 'c': 30, 'z': 99}

In [34]:
values

dict_values([10, 20, 30, 99])

## Practical Consequences of How `dict` Works
 - keys must be hashable object. They must implement proper `__hash__` and `__eq__` methods.

In [35]:
a = {}
b = [1, 2, 3]
a[b] = 3

TypeError: unhashable type: 'list'

In [36]:
class Test:
  def __init__(self):
    print("Hi this class is initialized")

  def test(self, input):
    self.inp = input
    print('hi')

In [37]:
a = Test()

Hi this class is initialized


In [38]:
a.test('ohohoh')

hi


### Tips
 - To save memory, avoid creating instance attributes outside of the `__init__` method
  * Python's default behavior - store instance attributes in a `__dict__` attribute (Unless the class has a `__slots__ attribute)
  * Instances of a class can share a common hash table, stored with the class.

## Set
The `set` and `frozenset` types are implemented with a hash table

In [1]:
l = ['spam', 'spam', 'eggs', 'spam', 'bacon', 'eggs']
set(l)

In [2]:
list(set(l))

['spam', 'eggs', 'bacon']

In [4]:
# If you want to (1) remove duplicates and (2) preserve the order of the first occurrences
dict.fromkeys(l).keys()

dict_keys(['spam', 'eggs', 'bacon'])

In [5]:
list(dict.fromkeys(l).keys())

['spam', 'eggs', 'bacon']

Set element must be hashable. The `set` type is not hashable so you can't build a set with nested set instances. But `frozenset` is hashable. So you can have `frozenset` elements inside a `set`.

In [6]:
l = [1, 2]
set(['a', l, 'b'])

TypeError: unhashable type: 'list'

In [12]:
frozen_l = frozenset(l)
frozen_m = frozenset(l)
k = set(['a', frozen_l, frozen_m, 'b'])

In [13]:
k

{'a', 'b', frozenset({1, 2})}

In [17]:
a = {1, 2, 3} # set literal
type(a)
b = {2, 3, 4}
type(b)
a | b
a ^ b

{1, 4}

Note that there is no literal for empty set (must use `set()`)
If you use `{}`, you created an empty `dict`.

In [18]:
s = {1}
type(s)

set

In [19]:
s.pop()

1

In [20]:
s

set()

In [21]:
s = {1, 2}
s.pop()

1

In [22]:
s

{2}

In [None]:
# Python runs a specialized BUILD_SET bytecode
s = {1, 2, 3}
# slower because Python has to look up the set name to fetch the constructor
# then build a list, and finally pass it to the constructor
s = set([1, 2, 3])

In [25]:
%%timeit
s = {1, 2, 3}

83.2 ns ± 1.22 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [24]:
%%timeit
s = set([1, 2, 3])

360 ns ± 204 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [26]:
frozenset(range(10)) #frozenset must be created by calling the constructor

frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9})

In [46]:
from unicodedata import name
{chr(i) for i in range(32, 256) if 'SIGN' in name(chr(i), '')}

{'#',
 '$',
 '%',
 '+',
 '<',
 '=',
 '>',
 '¢',
 '£',
 '¤',
 '¥',
 '§',
 '©',
 '¬',
 '®',
 '°',
 '±',
 'µ',
 '¶',
 '×',
 '÷'}

Adding elements to a set may change the order of existing elements. That's because the algorithm becomes less efficient if the hash table is more than 2/3 full. So Python may need to move and resize the table as it gorws.

In [50]:
s = {'a', 'b', 'c'}

In [60]:
t = {'a', 'b', 'c', 'd'}

## Set Operations on dict Views

 - view objects returned by the `dict` methods `.keys()` and `items()` are remarkably similar to `frozenset`

In [62]:
d1 = dict(a=1, b=2, c=3, d=4)
d2 = dict(b=20, d=40, e=50)

d1.keys() & d2.keys()

{'b', 'd'}

In [63]:
s = {'a', 'e', 'i'}
d1.keys() & s

{'a'}

In [66]:
d1.keys() | s

{'a', 'b', 'c', 'd', 'e', 'i'}

### Warning
A `dict_items` view only works as a set if all values in the `dict` are hashable.

In [68]:
warning_d = dict(a=[1, 2, 3], b=2)

warning_d.items() | s

TypeError: unhashable type: 'list'