# Dictionaries and Sets

The dict type is not only widely used in our programs but also a fundamental part of the Python implementation. Module namespaces, class and instance attributes, and function keyword arguments are some of the fundamental constructs where dictionaries are deployed. The built-in functions live in __builtins__.__dict__.

In [2]:
__builtins__.__dict__

{'__name__': 'builtins',
 '__doc__': "Built-in functions, exceptions, and other objects.\n\nNoteworthy: None is the `nil' object; Ellipsis represents `...' in slices.",
 '__package__': '',
 '__loader__': _frozen_importlib.BuiltinImporter,
 '__spec__': ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>),
 '__build_class__': <function __build_class__>,
 '__import__': <function __import__>,
 'abs': <function abs(x, /)>,
 'all': <function all(iterable, /)>,
 'any': <function any(iterable, /)>,
 'ascii': <function ascii(obj, /)>,
 'bin': <function bin(number, /)>,
 'callable': <function callable(obj, /)>,
 'chr': <function chr(i, /)>,
 'compile': <function compile(source, filename, mode, flags=0, dont_inherit=False, optimize=-1)>,
 'delattr': <function delattr(obj, name, /)>,
 'dir': <function dir>,
 'divmod': <function divmod(x, y, /)>,
 'eval': <function eval(source, globals=None, locals=None, /)>,
 'exec': <function exec(source, globals=None, locals=None, /)>,

Because of their crucial role, Python dicts are highly optimized. Hash tables are the engines behind Python’s high-performance dicts.

In [4]:
from collections import abc

my_dict = {}
isinstance(my_dict, abc.Mapping)

True

Ways to build dictionaries...

In [6]:
dict(one=1, two=2, three=3)

{'one': 1, 'two': 2, 'three': 3}

In [7]:
{'one': 1, 'two': 2, 'three': 3}

{'one': 1, 'two': 2, 'three': 3}

In [8]:
dict(zip(['one', 'two', 'three'], [1, 2, 3]))

{'one': 1, 'two': 2, 'three': 3}

In [9]:
dict([('two', 2), ('one', 1), ('three', 3)])

{'two': 2, 'one': 1, 'three': 3}

In [10]:
dict({'three': 3, 'one': 1, 'two': 2})

{'three': 3, 'one': 1, 'two': 2}

## Dict Comprehensions

In [11]:
DIAL_CODES = [
    (86, 'China'),
    (91, 'India'),
    (1, 'United States'),
    (62, 'Indonesia'),
    (55, 'Brazil'),
    (92, 'Pakistan'),
    (880, 'Bangladesh'),
    (234, 'Nigeria'),
    (7, 'Russia'),
    (81, 'Japan'),
]

In [13]:
country_code = {country: code for code, country in DIAL_CODES}
country_code

{'China': 86,
 'India': 91,
 'United States': 1,
 'Indonesia': 62,
 'Brazil': 55,
 'Pakistan': 92,
 'Bangladesh': 880,
 'Nigeria': 234,
 'Russia': 7,
 'Japan': 81}

In [14]:
{code: country.upper() for country, code in country_code.items() if code < 66}

{1: 'UNITED STATES', 62: 'INDONESIA', 55: 'BRAZIL', 7: 'RUSSIA'}

## Handling Missing Keys with setdefault

In [15]:
d = {'one': 1, 'two': 2, 'three': 3}

In [16]:
d['one']

1

In [18]:
d.get("ten") is None

True

In [21]:
d.get("ten", "XXX") # there is an optional default

'XXX'

In [22]:
d

{'one': 1, 'two': 2, 'three': 3}

The method setdefault() is similar to get(), but will set dict[key]=default if key is not already in dict.

In [23]:
d.setdefault("ten", "XXX") # returns the default value and alters d

'XXX'

In [24]:
d

{'one': 1, 'two': 2, 'three': 3, 'ten': 'XXX'}

## Mappings with Flexible Key Lookup
A defaultdict will never raise a KeyError. Any key that does not exist gets the value returned by the default factory.

In [1]:
from collections import defaultdict

d = defaultdict(lambda: "XXX")
d

defaultdict(<function __main__.<lambda>()>, {})

In [2]:
d['a'] = 1
d['b'] = 2

In [3]:
d['a'], d['b']

(1, 2)

In [4]:
d

defaultdict(<function __main__.<lambda>()>, {'a': 1, 'b': 2})

In [5]:
d['c']

'XXX'

In [6]:
d

defaultdict(<function __main__.<lambda>()>, {'a': 1, 'b': 2, 'c': 'XXX'})

So defaultdict keeps the default value for c in the dictionary.

## The __missing__ Method
This method is not defined in the base dict class, but dict is aware of it: if you subclass dict and provide a __missing__ method, the standard dict.__getitem__ will call it whenever a key is not found, instead of raising KeyError.

In [7]:
class StrKeyDict(dict):
    def __missing__(self, value):
        print(value, "is missing...")


In [8]:
d = StrKeyDict()

In [9]:
d['a']

a is missing...


In [10]:
d

{}

Or how they define it in the book...

In [2]:
class StrKeyDict0(dict):
    ''' converts nonstring keys to str on lookup
        so the keys can be looked up for a second time as a string
        will still raise an error if the key does not exist
    '''
    def __missing__(self, key):
        if isinstance(key, str):
            raise KeyError(key)
        return self[str(key)]

    def get(self, key, default=None):
        try:
            return self[key]
        except KeyError:
            return default

    def __contains__(self, key):
        return key in self.keys() or str(key) in self.keys()


In [3]:
d = StrKeyDict0([('2', 'two'), ('4', 'four')])
d

{'2': 'two', '4': 'four'}

In [4]:
d['2']

'two'

In [5]:
d[2]

'two'

In [6]:
d[4]

'four'

## Variations of dict

-  collections.OrderedDict: maintains the insertion order.
-  collections.ChainMap: holds a list of mappings that can be searched as one.
-  collections.Counter: A mapping that holds an integer count for each key.
-  collections.UserDict: A pure Python implementation of a mapping that works like a standard dict

While OrderedDict, ChainMap, and Counter come ready to use, UserDict is designed to be subclassed...

## Subclassing UserDict
It’s almost always easier to create a new mapping type by extending UserDict rather than dict. The main reason why it’s preferable to subclass from UserDict rather than from dict is that the built-in has some implementation shortcuts that end up forcing us to override methods that we can just inherit from UserDict with no problems. The exact problem with subclassing dict and other built-ins is covered in 'Subclassing Built-In Types Is Tricky' on page 348. Here is a re-working of our previous example...

In [8]:
import collections

class StrKeyDict(collections.UserDict):
    def __missing__(self, key):
        if isinstance(key, str):
            raise KeyError(key)
        return self[str(key)]

    def __contains__(self, key):
        return str(key) in self.data

    def __setitem__(self, key, item):
        self.data[str(key)] = item


For clarity, __missing__ is called if you try to retrieve data for a missing key:

In [9]:
class TestDict(collections.UserDict):
    def __missing__(self, key):
        print(key, "is missing...")
        
d = TestDict()

In [10]:
d["Some Key"]

Some Key is missing...


__contains__ is meant to be called when using the 'in' operator. The detault is just to check for the key, but here we explicitly look for str(key). Likewise, we also apply the dictionary inserts / updates with strings as keys by overriding __setitem__.

## Immutable Mappings
The mapping types provided by the standard library are all mutable, but you may need to guarantee that a user cannot change a mapping by mistake.

In [15]:
# a mappingproxy instance that is a read-only but dynamic view of the original mapping.
from types import MappingProxyType

d = {1: 'A'}
d_proxy = MappingProxyType(d)
d_proxy

mappingproxy({1: 'A'})

In [16]:
d_proxy[1]

'A'

In [18]:
try:
    d_proxy[2] = "x"
except Exception as e:
    print(e)

'mappingproxy' object does not support item assignment


In [19]:
d[2] = 'B'

In [20]:
d_proxy

mappingproxy({1: 'A', 2: 'B'})

## Set Theory

In [27]:
l = ['spam', 'spam', 'eggs', 'spam']
set(l), type(set(l))

({'eggs', 'spam'}, set)

In [29]:
list(set(l)), type(list(set(l)))

(['spam', 'eggs'], list)

In [41]:
needles = [1, 2]
haystack = [j for i in range(3) for j in range(10)]

In [47]:
# the loopy way to count needles
found = 0
for n in needles:
    if n in haystack:
        found += 1

found # found 2 of the needles in our haystack

2

In [48]:
# alternatively
found = len(set(needles) & set(haystack))
found

2

In [50]:
found = len(set(needles).intersection(haystack))
found

2

In [52]:
s = {1}
type(s)

set

In [53]:
s

{1}

In [54]:
s.pop()

1

In [55]:
s

set()

Literal set syntax like {1, 2, 3} is both faster and more readable than calling the constructor (e.g., set([1, 2, 3])).

In [56]:
s = {1, 2, 3}

In [58]:
type(s)

set

## Set Comprehensions

In [63]:
import random
random.seed(0)

choices = ['a', 'b', 'c', 'd', 'e']

In [65]:
{random.choice(choices) for x in range(10)} # getting a setcomp

{'a', 'b', 'c', 'e'}

***