# Ch3 字典和集合
本质是散列表

## 3.1 泛映射类型

> collections.abc 模块中有Mapping 和 MutableMapping 这两个抽象基类，它们的作用是为字典和集合dict和其他类似的类型定义形式接口

继承树：

`Container` class
- `__contains__`
  
`Iterable` class
- `__iter__`

`Sized` class
- `__len__`

`Mapping` class extends `Container`, `Iterable`, `Sized`
- `__getitem__`
- `___contains__`
- `__eq__`
- `__ne__`
- `get`
- `keys`
- `items`
- `values`

`MutableMapping` class extends `Mapping`
- `__setitem__`
- `__delitem__`
- `pop`
- `popitem`
- `clear`
- `update`
- `setdefault`

> 然而，非抽象映射类型一般不会直接继承这些抽象基类，它们会直接对`dict`或是`collections.User.Dict`进行扩展。这些抽象基类的主要作用是作为形式化的文档，它们定义了构建一个映射类型所需要的最基本的接口。


In [1]:
import collections.abc as abc

my_dict = {}
isinstance(my_dict, abc.Mapping)
#这里用isinstance而不是type来检查某个参数是否为dict类型，因为这个参数有可能不是dict，而是一个比较另类的映射类型。

True

What are hashable objects?
- An object is hashable if it has a hash value which never changes during its lifetime (it needs a `__hash__()` method), and can be compared to other objects (it needs an `__eq__()` method). Hashable objects which **compare equal must have the same hash value**.

str, bytes, numeric types are hashable. Tuple is hashable **if all its elements are hashable**.

Normally, all user defined objects are hashable because their hash value is their id(). If an object implements a custom `__eq__()` that takes into account its internal state, it may be hashable only if all its attributes are immutable.

Here are different ways to construct a dictionary:

In [2]:
a = dict(one = 1, two = 2, three = 3)
b = {'one':1,'two':2,"three":3}
c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))
d = dict([('two', 2), ('one', 1), ('three', 3)])
e = dict({'three': 3, 'one': 1, 'two': 2}) 
a == b == c == d == e

True

## 3.2 字典推导


In [4]:
# dialcodes.py
# BEGIN DIALCODES
# dial codes of the top 10 most populous countries
DIAL_CODES = [
        (86, 'China'),
        (91, 'India'),
        (1, 'United States'),
        (62, 'Indonesia'),
        (55, 'Brazil'),
        (92, 'Pakistan'),
        (880, 'Bangladesh'),
        (234, 'Nigeria'),
        (7, 'Russia'),
        (81, 'Japan'),
    ]

d1 = dict(DIAL_CODES)  # <1>
print('d1:', d1.keys())
d2 = dict(sorted(DIAL_CODES))  # <2>
print('d2:', d2.keys())
d3 = dict(sorted(DIAL_CODES, key=lambda x:x[1]))  # <3>
print('d3:', d3.keys())
assert d1 == d2 and d2 == d3  # <4>
# END DIALCODES


d1: dict_keys([86, 91, 1, 62, 55, 92, 880, 234, 7, 81])
d2: dict_keys([1, 7, 55, 62, 81, 86, 91, 92, 234, 880])
d3: dict_keys([880, 55, 86, 91, 62, 81, 234, 92, 7, 1])


## 3.3 常见的映射方法
对于`dict` `defaultdict` `OrderedDict`的常见方法举例

> 后面两个数据类型是`dict`的变种，位于`collections`模块内

`update(m, [**kargs])` duck typing, `m` can be a mapping or an iterable of key-value pairs. The method will first check if `m` has a `keys()` method, if not, it will iterate over `m` assuming it is an iterable of key-value pairs.

`d[k]`和`d.get(k)`的区别在于：如果键`k`不在字典中，`d[k]`会报错，而`d.get(k,default)`会返回defualt值。

In [None]:
# index0.py with slight modification
"""Build an index mapping word -> list of occurrences"""

import sys
import re

WORD_RE = re.compile(r'\w+')

index = {}
with open(sys.argv[1], encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start()+1
            location = (line_no, column_no)
            # this is ugly; coded like this to make a point
            occurrences = index.get(word, [])  # <1>
            occurrences.append(location)       # <2>
            index[word] = occurrences          # <3>

# print in alphabetical order
for word in sorted(index, key=str.upper):  # <4> 
    print(word, index[word])
    
# <4> 没有调用str.upper 而是把方法的引用传递给sorted
#     以便在排序时将单词规范为统一形式

In [None]:
import sys
import re

WORD_RE = re.compile(r'\w+')

index = {}
with open(sys.argv[1], encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start()+1
            location = (line_no, column_no)
            index.setdefault(word, []).append(location)  # <1> only one line, one query on key

# print in alphabetical order
for word in sorted(index, key=str.upper):
    print(word, index[word])

## 3.4 映射的弹性键查询（处理找不到键的情况）
- 通过`defaultdict`来实现
- 自定义`dict`的子类，实现`__missing__`方法

### 3.4.1 `defaultdict`: 处理找不到键的情况

> 具体而言，在实例化一个`defaultdict`的时候，需要给构造方法提供一个可调用对象，这个可调用对象会在`__getitem__`碰到找不到的键的时候被调用，让`__getitem__`返回某种默认值。

比如，我们新建了这样一个字典：`dd = defaultdict(list)`，如果键`'new-key'`在`dd`中还不存在的话，表达式`dd['new-key']`会按照以下的步骤来行事。

1. 调用list() 建立一个新列表
2. 把新列表作为值，`'new-key'`作为键，放到`defaultdict`中
3. 返回列表的引用(?)

> 而这个用来生成默认值的可调用对象存放在名`default_factory`的实例属性里。

In [None]:
import sys
import re
import collections

WORD_RE = re.compile(r'\w+')

index = collections.defaultdict(list)     # <1> list method as default factory
with open(sys.argv[1], encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start()+1
            location = (line_no, column_no)
            index[word].append(location)  # <2> can always success

# print in alphabetical order
for word in sorted(index, key=str.upper):
    print(word, index[word])

> 如果在创建`defaultdict` 的时候没有指定`default_factory`，查询不存在的键会触发`KeyError`。

default_factory 只会在`__getitem__`里被调用，而在其他的方法里不会被调用。比如，当key不存在时，`dd.get(k)`会返回`None`，而不会调用`default_factory`。

这一切的背后其实都靠的是`__missing__`方法。

### 3.4.2 `__missing__`方法
