### <center>2018 Winter CS101.08</center>

# <center>散列表和跳跃表</center>

##### <center>by tanzhuxiaqiu@huawei.com</center>

## 今日议程

1. 映射和字典
2. 散列表

## 映射和字典

- 映射（Mapping）可以理解通过一个函数过程将两个（类）对象一一对应起来。
- 字典（Dictionary）通常指可以实现映射功能的一种数据结构，Python中用dict关键字或‘{}’来表示，在其他语言中也被称作Map。



### Python中的通用映射类型

- collections.abc中提供了Mapping和MutableMapping，描述了dict和其衍生类的实现接口
    - collections.defaultdict
    - collections.OrderedDict
    - collections.ChainMap
    - collections.Counter
    - collections.UserDict

![](img/8-1.png)

In [59]:
from collections.abc import MutableMapping
a = dict(a=1, b=2, c=3)
b = {'a': 1, 'b': 2, 'c': 3}
c = dict(zip(['a', 'b', 'c'], [1, 2, 3]))
d = dict([('a', 1), ('b', 2), ('c', 3)])
e = {k:v for v, k in enumerate(list('abc'), 1)}
print(a == b == c == d == e)
isinstance(d, MutableMapping)

True


True

In [11]:
issubclass(dict, MutableMapping)

True

In [9]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [15]:
import re
import collections

WORD_RE = re.compile(r'\w+')

# raise KeyError when using index = {}
index = collections.defaultdict(list)
with open('src/ch08/zen_of_python.txt', encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start() + 1
            location = (line_no, column_no)

            index[word].append(location)

for word in sorted(index, key=str.upper):
    print(word, index[word])

a [(19, 48), (20, 53)]
Although [(11, 1), (16, 1), (18, 1)]
ambiguity [(14, 16)]
and [(15, 23)]
are [(21, 12)]
aren [(10, 15)]
at [(16, 38)]
bad [(19, 50)]
be [(15, 14), (16, 27), (20, 50)]
beats [(11, 23)]
Beautiful [(3, 1)]
better [(3, 14), (4, 13), (5, 11), (6, 12), (7, 9), (8, 11), (17, 8), (18, 25)]
break [(10, 40)]
by [(1, 20)]
cases [(10, 9)]
complex [(5, 23)]
Complex [(6, 1)]
complicated [(6, 24)]
counts [(9, 13)]
dense [(8, 23)]
do [(15, 64), (21, 48)]
Dutch [(16, 61)]
easy [(20, 26)]
enough [(10, 30)]
Errors [(12, 1)]
explain [(19, 34), (20, 34)]
Explicit [(4, 1)]
explicitly [(13, 8)]
face [(14, 8)]
first [(16, 41)]
Flat [(7, 1)]
good [(20, 55)]
great [(21, 28)]
guess [(14, 52)]
hard [(19, 26)]
honking [(21, 20)]
idea [(19, 54), (20, 60), (21, 34)]
If [(19, 1), (20, 1)]
implementation [(19, 8), (20, 8)]
implicit [(4, 25)]
In [(14, 1)]
is [(3, 11), (4, 10), (5, 8), (6, 9), (7, 6), (8, 8), (17, 5), (18, 16), (19, 23), (20, 23)]
it [(15, 67), (19, 43), (20, 43)]
let [(21, 42)]
m

In [20]:
words = re.findall(r'\w+', open('src/ch08/zen_of_python.txt').read().lower())
collections.Counter(words).most_common(10)

[('is', 10),
 ('better', 8),
 ('than', 8),
 ('the', 6),
 ('to', 5),
 ('of', 3),
 ('although', 3),
 ('never', 3),
 ('be', 3),
 ('one', 3)]

In [55]:
class LRU(collections.OrderedDict):
    'Limit size, evicting the least recently looked-up key when full'

    def __init__(self, maxsize=4, *args, **kwds):
        self.maxsize = maxsize
        super().__init__(*args, **kwds)

    def __getitem__(self, key):
        value = super().__getitem__(key)
        self.move_to_end(key)
        return value

    def __setitem__(self, key, value):
        super().__setitem__(key, value)
        if len(self) > self.maxsize:
            oldest = next(iter(self))
            del self[oldest]

lru = LRU()
for v, k in enumerate(list('abcd')):
    lru[k] = v
print(lru)

LRU([('a', 0), ('b', 1), ('c', 2), ('d', 3)])


In [56]:
print(lru['a'])
print(lru)

0
LRU([('b', 1), ('c', 2), ('d', 3), ('a', 0)])


In [57]:
lru['e'] = 4
print(lru)

LRU([('c', 2), ('d', 3), ('a', 0), ('e', 4)])


In [78]:
da = {k:v for v, k in enumerate(list('abcd'))}
db = {k:v for v, k in enumerate(list('xyz'))}
print(da)
print(db)
cm = collections.ChainMap(da, db)
cm

{'a': 0, 'b': 1, 'c': 2, 'd': 3}
{'x': 0, 'y': 1, 'z': 2}


ChainMap({'a': 0, 'b': 1, 'c': 2, 'd': 3}, {'x': 0, 'y': 1, 'z': 2})

In [77]:
print(cm['a'], cm['x'])
db['x'] = 10
cm

100 10


ChainMap({'a': 100, 'b': 1, 'c': 2, 'd': 3}, {'x': 10, 'y': 1, 'z': 2})

In [76]:
cm['a'] = 100
cm

ChainMap({'a': 100, 'b': 1, 'c': 2, 'd': 3}, {'x': 10, 'y': 1, 'z': 2})

In [63]:
class WeakDict(dict):
    def __setitem__(self, key, value):
        super().__setitem__(key, value + value)
        
wd = WeakDict(a=1)
wd

{'a': 1}

In [64]:
wd['b'] = 2
wd

{'a': 1, 'b': 4}

In [65]:
wd.update({'c': 3})
wd

{'a': 1, 'b': 4, 'c': 3}

In [67]:
class MyDict(collections.UserDict):
    def __setitem__(self, key, value):
        super().__setitem__(key, value + value)

md = MyDict(a=1)
md['b'] = 2
md.update({'c': 3})
md

{'a': 2, 'b': 4, 'c': 6}

### Python中dict的映射方法

![](img/8-2.png)

### Immutable Mapping

Python3.3的types包提供了一个MappingProxyType类型，可以对一个dict提供只读的封装

In [79]:
from types import MappingProxyType
d = {'A': 1}
dp = MappingProxyType(d)
dp

mappingproxy({'A': 1})

In [80]:
dp['A']

1

In [81]:
dp['B'] = 2

TypeError: 'mappingproxy' object does not support item assignment

In [82]:
d['B'] = 2
dp

mappingproxy({'A': 1, 'B': 2})

## 散列表

散列表（Hash Table）也称哈希表，可以看做是数组结构的扩展，支持按键值或关键字随机访问数据

- 查找操作的时间复杂度为O(1)
- 把关键字转换成数组索引的函数成为散列函数或哈希函数
- 

![](img/8-3.jpg)

### Python中的hash函数

> An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.

- 原子的immutable类型，如str，bytes，数字类型都是可以hash的
- forzenset是可以hansh的，因为它包含的元素必须是可以hash的
- tuple只有当包含的元素都是可hash的元素时，整个tuple才是可以hash的
- 自定义的类型默认是可以hash的，其值和id()相关

In [72]:
tt = (1, 2, (30, 40))
hash(tt)

8027212646858338501

In [73]:
tl = (1, 2, [30, 40])
hash(tl)

TypeError: unhashable type: 'list'

In [74]:
tf = (1, 2, frozenset([30, 40]))
hash(tf)

985328935373711578

# Any Questions?