## 容器、可迭代对象和迭代器

- 容器:对象的集合
* 所有的容器都是可迭代的（iterable）
* 可迭代对象 Iterable 调用 iter() 函数，得到一个迭代器
  * 可以直接作用于for循环的对象
  * 集合数据类型，如list、tuple、dict、set、str
* 迭代器（iterator）通过 next() 函数来得到下一个元素，从而支持遍历
  * 可以被next()函数调用并不断返回下一个数据，直到没有数据时抛出StopIteration错误
  * 表示的是一个数据流,一个有序序列，但却不能提前知道序列的长度，只能不断通过next()函数实现按需计算下一个数据，所以Iterator的计算是惰性的，只有在需要返回下一个数据时它才会计算

In [None]:

# 生成器
# list、dict、str等不是Iterator Iterable, 变成Iterator可以使用iter()函数

from collections import Iterable, Iterator

isinstance('abc', Iterable)  # str是否可迭代
isinstance([1, 2, 3], Iterable)  # list是否可迭代
isinstance(123, Iterable)  # 整数是否可迭代

isinstance((x for x in range(10)), Iterator)  # True
isinstance([], Iterator)  # False
isinstance({}, Iterator)  # False
isinstance('abc', Iterator)  # False
isinstance(iter([]), Iterator)  # True
isinstance(iter('abc'), Iterator)  # True

# for循环本质上就是通过不断调用next()函数实现
for i, value in enumerate(['A', 'B', 'C']):
    print(i, value)

for x, y in [(1, 1), (2, 4), (3, 9)]:
    print(x, y)


In [1]:
def is_iterable(param):
    try:
        iter(param)
        return True
    except TypeError:
        return False


params = [
    1234,
    '1234',
    [1, 2, 3, 4],
    set([1, 2, 3, 4]),
    {1: 1, 2: 2, 3: 3, 4: 4},
    (1, 2, 3, 4)
]

for param in params:
    print('{} is iterable? {}'.format(param, is_iterable(param)))

1234 is iterable? False
1234 is iterable? True
[1, 2, 3, 4] is iterable? True
{1, 2, 3, 4} is iterable? True
{1: 1, 2: 2, 3: 3, 4: 4} is iterable? True
(1, 2, 3, 4) is iterable? True


## 生成器 generator

* 懒人版本的迭代器
* 语法：
  * 列表生成式的[]改成() (i for i in range(100000000))
  * 带yield的generator function
* 调用 next() 函数时才会生成下一个变量
  * 保存的是算法，每次调用next(g)，就计算出g的下一个元素的值，直到计算到最后一个元素
  * 在循环过程中不断调用yield，就会不断中断。当然要给循环设置一个条件来退出循环，不然就会产生一个无限数列出来
  * 没有更多的元素时，抛出StopIteration的错误
* 应用
    - 协程的一种重要实现方式
    - Python 3.5 引入 async await 语法糖后，生成器实现协程的方式就已经落后了


In [None]:

g = (x * x for x in range(10))

# next(g)

for n in g:
    print(n)


def fib(max):
    n, a, b = 0, 0, 1
    while n < max:
        yield b
        a, b = b, a + b
        n = n + 1
    return 'done'


g = fib(6)
while True:
    try:
        x = next(g)
        print('g:', x)
    except StopIteration as e:
        print('Generator return value:', e.value)
        break


def odd():
    print('step 1')
    yield 1
    print('step 2')
    yield (3)
    print('step 3')
    yield (5)


def triangles(n):
    for i in range(1, n + 1):
        if i == 1:
            a = [1]
        yield a
        b = (i + 1) * [1]
        for j in range(1, i):
            b[j] = a[j - 1] + a[j]
        a = b


n = 0
results = []
for t in triangles(10):
    print(t)
    results.append(t)
    n = n + 1
    if n == 10:
        break
if results == [
    [1],
    [1, 1],
    [1, 2, 1],
    [1, 3, 3, 1],
    [1, 4, 6, 4, 1],
    [1, 5, 10, 10, 5, 1],
    [1, 6, 15, 20, 15, 6, 1],
    [1, 7, 21, 35, 35, 21, 7, 1],
    [1, 8, 28, 56, 70, 56, 28, 8, 1],
    [1, 9, 36, 84, 126, 126, 84, 36, 9, 1]
]:
    print('测试通过!')
else:
    print('测试失败!')

In [2]:
import os
import psutil


def show_memory_info(hint):
    # 显示当前 python 程序占用的内存大小
    pid = os.getpid()
    p = psutil.Process(pid)

    info = p.memory_full_info()
    memory = info.uss / 1024. / 1024
    print('{} memory used: {} MB'.format(hint, memory))

In [5]:
def test_iterator():
    show_memory_info('initing iterator')
    # 迭代器声明
    list_1 = [i for i in range(100000000)]
    show_memory_info('after iterator initiated')
    print(sum(list_1))
    show_memory_info('after sum called')

def test_generator():
    show_memory_info('initing generator')
    # 生成器声明
    list_2 = (i for i in range(100000000))
    show_memory_info('after generator initiated')
    print(sum(list_2))
    show_memory_info('after sum called')

%time test_iterator()
print('---------')
%time test_generator()

initing iterator memory used: 63.265625 MB
after iterator initiated memory used: 3707.609375 MB
4999999950000000
after sum called memory used: 3707.609375 MB
CPU times: user 4.99 s, sys: 1.44 s, total: 6.43 s
Wall time: 6.69 s
---------
initing generator memory used: 63.453125 MB
after generator initiated memory used: 63.453125 MB
4999999950000000
after sum called memory used: 63.453125 MB
CPU times: user 5.29 s, sys: 29.6 ms, total: 5.32 s
Wall time: 5.5 s


In [6]:
# 验证 (1 + 2 + 3 + ... + n)^2 = 1^3 + 2^3 + 3^3 + ... + n^3
def generator(k):
    i = 1
    while True:
        yield i ** k
        i += 1


gen_1 = generator(1)
gen_3 = generator(3)


def get_sum(n):
    sum_1, sum_3 = 0, 0
    for i in range(n):
        next_1 = next(gen_1)
        next_3 = next(gen_3)
        print('next_1 = {}, next_3 = {}'.format(next_1, next_3))
        sum_1 += next_1
        sum_3 += next_3
    print(sum_1 * sum_1, sum_3)


get_sum(8)

next_1 = 1, next_3 = 1
next_1 = 2, next_3 = 8
next_1 = 3, next_3 = 27
next_1 = 4, next_3 = 64
next_1 = 5, next_3 = 125
next_1 = 6, next_3 = 216
next_1 = 7, next_3 = 343
next_1 = 8, next_3 = 512
1296 1296


In [7]:
# 给定一个 list 和一个指定数字，求这个数字在 list 中的位置
# 返回一个 Generator 对象，需要使用 list 转换为列表后，才能用 print 输出
def index_generator(L, target):
    for i, num in enumerate(L):
        if num == target:
            yield i

print(list(index_generator([1, 6, 2, 4, 5, 2, 8, 6, 3, 2], 2)))

[2, 5, 9]


### 给定两个序列，判定第一个是不是第二个的子序列

- 一个列表的元素在第二个列表中都按顺序出现，但是并不必挨在一起


In [9]:
# 常规算法是贪心算法。维护两个指针指向两个列表的最开始，然后对第二个序列一路扫过去，如果某个数字和第一个指针指的一样，那么就把第一个指针前进一步。第一个指针移出第一个序列最后一个元素的时候，返回 True，否则返回 False
def is_subsequence(a, b):
    b = iter(b)
    return all(i in b for i in a)


print(is_subsequence([1, 3, 5], [1, 2, 3, 4, 5]))
print(is_subsequence([1, 4, 3], [1, 2, 3, 4, 5]))

True
False


In [None]:
def is_subsequence(a, b):
    b = iter(b)
    print(b)

    gen = (i for i in a)
    print(gen)

    for i in gen:
        print(i)

    gen = ((i in b) for i in a)
    # (i in b) 等价
# while True:
#     val = next(b)
#     if val == i:
#         yield True

    print(gen)

    for i in gen:
        print(i)

    return all(((i in b) for i in a))


print(is_subsequence([1, 3, 5], [1, 2, 3, 4, 5]))

In [None]:
# next() 函数运行的时候，保存了当前的指针
b = (i for i in range(5))
# print(list(b))
print(2 in b)
print(4 in b)
print(3 in b)