# 并发编程之Futures

并发（Concurrency）和并行（Parallelism）

- 并发，通过线程和任务之间互相切换的方式实现，但同一时刻，只允许有一个线程或任务执行。
- 并行，则是指多个进程同时执行。

在 Python 中，并发并不是指同一时刻有多个操作（thread、task）同时进行。相反，某个特定的时刻，它只允许有一个操作发生，只不过线程 / 任务之间会互相切换，直到完成。

这是因为 Python 的解释器不是线程安全的，为了解决由此带来的 race condition 等问题，Python 便引入了全局解释器锁——同一时刻，只允许一个线程执行。在执行 I/O 操作时，如果一个线程被 block 了，全局解释器锁便会被释放，从而让另一个线程能够继续执行。

并行，指的才是同一时刻、同时发生。Python 中的 multi-processing 便是这个意思。比如，电脑是 6 核处理器，那么在运行程序时，就可以强制 Python 开 6 个进程，同时执行，以加快运行速度。

- 并发通常应用于 I/O 操作频繁的场景，比如要从网站上下载多个文件，I/O 操作的时间可能会比 CPU 运行处理的时间长得多。
- 并行则更多应用于 CPU heavy 的场景，比如 MapReduce 中的并行计算，为了加快运行速度，一般会用多台机器、多个处理器来完成。

## 单线程与多线程性能比较

- 单线程版


In [1]:
# predefine URL
SITES = [
        'https://en.wikipedia.org/wiki/Portal:Arts',
        'https://en.wikipedia.org/wiki/Portal:History',
        'https://en.wikipedia.org/wiki/Portal:Society',
        'https://en.wikipedia.org/wiki/Portal:Biography',
        'https://en.wikipedia.org/wiki/Portal:Mathematics',
        'https://en.wikipedia.org/wiki/Portal:Technology',
        'https://en.wikipedia.org/wiki/Portal:Geography',
        'https://en.wikipedia.org/wiki/Portal:Science',
        'https://en.wikipedia.org/wiki/Computer_science',
        'https://en.wikipedia.org/wiki/Python_(programming_language)',
        'https://en.wikipedia.org/wiki/Java_(programming_language)',
        'https://en.wikipedia.org/wiki/PHP',
        'https://en.wikipedia.org/wiki/Node.js',
        'https://en.wikipedia.org/wiki/The_C_Programming_Language',
        'https://en.wikipedia.org/wiki/Go_(programming_language)'
    ]

In [2]:
import requests
import time
import copy

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))
    
def download_all(sites):
    for site in sites:
        download_one(site)

def main():
    sites = copy.copy(SITES)
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))
    
if __name__ == '__main__':
    main()


Read 185172 from https://en.wikipedia.org/wiki/Portal:Arts
Read 411041 from https://en.wikipedia.org/wiki/Portal:History
Read 252518 from https://en.wikipedia.org/wiki/Portal:Society
Read 330059 from https://en.wikipedia.org/wiki/Portal:Biography
Read 150866 from https://en.wikipedia.org/wiki/Portal:Mathematics
Read 260181 from https://en.wikipedia.org/wiki/Portal:Technology
Read 267523 from https://en.wikipedia.org/wiki/Portal:Geography
Read 164535 from https://en.wikipedia.org/wiki/Portal:Science
Read 368013 from https://en.wikipedia.org/wiki/Computer_science
Read 439878 from https://en.wikipedia.org/wiki/Python_(programming_language)
Read 328913 from https://en.wikipedia.org/wiki/Java_(programming_language)
Read 543304 from https://en.wikipedia.org/wiki/PHP
Read 185583 from https://en.wikipedia.org/wiki/Node.js
Read 60852 from https://en.wikipedia.org/wiki/The_C_Programming_Language
Read 323100 from https://en.wikipedia.org/wiki/Go_(programming_language)
Download 15 sites in 0.90872

- 多线程版本

In [3]:
import concurrent.futures
import requests
import time
import copy

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))

def download_all(sites):
    # let python decide the number of worker by no. cores
    # with concurrent.futures.ProcessPoolExecutor() as executor: 
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        executor.map(download_one, sites)

def main():
    sites = copy.copy(SITES)

    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

if __name__ == '__main__':
    main()


Read 252518 from https://en.wikipedia.org/wiki/Portal:Society
Read 150866 from https://en.wikipedia.org/wiki/Portal:Mathematics
Read 185172 from https://en.wikipedia.org/wiki/Portal:Arts
Read 330059 from https://en.wikipedia.org/wiki/Portal:Biography
Read 411041 from https://en.wikipedia.org/wiki/Portal:History
Read 260181 from https://en.wikipedia.org/wiki/Portal:Technology
Read 164535 from https://en.wikipedia.org/wiki/Portal:Science
Read 267523 from https://en.wikipedia.org/wiki/Portal:Geography
Read 368013 from https://en.wikipedia.org/wiki/Computer_science
Read 439878 from https://en.wikipedia.org/wiki/Python_(programming_language)
Read 60852 from https://en.wikipedia.org/wiki/The_C_Programming_LanguageRead 185583 from https://en.wikipedia.org/wiki/Node.js

Read 328913 from https://en.wikipedia.org/wiki/Java_(programming_language)
Read 543304 from https://en.wikipedia.org/wiki/PHP
Read 323100 from https://en.wikipedia.org/wiki/Go_(programming_language)
Download 15 sites in 0.27155

requests.get() 方法是线程安全的（thread-safe），因此在多线程的环境下，它也可以安全使用，并不会出现 race condition 的情况。

什么是 race condition ？竞态条件，也就是多线程的共享资源竞争。

## 什么是  Futures ？

Futures 模块位于 concurrent.futures 和 asyncio 中，它们都表示带有延迟的操作。Futures 会将处于等待状态的操作包裹起来放到队列中，这些操作的状态随时可以查询，它们的结果或是异常，也能够在操作完成后被获取。

作为用户，我们不用考虑如何去创建 Futures，这些 Futures 底层都会帮我们处理好。我们要做的，实际上是去 schedule 这些 Futures 的执行。

- Futures 中的 Executor 类

   当我们执行 executor.submit(func) 时，它便会安排里面的 func() 函数执行，并返回创建好的 future 实例，以便之后查询调用。

- Futures 中的方法：

   - done()，表示相对应的操作是否完成——True 表示完成，False 表示没有完成。

      要注意，done() 是 non-blocking 的，会立即返回结果。
      
   - add_done_callback(fn)，则表示 Futures 完成后，相对应的参数函数 fn，会被通知并执行调用。
   - result()，它表示当 future 完成后，返回其对应的结果或异常。
   - as_completed(fs)，则是针对给定的 future 迭代器 fs，在其完成后，返回完成后的迭代器。

- [参考文档](https://docs.python.org/3/library/concurrent.futures.html)
- 示例代码：


In [4]:
import concurrent.futures
import copy
import time
import requests

def download_one(url):
    print(f'--> To download {url}')
    resp = requests.get(url)
    content = resp.content
    print('Download finished')
    return (url, len(content))

def download_all_as_completed(sites):
    with concurrent.futures.ProcessPoolExecutor() as executor:
        to_do = []
        for site in sites:
            future = executor.submit(download_one, site)
            to_do.append(future)

        for future in concurrent.futures.as_completed(to_do):
            try:
                result = future.result()
            except Exception as exc:
                print('<xx> Threw an exception: %s' % (exc))
            else:
                print('<-- Read {} from {}'.format(result[1], result[0]))

def download_all_with_callback(sites):
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for site in sites:
            future = executor.submit(download_one, site)
            future.add_done_callback(done_callback)

def done_callback(future):
    try:
        result = future.result()
    except Exception as exc:
        print('<xx> Threw an exception: %s' % (exc))
    else:
        if result is not None:
            print('<-- Read {} from {}'.format(result[1], result[0]))

def main():
    sites = copy.copy(SITES)
    start_time = time.perf_counter()
    download_all_with_callback(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

if __name__ == '__main__':
    main()


--> To download https://en.wikipedia.org/wiki/Portal:Arts--> To download https://en.wikipedia.org/wiki/Portal:Society--> To download https://en.wikipedia.org/wiki/Portal:History--> To download https://en.wikipedia.org/wiki/Portal:Biography--> To download https://en.wikipedia.org/wiki/Portal:Geography--> To download https://en.wikipedia.org/wiki/Portal:Science--> To download https://en.wikipedia.org/wiki/Portal:Mathematics--> To download https://en.wikipedia.org/wiki/Computer_science--> To download https://en.wikipedia.org/wiki/Portal:Technology--> To download https://en.wikipedia.org/wiki/Python_(programming_language)--> To download https://en.wikipedia.org/wiki/Java_(programming_language)
--> To download https://en.wikipedia.org/wiki/PHP










Download finished
--> To download https://en.wikipedia.org/wiki/Node.jsDownload finished

<-- Read 185172 from https://en.wikipedia.org/wiki/Portal:Arts
<-- Read 252518 from https://en.wikipedia.org/wiki/Portal:Society--> To download https:/