# Python并发编程之Futures

## 区分并发（Concurrency）和并行（Parallelism）
并发：
- 多线程并发：多线程并发则由程序员管理并发处理的任务，这种并发方式可以方便地在线程间共享数据（前提是不能互斥）；<br/>

- 异步IO（asyncio）:异步本质上是单线程的，因为 IO 操作在很多时候会存在阻塞，异步就是在这种阻塞的时候，通过控制权的交换来实现多任务的。即**异步本质上是运行过程中的控制权的交换**。最典型的例子就是生产者消费者模型。<br/>

  - 异步这个概念在不同的地方有不同的说法，比如 python 里面叫做协程，内部通过生成器来实现控制权的交换。但是无论怎么称呼，异步这种并发方式都脱离不了控制权的交换这么一个事实。

并行：多进程并发即运行多个独立的程序，优势在于并发处理的任务都由操作系统管理，不足之处在于程序与各进程之间的通信和数据共享不方便；<br/>

区别：对于计算密集型程序，多进程并发优于多线程并发。计算密集型程序指的程序的运行时间大部分消耗在CPU的运算处理过程，而硬盘和内存的读写消耗的时间很短；相对地，IO密集型程序指的则是程序的运行时间大部分消耗在硬盘和内存的读写上，CPU的运算时间很短。

## 单线程与多线程性能比较

In [4]:
# 单线程
import time

import requests


def download_one(url):
    resp = requests.get(url)
    print("Read {} from {}".format(len(resp.content), url))


def download_all(sites):
    for site in sites:
        download_one(site)


def main():
    sites = [
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143655",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143656",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143657",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143658",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143659",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143660",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143661",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143662",
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print("Download {} sites in {} seconds".format(len(sites), end_time - start_time))


if __name__ == "__main__":
    main()

Read 62788 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143655
Read 62788 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143656
Read 62788 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143657
Read 62788 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143658
Read 62788 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143659
Read 62788 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143660
Read 62788 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143661
Read 62788 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143662
Download 8 sites in 6.045021199999994 seconds


In [3]:
# 多线程版本
import concurrent.futures
import time

import requests


def download_one(url):
    resp = requests.get(url)
    print("Read {} from {}".format(len(resp.content), url))

# 多线程处理
def download_all(sites):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        # map与python内置map用法类似
        executor.map(download_one, sites)

def main():
    sites = [
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143655",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143656",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143657",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143658",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143659",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143660",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143661",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143662",
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print("Download {} sites in {} seconds".format(len(sites), end_time - start_time))


if __name__ == "__main__":
    main()

Read 62788 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143656
Read 62788 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143658
Read 62788 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143655
Read 62788 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143657
Read 62788 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143659
Read 62788 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143660
Read 62788 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143662
Read 62788 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143661
Download 8 sites in 1.0352042999999966 seconds


In [6]:
# 多进程版本

import concurrent.futures
import time

import requests


def download_one(url):
    resp = requests.get(url)
    print("Read {} from {}".format(len(resp.content), url))

# 多进程处理
def download_all(sites):
    # 省略参数workers，系统会自动返回CPU的数量作为可以调用的进程数。
    with concurrent.futures.ProcessPoolExecutor() as executor:
        # map与python内置map用法类似
        executor.map(download_one, sites)

def main():
    sites = [
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143655",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143656",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143657",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143658",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143659",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143660",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143661",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143662",
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print("Download {} sites in {} seconds".format(len(sites), end_time - start_time))


if __name__ == "__main__":
    main()

Download 8 sites in 0.9625254999999697 seconds


In [1]:
# 另一个多线程版本
import concurrent.futures
import time

import requests


def download_one(url):
    resp = requests.get(url)
    print("Read {} from {}".format(len(resp.content), url))

# 多线程处理
def download_all(sites):
	with concurrent.futures.ThreadPoolExecutor() as executor:
		to_do = []
		for site in sites:
			future = executor.submit(download_one, site)
			to_do.append(future)
		for future in concurrent.futures.as_completed(to_do):
			future.result()

def main():
    sites = [
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143655",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143656",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143657",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143658",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143659",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143660",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143661",
        "http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143662",
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print("Download {} sites in {} seconds".format(len(sites), end_time - start_time))


if __name__ == "__main__":
    main()

Read 62970 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143661
Read 62970 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143658
Read 62970 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143662
Read 62970 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143657
Read 62970 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143656
Read 62970 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143660
Read 62970 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143655
Read 62970 from http://www.dapenti.com/blog/readforwx.asp?name=xilei&id=143659
Download 8 sites in 0.7090143000000353 seconds
