## python中的Future
* 在 Python 中，并发并不是指同一时刻有多个操作（thread、task）同时进行。相反，某个特定的时刻，它只允许有一个操作发生，只不过线程 / 任务之间会互相切换，直到完成
* Python 入了全局解释器锁，也就是同一时刻，只允许一个线程执行。当然，在执行 I/O 操作时，如果一个线程被 block 了，全局解释器锁便会被释放，从而让另一个线程能够继续执行。

In [2]:
# 串行下载
import requests
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))
    
def download_all(sites):
    for site in sites:
        download_one(site)

def main():
    sites = [
        'https://www.cnblogs.com/String-song/p/12794901.html',
        'https://www.cnblogs.com/String-song/p/16571636.html',
        'https://wenku.csdn.net/answer/8ajppvmpa3',
        'https://bobbyhadz.com/blog/couldnt-find-tree-builder-with-the-features-you-requested-lxml'
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))
    
if __name__ == '__main__':
    main()

# 输出


Read 21004 from https://www.cnblogs.com/String-song/p/12794901.html
Read 19856 from https://www.cnblogs.com/String-song/p/16571636.html
Read 130225 from https://wenku.csdn.net/answer/8ajppvmpa3
Read 81583 from https://bobbyhadz.com/blog/couldnt-find-tree-builder-with-the-features-you-requested-lxml
Download 4 sites in 1.282705800000258 seconds


In [6]:
# 使用线程池实现下载
import concurrent.futures
import requests
import threading
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))


def download_all(sites):
    # 线程池方式
    # with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # 进程池方式，以并行的方式运行，进一步提高效率，python会自动判断cpu的核心数，依据这个核心数来创建进程
    with concurrent.futures.ProcessPoolExecutor() as executor:
        executor.map(download_one, sites)

def main():
    sites = [
        'https://www.cnblogs.com/String-song/p/12794901.html',
        'https://www.cnblogs.com/String-song/p/16571636.html',
        'https://wenku.csdn.net/answer/8ajppvmpa3',
        'https://bobbyhadz.com/blog/couldnt-find-tree-builder-with-the-features-you-requested-lxml'
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

if __name__ == '__main__':
    main()

## 输出


Download 4 sites in 0.22935940000024857 seconds


In [20]:
# 使用future的版本
import concurrent.futures
import requests
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))

def download_all(sites):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # with concurrent.futures.ProcessPoolExecutor() as executor:
        to_do = []
        for site in sites:
            future = executor.submit(download_one, site)
            to_do.append(future)
            
        for future in concurrent.futures.as_completed(to_do):
            future.result()
def main():
    sites = [
        'https://www.cnblogs.com/String-song/p/12794901.html',
        'https://www.cnblogs.com/String-song/p/16571636.html',
        'https://wenku.csdn.net/answer/8ajppvmpa3',
        'https://bobbyhadz.com/blog/couldnt-find-tree-builder-with-the-features-you-requested-lxml'
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

if __name__ == '__main__':
    main()

# 输出

Read 19856 from https://www.cnblogs.com/String-song/p/16571636.html
Read 21005 from https://www.cnblogs.com/String-song/p/12794901.html
Read 120404 from https://wenku.csdn.net/answer/8ajppvmpa3
Read 0 from https://bobbyhadz.com/blog/couldnt-find-tree-builder-with-the-features-you-requested-lxml
Download 4 sites in 0.5521103000000949 seconds


In [19]:
# 使用async的版本
import nest_asyncio
import asyncio
import aiohttp
import time

nest_asyncio.apply()

async def download_one(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            print('Read {} from {}'.format(resp.content_length, url))

async def download_all(sites):
    tasks = [asyncio.create_task(download_one(site)) for site in sites]
    await asyncio.gather(*tasks)

def main():
    sites = [
        'https://www.cnblogs.com/String-song/p/12794901.html',
        'https://www.cnblogs.com/String-song/p/16571636.html',
        'https://wenku.csdn.net/answer/8ajppvmpa3',
        'https://bobbyhadz.com/blog/couldnt-find-tree-builder-with-the-features-you-requested-lxml'
    ]
    start_time = time.perf_counter()
    asyncio.run(download_all(sites))
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))
    
if __name__ == '__main__':
    main()

## 输出

Read None from https://www.cnblogs.com/String-song/p/12794901.html
Read None from https://www.cnblogs.com/String-song/p/16571636.html
Read None from https://wenku.csdn.net/answer/8ajppvmpa3
Read 0 from https://bobbyhadz.com/blog/couldnt-find-tree-builder-with-the-features-you-requested-lxml
Download 4 sites in 0.5369809000003443 seconds


### asyncio、多线程、多进程的选择
* 如果是 I/O bound，并且 I/O 操作很慢，需要很多任务 / 线程协同实现，那么使用 Asyncio 更合适。
* 如果是 I/O bound，但是 I/O 操作很快，只需要有限数量的任务 / 线程，那么使用多线程就可以了。
* 如果是 CPU bound，则需要使用多进程来提高程序运行效率。