
# Why Asyncio
# What Asyncio

## Sync VS Async

# How Asyncio 

- key words 
    - 单线程
    - task future对象
    - event loop 对象
    - 预备状态
    - 等待状态
    - 列表
事实上，Asyncio 和其他 Python 程序一样，是单线程的，它只有一个主线程，但是可以进行多个不同的任务（task），这里的任务，就是特殊的 future 对象。这些不同的任务，被一个叫做 event loop 的对象所控制。


# How-to Asyncio


In [None]:

import asyncio
import aiohttp
import time

async def download_one(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            print('Read {} from {}'.format(resp.content_length, url))

async def download_all(sites):
    tasks = [asyncio.create_task(download_one(site)) for site in sites]
    await asyncio.gather(*tasks)

def main():
    sites = [
        'https://en.wikipedia.org/wiki/Portal:Arts',
        'https://en.wikipedia.org/wiki/Portal:History',
        'https://en.wikipedia.org/wiki/Portal:Society',
        'https://en.wikipedia.org/wiki/Portal:Biography',
        'https://en.wikipedia.org/wiki/Portal:Mathematics',
        'https://en.wikipedia.org/wiki/Portal:Technology',
        'https://en.wikipedia.org/wiki/Portal:Geography',
        'https://en.wikipedia.org/wiki/Portal:Science',
        'https://en.wikipedia.org/wiki/Computer_science',
        'https://en.wikipedia.org/wiki/Python_(programming_language)',
        'https://en.wikipedia.org/wiki/Java_(programming_language)',
        'https://en.wikipedia.org/wiki/PHP',
        'https://en.wikipedia.org/wiki/Node.js',
        'https://en.wikipedia.org/wiki/The_C_Programming_Language',
        'https://en.wikipedia.org/wiki/Go_(programming_language)'
    ]
    start_time = time.perf_counter()
    asyncio.run(download_all(sites))
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))
    
if __name__ == '__main__':
    main()


- Asyncio 的 root call，表示拿到 event loop，运行输入的 coro，直到它结束，最后关闭这个 event loop。
- 事实上，asyncio.run() 是 Python3.7+ 才引入的，相当于老版本的以下语句：

In [None]:

loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(coro)
finally:
    loop.close()

[event loop reference](https://docs.python.org/3/library/asyncio-eventloop.html)

# Asyncio 有缺陷吗？
- 库的支持问题
# 多线程还是 Asyncio
- 如果是 I/O bound，并且 I/O 操作很慢，需要很多任务 / 线程协同实现，那么使用 Asyncio 更合适。
- 如果是 I/O bound，但是 I/O 操作很快，只需要有限数量的任务 / 线程，那么使用多线程就可以了。
- 如果是 CPU bound，则需要使用多进程来提高程序运行效率。


# 思考题

In [5]:

import time
def cpu_bound(number):
    print(sum(i * i for i in range(number)))

def calculate_sums(numbers):
    for number in numbers:
        cpu_bound(number)

def main():
    start_time = time.perf_counter()  
    numbers = [10000000 + x for x in range(20)]
    calculate_sums(numbers)
    end_time = time.perf_counter()
    print('Calculation takes {} seconds'.format(end_time - start_time))
    
if __name__ == '__main__':
    main()

333333283333335000000
333333383333335000000
333333483333355000001
333333583333395000005
333333683333455000014
333333783333535000030
333333883333635000055
333333983333755000091
333334083333895000140
333334183334055000204
333334283334235000285
333334383334435000385
333334483334655000506
333334583334895000650
333334683335155000819
333334783335435001015
333334883335735001240
333334983336055001496
333335083336395001785
333335183336755002109
Calculation takes 9.770003832999976 seconds


In [16]:
%%writefile mp_pool_test_2.py
import time
import concurrent.futures

def cpu_bound(number):
    print(sum(i * i for i in range(number)))

# def calculate_sums(numbers):
#     with concurrent.futures.ProcessPoolExecutor(max_workers=10) as executor:
#         for number in numbers:
#             job_worker = executor.submit(cpu_bound(number))    

def calculate_sums(numbers):
    with concurrent.futures.ProcessPoolExecutor(max_workers=10) as executor:
        executor.map(cpu_bound ,numbers)               
def main():
    start_time = time.perf_counter()  
    numbers = [10000000 + x for x in range(20)]
    calculate_sums(numbers)
    end_time = time.perf_counter()
    print('Calculation takes {} seconds'.format(end_time - start_time))
    
if __name__ == '__main__':
    main()

Overwriting mp_pool_test_2.py


In [17]:
!python mp_pool_test_2.py

333333283333335000000
333333383333335000000
333333483333355000001
333333783333535000030
333334183334055000204
333334083333895000140
333333683333455000014
333333883333635000055
333333583333395000005
333333983333755000091
333334383334435000385
333334683335155000819
333334983336055001496
333334483334655000506
333335083336395001785
333334283334235000285
333334583334895000650
333334883335735001240
333334783335435001015
333335183336755002109
Calculation takes 1.2973175829999999 seconds


In [11]:
%%writefile mp_pool_test.py
import multiprocessing
import time


def cpu_bound(number):
    return sum(i * i for i in range(number))


def find_sums(numbers):
    with multiprocessing.Pool() as pool:
        pool.map(cpu_bound, numbers)


if __name__ == "__main__":
    numbers = [10000000 + x for x in range(20)]
    start_time = time.time()
    find_sums(numbers)
    duration = time.time() - start_time
    print(f"Duration {duration} seconds")

Writing mp_pool_test.py


In [12]:
!python mp_pool_test.py

Duration 1.3228919506072998 seconds
