# 协程是实现并发编程的一种方式

NGINX 事件循环
事件循环启动一个统一的调度器，让调度器来决定一个时刻去运行哪个任务，于是省却了多线程中启动线程、管理线程、同步锁等各种开销。同一时期的 NGINX，在高并发下能保持低资源低消耗高性能，相比 Apache 也支持更多的并发连接。

# 从一个爬虫说起


In [None]:

import time

def crawl_page(url):
    print('crawling {}'.format(url))
    sleep_time = int(url.split('_')[-1])
    time.sleep(sleep_time)
    print('OK {}'.format(url))

def main(urls):
    for url in urls:
        crawl_page(url)

%time main(['url_1', 'url_2', 'url_3', 'url_4'])


In [None]:
%%writefile coroutine_1.py

import asyncio
import time
def my_profile(func):
    def wrapper(url):
        starttime = time.time()
        func(url)
        endtime = time.time()
        du = endtime -starttime
        return du        
    return wrapper

async def crawl_page(url):
    print('crawling {}'.format(url))
    sleep_time = int(url.split('_')[-1])
    await asyncio.sleep(sleep_time)
    print('OK {}'.format(url))
    
# @my_profile
async def main(urls):
    for url in urls:
        await crawl_page(url)
        
starttime = time.time()
asyncio.run(main(['url_1', 'url_2', 'url_3', 'url_4']))
endtime = time.time()
du = endtime -starttime
print('du is {}'.format(du))


In [None]:
!python coroutine_1.py

- async 修饰词声明异步函数，于是，这里的 crawl_page 和 main 都变成了异步函数。而调用异步函数，我们便可得到一个协程对象（coroutine object）。
## 执行协程的多种方法
- 介绍三种
    - await
    await 执行的效果，和 Python 正常执行是一样的，也就是说程序会阻塞在这里，进入被调用的协程函数，执行完毕返回后再继续
    - asyncio.create_task() 来创建任务
    - asyncio.run 来触发运行 Python 3.7 之后才有的特性，可以让 Python 的协程接口变得非常简单 不用去理会事件循环怎么定义和怎么使用的问题
        一个非常好的编程规范是，asyncio.run(main()) 作为主程序的入口函数，在程序运行周期内，只调用一次 asyncio.run。

## 任务（Task）

In [None]:
%%writefile coroutine_task_1.py

import time
import asyncio

async def crawl_page(url):
    print('crawling {}'.format(url))
    sleep_time = int(url.split('_')[-1])
    await asyncio.sleep(sleep_time)
    print('OK {}'.format(url))

async def main(urls):
    tasks = [asyncio.create_task(crawl_page(url)) for url in urls]
    for task in tasks:
        await task

starttime = time.time()
asyncio.run(main(['url_1', 'url_2', 'url_3', 'url_4']))
endtime = time.time()
du = endtime -starttime
print('du is {}'.format(du))




In [None]:
!python coroutine_task_1.py

In [None]:
%%writefile coroutine_task_2.py

import asyncio
import time
async def crawl_page(url):
    print('crawling {}'.format(url))
    sleep_time = int(url.split('_')[-1])
    await asyncio.sleep(sleep_time)
    print('OK {}'.format(url))

async def main(urls):
    tasks = [asyncio.create_task(crawl_page(url)) for url in urls]
    await asyncio.gather(*tasks)

starttime = time.time()
asyncio.run(main(['url_1', 'url_2', 'url_3', 'url_4']))
endtime = time.time()
du = endtime -starttime
print('du is {}'.format(du))


- *tasks 解包列表，将列表变成了函数的参数；与之对应的是， ** dict 将字典变成了函数的参数

In [None]:
!python coroutine_task_2.py

In [None]:
%%writefile coroutine_task_3.py

import asyncio
import time

async def worker_1():
    print('worker_1 start')
    await asyncio.sleep(3)
    print('worker_1 done')

async def worker_2():
    print('worker_2 start')
    await asyncio.sleep(2)
    print('worker_2 done')

async def main():
    task1 = asyncio.create_task(worker_1())
    task2 = asyncio.create_task(worker_2())
    print('before await')
    await task1
    print('awaited worker_1')
    await task2
    print('awaited worker_2')
starttime = time.time()
asyncio.run(main())
endtime = time.time()
du = endtime -starttime
print('du is {}'.format(du))

In [None]:
!python coroutine_task_3.py

### 协程任务限定运行时间
### 协程运行时出现错误处理
- return_exceptions=True 不会throw到执行层

In [None]:
%%writefile coroutine_task_4.py

import asyncio

async def worker_1():
    await asyncio.sleep(1)
    return 1

async def worker_2():
    await asyncio.sleep(2)
    return 2 / 0

async def worker_3():
    await asyncio.sleep(3)
    return 3

async def main():
    task_1 = asyncio.create_task(worker_1())
    task_2 = asyncio.create_task(worker_2())
    task_3 = asyncio.create_task(worker_3())

    await asyncio.sleep(2)
    task_3.cancel()

    res = await asyncio.gather(task_1, task_2, task_3, return_exceptions=True)
    print(res)

asyncio.run(main())



In [None]:
!python coroutine_task_4.py

# 生产者和消费者模型

In [None]:
%%writefile coroutine_producer_customer_model.py

import asyncio
import random
import time


async def consumer(queue, id):
    while True:
        val = await queue.get()
        print('{} get a val: {}'.format(id, val))
        await asyncio.sleep(1)

async def producer(queue, id):
    for i in range(5):
        val = random.randint(1, 10)
        await queue.put(val)
        print('{} put a val: {}'.format(id, val))
        await asyncio.sleep(1)

async def main():
    queue = asyncio.Queue()

    consumer_1 = asyncio.create_task(consumer(queue, 'consumer_1'))
    consumer_2 = asyncio.create_task(consumer(queue, 'consumer_2'))

    producer_1 = asyncio.create_task(producer(queue, 'producer_1'))
    producer_2 = asyncio.create_task(producer(queue, 'producer_2'))

    await asyncio.sleep(10)
    consumer_1.cancel()
    consumer_2.cancel()
    
    await asyncio.gather(consumer_1, consumer_2, producer_1, producer_2, return_exceptions=True)


s = time.time()
asyncio.run(main())
e = time.time()
print(e-s)

In [None]:
!python coroutine_producer_customer_model.py

# 实战：豆瓣近日推荐电影爬虫

In [10]:

import requests
from bs4 import BeautifulSoup

def main():
    url = "https://movie.douban.com/cinema/later/beijing/"
    head={
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36',
    'Referer':'https://time.geekbang.org/column/article/101855',
    'Connection':'keep-alive'}
    init_page = requests.get(url,headers=head).content
    # init_page = requests.get(url).content
    # init_soup = BeautifulSoup(init_page, 'lxml')
    init_soup = BeautifulSoup(init_page, 'html.parser')

    all_movies = init_soup.find('div', id="showing-soon")
    for each_movie in all_movies.find_all('div', class_="item"):
        all_a_tag = each_movie.find_all('a')
        all_li_tag = each_movie.find_all('li')

        movie_name = all_a_tag[1].text
        url_to_fetch = all_a_tag[1]['href']
        movie_date = all_li_tag[0].text

        response_item = requests.get(url_to_fetch,headers=head).content
        # soup_item = BeautifulSoup(response_item, 'lxml')
        soup_item = BeautifulSoup(response_item, 'html.parser')
        img_tag = soup_item.find('img')

        print('{} {} {}'.format(movie_name, movie_date, img_tag['src']))

%time main()

阿凡达：水之道 12月16日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2884182275.jpg
谁偷了我的粉兔子 12月16日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2884307337.jpg
穿靴子的猫2 12月23日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2885032978.jpg
绑架游戏 12月23日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2869898539.jpg
唬胆特工 12月23日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2885052777.jpg
繁华将至 12月23日 https://img2.doubanio.com/view/photo/s_ratio_poster/public/p2628561183.jpg
龙马！新生网球王子 12月23日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2884562826.jpg
龙马精神 12月31日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2883750366.jpg
保你平安 12月31日 https://img2.doubanio.com/view/photo/s_ratio_poster/public/p2885144191.jpg
透明侠侣 12月31日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2876027489.jpg
绝望主夫 12月31日 https://img2.doubanio.com/view/photo/s_ratio_poster/public/p2884933443.jpg
女生规则 01月06日 https://img9.doub

In [18]:
%%writefile coroutine_crawl.py
import asyncio
import aiohttp
import time
from bs4 import BeautifulSoup
header={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36',
'Referer':'https://time.geekbang.org/column/article/101855',
'Connection':'keep-alive'}
async def fetch_content(url):
    async with aiohttp.ClientSession(
        headers=header, connector=aiohttp.TCPConnector(ssl=False)
    ) as session:
        async with session.get(url) as response:
            return await response.text()

async def main():
    url = "https://movie.douban.com/cinema/later/beijing/"

    init_page = await fetch_content(url)
    init_soup = BeautifulSoup(init_page, 'html.parser')

    movie_names, urls_to_fetch, movie_dates = [], [], []

    all_movies = init_soup.find('div', id="showing-soon")
    for each_movie in all_movies.find_all('div', class_="item"):
        all_a_tag = each_movie.find_all('a')
        all_li_tag = each_movie.find_all('li')

        movie_names.append(all_a_tag[1].text)
        urls_to_fetch.append(all_a_tag[1]['href'])
        movie_dates.append(all_li_tag[0].text)

    tasks = [fetch_content(url) for url in urls_to_fetch]
    pages = await asyncio.gather(*tasks)

    for movie_name, movie_date, page in zip(movie_names, movie_dates, pages):
        soup_item = BeautifulSoup(page, 'html.parser')
        img_tag = soup_item.find('img')

        print('{} {} {}'.format(movie_name, movie_date, img_tag['src']))

s = time.time()
asyncio.run(main())
e= time.time()
print(e-s)


Overwriting coroutine_crawl.py


In [19]:
!python coroutine_crawl.py

2136.22s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


阿凡达：水之道 12月16日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2884182275.jpg
谁偷了我的粉兔子 12月16日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2884307337.jpg
穿靴子的猫2 12月23日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2885032978.jpg
绑架游戏 12月23日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2869898539.jpg
唬胆特工 12月23日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2885052777.jpg
繁华将至 12月23日 https://img2.doubanio.com/view/photo/s_ratio_poster/public/p2628561183.jpg
龙马！新生网球王子 12月23日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2884562826.jpg
龙马精神 12月31日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2883750366.jpg
保你平安 12月31日 https://img2.doubanio.com/view/photo/s_ratio_poster/public/p2885144191.jpg
透明侠侣 12月31日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2876027489.jpg
绝望主夫 12月31日 https://img2.doubanio.com/view/photo/s_ratio_poster/public/p2884933443.jpg
女生规则 01月06日 https://img9.doub

# 协程如何实现回调函数
- task 对象调用 add_done_callback() 函数，即可绑定特定回调函数。回调函数接受一个 future 对象，可以通过 future.result() 来获取协程函数的返回值。

In [21]:
%%writefile coroutine_callback.py
import asyncio
import time
async def crawl_page(url):
    print('crawling {}'.format(url))
    sleep_time = int(url.split('_')[-1])
    await asyncio.sleep(sleep_time)
    return 'OK {}'.format(url)

async def main(urls):
    tasks = [asyncio.create_task(crawl_page(url)) for url in urls]
    for task in tasks:
        task.add_done_callback(lambda future: print('result: ', future.result()))
    await asyncio.gather(*tasks)

s = time.time()
asyncio.run(main(['url_1', 'url_2', 'url_3', 'url_4']))
e = time.time()
print(e-s)

Overwriting coroutine_callback.py


In [22]:
!python coroutine_callback.py

2585.09s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


crawling url_1
crawling url_2
crawling url_3
crawling url_4
result:  OK url_1
result:  OK url_2
result:  OK url_3
result:  OK url_4
4.00203800201416
