### 非同步IO
參考資料：<br>
https://ithelp.ithome.com.tw/articles/10199385  <br>
https://www.jianshu.com/p/b5e347b3a17c <br>
https://www.maxlist.xyz/2020/03/29/python-coroutine/

In [9]:
import asyncio

#建立一個event loop
loop = asyncio.get_event_loop()

#定義協程(在 def 前加上async 即可)
async def example1():
    print('start example1')
    await asyncio.sleep(1)
    print('finish example1')

async def example2():
    print('start example2')
    print('finish example2')

#當有2個以上的協程，需要用task包裝起來做為一個任務列表
tasks =[loop.create_task(example1()),loop.create_task(example2())]

#執行loop
#若是只有單一協程，輸入為 loop.run_until_complete(example1())，誕這樣就沒有異步的必要
loop.run_until_complete(asyncio.wait(tasks))

RuntimeError: This event loop is already running

start example1
start example2
finish example2
finish example1


In [2]:
import asyncio
import time

#建立一個event loop
loop = asyncio.get_event_loop()

#定義協程(在 def 前加上async 即可)
async def example1():
    print('start example1')
    #比較time 和await的差別
    time.sleep(1)
    print('finish example1')

async def example2():
    print('start example2')
    print('finish example2')

#當有2個以上的協程，需要用task包裝起來做為一個任務列表
tasks =[asyncio.ensure_future(example1()),asyncio.ensure_future(example2())]

#執行loop
#若是只有單一協程，輸入為 loop.run_until_complete(example1())
loop.run_until_complete(asyncio.wait(tasks))

RuntimeError: This event loop is already running

start example1
finish example1
start example2
finish example2


若是使用time.sleep(1) <br>
代表這算程式的一部份(強制休眠1秒)，並不會切換到example2中。<br>
若是使用await asyncio.sleep(1) <br>
在停留的那一秒會切換到example2執行example2，執行完畢再回到example1接續執行。

### asyncio.create_task 和 asyncio.ensure_future 和 loop.create_task
資料參考：http://blog.sina.com.cn/s/blog_6262a50e0102wngq.html


### 作業目標
比較一下非同步爬蟲跟多線程爬蟲的差異是什麼？各自的優缺點為何？

Ans：非同步和多線程很像，只是為單線程。
多線程為各線程的I/O互相切換以節省等待時間。

老師的說法：<br>
程序使用多線程來解決網絡IO阻塞導致CPU空閒的問題 --> 所有地方皆可使用<br>
非同步 --> 同一core等待時間的有效利用同一網頁技術上多用到<br>
多線程爬蟲 是一種 實作 非同步爬蟲 的方法，但非同步爬蟲 也可以用其他方式達到。

### loop.close()
在jupyter中使用loop.close()會報錯，但是正常環境使用下應該要加loop.close()

非同步爬蟲：<br>
https://www.maxlist.xyz/2020/04/05/async-python-crawler-snippets/ <br>
https://www.learncodewithmike.com/2020/09/python-asynchronous-scraper-using-asyncio-and-aiohttp.html <br>

### 多線程和非同步(異步)爬蟲時間比較

In [2]:
#使用多線程
import threading
import time
import requests

url01 = 'https://www.python.org/'
url02 = 'https://www.cupoy.com/home'
url03 = 'https://www.postcrossing.com/'
url04 = 'https://www.codewars.com/'

url_list = [url01,url02,url03,url04]

def get_html(url):
    res = requests.get(url)
    print('%s done.' %(url))

threading_list = list()
for url in url_list:
    threading_list.append(threading.Thread(target=get_html,args=(url,)))
    
starttime02 =time.time()
for threading in threading_list:
    threading.start()
    
for threading in threading_list:
    threading.join()
    
finishtime02 = time.time()
print('totally use %f seconds' %(finishtime02-starttime02))

https://www.python.org/ done.
https://www.cupoy.com/home done.
https://www.codewars.com/ done.
https://www.postcrossing.com/ done.
totally use 1.560089 seconds


aiohttp.ClientSession() as session <br>
session.get(url) 的感覺很像 requests.get(url) 

In [10]:
import asyncio
import aiohttp
from bs4 import BeautifulSoup

#讀取網頁內容並返回
async def fetch(session,url):
    async with session.get(url) as response:
        html_body = await response.text()
        soup = BeautifulSoup(html_body,'lxml')
        print(soup)


async def main():
    async with aiohttp.ClientSession() as session:
        html = await fetch(session,'https://www.cupoy.com/home')
       

loop = asyncio.get_event_loop()
loop.run_until_complete(main())


RuntimeError: This event loop is already running

<!DOCTYPE html>
<html><head><meta charset="utf-8"/><meta content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=0,shrink-to-fit=no" name="viewport"/><meta content="IE=edge" http-equiv="X-UA-Compatible"/><meta content="zh-tw" http-equiv="Content-Language"/><meta content="Cupoy - 為你探索世界的新知" property="og:title"/><meta content="Cupoy致力推廣專業與深度兼具的新知，建立業界專家與學習者間的溝通橋樑，打造翻轉教育的社群共學模式，量身規劃各種新知學習地圖，豐富個人實力與履歷亮點。" property="og:description"/><meta content="https://www.cupoy.com/images/landing1200-630.jpg" property="og:image"/><meta content="https://www.cupoy.com/" property="og:url"/><meta content="website" property="og:type"/><meta content="Cupoy致力推廣專業與深度兼具的新知，建立業界專家與學習者間的溝通橋樑，打造翻轉教育的社群共學模式，量身規劃各種新知學習地圖，豐富個人實力與履歷亮點。" name="description"/><meta content="" property="keywords"/><meta content="app-id=597799429" name="apple-itunes-app"/><meta content="" property="cp:newsitemid"/><meta content="" property="cp:cumatrixid"/><meta content="" property="cp:cumatrixitemid"/><meta content="#E624

In [30]:
#使用非同步爬蟲
import aiohttp
import asyncio
import time

url01 = 'https://www.python.org/'
url02 = 'https://www.cupoy.com/home'
url03 = 'https://www.postcrossing.com/'
url04 = 'https://www.codewars.com/'

url_list = [url01,url02,url03,url04]

async def fetch(url,session):
    async with session.get(url) as response:
        html_body = await response.text()
        print(url,' done')
        await asyncio.sleep(2)
        print(url,' OK')

async def main():
    async with aiohttp.ClientSession() as session:
        tasks = [asyncio.create_task(fetch(url, session)) for url in url_list]  # 建立任務清單
        await asyncio.gather(*tasks)  # 打包任務清單及執行

start_time = time.time()
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
print('total use %f seconds' %(time.time()-start_time))

RuntimeError: This event loop is already running

https://www.python.org/  done
https://www.cupoy.com/home  done
https://www.codewars.com/  done
https://www.postcrossing.com/  done
https://www.python.org/  OK
https://www.cupoy.com/home  OK
https://www.codewars.com/  OK
https://www.postcrossing.com/  OK
