# Concurrency, parallelism, and asynchronous I/O

## Concurrency

> the appearance of doing more than one thing at a time (can be time-sliced)

- Easy to do with the `threading` or `multiprocessing` libraries in Python
- `libevent`, `gevent`, etc. provide concurrency as well

## Parallelism

> *actually* doing more than one thing at a time (multi-core/hyperthreading/distributed)

- The GIL prevents this in many *threaded* environments (**including** `libevent`, `gevent`, etc.)

## Asynchronous programming

> programming style where rather than blocking on I/O, we find something useful to do, and "come back" to the I/O later

- `twisted` did this with reactors and callbacks
- `libevent`, `gevent`, et. al. are *implicitly asynchronous* (things that would block in a thread instead yield to
an **event loop**, which finds something useful to do
- In Py3, (particularly 3.6+), we have an *explicitly asynchronous* style we can use (and 3.7 added `async`/`await` syntax to support it)

# Raise StopIteration(value)

In [None]:
def mygen():
    if False:
        yield 'wait on data'   # event to wait on
    return 'Something'         # actual return value

In [None]:
gen = mygen()
gen

In [None]:
next(gen)

In [None]:
def myprint():
    value = yield from mygen()  # data = yield from socket.recv(...)
    print('Value was', value)

In [None]:
for event in myprint():
    print('got event', event)

In [None]:
import sys
sys.version_info

Asyncio in Py3.4-3.5

In [None]:
%%file data/asyncio-examples/asyncio-old.py
import asyncio
import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s %(levelname)s:%(name)s:%(message)s')

log = logging.getLogger()

def main():
    loop = asyncio.get_event_loop()    
    loop.run_until_complete(asyncio.gather(coroutine_1(), coroutine_2()))
    #loop.run_until_complete(asyncio.gather(coroutine_2(), coroutine_1()))
    
@asyncio.coroutine
def coroutine_1():
    log.info('coroutine_1 is active on the event loop')

    log.info('coroutine_1 yielding control. Going to be blocked for 4 seconds')
    yield from asyncio.sleep(4)
    # data = yield from async_aware_socket.recv(100)

    log.info('coroutine_1 resumed. coroutine_1 exiting')
    

@asyncio.coroutine
def coroutine_2():
    log.info('coroutine_2 is active on the event loop')

    log.info('coroutine_2 yielding control. Going to be blocked for 5 seconds')
    yield from asyncio.sleep(5)

    log.info('coroutine_2 resumed. coroutine_2 exiting')
    

if __name__ == '__main__':
    main()

In [None]:
!python data/asyncio-examples/asyncio-old.py

Asyncio with `async/await`

In [None]:
%%file data/asyncio-examples/asyncio-new.py
import asyncio
import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s %(levelname)s:%(name)s:%(message)s')

log = logging.getLogger()

def main():
    loop = asyncio.get_event_loop()    
    loop.run_until_complete(asyncio.gather(coroutine_1(), coroutine_2()))
    
    
async def coroutine_1():
    log.info('coroutine_1 is active on the event loop')

    log.info('coroutine_1 yielding control. Going to be blocked for 4 seconds')
    value = await asyncio.sleep(4)

    log.info('coroutine_1 resumed. coroutine_1 exiting with %s', value)
    

async def coroutine_2():
    log.info('coroutine_2 is active on the event loop')

    log.info('coroutine_2 yielding control. Going to be blocked for 5 seconds')
    await asyncio.sleep(5)

    log.info('coroutine_2 resumed. coroutine_2 exiting')
    

if __name__ == '__main__':
    main()

In [None]:
!python data/asyncio-examples/asyncio-new.py

Slightly more complex: simple web crawler

In [None]:
!pip install aiohttp-requests beautifulsoup4

In [None]:
from urllib.parse import urljoin, splittype
urljoin('https://www.python.org', 'email:edward.fine@afinepoint.net')

In [None]:
urljoin('https://www.python.org/jobs/', 'Atlanta')

In [None]:
from urllib.parse import urljoin, splittype

import bs4
from aiohttp_requests import requests

async def get_links(url):
    response = await requests.get(url)
    if 'text/html' not in response.headers.get('content-type', 'text/html'):
        return
    text = await response.text()
    soup = bs4.BeautifulSoup(text, 'html.parser')
    hrefs = (a.attrs.get('href') for a in soup.find_all('a'))
    hrefs = (href for href in hrefs if href)
    hrefs = (urljoin(url, href) for href in hrefs)
    hrefs = (href for href in hrefs if splittype(href)[0] in ('http', 'https'))
    hrefs = (href.split('#')[0] for href in hrefs)
    return hrefs

In [None]:
import re
import asyncio
from urllib.parse import urlparse

hrefs_seen = set()
is_python = re.compile(r'www\.python\.org')
queue = asyncio.Queue()

def valid_host(href):
    pr = urlparse(href)
    return is_python.search(pr.netloc)

async def enqueue_url(url):
    if url not in hrefs_seen and valid_host(url):
        hrefs_seen.add(url)
        await queue.put(url)
        
async def handle_page(url):
    print(f'Handling {url}')
    for link in await get_links(url):
        await enqueue_url(link)    
        
async def crawl():
    while len(hrefs_seen) < 2000:
        url = await queue.get()
        await handle_page(url)
    print('I have done enough!')

In [None]:
ROOT = 'https://www.python.org'

In [None]:
#await handle_page(ROOT)
await queue.put(ROOT)

In [None]:
queue.qsize()

In [None]:
await asyncio.gather(crawl(), crawl(), crawl())

In [None]:
len(hrefs_seen)

In [None]:
queue.qsize()

In [None]:
while queue.qsize():
    print(await queue.get())

In [None]:
hrefs_seen

# Lab

Open [AsyncIO Lab](asyncio-lab.ipynb)