# Concurrency, parallelism, and asynchronous I/O

## Concurrency

> the appearance of doing more than one thing at a time (can be time-sliced)

- Easy to do with the `threading` or `multiprocessing` libraries in Python
- `libevent`, `gevent`, etc. provide concurrency as well

## Parallelism

> *actually* doing more than one thing at a time (multi-core/hyperthreading/distributed)

- The GIL prevents this in many *threaded* environments (**including** `libevent`, `gevent`, etc.)

## Asynchronous programming

> programming style where rather than blocking on I/O, we find something useful to do, and "come back" to the I/O later

- `twisted` did this with reactors and callbacks
- `libevent`, `gevent`, et. al. are *implicitly asynchronous* (things that would block in a thread instead yield to
an **event loop**, which finds something useful to do
- In Py3, (particularly 3.6+), we have an *explicitly asynchronous* style we can use (and 3.7 added `async`/`await` syntax to support it)

# Raise StopIteration(value)

In [1]:
def mygen():
    if False:
        yield 'wait on data'
    return 'Something'

In [2]:
gen = mygen()
gen

<generator object mygen at 0x7f9a986ad580>

In [3]:
next(gen)

StopIteration: Something

In [4]:
def myprint():
    value = yield from mygen()  # data = yield from socket.recv(...)
    print('Value was', value)

In [5]:
for event in myprint():
    print('got event', event)

Value was Something


In [6]:
import sys
sys.version_info

sys.version_info(major=3, minor=8, micro=5, releaselevel='final', serial=0)

Asyncio in Py3.4-3.5

In [7]:
%%file data/asyncio-examples/asyncio-old.py
import asyncio
import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s %(levelname)s:%(name)s:%(message)s')

log = logging.getLogger()

def main():
    loop = asyncio.get_event_loop()    
    loop.run_until_complete(asyncio.gather(coroutine_1(), coroutine_2()))
    #loop.run_until_complete(asyncio.gather(coroutine_2(), coroutine_1()))
    
@asyncio.coroutine
def coroutine_1():
    log.info('coroutine_1 is active on the event loop')

    log.info('coroutine_1 yielding control. Going to be blocked for 4 seconds')
    yield from asyncio.sleep(4)
    # data = yield from async_aware_socket.recv(100)

    log.info('coroutine_1 resumed. coroutine_1 exiting')
    

@asyncio.coroutine
def coroutine_2():
    log.info('coroutine_2 is active on the event loop')

    log.info('coroutine_2 yielding control. Going to be blocked for 5 seconds')
    yield from asyncio.sleep(5)

    log.info('coroutine_2 resumed. coroutine_2 exiting')
    

if __name__ == '__main__':
    main()

Overwriting data/asyncio-examples/asyncio-old.py


In [8]:
!python data/asyncio-examples/asyncio-old.py

  def coroutine_1():
  def coroutine_2():
2021-11-01 12:56:54,004 INFO:root:coroutine_1 is active on the event loop
2021-11-01 12:56:54,004 INFO:root:coroutine_1 yielding control. Going to be blocked for 4 seconds
2021-11-01 12:56:54,004 INFO:root:coroutine_2 is active on the event loop
2021-11-01 12:56:54,004 INFO:root:coroutine_2 yielding control. Going to be blocked for 5 seconds
2021-11-01 12:56:58,008 INFO:root:coroutine_1 resumed. coroutine_1 exiting
2021-11-01 12:56:59,005 INFO:root:coroutine_2 resumed. coroutine_2 exiting


Asyncio with `async/await`

In [9]:
%%file data/asyncio-examples/asyncio-new.py
import asyncio
import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s %(levelname)s:%(name)s:%(message)s')

log = logging.getLogger()

def main():
    loop = asyncio.get_event_loop()    
    loop.run_until_complete(asyncio.gather(coroutine_1(), coroutine_2()))
    
    
async def coroutine_1():
    log.info('coroutine_1 is active on the event loop')

    log.info('coroutine_1 yielding control. Going to be blocked for 4 seconds')
    value = await asyncio.sleep(4)

    log.info('coroutine_1 resumed. coroutine_1 exiting with %s', value)
    

async def coroutine_2():
    log.info('coroutine_2 is active on the event loop')

    log.info('coroutine_2 yielding control. Going to be blocked for 5 seconds')
    await asyncio.sleep(5)

    log.info('coroutine_2 resumed. coroutine_2 exiting')
    

if __name__ == '__main__':
    main()

Overwriting data/asyncio-examples/asyncio-new.py


In [10]:
!python data/asyncio-examples/asyncio-new.py

2021-11-01 12:59:24,503 INFO:root:coroutine_1 is active on the event loop
2021-11-01 12:59:24,503 INFO:root:coroutine_1 yielding control. Going to be blocked for 4 seconds
2021-11-01 12:59:24,503 INFO:root:coroutine_2 is active on the event loop
2021-11-01 12:59:24,503 INFO:root:coroutine_2 yielding control. Going to be blocked for 5 seconds
2021-11-01 12:59:28,507 INFO:root:coroutine_1 resumed. coroutine_1 exiting with None
2021-11-01 12:59:29,505 INFO:root:coroutine_2 resumed. coroutine_2 exiting


Slightly more complex: simple web crawler

In [11]:
!pip install aiohttp-requests beautifulsoup4

Looking in links: /home/rick446/src/wheelhouse
You should consider upgrading via the '/home/rick446/.virtualenvs/classes/bin/python -m pip install --upgrade pip' command.[0m


In [25]:
from urllib.parse import urljoin, splittype

import bs4
from aiohttp_requests import requests

async def get_links(url):
    response = await requests.get(url, ssl=False)
    text = await response.text()
    soup = bs4.BeautifulSoup(text, 'html.parser')
    hrefs = (a.attrs.get('href') for a in soup.find_all('a'))
    hrefs = (href for href in hrefs if href)
    hrefs = (urljoin(url, href) for href in hrefs)
    hrefs = (href for href in hrefs if splittype(href)[0] in ('http', 'https'))
    hrefs = (href.split('#')[0] for href in hrefs)
    return hrefs

In [26]:
import re
import asyncio
from urllib.parse import urlparse

hrefs_seen = set()
is_python = re.compile(r'www\.python\.org')
queue = asyncio.Queue()

def valid_host(href):
    pr = urlparse(href)
    return is_python.search(pr.netloc)

async def enqueue_url(url):
    if url not in hrefs_seen and valid_host(url):
        hrefs_seen.add(url)
        await queue.put(url)
        
async def handle_page(url):
    print(f'Handling {url}')
    for link in await get_links(url):
        await enqueue_url(link)    
        
async def crawl():
    try:
        while len(hrefs_seen) < 200:
            url = queue.get_nowait()
            await handle_page(url)
    except asyncio.QueueEmpty:
        print('No more urls!')

In [14]:
ROOT = 'https://www.python.org'

In [15]:
#await handle_page(ROOT)
await queue.put(ROOT)

In [16]:
queue.qsize()

1

In [17]:
await asyncio.gather(crawl(), crawl(), crawl())

Handling https://www.python.org
No more urls!
No more urls!
Handling https://www.python.org
Handling https://www.python.org/
Handling https://www.python.org/psf-landing/


  hrefs = (href for href in hrefs if splittype(href)[0] in ('http', 'https'))


Handling https://www.python.org/jobs/


[None, None, None]

In [18]:
len(hrefs_seen)

273

In [19]:
queue.qsize()

269

In [20]:
while queue.qsize():
    print(await queue.get())

https://www.python.org/community-landing/
https://www.python.org/community/irc/
https://www.python.org/about/
https://www.python.org/about/apps/
https://www.python.org/about/quotes/
https://www.python.org/about/gettingstarted/
https://www.python.org/about/help/
https://www.python.org/downloads/
https://www.python.org/downloads/source/
https://www.python.org/downloads/windows/
https://www.python.org/downloads/macos/
https://www.python.org/download/other/
https://www.python.org/download/alternatives
https://www.python.org/doc/
https://www.python.org/doc/av
https://www.python.org/doc/essays/
https://www.python.org/community/
https://www.python.org/community/survey
https://www.python.org/community/diversity/
https://www.python.org/community/lists/
https://www.python.org/community/forums/
https://www.python.org/psf/annual-report/2021/
https://www.python.org/community/workshops/
https://www.python.org/community/sigs/
https://www.python.org/community/logos/
https://www.python.org/community/me

In [21]:
hrefs_seen

{'https://www.python.org',
 'https://www.python.org/',
 'https://www.python.org/about/',
 'https://www.python.org/about/apps',
 'https://www.python.org/about/apps/',
 'https://www.python.org/about/gettingstarted/',
 'https://www.python.org/about/help/',
 'https://www.python.org/about/legal/',
 'https://www.python.org/about/quotes/',
 'https://www.python.org/accounts/login/?next=/jobs/create/',
 'https://www.python.org/accounts/login/?next=/jobs/mine/',
 'https://www.python.org/blogs/',
 'https://www.python.org/community-landing/',
 'https://www.python.org/community/',
 'https://www.python.org/community/awards',
 'https://www.python.org/community/awards/',
 'https://www.python.org/community/diversity/',
 'https://www.python.org/community/forums/',
 'https://www.python.org/community/irc/',
 'https://www.python.org/community/jobs/howto/',
 'https://www.python.org/community/lists/',
 'https://www.python.org/community/logos/',
 'https://www.python.org/community/merchandise/',
 'https://www.

In [22]:
len(hrefs_seen)

273

# Lab

Open [AsyncIO Lab](asyncio-lab.ipynb)