# Async IO in Python: A Complete Walkthrough

Async IO is a concurrent programming design that has received dedicated support in Python, evolving rapidly from Python 3.4 through 3.7, and probably beyond.

<https://realpython.com/async-io-python/>

## 001. Synchronous Version

**Parallelism** consists of performing multiple operations at the same time. **Multiprocessing** is a means to effect parallelism, and it entails spreading tasks over a computer’s central processing units (CPUs, or cores). Multiprocessing is well-suited for CPU-bound tasks: tightly bound for loops and mathematical computations usually fall into this category.

**Concurrency** is a slightly broader term than parallelism. It suggests that multiple tasks have the ability to run in an overlapping manner. (There’s a saying that concurrency does not imply parallelism.)

**Threading** is a concurrent execution model whereby multiple threads take turns executing tasks. One process can contain multiple threads. Python has a complicated relationship with threading thanks to its GIL, but that’s beyond the scope of this article.

async IO is a single-threaded, single-process design: it uses **cooperative multitasking**. It has been said in other words that async IO gives a feeling of concurrency despite using a single thread in a single process. Coroutines (a central feature of async IO) can be scheduled concurrently, but they are not inherently concurrent.

In [1]:
import sys
from pathlib import Path


In [2]:
from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "all"

In [3]:
from faker import Faker
fake = Faker()

### 002.001 The async/await Syntax and Native Coroutines

Let’s take the immersive approach and write some async IO code. This short program is the Hello World of async IO but goes a long way towards illustrating its core functionality:

1. Create a function `count_sync`
    1. after printing, it waits synchronously 1 sec, then prints two
1. Create a function `main_sync`
    1. it runs `count_sync` in sequence 3 times
1. Create a function `count_async`
    1. after printing, it waits asynchronously 1 sec, then prints two
1. Create a function `main_async`
    1. it runs `count_async` asynchronously 3 times
1. The output should be One Two One ... for the sync version, taking 3 s. And One One ... for the async one, taking sec


In [4]:
import asyncio
import time

# this is needed to make asyncio run inside notebooks, without the
# This event loop is already running RunTimeError
import nest_asyncio
nest_asyncio.apply()

1
# def count_sync():
#     print("One Sync")
#     ...

2
# def main_sync():
#     ...

3
# async def count_async():
#     print("One Async")
#     ...

4
# async def main_async():
#     ...

5
# s = time.perf_counter()
# main_sync()
# elapsed = time.perf_counter() - s
# print(f"sync version executed in {elapsed:0.2f} seconds.")

# s = time.perf_counter()
# asyncio.run(main_async())
# elapsed = time.perf_counter() - s
# print(f"async version executed in {elapsed:0.2f} seconds.")

# solution


1

2

3

4

5

### 002.002 The Rules of Async IO

Here’s one example of how async IO cuts down on wait time: given a coroutine makerandom() that keeps producing random integers in the range [0, 10], until one of them exceeds a threshold, you want to let multiple calls of this coroutine not need to wait for each other to complete in succession. You can largely follow the patterns from the two scripts above, with slight changes

1. `makerandom` 
    1. should keep generating `i` until one it's greater than threshold.
    1. if it isn't, it should print f"{i} too low; retrying.",
    1. then sleep asynchronously for `i + 1` seconds, 
    1. the generate a new one
1. `main`
    1. should run 3 makerandom with idx in range 0 to 2 asynchronously
1. should gather the results


In [5]:
import asyncio
import random

# ANSI colors
c = (
    "\033[0m",   # End of color
    "\033[36m",  # Cyan
    "\033[91m",  # Red
    "\033[35m",  # Magenta
)

# f"{i} too low; retrying."

# this is needed to make asyncio run inside notebooks, without the
# This event loop is already running RunTimeError
import nest_asyncio
nest_asyncio.apply()

1
# async def makerandom(idx: int, threshold: int = 6) -> int:
#     threshold = 10 - idx - 1
#     print(c[idx + 1] + f"Initiated makerandom({idx=}, {threshold=})." + c[0])
#     i = random.randint(0, 10)
#     ...
#     print(c[idx + 1] + f"---> Finished: makerandom({idx}) == {i}" + c[0])
#     return i

2
# async def main():
#     ..
#     return res

3
# random.seed(444)
# r1, r2, r3 = asyncio.run(main())
# print(f"r1: {r1}, r2: {r2}, r3: {r3}")

# solution


1

2

3

### 002.003 Chaining Coroutines

A key feature of coroutines is that they can be chained together. (Remember, a coroutine object is awaitable, so another coroutine can await it.) This allows you to break programs into smaller, manageable, recyclable coroutines:

1. `load` 
    1. waits i secs, asynchronously
    1. returns i
1. `fire`
    1. waits i secs, asynchronously
    1. returns i plus the i from the following step as a tuple
1. `chain`
    1. runs load
    1. then feeds result to fire
1. `main`
    1. initiates an instance of `chain` for each arg, concurrently
1. finally
    1. runs all the jobs in `main`


In [6]:

import asyncio
import random
import time
import sys

# this is needed to make asyncio run inside notebooks, without the
# This event loop is already running RunTimeError
import nest_asyncio
nest_asyncio.apply()

1
# async def load(n: str) -> int:
#     i = random.randint(0, 10)
#     print(f"loading {n}: (will take {i} seconds)")
#     ...
#     print(f"{n} loaded..")
#     ...

2
# async def fire(n: str, last_i: int) -> (int, int):
#     i = random.randint(0, 10)
#     print(f"firing {n}: (will take {i} seconds)")
#     ...
#     print(f"{n} fired!")
#     return (last_i, i)

3
# async def chain(n: str) -> None:
#     start = time.perf_counter()
#     ...
#     ...
#     end = time.perf_counter() - start
#     print(f"--> Timing for {n}: total {end:0.2f} seconds, partial {p2}.")

4
# async def main(*args):
#     ...

5
# random.seed(1672)
# args = ["cannon", "laser", "railgun"]
# start = time.perf_counter()
# ...
# end = time.perf_counter() - start
# print(f"Program finished in {end:0.2f} seconds.")

# solution


1

2

3

4

5

### 002.004 Using a Queue

There is an alternative structure that can also work with async IO: a number of producers, which are not associated with each other, add items to a queue. Each producer may add multiple items to the queue at staggered, random, unannounced times. A group of consumers pull items from the queue as they show up, greedily and without waiting for any other signal. One use-case for queues (as is the case here) is for the queue to act as a transmitter for producers and consumers that aren’t otherwise directly chained or associated with each other.


1. `produce` pushes a random number of jobs to the queue
    1. Use the itertools method that allows to returns a list of n somethings (in this case, None)
    1. Add a tuple (an actual tuple) to the queue: one item is a random job number, the other is a timestamp
1. `consumer` gets the next available job and processes it
    1. Q: we are using `while True`, isn't that going to cause problems?
    1. Get the tuple off the queue
    1. Tell the queue it can tick the job as complete
1. `main` orchestrates the whole thing
    1. create a queue
    1. create and schedule a set with a task for each producer
    1. make sure each of these tasks gets eventually garbage collected
    1. do the same for consumer (use male names)
    1. run all the producers tasks
    1. Q: why only the producers?
    1. wait until all jobs are processed
    1. Q: how do we know we are done?
    1. turn off the consumers


In [7]:
import asyncio
import itertools as it
import random
import time

# this is needed to make asyncio run inside notebooks, without the
# This event loop is already running RunTimeError
import nest_asyncio
nest_asyncio.apply()

# async def create_job(size: int = 5) -> str:
#     return fake.job()

# async def rand_sleep() -> None:
#     i = random.randint(0, 10)
#     await asyncio.sleep(i)

1
# async def produce(name: str, q: asyncio.Queue) -> None:
#     n = random.randint(0, 10)
#     for _ in ...
#         await rand_sleep()
#         i = await create_job()
#         t = time.perf_counter()
#         ...
#         print(f"Producer {name} added <{i}> to queue.")

2
# async def consume(name: int, q: asyncio.Queue) -> None:
#     while True:
#         await rand_sleep()
#         ...
#         now = time.perf_counter()
#         print(f"    > Consumer {name} got element <{i}>"
#               f" in {now-t:0.5f} seconds.")
#         ...

3
# async def main(nprod: int, ncon: int):
#     q = ...
#     producers = set()
#     for _ in range(nprod):
#         task = ...produce(fake.first_name_female(), q)
#         producers.add(task)
#         ...
#     consumers = set()
#     for _ in range(ncon):
#         task = ...
#         consumers.add(task)
#         ...
#     ...
#     ...
#     for c in consumers:
#         ...


# random.seed(444)
# start = time.perf_counter()
# asyncio.run(main(nprod=5, ncon=2))
# elapsed = time.perf_counter() - start
# print(f"Program completed in {elapsed:0.5f} seconds.")


# solution


1

2

3

### 002.005 Async IO’s Roots in Generators

The await keyword behaves similarly, marking a break point at which the coroutine suspends itself and lets other coroutines work. “Suspended,” in this case, means a coroutine that has temporarily ceded control but not totally exited or finished. Keep in mind that yield, and by extension yield from and await, mark a break point in a generator’s execution.

1. `endless` is a generator that each time next is called on it returns one of 9,8,7,6,9...
    1. Import the relevant package
2. use `for ... in ` to print a few iterations of endless
3. print 3 more iterations of e, hard coded

In [8]:
# this is needed to make asyncio run inside notebooks, without the
# This event loop is already running RunTimeError
import nest_asyncio
nest_asyncio.apply()

1
# from ...

# def endless():
#     ...

# e = endless()

2
# total = 0
# for i in e:
#     if total < 40:
#         total += i
#         print(f"{i:02d}/{total:02d}", end=" ")
#     else:
#         print()
#         print(f"discarding {i}")
#         break

3
# ...

# solution


1

2

3

### 002.006 asynchronous generator

asynchronous iterators and asynchronous generators are not designed to concurrently map some function over a sequence or iterator. They’re merely designed to let the enclosing coroutine allow other tasks to take their turn. The async for and async with statements are only needed to the extent that using plain for or with would “break” the nature of await in the coroutine. This distinction between asynchronicity and concurrency is a key one to grasp.

1. `mygen` is an async generator
    1. It yields a sequence of powers of 2 up to 10
    1. IT prints a separator (`.` for example) at each iteration
    1. In between each yield there is a 0.2 sec async pause
1. `main` is a a wrapper for async iterator and comprehension
    1. `g` uses a list comprehension to put all the items generated by mygen in a list
    1. `f` uses for loop to put all the items generated by mygen in a list, but only if i is not 3 or 5
    1. Both are returned as a tuple
1. Get the tuples and print them out

In [9]:
# this is needed to make asyncio run inside notebooks, without the
# This event loop is already running RunTimeError
import nest_asyncio
nest_asyncio.apply()

import concurrent.futures
import time


# 1
# async def mygen(sep:str, up_to: int = 10):
#     print()
#     i = 0
#     for i in range(up_to):
#         print(sep, end=" ")
#         ...

# 2
# async def main():
#     g = ...

#     f = []
#     async for j in mygen("+"):
#         if j // 3 % 5:
#             continue
#         f.append(j)
#     return g, f

3
# g, f =

# g
# f

# solution


3

### 002.007 Async web scraper

In this section, you’ll build a web-scraping URL collector, areq.py, using aiohttp, a blazingly fast async HTTP client/server framework. (We just need the client part.) Such a tool could be used to map connections between a cluster of sites, with the links forming a directed graph.

1. `fetch_html` sends a get request to a url and returns the HTML content
    1. Use the appropriate method to send an async get request (passing kwargs)
    1. make the response handle any >= 400 request by raising
    1. get the source code (HTML but it will work for .txt files too in fact)
1. `parse` returns the links found in the source
    1. catch all the aiohttp errors and print them
    1. For each link, try and create an absolute link by joining them to the page's url (if they are relative) or not if not needed
    1. Q: Why does it work?
1. `write_one` uses  aiofiles to open files asynchronousloy and write a line at the time
1. `bulk_crawl_and_write` creates a task per url

In [10]:
# this is needed to make asyncio run inside notebooks, without the
# This event loop is already running RunTimeError
import nest_asyncio
nest_asyncio.apply()

import aiofiles
import aiohttp
import re
import urllib
from aiohttp import ClientSession
from pathlib import Path
from typing import IO

outpath = Path("_002_python_real_python_urls.txt")
urls = [
    "https://regex101.com/",
    "https://docs.python.org/3/this-url-will-404.html",
    "https://www.nytimes.com/guides/",
    "https://www.mediamatters.org/",
    "https://1.1.1.1/",
    "https://www.politico.com/tipsheets/morning-money",
    "https://www.bloomberg.com/markets/economics",
    "https://www.ietf.org/rfc/rfc2616.txt"
]

HREF_RE = re.compile(r'href="(http.*?)"')

# 1
# async def fetch_html(url: str, session: ClientSession, **kwargs) -> str:
#     """GET request wrapper to fetch page HTML."""

#     resp = ... 1
#     ... 2
#     print("Got response [%s] for URL: %s" % (resp.status, url))
#     html = ...
#     return html

# 2
# async def parse(url: str, session: ClientSession, **kwargs) -> set:
#     """Find HREFs in the HTML of `url`."""

#     found = set()
#     try:
#         html = await fetch_html(url=url, session=session, **kwargs)
#     except (
#         ...,
#         ...,
#     ) as e:
#         print("aioHTTP error: %s" % e)
#         return found
#     except Exception as e:
#         print("non aioHTTP error: %s" % e)
#         return found
#     else:
#         for link in HREF_RE.findall(html):
#             try:
#                 # Q: Why does it work?
#                 abslink = ...
#             except (urllib.error.URLError, ValueError):
#                 print("Error parsing URL: %s" % link)
#                 pass
#             else:
#                 found.add(abslink)
#         print("Found %d links for %s" % ( len(found), url))
#         return found

# 3
# async def write_one(file: IO, url: str, **kwargs) -> None:
#     """Write the found HREFs from `url` to `file`."""
#     res = await parse(url=url, **kwargs)
#     if not res:
#         return None
#     ...:
#         for p in res:
#             ...write(f"{url}\t{p}\n")
#         print("Wrote results for source URL: %s" % url)

# 4
# async def bulk_crawl_and_write(file: IO, urls: set, **kwargs) -> None:
#     async with ClientSession() as session:
#         tasks = []
#         for url in urls:
#             ...
#         await asyncio.gather(*tasks)

# 5
# with open(outpath, "w") as outfile:
#     outfile.write("source_url\tparsed_url\n")
# asyncio.run(bulk_crawl_and_write(file=outpath, urls=urls))


# solution


In [11]:
%%bash
cat _002_python_real_python_urls.txt

source_url	parsed_url
https://1.1.1.1/	https://itunes.apple.com/us/app/1-1-1-1-faster-internet/id1423538627
https://1.1.1.1/	https://www.cloudflare.com/careers/departments/
https://1.1.1.1/	https://twitter.com/intent/tweet?text=ISPs%20spy%20on%20your%20Internet%20traffic%20and%20sell%20the%20data.%20I%27m%20using%201.1.1.1%20with%20WARP%2C%20a%20free%20app%20which%20makes%20the%20Internet%20on%20my%20phone%20faster%20and%20more%20private.%20You%20should%20get%20the%20app%20too%3A%20https%3A//one.one.one.one
https://1.1.1.1/	https://cloudflare.com
https://1.1.1.1/	https://pkg.cloudflareclient.com/
https://1.1.1.1/	https://developers.cloudflare.com/warpclient/setting-up/macOS/
https://1.1.1.1/	https://developers.cloudflare.com/warpclient/setting-up/windows/
https://1.1.1.1/	https://1111-releases.cloudflareclient.com/windows/Cloudflare_WARP_Release-x64.msi
https://1.1.1.1/	https://developers.cloudflare.com/warpclient/setting-up/linux/
https://1.1.1.1/	https://blog.cloudflare.com/warp-for-