# Chapter 21. Asynchronous Programming

Three major topics that are related:
 - Python's `async def`, `await`, `async with`, and `async for`
 - Objects supporting those constructs: native coroutines and asynchronous variants of context managers, iterables, generators, and comprehensions
 - `asyncio` and other async libraries


## A Few Definitions

Native coroutine
 - a coroutine fcn defined with `async def`. You can delegate from a native coroutine to another native coroutine using the `await` keyword, similar to `yield from`. The `await` keyword cannot be used outside of a native coroutine.

Classic coroutine
 - A generator function that consumes datat sent to it via `my_coro.send(data)` calls, and reads that data by using `yield` in an expression. Classic coroutines can delegate to other classic coroutines using `yield from`.

Generator-based coroutine
 - A generator fcn decorated with `@types.coroutine`. That decorator makes the generator compatible with the new `await` keyword.

Asynchronous generator
 - A generator fcn defined with `async def` and using `yield` in its body. It returns an async generator object that provides `_anext_`, a coroutine method to retrieve the next itme.

10:09 - 10:25

In [None]:
# blogdom.py

#!/usr/bin/env pythono3
import asyncio
import socket
from keyword import kwlist

MAX_KEYWORD_LEN = 4

async def probe(domain: str) -> tuple[str, bool]:
  # get a reference to the asyncio event loop
  # so we can use it next
  loop = asyncio.get_running_loop()
  try:
    # loop.getaddrinfo coroutine method
    # returns a five-part tuple of paramters
    await loop.getaddrinfo(domain, None)
  except socket.gaierror:
    return (domain, False)
  return (domain, True)

# main must be a coroutine so that we can use await in it
async def main() -> None:
  names = (kw for kw in kwlist if len(kw) <= MAX_KEYWORD_LEN)
  domains = (f'{name}.dev'.lower() for name in names)
  # build a list of coroutine objs by invoking
  # the probe coroutine with each domain argument
  coros = [probe(domain) for domain in domains]
  # asyncio.as_completed is a generator that yields
  # coroutines that return the results of the coroutine
  # passed to it **in the order they are completed**
  for coro in asyncio.as_completed(coros):
    # await expression will not block
    # but this is required for us to get the res from coro
    domain, found = await coro
    mark = '+' if found else ' '
    print(f'{mark} {domain}')

if __name__ == '__main__':
  # asyncio.run starts the event loop
  # and returns only when the event loop exits
  asyncio.run(main())

In [None]:
!python blogdom.py

+ not.dev
+ as.dev
+ from.dev
  is.dev
  with.dev
  for.dev
+ def.dev
  or.dev
  none.dev
  pass.dev
  if.dev
+ in.dev
  elif.dev
+ and.dev
+ try.dev
+ true.dev
  else.dev
+ del.dev


## Awaitable

The `for` keyword works with `iterables`. The `await` keyword works with `awaitable`.

As the end user of `asyncio`, these are awaitables we'll see frequently:
 - A native coroutine object, which you get by calling a native coroutine function
 - An `asyncio.Task` which we usually get by passing a coroutine obj to `asyncio.create_task()`

Lower-level awaitable:
 - An object with an `__await__` method that returns an iterator; for example, an `asyncio.Future` instance

 - Objects written in ohter languages using the Python/C API with a `tp_as_async.am_await` function returning an iterator

## Downloading with asyncio and HTTPX

As of Python 3.10, `asyncio` only supports TCP and UDP directly, and there are no async HTTP client or server packages in the standard library. So, we use HTTPX.

In [None]:
# flags_asyncio.py

import asyncio

from httpx import AsyncClient

from flags import BASE_URL, save_flag, main

# must be a native coroutine
# so it can await on get_flag which does http request
# Then, it displays the code of the downloaded flag,
# and save the result
async def download_one(client: AsyncClient, cc: str):
  image = await get_flag(client, cc)
  save_flag(image, f'{cc}.gif')
  print(cc, end=' ', flush=True)
  return cc

async def get_flag(client: AsyncClient, cc: str) -> bytes:
  url = f"{BASE_URL}/{cc}/{cc}.gif".lower()
  # get method of httpx.AsyncClient returns a ClientResponse obj
  # that is also an async context manager
  resp = await client.get(url, timeout=6.1, follow_redirects=True)
  return resp.read()

# This fcn needs to be a plain fcn--not a coroutine
def download_many(cc_list: list[str]) -> int:
  # execute the event loop driving the supervisor(cc_list)
  # coroutine object until it returns
  return asyncio.run(supervisor(cc_list))

async def supervisor(cc_list: list[str]) -> int:
  # Async HTTP client operations in httpx are methods of AsyncClient
  # which is also an async context manager
  # a context manager with async setup and teardown methods
  async with AsyncClient() as client:
    # build a list of coroutine objects by calling the download_one
    # coroutine once for each flag to be retrieved
    to_do = [download_one(client, cc) for cc in sorted(cc_list)]
    # wait for the asyncio.gather coroutine,
    # which accepts one or more awaitable arguments
    # and waits for all of them to complete
    res = await asyncio.gather(*to_do)

  return len(res)

if __name__ == '__main__':
  main(download_many)

### The Secret of Native Coroutines: Humble Generators

A key difference between the classic coroutine and native coroutine is that there are no visible `.send()` calls or `yield` expression in the latter. Our code sits between the asyncio library and the async libraries we are using (e.g. HTTPX).

Under the hood, the `asyncio` event loop makes the `.send` calls that drive your coroutines, and your coroutines `await` on other coroutines, including library coroutines. (`await` borrows most of its implementation from `yield from`, which also makes `.send` calls to drive coroutines)



### All-or-Nothing Problem

## Asynchronous Context Managers

In an async driver like asyncpg, the setup and wrap-up need to be coroutines so that other operations can happen concurrently. However, the implementation of the classic `with` statement doesn't support coroutines doing the work of `__enter__` or `__exit__`.

In the `asyncpg.Transaction` class, the `__aenter__` coroutine method does `await` `self.start()` and the `__aexit__` coroutine awaits on private `__rollback` or `__commit` coroutine methods, depending on whether an exception occurred or not.

Back to *flags_asyncio.py*, the `AsyncClient` class of `httpx` is an async context manager, so it can use awaitables in its `__aenter__` and `__aexit__` special coroutine methods.

## Enhancing the asyncio Downloader


In [None]:
# slow_server.py

""" Slow HTTP server class.

This module implements a ThreadingHTTPServer using a custom
SimpleHTTPRequestHandler subclass that introduces delays to all
GET responses, and optionally returns erros to a fraction of
the requests if given the --error_rate command-line argument
"""

import contextlib
import os
import socket
import time
from functools import partial
from http import server, HTTPStatus
from http.server import ThreadingHTTPServer, SimpleHTTPRequestHandler
from random import random, uniform

MIN_DELAY = 0.5 # minimum delay for do_GET (secs)
MAX_DELAY = 5.0 # maximum delay for do_GET (secs)

class SlowHTTPRequestHandler(SimpleHTTPRequestHandler):
  """
  The optional error_rate arg determines how often GET requests
  receive a 418 status code
  """

  def __init__(self, *args, error_rate=0.0, **kwargs):
    self.error_rate = error_rate
    super().__init__(*args, **kwargs)

  def do_GET(self):
    """Serve a GET request."""
    delay = uniform(MIN_DELAY, MAX_DELAY)
    cc = self.path[-6:-4].upper()
    print(f'{cc} delay: {delay:0.2}s')
    time.sleep(delay)
    if random() < self.error_rate:
      try:
        self.send_error(HTTPStatus.IM_A_TEAPOT, "I'm a Teapot")
      except BrokenPipeError as exc:
        print(f"{cc} *** BrokenPipeError: client closed")
    else:
      f = self.send_head()
      if f:
        try:
          self.copyfile(f, self.wfile)
        except BrokenPipeError as exc:
          print(f"{cc} *** BrokenPipeError: client closed")
        finally:
          f.close()

if __name__ == '__main__':
  import argparse

  parser = argparse.ArgumentParser()
  parser.add_argument('--bind', '-b', metavar='ADDRESS',
                      help='Specify alternate bind address '
                           '[default: all interfaces]')
  parser.add_argument('--directory', '-d', default=os.getcwd(),
                      help='Specify alternative directory '
                           '[default:current directory]')
  parser.add_argument('--error-rate', '-e', metavar='PROBABILITY',
                      default=0.0, type=float,
                      help='Error rate; e.g. use .25 for 25% probability '
                           '[default:0.0]')
  parser.add_argument('port', metavar='store',
                      default=8001, type=int,
                      nargs='?',
                      help='Specify alternate port [default: 8001]')
  args = parser.parse_args()
  handler_class = partial(SlowHTTPRequestHandler,
                          directory=args.directory,
                          error_rate=args.error_rate)

  class DualStackServer(ThreadingHTTPServer):
    def server_bind(self):
      with contextlib.suppress(Exception):
        self.socket.setpockopt(
            socket.IPPROTO_IPV6, socket.IPV6_V6ONLY, 0
        )
        return super().server_bind()

    server.test(
        HandlerClass=handler_class,
        ServerClass=DualStackServer,
        port=args.port,
        bind=args.bind,
    )



### Using asyncio.as_completed and a Thread

In the previous example, we passed several coroutines to `asyncio.gather` which returns a list with results of the coroutines in the order they were submitted.

This means that `asyncio.gather` can only return when all the waitables are done.

In [None]:
# flags2_asyncio.py

import asyncio
from collections import Counter
from http import HTTPStatus
from pathlib import Path

import httpx
import tqdm

from flags2_common import main, DownloadStatus, save_flag

# low concurrency default to avoid errors from remote site
# such as 503 - Service Temporarily Unavailable
DEFAULT_CONCUR_REQ = 5
MAX_CONCUR_REQ = 1000

# get_flag is very similar to the sequential version
# 1st difference: it requires the client parameter
async def get_flag(client: httpx.AsyncClient,
                   base_url: str,
                   cc: str) -> bytes:
  url = f'{base_url}/{cc}/{cc}.gif'.lower()
  # 2nd difference: `.get` is an `AsyncClient` method
  #                 and it's coroutine so we need to `await` it
  resp = await client.get(url, timeout=3.1, follow_redirects=True)
  resp.raise_for_status()
  return resp.content

async def download_one(client: httpx.AsyncClient,
                       cc: str,
                       base_url: str,
                       semaphore: asyncio.Semaphore,
                       verbose: bool) -> DownloadStatus:
  try:
    # use semaphore as an async context manager so that
    # the program as a whole is not blocked
    async with semaphore:
      image = await get_flag(client, base_url, cc)
  except:
    res = exc.response
    if res.status_code == HTTPStatus.NOT_FOUND:
      status = DownloadStatus.NOT_FOUND
      msg = f"not found: {res.url}"
    else:
      raise
  else:
    # Saving the image is an I/O operation
    # To avoid blocking the event loop, run save_flag in a thread
    await asyncio.to_thread(save_flag, image, f"{cc}.gif")
    status = DownloadStatus.OK
    msg = 'OK'
  if verbose and msg:
    print(cc, msg)
  return status

# Supervisor takes the same arguments as the `download_many` fnc
# but it cannot be invoked directly from main because it's a
# coroutine and not a plain function like download_many
async def supervisor(cc_list: list[str],
                     base_url: str,
                     verbose: bool,
                     concur_req: int) -> Counter[DonwloadStatus]:
  counter: Counter[DownloadStatus] = Counter()
  semaphore = asyncio.Semaphore(concur_req)
  async with httpx.AsyncClient() as client:
    # create a list of coroutine objects,
    # one per call to the download_one coroutine

    to_do = [download_one(client, cc, base_url, semaphore, verbose)
            for cc in sorted(cc_list)]
    to_do_iter = asyncio.as_completed(to_do)
    if not verbose:
      to_do_iter = tqdm.tqdm(to_do_iter, total=len(cc_list))
    error: httpx.HTTPError | None = None
    for coro in to_do_iter:
      try:
        status = await coro
      except httpx.HTTPStatusError as exc:
        error_msg = 'HTTP error {resp.status_code} - {resp.reason_phrase}'
        error_msg = error_msg.format(resp=exc.response)
        error = exc
      except httpx.RequestError as exc:
        error_msg = f'{exc} {type(exc)}'.strip()
        error = exc
      except KeyboardInterrupt:
        break

      if error:
        status = DownloadStatus.ERROR
        if verbose:
          url = str(error.request.url)
          cc = Path(url).stem.upper()
          print(f'{cc} error: {error_msg}')
      counter[status] += 1

  return counter

def download_many(cc_list: list[str],
                  base_url: str,
                  verbose: bool,
                  concur_req: int) -> Counter[DownloadStatus]:
  coro = supervisor(cc_list, base_url, verbose, concur_req)
  # instantiates the supervisor coroutine object
  # and passes it to the event loop with asyncio.run
  counts = asyncio.run(coro)

  return counts

if __name__ == "__main__":
  main(download_many, DEFAULT_CONCUR_REQ, MAX_CONCUR_REQ)

All network I/O is done with coroutine in `asyncio`, but not file I/O. However, file I/O is also "blocking"--in the sense that r/w files takes thousands of times longer than r/w to RAM.

Since Python 3.9, the `asyncio.to_thread` coroutine makes it easy to delegate file I/O to a thread pool provided by `asyncio`.

### Making Multiple Requests for Each Download

Suppose we want to save each country flag with the name of the country and the coutry code. Now we need to make 2 HTTP requests per flag: one to get the flag image itself, the other to get the *metadata.json* file in the same directory as the image.

The `await` keyword allows you to drive the async requests one after the other, sharing the local scope of the driving coroutine.

Few changes in third variation of the `asyncio` flag downloading script:

- `get_country`
 * This new coroutine fetches the `metadata.json` file for the country code, and gets the name of the country from it.

- `download_one`
 * This coroutine now uses `await` to delegate to `get_flag` and the new `get_country` coroutine, using the result of the latter to build the name of the file to save

In [None]:
# flags3_asyncio.py

import asyncio
from collections import Counter
from http import HTTPStatus
from pathlib import Path

import httpx
import tqdm

from flags2_common import main, DownloadStatus, save_flag

# low concurrency default to avoid errors from remote site
# such as 503 - Service Temporarily Unavailable
DEFAULT_CONCUR_REQ = 5
MAX_CONCUR_REQ = 1000


async def get_country(client: httpx.AsyncClient,
                      base_url: str,
                      cc: str) -> str:
    url = f'{base_url}/{cc}/metadata.json'
    resp = await client.get(url, timeout=3.1, follow_redirects=True)
    resp.raise_for_status()
    metadata = resp.json()
    return metadata['country']

# get_flag is very similar to the sequential version
# 1st difference: it requires the client parameter
async def get_flag(client: httpx.AsyncClient,
                   base_url: str,
                   cc: str) -> bytes:
  url = f'{base_url}/{cc}/{cc}.gif'.lower()
  # 2nd difference: `.get` is an `AsyncClient` method
  #                 and it's coroutine so we need to `await` it
  resp = await client.get(url, timeout=3.1, follow_redirects=True)
  resp.raise_for_status()
  return resp.content

async def download_one(client: httpx.AsyncClient,
                       cc: str,
                       base_url: str,
                       semaphore: asyncio.Semaphore,
                       verbose: bool) -> DownloadStatus:
  try:
    async with semaphore:
      image = await get_flag(client, base_url, cc)
    async with semaphore:
      country = await get_country(client, base_url, cc)
  except httpx.HTTPStatusError as exc:
    res = exc.response
    if res.stats_code == HTTPStatus.NOT_FOUND:
      status = DownloadStatus.NOT_FOUND
      msg = f'not found: {res.url}'
    else:
      raise
  else:
    filename = country.replace(' ', '_')
    await asyncio.to_thread(save_flag, image, f'{filename}.gif')
    status = DownloadStatus.OK
    msg = 'OK'

  if verbose and msg:
    print(cc, msg)
  return status

# Supervisor takes the same arguments as the `download_many` fnc
# but it cannot be invoked directly from main because it's a
# coroutine and not a plain function like download_many
async def supervisor(cc_list: list[str],
                     base_url: str,
                     verbose: bool,
                     concur_req: int) -> Counter[DownloadStatus]:
  counter: Counter[DownloadStatus] = Counter()
  semaphore = asyncio.Semaphore(concur_req)
  async with httpx.AsyncClient() as client:
    # create a list of coroutine objects,
    # one per call to the download_one coroutine

    to_do = [download_one(client, cc, base_url, semaphore, verbose)
            for cc in sorted(cc_list)]
    to_do_iter = asyncio.as_completed(to_do)
    if not verbose:
      to_do_iter = tqdm.tqdm(to_do_iter, total=len(cc_list))
    error: httpx.HTTPError | None = None
    for coro in to_do_iter:
      try:
        status = await coro
      except httpx.HTTPStatusError as exc:
        error_msg = 'HTTP error {resp.status_code} - {resp.reason_phrase}'
        error_msg = error_msg.format(resp=exc.response)
        error = exc
      except httpx.RequestError as exc:
        error_msg = f'{exc} {type(exc)}'.strip()
        error = exc
      except KeyboardInterrupt:
        break

      if error:
        status = DownloadStatus.ERROR
        if verbose:
          url = str(error.request.url)
          cc = Path(url).stem.upper()
          print(f'{cc} error: {error_msg}')
      counter[status] += 1

  return counter

def download_many(cc_list: list[str],
                  base_url: str,
                  verbose: bool,
                  concur_req: int) -> Counter[DownloadStatus]:
  coro = supervisor(cc_list, base_url, verbose, concur_req)
  # instantiates the supervisor coroutine object
  # and passes it to the event loop with asyncio.run
  counts = asyncio.run(coro)

  return counts

if __name__ == "__main__":
  main(download_many, DEFAULT_CONCUR_REQ, MAX_CONCUR_REQ)

## Writing asyncio Servers

In [None]:
# charindex.py

import sys
import unicodedata
from collections import defaultdict
from collections.abc import Iterator

STOP_CODE: int = sys.maxunicode + 1
Char = str
Index = defaultdict[str, set[Char]]

def tokenize(text: str) -> Iterator[str]:
  """return iterator of uppercased words"""
  for word in text.upper().replace('-', ' ').split():
    yield word

class InvertedIndex:
  entries: Index

  def __init__(self, start: int = 32, stop: int = STOP_CODE):
    entries: Index = defaultdict(set)
    for char in (chr(i) for i in range(start, stop)):
      name = unicodedata.name(char, '')
      if name:
        for word in tokenize(name):
          entries[word].add(char)

    self.entries = entries

  def search(self, query: str) -> set[str]:
    if words := list(tokenize(query)):
      found = self.entries[words[0]]
      return found.intersection(*(self.entries[w] for w in words[1:]))
    else:
      return set()

def format_results(chars: set[Char]) -> Iterator[str]:
  for char in sorted(chars):
    name = unicodedata.name(char)
    code = ord(char)
    yield f'U+{code:04x}\t{char}\t{name}'

def main(words: list[str]) -> None:
  if not words:
    print('Please give one or more words to search')
    sys.exit(2)
  index = InvertedIndex()
  chars = index.search(' '.join(words))
  for line in format_results(chars):
    print(line)
  print('-' * 66, f'{len(chars)} found')

if __name__ == '__main__':
  main(sys.argv[1:])

In [None]:
# web_mojifinder.py

from pathlib import Path
from unicodedata import name

from fastapi import FastAPI
from fastapi.responses import HTMLResponse
from pydantic import BaseModel

from charindex import InvertedIndex

STATIC_PATH = Path(__file__).parent.absolute()

app = FastAPI(
    title='Mojifinder Web',
    description='Search for Unicode characters by name.'
)

class CharName(BaseModel): # Pydantic schema for a JSON response
  char: str
  name: str

def init(app):
  app.state.index = InvertedIndex()
  app.state.form = (STATIC_PATH / 'form.html').read_text()

# run init when this module is loaded by the ASGI server
init(app)

# FastAPI assumes that any parameters that appear in the fcn
# or coroutine signature that are not in the route path will be
# passed in the HTTP query string
@app.get('/search', response_model=list[CharName])
async def search(q: str):
  chars = sorted(app.state.index.search(q))
  # return an iterable of dicts compatible with the response_model
  # schema allows FastAPI to build the JSON response
  return ({'char': c, 'name': name(c)} for c in chars)

@app.get('/', response_class=HTMLResponse, include_in_schema=False)
def form():
  return app.state.form

# no main function
# it is loaded and driven by the ASGI server (e.g., uvicorn)