# Introducing asyncio

## Table of Contents:

* [The problems of sequential code](#sequential-problems)
* [Concurrency](#concurrency)
* [Threads and Processes](#threads-and-processes)
* [The Global Interpreter Lock (GIL)](#gil)
* [Python standard library concurrency modules](#stdlib)
* [Hello, asyncio](#hello-asyncio)
* [References](#refs)

Here and after, the **Python v3.10** will be used.

In [1]:
import asyncio
import inspect
import random
import sys
import time
import threading
import typing

### The problems of sequential code <a class="anchor" id="sequential-problems"></a>

The following snippet illustrates the performance of a sequential execution.

In [2]:
def routine(name: str) -> int:
    """A short description of the routine.
    
    Parameters
    ----------
    name : str
        the name of routine

    Returns
    -------
    int
        sleep_time ** 10
    """
    sleep_time: int = random.randint(1, 5)
    print(f"{name} is sleeping for {sleep_time} seconds.")
    time.sleep(sleep_time)
    print(f"{name} has woken up.")
    return sleep_time**10


start = time.time()
results = [routine(f"R{i}") for i in range(3, 0, -1)]
print(f"Results = {results}")
print(f"Elapsed time: {round(time.time() - start, 2)}")

R3 is sleeping for 1 seconds.
R3 has woken up.
R2 is sleeping for 3 seconds.
R2 has woken up.
R1 is sleeping for 1 seconds.
R1 has woken up.
Results = [1, 59049, 1]
Elapsed time: 5.0


Iterations and function calls are executed sequentially: the next one waits for the completion of the previous one. This is a case of **synchronous** programming when the instructions are executed in the strict order as they are written in the code.

It would be great if each function could run separately and independently of the others. This would save precious time and resources and such a goal can be achieved with **concurrency**.

### Concurrency <a class="anchor" id="concurrency"></a>

Before diving into the depths of concurrency, let's go through the shallow waters of multitasking.

[**Multitasking**](https://en.wikipedia.org/wiki/Computer_multitasking) is the ability to perform multiple tasks over a certain period of time and can share the same resources. In computer science, tasks are running instances of programs (roughly speaking, processes) controlled by the operating system (OS).

There are two major types of multitasking:

* [preemptive](https://en.wikipedia.org/wiki/Preemption_(computing)) - the OS takes over switching between tasks/processes via the \[task\] scheduler (a module of the OS);

* [cooperative](https://en.wikipedia.org/wiki/Cooperative_multitasking) - switch points, where a running task/process "releases/acquires" ([yields](https://en.wikipedia.org/wiki/Yield_(multithreading))) control, are explicitly written in the code.

[**Concurrency**](https://en.wikipedia.org/wiki/Concurrency_(computer_science)) is the ability to do multiple tasks and switching between them, so concurrency is multitasking. The switching is done so fast, that we think the tasks are executed simultaneously. A good everyday example was given in the Matthew Fowler's book "[Python concurrency with asyncio](https://www.amazon.com/Python-Concurrency-asyncio-Matthew-Fowler/dp/1617298662)".

![Concurrency/Parallelism illustration](img/concurrency_and_parallelism_illustration.png "Concurrency/Parallelism illustration")

Concurrency should be distinguished from [parallelism](https://en.wikipedia.org/wiki/Parallel_computing) when tasks start, run, and complete independently. Parallelism is achieved only on a multicore machine: a core per task is a sort of an isolated swimline. In terms of parallelism, a task can be likened to a swimmer who passes the distance at his own speed (usually) without interfering with other competitors. On a single-core machine, only concurrency is supportable.

![Concurrency/Parallelism switching](img/concurrency_and_parallelism_switching.png "Concurrency/Parallelism switching")

Parallelism implies (is a subset of) concurrency, i.e. multiple tasks are done at the same time, but concurrency does not imply (is a superset of) parallelism (a single-core case). The [image](https://realpython.com/async-io-python/) below depicts this relation:

![Concurrency/Parallelism relationship](img/concurrency_and_parallelism_venn_diagram.webp "Concurrency/Parallelism relationship")

The following [picture](https://dsin.wordpress.com/2017/08/28/different-between-concurrency-and-parallelism/) visualises the sequential, concurrent and parallel ways of execution flows:

* sequential - a deterministic order of execution;
* concurrent - out of the order (switching, not necessarily equally done);
* parallel - an separate and independent execution of a task.

![Sequential, Concurrent and Parallel ways depicted](img/execution_flows.jpeg "Sequential, Concurrent and Parallel ways")

Parallel and concurrent models are the examples of **asynchronous** programming.

### Threads and Processes <a class="anchor" id="threads-and-processes"></a>

In the [concurrency](#concurrency) chapter tasks were deliberately called processes. Now consider the concept of process in more detail.

A [process](https://en.wikipedia.org/wiki/Process_(computing)) refers to a running instance of a computer program isolated from other processes on the same machine. Every process has the main thread which executes program instructions. A running Python program is an instance of the Python interpreter carrying out Python instructions (Python byte-code).

A [thread](https://en.wikipedia.org/wiki/Thread_(computing)) is the object of the OS - the smallest sequence of programmed instructions - that executes the instructions of a process. A thread can be managed independently by the OS scheduler. By default, a process has the main thread, but a process can spawn inside itself multiple threads that can execute independently instructions. 

In a [nutshell](https://www.baeldung.com/cs/process-vs-thread):

* a thread is like a virtual processor (core), a process is like a virtual computer;

* each process has at least one thread and no thread lives outside any process;

* creating a thread is cheaper than a process;

* threads can share the resources assigned to the process that spawned them;

* spawned ([forked](https://en.wikipedia.org/wiki/Fork_(system_call))) processes are isolated copies that have not shareable resources;

* a process and a thread are objects to the task scheduler, so they can both be called tasks.

### The Global Interpreter Lock (GIL) <a class="anchor" id="gil"></a>

The Python Global Interpreter Lock or GIL is a sort of mutex (in Python, [lock](https://en.wikipedia.org/wiki/Lock_(computer_science))) that allows only one thread be running at any time. In short, a mutex is a synchronisation primitive (~tool) that manages the access of concurrent tasks (threads or processes) to shareable resources. The GIL protects access to Python objects, preventing multiple threads from executing Python bytecodes at once.

For single-threaded programs, the GIL is not a problem, but is a significant hindrance for concurrent programs. When a thread runs, first it acquires the GIL. Other threads cannot acquire the GIL until it is released, so instead of doing job, they are seeking to acquire the Lock. Therefore, the benefits of concurrency are nullified.

If so, why the GIL was incorporated into Python? The key reason is that CPython's (the implementation of Python in C programming language) memory management is not thread-safe. Python uses reference counting for memory menagement. When a Python object is created, it has a referece count variable that keeps track of the number of references that point to the object. When the refcount variable of an object reaches zero, the object if free'd from the memory.

In [3]:
obj = object()
lst = []
print("Empty structures")
print(f"obj: refcount = {sys.getrefcount(obj)}")
print(f"lst: refcount = {sys.getrefcount(lst)}\n")

lst.append(obj)
print("lst.append(obj)")
print(f"lst: refcount = {sys.getrefcount(lst)}")
print(f"obj: refcount = {sys.getrefcount(obj)}\n")

del lst[0]
print(f"del lst[0]: refcount = {sys.getrefcount(lst)}")

# See https://docs.python.org/3/library/sys.html#sys.getrefcount
# for an explanation why `sys.getrefcount` returns 2 for an empty list

Empty structures
obj: refcount = 2
lst: refcount = 2

lst.append(obj)
lst: refcount = 2
obj: refcount = 3

del lst[0]: refcount = 2


Suppose that:

* PyObj is a Python object with a refcount variable equal to 1;
* T1 is a thread that increments (+1) the PyObj refcount value;
* T2 is a thread that decrements (-1) the PyObj refcount value.

A golden rule of concurrency is that you ***can never be sure what order of execution will be carried out since it is an out-of-order technique done in overlapping and interleaving fashion***. So we can have possible situations:

Situation 1:

1. T1 &rarr; PyObj.refcount = 2
2. T2 &rarr; Pyobj.refcount = 1

In the situation 1 a PyObj lives but maybe it should have been deleted, so it may be a problem.

Situation 2:

<ol>
    <li>T2 &rarr; PyObj.refcount = 0</li>
    <li>and now there are troubles on the horizon...</li>
    <ol>
        <li>T1 got the job done &rarr; PyObj.refcount = 1 - but maybe this result is not desirable.</li>
        <li>The <a href="https://rushter.com/blog/python-garbage-collector/">garbage collector</a> - memory "janitor" module - managed to free the PyObj &rarr; this memory area is no longer valid to be addressed &rarr; error-prone undefined behaviour leading to possible system corruption.</li>
    </ol>
</ol>
    
Such problems of concurrency are called [race conditions](https://www.baeldung.com/cs/race-conditions). A **race condition** is a condition of a program where its behaviour depends on relative timing or interleaving of multiple threads or processes. It is the case when the indeterministic interleaving between tasks make the outcome of a program unpredictable which is not optative.

![Race conditions](img/race_conditions_with_refcount.png "Race conditions with refcount")

It was because of the need to manage memory and solve the problems of concurrent code that the GIL was created. Adding the GIL in this implementation was a pragmatic solution ещё и по for the following reasons:

* easy to integrate C libraries/extensions, including thread-unsafe, since the GIL will take care about consistency;

* one lock is a cheap and simple enough solution that does not complicate the support of the Python language;

* multi-core computers is a relatively new concept to be widely spreaded ([article](https://medium.com/pyslackers/lets-talk-about-python-s-gil-ade59022bc83)).

### Python standard library concurrency modules <a class="anchor" id="stdlib"></a>

The Python Standard Library provides modules to enjoy concurrency powers:

* `threading`


* `multiprocessing`


* `asyncio`

A nice and simple answer on which module to choose was given in the [Stack Overflow](https://stackoverflow.com/questions/27435284/multiprocessing-vs-multithreading-vs-asyncio)

```python
if io_bound:
    if slow_io:  # many connections
        print("Use asyncio")
    else:  # fast I/O, a limited number of connections
        print("Use threading")
else:  # cpu_bound
    print("Use multiprocessing")
```

**Note**: The `concurrent.futures` module provides fancy abstractions over `multithreading` or `multiprocessing` packages.

In Python, it is important to distinct CPU-bound (*determined by the speed of the CPU*) and I/O-bound (*determined by the speed of I/O devices*) operations.

Because Python interpreter uses the GIL, a single-process Python program could only use one native thread during execution without exploiting CPU more than 100%, i.e. without enjoying multi-core utilisation. Is there a way to bypass such a huge limitation?! In short, yes &rarr; to create multiple processes, each holding its own GIL since they are instances of Python programs, each exploiting a CPU core assigned to it.

![Spawning Python interpreters](img/python_multiprocessing.png "Spawning Python interpreters")

Creating processes is not a cheap solution, but for CPU-bound tasks it is the way. Creating threads is less expensive than spawning processes, but using threads in Python for solving CPU-bound tasks is (usually) a bad idea, but (maybe) a good idea for I/O-bound operations. The point is that in case of I/O operations the GIL is [released](https://stackoverflow.com/questions/36949042/when-the-gil-is-released).

All right, let's check this out in practice.

In [4]:
# a decorating function, ot a decorator
def named_timer(name: str = "func"):
    def timer(func: typing.Callable):
        def wrapper(*args, **kwargs):
            start = time.time()
            res = func(*args, **kwargs)
            end = round(time.time() - start, 2)
            print(f"{name} elapsed time: {end}")
            return res

        return wrapper

    return timer


# @named_timer("squares")
def cpu_bound(start: int, end: int, step: int):
    return sum([num**2 for num in range(start, end, step)])


# @named_timer("sleeping")
def io_bound(sleep_time):
    time.sleep(sleep_time)

Try uncommenting the decorators and see what happens - this notebook is a playground.

In [5]:
# CPU-bound work performed sequentially

named_timer("Squares")(cpu_bound)(1, 10000000, 1)

Squares elapsed time: 3.01


333333283333335000000

In [6]:
# threads for the CPU-bound work

t1 = threading.Thread(
    target=named_timer("T1")(cpu_bound), args=(1, 100000, 1)
)
t2 = threading.Thread(
    target=named_timer("T2")(cpu_bound),
    args=(100000, 10000000, 1),
)

start = time.time()

# launching the threads
t1.start()
t2.start()

# waiting for the threads until they are done
t1.join()
t2.join()

print(f"CPU-bound work time: {time.time() - start}")

T1 elapsed time: 0.06
T2 elapsed time: 3.23
CPU-bound work time: 3.239287853240967


If you launch the cells with sequential and "concurrent" code, a strange behaviour will be exposed: ***threads do not show a stable gain in performance!*** Sometimes they work faster, sometimes slower comparing to the sequential code above.

What will happen in case of I/O operations?

In [7]:
# I/O-bound work performed sequentially

start = time.time()

named_timer("Sleepy_1")(io_bound)(1)
named_timer("Sleepy_2")(io_bound)(2)

print(f"I/O-bound work time: {time.time() - start}")

Sleepy_1 elapsed time: 1.0
Sleepy_2 elapsed time: 2.0
I/O-bound work time: 3.0023934841156006


In [8]:
# threads for the I/O-bound work

t1 = threading.Thread(
    target=named_timer("T1")(io_bound), args=(1,)
)
t2 = threading.Thread(
    target=named_timer("T2")(io_bound), args=(2,)
)

start = time.time()

t1.start()
t2.start()

t1.join()
t2.join()

print(f"I/O-bound work time: {time.time() - start}")

T1 elapsed time: 1.0
T2 elapsed time: 2.0
I/O-bound work time: 2.003573179244995


Lovely! In case of I/O-bound work we do see some increase in performance. It is an illustrative demonstration of the concurrency in Python and the impact of the GIL. So we do need to choose wisely at least when 

**Note**. Attentive readers may notice that the starting and joining threads phases can be combined in one loop, so why not to improve and refactor the code...

In [9]:
# refactored threading

threads = [
    threading.Thread(
        target=named_timer("T1_imp")(io_bound), args=(1,)
    ),
    threading.Thread(
        target=named_timer("T2_imp")(io_bound), args=(2,)
    ),
]

start = time.time()

for thread in threads:
    thread.start()
    thread.join()

print(f"I/O-bound improved (?) work time: {time.time() - start}")

T1_imp elapsed time: 1.0
T2_imp elapsed time: 2.0
I/O-bound improved (?) work time: 3.0038859844207764


Aight, actually, it can be frustrating because the threads will stop executing concurrently. The reason is that if you start and join a thread in the same iteration, the latter will block execution until the thread finishes. Accordingly, the threads at the next iteration simply will not be able to start until the previous iteration is completed. That is why starting and joining threads should be done separately.

In [10]:
# re-refactored threading

threads = [
    threading.Thread(
        target=named_timer("T1_imp")(io_bound), args=(1,)
    ),
    threading.Thread(
        target=named_timer("T2_imp")(io_bound), args=(2,)
    ),
]

start = time.time()

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

print(f"I/O-bound improved (!) work time: {time.time() - start}")

T1_imp elapsed time: 1.0
T2_imp elapsed time: 2.0
I/O-bound improved (!) work time: 2.0030434131622314


### Hello, asyncio <a class="anchor" id="hello-asyncio"></a>

The `asyncio` module was first introduced in Python 3.4 and stands for "Asynchronous I/O". It uses a single-threaded (thanks to the GIL) event loop model for executing awaitable objects, e.g. coroutines.

The term "coroutine" may refer to:

* A **coroutine function** is an `async def` function which returns a coroutine object whereas a routine (an ordinary `def` function) returns a value (by default, `None`);

* A **[coroutine object](https://docs.python.org/3/glossary.html#term-coroutine)** is a function with a superpower to be entered, exited, and resumed at many different points. Whereas for routines there is only one entry point (function call) and one exit point (the first return statement hit) and between them the work of the routine blocks the execution flow.

See [PEP492](https://peps.python.org/pep-0492/) that introduced coroutines with `async`/`await` syntax.

The [event loop](https://www.pythontutorial.net/python-concurrency/python-event-loop/) manages coroutines (a special case of expected objects): registration, status polling, deletion. For now, we will not go into details of how it works: first we will learn how to use it, and then we can crawl under the hood.

![The illustration of event loop pattern](img/python-event-loop.svg "The Event Loop")


Looking ahead, `asyncio` can be used not only for I/O-bound work, but also for CPU-bound, but it will be covered later.

Let's start with a simple one: since there are coroutines, then there can be routines. This is true, and routines are just regular Python functions.

In [11]:
def routine(sleep_time: int = 1) -> str:
    print(f"Coroutine is sleeping for {sleep_time} second[s].")
    time.sleep(sleep_time)
    return "hello, routine."


print(f"Is routine? {inspect.isroutine(routine)}")
ro_res = routine()
print(f"Routine called = {ro_res}")

Is routine? True
Coroutine is sleeping for 1 second[s].
Routine called = hello, routine.


As expected, the routine is called, is executed till the return statement and the result is conveyed back.

Let's try to do something similar with coroutines.

In [12]:
async def coroutine(sleep_time: int = 1) -> str:
    print(f"Coroutine is sleeping for {sleep_time} second[s].")
    await asyncio.sleep(sleep_time)
    return "Hello, coroutine."


print(f"Is coroutine? {inspect.iscoroutine(coroutine)}")
coro_called = coroutine()
print(f"Coroutine called = {coro_called}")
coro_awaited = await coro_called
print(f"Coroutine awaited = {coro_awaited}")

Is coroutine? False
Coroutine called = <coroutine object coroutine at 0x7f8d0920e340>
Coroutine is sleeping for 1 second[s].
Coroutine awaited = Hello, coroutine.


When a *coroutine function* is called, it returns a *coroutine object*. To get the result of a coroutine, it should be `await`'ed. If a coroutine is invoked but not awaited, you can get RuntimeWarning.

In [13]:
async def f():
    await asyncio.sleep(1)


f()
await f()

  f()


Inside the routines `await` statements are not allowed.

In [14]:
def error_routine():
    """await is allowed only in async functions (coroutines)."""
    await asyncio.sleep(1)

SyntaxError: 'await' outside async function (1940803628.py, line 3)

Our "Hello, asyncio" example.

In [15]:
async def main():
    print(routine())
    print(await coroutine())

In [16]:
if __name__ == "__main__":
    asyncio.run(main())

RuntimeError: asyncio.run() cannot be called from a running event loop

Since Jupyter Notebook already has a running event loop, consider to run `00_hello_asyncio.py` script.

To reproduce the example here, just awaiting the `main` coroutine **object** (!) will do.

In [17]:
await main()

Coroutine is sleeping for 1 second[s].
hello, routine.
Coroutine is sleeping for 1 second[s].
Hello, coroutine.


### References <a class="anchor" id="refs"></a>

* Multitasking, concurrency, parallelism:
    * [Web MIT Concurrency](https://web.mit.edu/6.005/www/fa14/classes/17-concurrency/)
    * [Python threading, multiprocessing, asyncio](https://medium.com/@jersobh/python-optmized-parallelism-multiprocessing-e-asyncio-5b62f67e3ea3)
    * [Concurrency and Parallelism - difference](https://stackoverflow.com/questions/1050222/what-is-the-difference-between-concurrency-and-parallelism)
    * [Concurrency vs Parallelism](https://www.baeldung.com/cs/concurrency-vs-parallelism)
    * [Concurrency is not Parallelism](https://www.youtube.com/watch?v=oV9rvDllKEg) by Rob Pike 

* Python concurrency modules:
    * [Superfast Python - threading](https://superfastpython.com/threading-in-python/)
    * [Superfast Python - multiprocessing](https://superfastpython.com/multiprocessing-in-python/)
    * [Superfast Python - asyncio](https://superfastpython.com/python-asyncio/)

* The Global Interpreter Lock (GIL):
    * [Real Python - GIL](https://realpython.com/python-gil/)
    * [Python Wiki - GIL](https://wiki.python.org/moin/GlobalInterpreterLock)

* David Beazley Talks:
    * [Understanding the Python GIL](https://www.youtube.com/watch?v=Obt-vMVdM8s)
    * [Embracing the GIL](https://www.youtube.com/watch?v=fwzPF2JLoeU)
    * [Inside the Python GIL](https://www.youtube.com/watch?v=ph374fJqFPE)
    
* Asyncio related:
    * ["Python concurrency with asyncio", Matthew Fowler](https://www.amazon.com/Python-Concurrency-asyncio-Matthew-Fowler/dp/1617298662)
    * [PEP-492](https://peps.python.org/pep-0492/) - coroutines with async/await syntax