# `asyncio`

1. What is (and isn't?) `asyncio`
2. How it works, using non-`asyncio` stuff
3. Coroutines and tasks and `async def`
4. Running coroutines and `await`
5. Task groups
6. Getting results (and exceptions)
7. Task pools
8. Retrieving URLs
9. Chat server

# The background

Two related terms in programming:

- Concurrency -- a more general term meaning that we can benefit from pieces of our program executing semi-independently, even if they're not truly indepent of one another
- Parallelism -- running multiple parts of our program in parallel

For example:
- If I want read from 10 different files, then I might want to use parallelism -- each core on my system can read a different file
- If I want to download data from 10 different URLs, this also might be possible with parallelism
- If I have only one core, but I'm reading from 10 different files, I can still use concurrency -- because I can ask for data from one file, and while that data is coming to me, I can then turn to another file and ask for its data. This works because I know that it'll take a while for the data to come from any one of those files.

Note that all of these examples are for problems that are I/O-bound, meaning that the bottleneck is basically that we're reading from disks/networks/etc.

What if I wanted to perform a very big, difficult calculation like MD5 or SHA1, or bitcoin mining? Can I break any of those into pieces, and have them shared across CPUs? No. Those are CPU-bound problems. 

Over the years, we've had two main ways to get concurrency/parallelism in Python:
- Multiprocessing -- allows us to start new processes, split our work across them, and even join the results together. The good news is that each process is indeed separate and runs in parallel. There are two problems -- first of all, each process has a lot of overhead. The other problem is that they are indeed totally separate processes, so we have to get the data to and from each of them.
- Threading -- we can, in Python, start lots of new threads (many more than we can start processes). Threads have far lower overhead than processes, because we're inside of a single process. Because we're in a single process, that means we can share data.  But we're in a single process, and all of the threads run on one core. In Python, because of the GIL (global interpreter lock), only one thread can run at a time. We have concurrency, but we don't have parallelism.

For  many years, despite the grumbling, we managed in the Python world to use both of these.

As systems scaled up, this wasn't good enough. We wanted to be able to service a lot of requests from a server. We wanted to be able to work with lots of URLs on the Web. Neither was conducive to thousands or 10s of thousands of tasks at the same time.

JavaScript works on servers via a system known as `nodejs`. It's *very* fast. It only has *one* process, and *one* thread. It does this using the "reactor pattern":
- You have a list of functions
- You iterate over that list, giving each function a chance to run
- Every so often, the function says, "I'm done for now," and gives up control of the CPU (and then the next function runs)
- When a function completes, it removes itself from this list

It turns out that this is *VERY* fast and efficient. 

That was the beginning of `asyncio`, which runs in a similar way.

`asyncio` is cementing itself as a new form of concurrency in Python. Its called `asyncio` because it assumes that most of what it does will be with I/O, working with the network and/or filesystem, where you make a request and you know it'll take some time to get a result. During that time, we can let other functions run.

# Generator functions



In [1]:
# the dumbest function in the world

def myfunc():
    return 1
    return 2
    return 3

myfunc()

1

In [2]:
import dis   # Python disassembler

dis.dis(myfunc)

  3           0 RESUME                   0

  4           2 RETURN_CONST             1 (1)


In [3]:
# let's change one word in our function, from return to yield:

def myfunc():
    yield 1
    yield 2
    yield 3

In [5]:
myfunc()    # calling the function doesn't execute it! Rather, it returns an object that expects to be put in a for loop

<generator object myfunc at 0x10d2585c0>

In [8]:
g = myfunc()   # this is a generator object, which impelments the iterator protocol

In [9]:
next(g)  # this runs the function through the next yield, and then the function goes to sleep

1

In [10]:
g.gi_frame.f_lineno

4

In [11]:
next(g) 

2

In [12]:
g.gi_frame.f_lineno

5

In [13]:
# we can create, using this, our own baby version of asyncio

# we'll create a list 
# we'll put those generators on the list
# we'll iterate through each generator, giving it a chance to run
# when a generator is done (it raises StopIteration), we'll remove it from our list

def mygen(id_number, maxnum):   # we'll take 2 arguments, the ID number and the number we'll go up to 
    for i in range(maxnum):
        yield f'{id_number}: {i}'

g1 = mygen(1, 5)   # id 1, up to 5
g2 = mygen(2, 7)   # id 2, up to 7
g3 = mygen(3, 3)   # id 3, up to 3

generators = [g1, g2, g3]

while generators:   # so long as this list is non-empty
    for one_g in generators:
        try:
            print(next(one_g))   # ask the current generator for its next value, and print it
        except StopIteration:
            generators.remove(one_g)  # if we got to the end of a generator, remove it from our list

print('Done!')
    

1: 0
2: 0
3: 0
1: 1
2: 1
3: 1
1: 2
2: 2
3: 2
1: 3
2: 3
1: 4
2: 4
2: 5
2: 6
Done!


# Mapping this to `asyncio`

- In `asyncio`, we don't define regular functions. Rather, we define `async def` functions. Just as generator functions, when we run them, give us generator objects, async def functions return *coroutines*. You don't run a coroutine directly. Rather, you put it on the event loop, where it gets multiple chances to run... until it ends.
- The event loop is a combination of a list and a `while` loop. So long as there are tasks on the loop, `asyncio` will run them, one at a time, giving each a chance to run for a bit.
- Instead of `yield` in generators, in `asyncio` we use a term called `await`. This means two things at once: First, that we're waiting for a value from something that might take a while. Second: While we're waiting, we're willing to go to sleep, a la `yield`.

It used to be, in older `asyncio` usage, that you would explicitly create or ask for the event loop, and you would put things directly on it. This is no longer considered OK. That's considered low-level `asyncio` usage.

If a coroutine doesn't have any `await` statements, then it runs all at once, never pausing, and never giving anyone else a chance to run. 

We've basically moved from the world of preemptive multitasking (with threads and multiprocessing) back 40 years to cooperative multitasking.

# How we'll write code in `asyncio`

1. We structure our code as generator-like functions, which we'll execute and get "coroutines" from. We'll use `async def` to do this.
2. We'll add our coroutines to the event loop.
3. 

In [14]:
async def hello():
    print('Hello!')

In [15]:
type(hello)

function

In [16]:
type(myfunc)

function

In [17]:
dis.show_code(myfunc)

Name:              myfunc
Filename:          /var/folders/rr/0mnyyv811fs5vyp22gf4fxk00000gn/T/ipykernel_76060/1889638504.py
Argument count:    0
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals:  0
Stack size:        2
Flags:             OPTIMIZED, NEWLOCALS, GENERATOR
Constants:
   0: None
   1: 1
   2: 2
   3: 3


In [18]:
dis.show_code(hello)

Name:              hello
Filename:          /var/folders/rr/0mnyyv811fs5vyp22gf4fxk00000gn/T/ipykernel_76060/1347416487.py
Argument count:    0
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals:  0
Stack size:        3
Flags:             OPTIMIZED, NEWLOCALS, COROUTINE
Constants:
   0: None
   1: 'Hello!'
Names:
   0: print


In [19]:
hello()

<coroutine object hello at 0x10dd52740>

We don't run coroutines directly!

- We add them to the event loop
- `asyncio`, when it gets to our coroutine, will give it a chance to run through the next `await`


# Exercise: `async` tasks

1. Define three coroutine functions:
    - `up` takes a single integer, `maximum`, and returns a string saying `up` and the current number. With each iteration, it'll go from 0 up to `maximum`.
    - `down` is the same thing, but starts with `maximum`, and goes down to 0.
    - `powers` takes a number, `n`, and returns (with each iteration) `n` to a new power up to 7 (i.e., `n ** 7`).
2. Write a `main` routine that schedules them all on the event loop as tasks
3. `await` all of them, and we should see their results intertwined.

# Exercise: Task groups

1. Write an `async def` that takes a filename as an argument, and returns a tuple whose first element is the filename, and whose second element is a dictionary whose keys are the characters in the file, and whose values are integers describing how many times each character appears. This `async def` should iterate over the lines of the file, pausing (with `asyncio.sleep`) between each line.
2. From `main`, iterate over the names of several files and run our function on each of those files, each in a separate task.
3. Iterate over each returned value -- filename and dict, and print them out.