# Parallelism and concurrency in Python

# Table of Contents
  - [Parallelism and concurrency in Python](#Parallelism-and-concurrency-in-Python)
    - [References](#References)
    - [Introduction](#Introduction)
    - [Parallelism Vs. concurrency](#Parallelism-Vs.-concurrency)
      - [Parallelism](#Parallelism)
      - [Concurrency](#Concurrency)
      - [Quiz: parallel or not](#Quiz:-parallel-or-not)
    - [Parallelism in python: pre-emptive multitasking](#Parallelism-in-python:-pre-emptive-multitasking)
      - [High-level interface: Process pools](#High-level-interface:-Process-pools)
      - [Low-level interface: Process, run, join and deadlocks](#Low-level-interface:-Process,-run,-join-and-deadlocks)
      - [Higher level interface: concurrent.futures](#Higher-level-interface:-concurrent.futures)
      - [Inter-process communication and data dependencies](#Inter-process-communication-and-data-dependencies)
      - [Synchronization using `lock`](#Synchronization-using-lock)
      - [Threads, GIL and the illusion of concurrency](#Threads,-GIL-and-the-illusion-of-concurrency)
        - [Threads vs processes](#Threads-vs-processes)
        - [When to use threads](#When-to-use-threads)
    - [Asynchronous programming and couroutines: cooperative multitasking](#Asynchronous-programming-and-couroutines:-cooperative-multitasking)
    - [Exercises](#Exercises)
      - [Exercise 1: Counting words in a fileüå∂Ô∏èüå∂Ô∏è](#Exercise-1:-Counting-words-in-a-fileüå∂Ô∏èüå∂Ô∏è)
      - [Exercise 2: Find super secret server keyüå∂Ô∏èüå∂Ô∏èüå∂Ô∏è](#Exercise-2:-Find-super-secret-server-keyüå∂Ô∏èüå∂Ô∏èüå∂Ô∏è)



## References
- [How async/await is implemented in Python](https://tenthousandmeters.com/blog/python-behind-the-scenes-12-how-asyncawait-works-in-python/)
- [Coroutines from the Python standard library](https://docs.python.org/3/library/asyncio-task.html#coroutine)
- [What color is your function?](https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/)

## Introduction
There are many cases where we could execute multiple tasks in parallel or switch between tasks while we wait for some time-consuming task to be completed.
You can just think of examples from daily life to see why: 

- While your soup cooks on the stove, you start washing your dishes
- You check the news while drinking your morning coffee
- You work on your python course exercises while attending a video call

Indeed, the language of computing is so engrained in many of us that today we refer to these sort of behaviors as *multitasking*, an expression borrowed from computer science.

Because in computer science we like to be precise, let us define these terms better.

## Parallelism vs. concurrency

### Parallelism
When we have two or more tasks *running and progressing simultaneously*, we can talk about **parallelism**. 
Think for example of the situation of paying at the supermarket where there are multiple lines: more than one customer can pay their purchases at the same time.

### Concurrency
When two or more tasks run in overlapping time periods (but **not necessarily simultaneously**) instead of sequentially, we say that their execution is **concurrent**.
This is the typical human multitasking, where we work on multiple tasks in a time period, but we must switch between them to be able to perform them correctly.
For example, we sit in a meeting, listen passively while working on our python program and stop working on our code to answer a question directed to us.



The image below can help you understanding the difference between concurrent and parallel work.
<figure>
  <img
  src="../../images/concurrency_vs_parallelism.jpg"
  height="400px"
  alt="A schema of parallelism and concurrency">
  <figcaption>A simple time diagram illustrating the difference between parallelism and concurrency (<a href="https://openclassrooms.com/en/courses/5684021-scale-up-your-code-with-java-concurrency/5684028-identify-the-advantages-of-concurrency-and-parallelism">Source</a>)</figcaption>
</figure>




### Quiz: parallel or not

<div class="alert alert-block alert-danger">
    <h4><b>Heads-up</b></h4>
    Please, don't forget to evaluate the cell below ‚¨áÔ∏è
</div>


For each of the examples below, decide whether the situation represents **parallel** actions or not.

In [None]:
from tutorial.threads import Threads

Threads()

## Parallelism in python: pre-emptive multitasking

By default, in Python tasks do not run in parallel.
Consider this example:

In [None]:
from datetime import datetime as dt
from time import sleep

def task(name: str):
    """
    This function defines a fictional task that takes one second
    to complete and prints when it started and finished.
    """
    print(f"{name} started at {dt.now()}")
    sleep(1)
    print(f"{name} finished at {dt.now()}")



def two_tasks():
    task("First task")
    task("Second task")


two_tasks()


<div class="alert alert-block alert-warning">
    <h4><b>Warning</b></h4> In a standard Python distribution we would use the <a href=https://docs.python.org/3/library/multiprocessing.html>multiprocessing</a> module part of the standard library. 
    To be able to run the examples in Ipython notebooks, we used the <b>multiprocess</b> module which is a drop-in replacement for <b>multiprocessing</b>
</div>

We see that the first task to be started (`First task`) finished before the second one could start. 
This is the sequential computational model we are used to when we first learn programming. 
However, in Python we can introduce **parallelism** by using the [multiprocess](https://github.com/uqfoundation/multiprocess) module. 

Using this module, we can execute code in different operating system **processes**. 
A process is a representation of a task, with all the code, memory and resources (files, network connections, etc.) needed to run it. 
In most cases, processes are managed by the operating system (OS).
The OS schedules what processes to run when. 
Moreover, it ensures  that no process runs forever and that they regularly yield computing resources to other processes.
This approach is called **pre-emptive multitasking** and is the standard way of running multiple processes in modern desktop and server operating systems. 

When working on a multi-core or multi-CPU system, it is possible to leverage multitasking to run your computations in parallel. 
We will see how in the following sections.

### High-level interface: Process pools

Let's rewrite our example from before using `multiprocess.Pool`, which executes jobs on a pool of shared processes.  


In [None]:
from datetime import datetime as dt
from time import sleep
from multiprocess import Pool

def task(name: str):
    """
    This function defines a fictional task that takes one second
    to complete and prints when it started and finished.
    """
    print(f"{name} started at {dt.now()}")
    sleep(1)
    print(f"{name} finished at {dt.now()}")



def two_tasks():
    with Pool(3) as p:
        p.map(task, ["First task", "Second task"])


for i in range(10):
    two_tasks()

We use `Pool` as a [context manager](https://book.pythontips.com/en/latest/context_managers.html) and use the `map` method of the pool object to call the function `task` with a list of arguments. 
Internally, this will create and run a separate process for each value in the list.

As you can see from the console output, the two tasks not only run simultaneously (**concurrently**) but also in parallel. 
This output highlights quite well one problem with concurrent computations: the order of completion is **non-deterministic**. 
We cannot know a priori which process will be started first and which process will complete first. 
If the order of the results is important, you need to make sure to send and return some sort of identifier with each job, so that you can reconstruct the right order.

However, if we use `map`, it takes care of managing the order of tasks automatically:

In [None]:
from datetime import datetime as dt
from time import sleep
from multiprocess import Pool

def increment(number: int) -> int:
    """
    This function increments the number by 1.
    """
    name = "Process " + str(number)
    print(f"{name} started at {dt.now()}")
    result = number + 1
    print(f"{name} finished at {dt.now()}")
    return result



def two_tasks():
    with Pool(3) as p:
        res = p.map(increment, range(10))
    print(res)



two_tasks()

### Low-level interface: Process, run, join and deadlocks

In some situations, we want more control over the execution of multiple processes. 
In that case, you can directly create processes using the [`Process`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process) object. 
This offers several methods, primarily:
- `run()`: by default, it runs the callable object with the argument passed at the process creation time. 
    <div class="alert alert-block alert-warning">
        <h4><b>Warning</b></h4> The <code>run</code> method is <b>blocking</b> and will just execute the function in the current python process, blocking it until the execution finishes.
    </div>
- `start()`: it will start the computation defined by `run()` in a separate process.
- `join()`: this methods blocks the python interpreter process until the task defined by the owning `Process` finishes. 
     A process cannot join itself because this would cause a **deadlock**. This is a situation where there is a cycling dependency between some waiting resources. 
    In real life, imagine the situation of two friends waiting for each other to call before going out. `join` is useful when we want to make sure a given process finishes its job before continuing.

let's see an example on how to use this:

In [None]:
from multiprocess import Process
from time import sleep
from datetime import datetime as dt

def log(message: str):
    """
    This function just prints a message.
    """
    print(f"{dt.now()}: {message}")

def friend(n: int, sleep_time: int = 1):
    """
    This function prints a friendly message.
    """
    log(f"Hello from process {n}")
    sleep(sleep_time)
    log(f"After sleeping, process {n} is done")

def waiting_friend(n: int, wait_for_friend: bool = True, sleep_time: int = 1):
    """
    This function waits for n friend processes to finish.
    """
    f = [Process(target=friend, args=(i, sleep_time)) for i in range(n)]
    for p in f:
        p.start()
        if wait_for_friend:
            p.join()
    log("Finished")



#Start without waiting for friend
waiting_friend(3, True, 10)


You can see the effect of `join` in the output of `waiting_friend`. 
If we do not `join` on the child processes, they can finish in any order; we don't have any guarantee that they will finish at all. 
Indeed, you can see that `waiting_friends` prints "finished" before any of the child processes wakes up from sleep.  

On the other hand, if we join on them in the for loop by setting the second argument of `waiting_friend` to `True`, the function `waiting_friend` will wait for each friend (1, 2, 3) to finish before starting the next process.

### Higher level interface: concurrent.futures

Because `join` **blocks** the main process until the child process finished, this is rarely the solution we need for scientific computing, where we want to split a large unit of work into smaller blocks and have multiple processes handle each of the blocks independently.
For this reason, in most cases we advise starting your parallel processing adventure using higher-level solutions that take care of this low-level synchronization for you. 
A first good starting point is the [`concurrent.futures`](https://docs.python.org/3/library/concurrent.futures.html) module of the python standard library. 
This module wraps the methods of `multiprocess` (and `multithreading`) modules and offers more convenient ways to launch parallel computations.

The way it works is roughly as follows:

- It creates a *pool* of `n` worker processes
- It sends the functions (and the data) to execute to the processes in batches of size `n`
- It waits (joins) to the processes until they finish.
- It repeats the steps above until it exhausts the data to process

Let's see how this works in practice with an artificial example: we call a function `work` that receives a number `i`, then waits 0.1 seconds and then return the number `i`. 
We want to call this function `n` times and compute the sum of the results.


To see the definition of `parallel_work` and `sequential_work` functions, please have a look [here](tutorial/threads.py).


In [None]:
from concurrent.futures import ProcessPoolExecutor
from time import sleep
from multiprocess import cpu_count
from functools import partial
from tutorial.threads import parallel_work, sequential_work

#Here we run the parallel and sequential functions
n = 10
with ProcessPoolExecutor(max_workers=cpu_count()) as executor:
    res = parallel_work(executor, n)
    res1 = sequential_work(n)
print(f"The parallel sum is {res}, the regular is {res1}")

Success! We can see that the sum computed in parallel using multiple processes equals the sum computed *locally* in one python process. Now it is interesting to try and see how much speedup we gain from this trick. To do so, we use the `timeit` module part of the python standard library:

<div class="alert alert-block alert-warning">
    <h4><b>Warning</b></h4> The cell below might take a very long time to run. Do not run it unless you are fine with waiting a couple of minutes.
</div>

In [None]:
import timeit
from multiprocess import cpu_count
from concurrent.futures import ProcessPoolExecutor

def time_for(fun: callable, args: list[any]):
    return [{arg: timeit.repeat(lambda : fun(arg), number=1, repeat=3)[1]} for arg in args]


sizes = range(1, 20, 1)

with ProcessPoolExecutor(max_workers=cpu_count()) as executor:
    res = time_for(lambda i : parallel_work(executor, i, ), sizes)
    res1 = time_for(sequential_work, sizes)

In [None]:
import matplotlib.pyplot as plt
from matplotlib.pyplot import Axes

def plot_times(ax: Axes, times: list[dict[str, float]], title: str):
    return ax.scatter(sizes, [list(t.values())[0] for t in times], label=title)

f, ax = plt.subplots(1, 1)
plot_times(ax, res, "Parallel")
plot_times(ax, res1, "Sequential")
h1 = ax.set_xlabel("Size")
h2 = ax.set_ylabel("Time")
f.legend()
f.show()



<figure>
  <img
  src="../../images/process_performance.png"
  height="400px"
  alt="Graph of performance.">
  <figcaption>Comparison of the runtime (in seconds) of the <i>parallel_work</i> and <i>sequential_work</i> functions as a function of the input size. Here we show until n=100, but the code above will only run 20 to same time.</figcaption>
</figure>


As you can see from the graph, there is a clear difference in execution time between `sequential_work` and `parallel_work` as the number `n` increases. 
This is due to the fact that if we execute the function `work` sequentially for `n` inputs, we need to wait **at least** `n` times the duration of execution of `work`. 
On the other hand, if we process in parallel, we can cut down the time by maximally `n_proc` where `n_proc` is the number of CPUs our system offers. 
In reality, the speed up will be a bit smaller because of the overhead of starting different processes and the synchronization effort.


<div class="alert alert-block alert-warning">
    <h4><b>Warning</b></h4> Multiprocessing/parallelism should be the last resort to improve the performance of your Python code. 
    You first should try improving your code by using the appropriate algorithms or by adopting numerical libraries like NumPy that offers vectorized operations. 
    If this fails, you could try just-in-time compilers like <a href="https://numba.pydata.org">Numba</a>, which only requires you to decorate existing functions. Only if all these steps yield no improvement, should you attempt to use multiprocessing explicitly. 
    If not used carefully, it can even harm the performance of your code.
</div>


### Inter-process communication and data dependencies

The pattern of using `ProcessPoolExecutor` works well for a variety of tasks where we can split the  problem into **independent** unit of works that can be performed at the same time; sometimes these problems are called *embarrassingly parallel*.  
However, many problems in scientific computing contain **data dependencies** where computations depend on the result of other computations. In this case, we cannot simply split our input data across processes as we did before. In these situations, we have a few possible solutions:

1. Split the problem across a dimension where there are no dependencies. 
1. Add communication between processes, so that individual processes can access the result computed by other processes.

In the rest of this section, we will address the second solution, as python offers methods to communicate between processes. The first solution cannot be easily addressed in this course because it requires domain knowledge and we cannot provide a simple recipe that will work in all cases.

We now consider an example of a problem that can be parallelized but where there are dependencies between the processes: the **producer-consumer** problem. 
Consider the following situation: we have two **producers**, each of which produces a sequence of numbers. 
Additionally, we have a **consumer**; this is a function that wants to get the sequence of numbers of each of the producers and compute the sum of **all the numbers** produced by all producers.
Because a sum can be computed in any order, we can have both producers work at the same time and get all the numbers and sum then in any order. 
To exchange data between the producers and the consumers we use a `Queue`. 
This is a list-like data structure where we put elements into it from one side and we extract them from the other side; we call this *first in, first out* (FIFO). 
The `multiprocess` module offers an implementation of a queue that works with multiple processing, it is called Queue.

Let's see how we do use a Queue with multiprocessing to solve the consumer-producer problem: 

In [None]:
from multiprocess import Process, Queue, Event
from dataclasses import dataclass
import time

@dataclass
class QueueItem:
    """
    This class represents an item in the queue.
    It contains the item and the producer.
    """
    item: int
    producer: str

@dataclass
class Stop:
    """
    This class represents the stop signal.
    """
    producer: str

# This is a type alias for the queue elements: either we put an item or a stop signal
QueueElement = QueueItem | Stop

def generate_items(start: int, max: int):
    """
    This function generates the items to be consumed.
    """
    for i in range(start, max):
        yield i

def producer(start: int, max: int, name: str, queue: Queue):
    """
    This function produces the items to be consumed and puts them in the queue.
    """
    for i in generate_items(start, max):
        print(f"Putting item in queue from {name}")
        queue.put(QueueItem(i, f"Producer {name}"))
        #We sleep a bit to simulate a long task, so that
        #other producers can put items in the queue
    queue.put(Stop(name))

def consumer(producers: list[str], queue: Queue):
    """
    This function consumes the items in the queue. 
    We pass it a list of producers so that it knows when to stop.
    """
    sum = 0
    stopped = set()
    while True:
        if stopped == set(producers):
            print("All producers have stopped")
            #If the producers have all stopped, we can stop the consumer
            break
        item = queue.get()
        match item:
            case Stop(producer):
                print("We got the stop signal")
                print(f"The final sum is {sum}")
                stopped.add(producer)
                continue
            case QueueItem(item, producer):
                sum += item
                print(f"We got {item} from {producer}")
                print(f"The current sum is {sum}")



def producer_consumer():
    """
    This function starts the producers and the consumer.
    """
    queue = Queue(maxsize=1)
    producer1 = Process(target=producer, args=(1, 10, "First producer", queue))
    producer2 = Process(target=producer, args=(10, 20, "Second producer", queue,))
    #We pass the list of producers to the consumer so it knows when to stop
    consumer1 = Process(target=consumer, args=(["First producer", "Second producer"] ,queue,))
    producer1.start()
    producer2.start()
    consumer1.start()

    producer1.join()
    producer2.join()
    consumer1.join()
    

producer_consumer()

Let's now understand how this code works:

To begin, we define two different `dataclasses` to represent the objects we will put in the queue. 
The class `QueueItem` will represent a number produced by one producer, while `Stop` represents the case where a producer is finished with its work.
The function `producer` takes the initial number of the sequence and the maximum, as well as `Queue` where its data should be put in.
The function iterates over the `generate_items` generator, for each number it produces, it puts it on the queue.
When no items are available anymore, it puts a `Stop` signal instead.
The message send with `Stop` also contains the name of who puts the message; this is used later to know when to stop.
Now we can analyze `consumer`; this function takes a queue as well as the names of the producers. 
The function runs an infinite loop and keeps receiving messages from the queue; when the message matches with a `QueueItem`, it extracts the number, prints a message and increments the sum.
When the function receives  a `Stop`, it extracts the name of the sender of the stop message and adds it to the set of senders. 
When it receives the stop signal from all the senders, it exits the loop and prints the final sum to the screen.

If we now look at the function `producer_consumer`, we see that we initialize a queue as well as two `Process` for the producers, to which we pass the initial value of the sequence and the queue we just created.
We also create a new `consumer` and we pass the same queue to it, for it to be able to communicate with the producers.
Finally, we start all the processes and call their `join` method in order to wait for them to finish.

Notice that the producers alternate in sending messages; this is thank to the line `queue = Queue(maxsize=1)`. 
This forces the queue to have a maximum size of one element; that means that we have the following sequence:
1. `Producer1` puts an element on the queue
2. `Producer1` moves to the next iteration 
3. Because the queue is already filled, `put` blocks
4. `consumer` gets the first element from `Producer1` from the queue
5. Now one of the two producers can again put an element on the queue. We can't guarantee which producer, but there's a good chance any of them at random gets to put an element.
6. The cycle repeats.

### Synchronization using `lock`
Notice that because of the operating system scheduling of processes, there's a certain *non-determinism* regarding the order where tasks run. 
This means that sometimes the `put-consume` loop will briefly favor one `Producer` over another. 
If we want both tasks to strictly alternate, we can synchronize them using a `Lock`. 
This is an object that offers two methods `acquire()` and `release()`. 
When we `acquire` a lock, any other function that accesses the **same** lock and tries to call `acquire()` will be blocked until the other function calls `release()`

Let's see an example how to use this:



In [None]:
from multiprocess import Process, Queue, Event, Lock
from multiprocess.synchronize import Lock as LockBase
import time
def process(name: str, lock: LockBase):
    """
    This function prints a message and then waits for the lock.
    """
    print(f"{name} started")
    for i in range(4):
        #This is a context manager, so it will release the lock when the block ends
        with lock:
            #Here `lock.acquire()` is called implicitly
            print(f"{name} got the lock")
            time.sleep(1)
            #Here `lock.release()` is called implicitly
        print(f"{name} released the lock")

def lock_example(): 
    print("Starting")
    l = Lock()
    p1 = Process(target=process, args=("First process", l))
    p2 = Process(target=process, args=("Second process", l))
    p1.start()
    p2.start()
    p1.join()
    p2.join()

lock_example()


As you see in the console output, the processes now work in the right order; the second process only can work when the first one releases the lock and the other way around.
It is important to think about your problem well: do you really need locking in order to obtain your result? 
In the case before, it was not necessary: the sum does not depend on the order in which the producer worked.

### Threads, GIL and the illusion of concurrency

Thus far, we only discussed one approach to concurrency (and parallelism) in python:  the use of operating system **processes** leveraging the ability of the OS to schedule and coordinate multiple processes across multiple CPUs. 

#### Threads vs processes
Other than multiprocessing, there's a second, very similar approach  called **threading**. 
Like a **process**, a **thread** is a representation of a task including all needed resources. 
In fact, a process usually consists of multiple threads all sharing a common block of memory. 
Because of shared memory, a thread is usually lighter in its resource usage than a process, meaning that we can start more threads.
Moreover the shared memory means that multiple threads can communicate using shared variables. 
Even if this is possible, this style of concurrency comes with severe performance and safety pitfalls and it is discouraged in most cases. 
Considering all this, in most programming languages we try to achieve concurrency by starting multiple threads in a process.  

However, due to the way the python interpreter is written (the infamous [global interpreter lock](https://wiki.python.org/moin/GlobalInterpreterLock)), we cannot get any performance benefits from running python code across multiple threads: only one thread can perform CPU operations at a time. 

#### When to use threads
However, there are some cases where we could benefit from this style: when we have multiple threads sitting and waiting for input from the network or another process, but that don't perform any heavy computation.
This is a case commonly encountered in user interface programming, where we don't want the main user interface to block waiting for the program to fetch data from the network.
In that situation, we can run the GUI in a main thread and have a second thread being responsible for network interaction. 
Here we won't have any speedup but we will give the user the **illusion** of concurrency because the GUI won't freeze while the network thread fetches the data.


Let's see this with an example:

In [None]:
from IPython.display import display
import ipywidgets as widgets
from threading import Thread
from queue import Queue
import time
import sys
from logging import getLogger, StreamHandler, Formatter, DEBUG, INFO
import logging

def make_logger():
    logging.basicConfig()
    logger = logging.getLogger('Button')
    if not logger.handlers:
        handler = StreamHandler(sys.stdout)
        handler.setFormatter(Formatter('%(asctime)s - Thread: %(thread)d - %(message)s'))
        logger.addHandler(handler)  
    logger.setLevel(INFO)
    return logger


def make_button():
    logger = make_logger()
    button = widgets.Button(description="Click Me!")
    output = widgets.Output()

    task_status = Queue()
    task_status.put('not_started')

    def background_task():
        task_status.put('running')
        logger.info("Background task started.")
        time.sleep(60)
        logger.info("Background task finished.")
        task_status.put('finished')

    def on_button_clicked(b):
        status = task_status.get()
        if status == 'not_started' or status == 'finished':
            logger.info("Button clicked while task not started, a new task will be started.")
            Thread(target=background_task).start()
        elif status == 'running':
            task_status.put('running')
            logger.info("Button clicked while task runs.")

    button.on_click(on_button_clicked)
    return button, output

button, output = make_button()
display(button, output)

As you can see in the output, pressing the button starts a task. 
However, while the task runs, you can still interact with the button. 
This is possible because the task started by the button runs on a separated thread. 

To communicate between the main thread and the second thread, we use a *Queue*. 
This is a list-like data structure where one side can put data in with the `put` function and the other side can get the data from it using `get`.
`get` is a **blocking** operation: this function will block the execution until data is available in the queue. 
This is how we use it in this script:

1. When the function `make_button` is first started, we create a `Queue` and put the initial status `not_started` on it.
2. The function will return a button and output field. The other functions created in its **closure** can access the queue.
3. We add the output field and the button returned by `make_button` to the current notebook cell using `display`
4. The button is connected to the function `on_button_clicked`.
5. When the button is clicked, `on_button_clicked` tries to get the current status of the background task from the queue. Initially the queue contains the status `not_started`, therefore a new background task is started using `Thread`
6. The background task adds a `running` status to the queue and sleeps for 60 seconds. When it finishes sleeping, it adds `finished` to the queue.
7. The next time the button is called, we try again to get the status of the task from the queue. If we click while the background task is running, we get `running` and the display shows the message "Button clicked while task runs."

In this way, we can have a background task running while the user can interact with the button in the main task.

## Asynchronous programming and coroutines: cooperative multitasking

So far, the style of programming we saw relies on *threads* or *processes* to split and coordinate works between multiple tasks that might be executed simultaneously on multiple CPUs. 
Even with a single CPU, we obtain the **illusion** of parallel work thanks to the operating system switching work between multiple tasks and making sure they all get a more or less regular portion of the CPU time. 
This style is commonly known as (**preemptive multitasking**)[https://en.wikipedia.org/wiki/Preemption_(computing)] and is the most common form on multitasking in modern server and desktop operating systems.

In contrast to this style, we can also employ **cooperative multitasking**. 
This is the style used in python **coroutines** and often known as **asynchronous programming** or **async/await** in other languages. 
In cooperative multitasking, the tasks are responsible themselves for *yielding* control and resources back to other tasks. 
This has a few benefits over operating system threads or processes:

- They require much less resources than threads and processes, which means potentially a very large number of tasks can run concurrently. 
- The programmer is responsible for passing control between the tasks. Because the points of switching are explicit, it is easier to reason about the behavior of the program. 
- Because the tasks are handled by the programming language, it is much easier to cancel their execution.


Because the operating system is not involved in the running of asynchronous tasks (or coroutines), the programming language is responsible for this task and needs to provide an **executor** or **event loop**. At the core, this is nothing but a function that handles the execution of the tasks in the right order, stopping them when they need to yield control, resuming them etc. 

After this abstract discussion, let's have a look at how cooperative multitasking is used in python. To do so, we revisit our very first example and rewrite it in the async style:


In [None]:
import asyncio
from datetime import datetime as dt
async def task(name: str):
    """
    This function defines a fictional task that takes one second
    to complete and prints when it started and finished.
    """
    print(f"{name} started at {dt.now()}")
    await asyncio.sleep(1)
    print(f"{name} finished at {dt.now()}")

async def main():
    await asyncio.gather(task("one"), task("two"))    


await main()


Note that we prepend the `async` keyword before the function definition. 
This defines a **coroutine**, a function whose execution can be suspended and restarted at a later time. 
Be aware that calling a coroutine using the usual function invocation syntax merely creates a task to be scheduled; to run the function you need to **await** for its result using the `await` keyword. 
Whenever we `await` on another coroutine, we effectively pass (yield) the control to that particular coroutine. 
Once that task finishes, the control returns to the coroutine awaiting it.

We see an example of this in the `task` function, where we `await` `asyncio.sleep(1)`. 
This means "create a task that waits for a second and wait for it to complete".


Finally, you can see that we used the `asyncio.gather` function. This is an utility function from the [asyncio](https://docs.python.org/3/library/asyncio.html) module of the standard library that schedules a number of tasks to run. 
Under the scenes, it creates a new coroutine (technically a `Future`) that internally awaits for all the tasks to complete. That's the reason why in `main` we need to `await` it. 

<div class="alert alert-block alert-warning">
    <h4><b>Warning</b></h4> In the example above, we simply awaited <code>main</code> in the top level script. 
    This is possible because IPython code already runs in an event loop which is responsible for running asynchronous tasks. 
    In a normal python application, the entrypoint to asynchronous programming should be the function <code>asyncio.run()</code>:</br>
    <code>asyncio.run(main())</code>
</div>


As the main module (asyncio) could suggest, the main place where this style of programming is beneficial is in I/O-limited programs, for example in web or database programming, where most of the application time is spent waiting for pages to load or files to open.
Unlike multiprocessing, coroutine cannot run on more than one CPU at the same time, so there is no speedup for computation-intensive programs. 

Let's see an example where using asyncio could benefit the performance.

In [None]:
import random
from dataclasses import dataclass


def random_sentence():
    words = ["apple", "banana", "cherry", "date", "elderberry"]
    sentence_length = random.randint(5, 10)  # Choose a random sentence length
    sentence = " ".join(random.choice(words) for _ in range(sentence_length))
    return sentence.capitalize() + "."

@dataclass
class Response:
    sentence: str
    url: str
    success: bool

async def respond(url: str) -> Response:
    await asyncio.sleep(random.random())
    return {"sentence": random_sentence(), "url": url, "success": True}


async def get_url(url: str):
    print(f"Getting response from server: {url}")
    res = await respond(url)
    print(f"Response from server {url}: {res}")
    await process_url(res)
    return res

async def process_url(response: Response) -> str:
    print(f"Processing {response}")
    return response

async def main():
    res = await asyncio.gather(*[get_url(f) for f in ["url1", "url2", "url3"]])
    print(res)


await main()


In this artificial example, we ask three fictional web servers with urls "url1", "url2" and "url3" to respond. 
Their response is simulated by `respond`, which waits a random amount of time before answering. 
Because of this delay, we don't know in advance which server will be faster in processing our request.
When we receive a response, we process it using `process_url`. 

As you can see from the console output, we launch three coroutines in sequence, but the order of the responses is random and determined by the time each "server" takes to reply. 
The advantage of this approach is that we can process a server's response while we wait for the other servers to respond. 
If we did this sequentially, everything would be blocked until the server would respond. 
Instead, using couroutines we can already process the responses from the faster servers while we wait for the results from the slower server.

<div class="alert alert-block alert-warning">
    <h4><b>Warning</b></h4>If you simply call a coroutine like a function without <code>await</code>, nothing will happen. The call will return a `Future` object that 
    can be awaited on, but the task will not run until you <b>explicitly</b> use <code>await</code> on it.
</div>



<div class="alert alert-block alert-warning">
    <h4><b>Warning</b></h4>Coroutines are <b>infectious</b>. Any function that calls a coroutine becomes a couroutine itself (unless we synchronously wait on the coroutine using <code>asyncio.run()</code>). This is sometimes called the <b>function coloring problem</b>. For a good explanation of this issue, see <a href=https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/>this article</a>
</div>


## Exercises

In [None]:
%reload_ext tutorial.tests.testsuite



### Exercise 1: Counting words in a fileüå∂Ô∏èüå∂Ô∏è

Write a **parallel** function `letter_statistics` that returns the statistics of letter counts in the large file `input_file`.
This means that the function should return a **sorted** `Dict[str, int]` containing the counts for each letter in sorted order.

<div class="alert alert-block alert-info">
    <h4><b>Hints</b></h4>
    <ul>
        <li>
            You can open the file <b>read-only</b> multiple times. 
        </li>
        <li>
           To facilitate your work, we pass the size of the file (in number of characters) using the <code>size</code> argument.
        </li>
        <li>
            Using <code>seek</code> you can specify a line offset from the start of the file. Using <code>read(size)</code> you can read <code>size</code> characters only. 
        </li>
        <li>
            Write your function in the cell below inside of the <code>solution_exercise1</code> function. The function receives a <code>Path</code> object <code>input_file</code> as an input and should return a single <code>dict[str, int]</code> dictionary.
        </li>
        <li>
        Consider using the <code>collections.Counter</code> class to count the number of letters in a string.
        </li>
    </ul>
<div>


In [None]:
%reload_ext tutorial.tests.testsuite

In [None]:
%%ipytest
from pathlib import Path
from collections import Counter
from concurrent.futures import ProcessPoolExecutor
from multiprocess import Process

def solution_exercise1(input_file: Path, size: int) -> dict[str, int]:
   """Write your solution here"""
   return {"a": 1}

### Exercise 2: Find super secret server keyüå∂Ô∏èüå∂Ô∏èüå∂Ô∏è
We have a super secret server that was started with a super secret key. 
Our goal is to retrieve the key. 
The server behaves like this:
1. every time we **await** the `get_value` coroutine method,  it returns a a character. If you run:
    ```python
    import asyncio
    server = SecretServer()
    async def my_fun():
        await server.get_value()
    value = asyincio.run(my_fun())
    ```
    your `value` will contain the current character.

1. If we can get all the characters from the server, they form a sequence that is the secret password to our server:
   1. The beginning and end of the sequence are separated by the character `/`.
2. **But be careful!**: the server has a special security mechanism: 
   1. after the first time you call it, you only have a limited number of times to try and call `get_value`. 
   2. If you wait too long to get the next character, the sequence will reset. 
   3. In this case too the server will return `/` and then restart.

Your task is to write a **coroutine** that gathers all characters from the server that form a sentence delimited by `/` and use them as a password to get the secret message.

In summary:

- Complete the function `solution_exercise2`. The function should return the string containing the secret message. Write all the asynchronous code inside of `get_secret`
- The function receives an object of type `SecretServer`. 
- Using the async method `get_value` you can get the next value of the sequence
- Whenever the sequence stops or the time expires, the function returns the character `/`
- You can check if your sequence is correct using the method `check_key`

<div class="alert alert-block alert-info">
    <h4><b>Hints</b></h4>
    <ul>
        <li>
            Use <code>asyncio.gather</code> to await on multiple coroutines concurrently.
        </li>
        <li>
           To facilitate your work, assume that the key  isn't longer than 50 characters.
        </li>
        <li>
            You can check if your sequence is correct using the method `check_key`
        </li>
    </ul>
<div>



In [None]:
%%ipytest debug async
import asyncio
from tutorial.tests.test_threads import SecretServer

async def solution_exercise2(server: SecretServer) -> str:
    """Write your solution here"""
    return await server.get_value()
    
