# Parallelism and concurrency in Python

## Introduction
There are many cases where we could execute multiple tasks in parallel or switch between tasks while we wait for some time consuming task to be completed.
You can just think of examples from daily life to see why: 

- While your soup cooks on the stove, you start washing your dishes
- You check the news while drinking your morning coffee
- You work on your python course exercises while attending a video call

Indeed, the language of computing is so engrained in many of us that today we refer to these sort of behaviors as *multitasking*, an expression borrowed from computer science.

Because in computer science we like to be precise, let us define these terms better.

## Parallelism Vs. concurrency

### Parallelism
When we have two or more tasks *running and progressing simultaneously*, we can talk about **parallelism**. Think for example of the situation of paying at the supermarket where there are multiple lines: more than one customer can pay their purchases at the same time

### Concurrency
When two or more tasks run in overlapping time periods (but **not necessarily simultaneously**) instead of sequentially, we say that their execution is **concurrent**.
This is the typical human multitasking, where we work on multiple tasks in a time period, but we must switch between them to be able to perform them correctly.
For example, we sit in a meeting, listen passively while working on our python program and stop working on our code to answer a question directed to us.



The image below can help you understanding the difference between concurrent and parallel work.
<figure>
  <img
  src="../../images/concurrency_vs_parallelism.jpg"
  height="400px"
  alt="The beautiful MDN logo.">
  <figcaption>A simple time diagram illustrating the difference between parallelism and concurrency (source: https://openclassrooms.com/en/courses/5684021-scale-up-your-code-with-java-concurrency/5684028-identify-the-advantages-of-concurrency-and-parallelism)</figcaption>
</figure>


### Quiz: parallel or not
For each of these real-life examples, determine if the tasks are executed in parallel or not

- One cashier serves two lines of people in a store
- A swimming pool offers multiple shower stalls 
- Multiple people take turns drinking from a cup


## Parallelism in python
By default, in python tasks do not run in parallel. Consider this example:



In [2]:
from datetime import datetime as dt
from time import sleep

def task(name: str):
    """
    This function defines a fictional task that takes one second
    to complete and prints when it started and finished.
    """
    print(f"{name} started at {dt.now()}")
    sleep(1)
    print(f"{name} finished at {dt.now()}")



def two_tasks():
    task("First task")
    task("Second task")


two_tasks()


First task started at 2023-10-31 09:09:00.511459
First task finished at 2023-10-31 09:09:01.511561
Second task started at 2023-10-31 09:09:01.511673
Second task finished at 2023-10-31 09:09:02.511766


We see that the first task to be started (`First task`) finished before the second one could start. This is the normal sequential  computational model we are used to when we first learn programming. However, in python we can introduce **parallelism** by using the [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) module. Using this module, we can execute code in different operating system processes. Because modern computers have multicore CPU who can execute multiple processes in parallel, this means that your code will run in parallel.

### High-level interface: Process pools

Let's rewrite our example from before using [`multiprocessing.Pool`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool) which executes jobs on a pool of shared processes.  


In [5]:
from datetime import datetime as dt
from time import sleep
from multiprocessing import Pool

def task(name: str):
    """
    This function defines a fictional task that takes one second
    to complete and prints when it started and finished.
    """
    print(f"{name} started at {dt.now()}")
    sleep(1)
    print(f"{name} finished at {dt.now()}")



def two_tasks():
    with Pool(3) as p:
        p.map(task, ["First task", "Second task"])


for i in range(10):
    two_tasks()

Second task started at 2023-10-31 09:17:18.444387First task started at 2023-10-31 09:17:18.444386

First task finished at 2023-10-31 09:17:19.446255Second task finished at 2023-10-31 09:17:19.446431

First task started at 2023-10-31 09:17:19.469125Second task started at 2023-10-31 09:17:19.469224

Second task finished at 2023-10-31 09:17:20.473578First task finished at 2023-10-31 09:17:20.473757

First task started at 2023-10-31 09:17:20.499679Second task started at 2023-10-31 09:17:20.499744

First task finished at 2023-10-31 09:17:21.504205Second task finished at 2023-10-31 09:17:21.504349

Second task started at 2023-10-31 09:17:21.531218First task started at 2023-10-31 09:17:21.531187

Second task finished at 2023-10-31 09:17:22.535944First task finished at 2023-10-31 09:17:22.536560

Second task started at 2023-10-31 09:17:22.559639First task started at 2023-10-31 09:17:22.559555

Second task finished at 2023-10-31 09:17:23.564162First task finished at 2023-10-31 09:17:23.565073



We use `Pool` as a [context manager](https://book.pythontips.com/en/latest/context_managers.html) and use the `map` method of the pool object to call the function `task` with a list of arguments. Internally, this will create and run a separate process for each value in the list.

As you can see from the console output, the two task not only run simultaneously (**concurrently**) but also in parallel. This output highlights quite well one problem with concurrent computations: the order of completion is **non-deterministic**. We cannot know a priori which process will be started first and which process will complete first. If the order of the results is important, you need to make sure to send and return some sort of identifier with each job, so that you can reconstruct the right order.

However, if we use `map`, it takes care of managing the order of tasks automatically:

In [13]:
from datetime import datetime as dt
from time import sleep
from multiprocessing import Pool

def increment(number: int) -> int:
    """
    This function increments the number by 1.
    """
    name = "Process " + str(number)
    print(f"{name} started at {dt.now()}")
    result = number + 1
    print(f"{name} finished at {dt.now()}")
    return result



def two_tasks():
    with Pool(3) as p:
        res = p.map(increment, range(10))
    print(res)



two_tasks()

Process 2 started at 2023-10-31 09:38:22.802498Process 1 started at 2023-10-31 09:38:22.802352Process 0 started at 2023-10-31 09:38:22.802282


Process 1 finished at 2023-10-31 09:38:22.804046Process 0 finished at 2023-10-31 09:38:22.804153Process 2 finished at 2023-10-31 09:38:22.804051


Process 3 started at 2023-10-31 09:38:22.804730Process 4 started at 2023-10-31 09:38:22.804826Process 5 started at 2023-10-31 09:38:22.805063


Process 4 finished at 2023-10-31 09:38:22.805455Process 5 finished at 2023-10-31 09:38:22.805627
Process 3 finished at 2023-10-31 09:38:22.805599Process 6 started at 2023-10-31 09:38:22.805979


Process 6 finished at 2023-10-31 09:38:22.806393Process 8 started at 2023-10-31 09:38:22.806500Process 7 started at 2023-10-31 09:38:22.806376
Process 8 finished at 2023-10-31 09:38:22.806961


Process 9 started at 2023-10-31 09:38:22.807409Process 7 finished at 2023-10-31 09:38:22.807244

Process 9 finished at 2023-10-31 09:38:22.807992
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10

### Low-level interface: Process, run, join and deadlocks

If you instead want more control over the execution of processes, you can directly create processes using the [`Process`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process) object. This offers several methods, primarily:
- `run()`: by default, it runs the callable object with the argument passed at the process creation time. 
    <div class="alert alert-block alert-warning">
        <h4><b>Warning</b></h4> The <code>run</code> method is <b>blocking</b> and will just execute the function in the current python process, blocking it until the execution finishes.
    </div>
- `start()`: it will start the calculation defined by `run()` in a separate process.
- `join()`: this methods blocks the python interpreter process until the task defined by the owning `Process` finishes. A process cannot join itself because this would cause a **deadlock**. This is a situation where there is a cycling dependency between some waiting resources. In real life, imagine the situation of two friends waiting for each other to call before going out. 




### Higher level interface
In reality, instead of 