# Python Multiprocessing

Multiprocessing in Python allows us to execute multiple processes in parallel, enabling us to leverage multiple CPU cores on a single machine to parallelize our code. Whilst distributed computation in general is a complicated and nuanced subject particularly when we begin to consider interprocess communications or even inter-machine communications on distributed systems, it doesn't mean we can't use and benefit from it.

### Processes vs. Threads
A `Process` is an individual execution of code within a PVM and has it's own memory space. A `Thread` on the other hand shares memory with other threads and runs on the same process. Each of these will have its own benefits, generally multiprocessing is good for CPU bound tasks whilst multithreading is good for I/O bound tasks.

Specifically this is because the global interpreter lock of Python restricts code execution to one thread at a time within the Python virtual machine which limits threading for CPU bound tasks, but multiprocessing bypasses this issue.

## Sequential Execution

Let's take a look at an example of some code which might benefit from parallelization.

In [2]:
import time

def job_with_fixed_execution_time(x: int):
    """
    execution time is a fixed 1.0s
    """
    time.sleep(1)
    return x ** 2


inputs = [1, 2, 3, 4, 5]
outputs = []

for val in inputs:
    # does this computation depend on the other values though?
    output = job_with_fixed_execution_time(val)
    outputs.append(output)

print(f"{inputs} -> job -> {outputs}")

[1, 2, 3, 4, 5] -> job -> [1, 4, 9, 16, 25]


## Parallel Execution

Performing the computation of `job_with_fixed_execution_time` doesn't depend on the other inputs, this makes it a good candidate for parallelization.

Examples of other good candidates would be processing regions in geospatial data, videos in computer vision data, etc.

In [4]:
!python ../scripts/03-multiprocessing.py

Execution Time: 0.515 seconds
range(0, 12) -> job -> [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]
`map` execution took 2.095 s



In [None]:
# discuss mapping via. the scripts
import multiprocessing as mp

with mp.Pool(processes=mp.cpu_count()) as pool:
    # give jobs to pool
    outputs = pool.map(
        job_with_fixed_execution_time,
        [1, 2, 3]
    )


### Mapping

Mapping refers to the process of applying a function to multiple inputs in parallel within a pool of processes. We have multiple different variants of mapping such as `map_async`, `starmap`, `starmap_async`, `imap`, etc. each of which handly slightly different use cases.

In [3]:
# lets look at 01-multiprocessing.py

We can increase the number of process to increase the pool of workers available for the inputs to be mapped to. As a given process finishes executing its `job` it's provided the next input to start working on. 

## Inter-process Communication



## Limitations

`multiprocessing` is only suitable for parallelization on a single machine with potentially multiple CPUs, in this scenario you are limited in the resources that you can use. You can also implement communication between processes if you need you computation to do this such as in fluid simulations where you want to run a bunch of solvers at once in individual cells and then gather the results and have them interact before the next timestep.

Often in high-performance computing environments we want to leverage multiple CPUs across multiple machines (or nodes), to do this we often need our machines to be able to talk to each other and communicate their data, this can be achieved through software such as MPI (message passing interface) like OpenMPI or MPICH, alongside Python libraries like `mpi4py`.

In [13]:
# def func(x, y):
#     return (x, y)

# # func(y=2, x=1)

# x = 1
# y = 2
# cfg = {
#     "x":1,
#     "y": 2
# }

# func(x=cfg["x"], y=cfg["y"])
# func(**cfg)

# # func(**{"x": 1, "y": 2})

(1, 2)