# Multiprocessing

Multiprocessing is a Python package that allows processes to be spawned in Python. Check [the official documentation](https://docs.python.org/3/library/multiprocessing.html).

In [2]:
import time
import random
import numpy as np
from multiprocessing import Process

## Single process

To create a process from scratch, use `multiprocessing.Process` class.

- The `start()` method starts the execution;
- The `joint()` method is used to synchronise the parent process with the child processes. It allows the parent process to wait for the child processes to finish before continuing.

---

The following example runs the same function in different processes. The first time it iterates two times more iteration then the second.

In [4]:
def count(N: int, process_name: str):
    st_time = time.time()
    for i in range(N):
        ((i+10)/25)**(1/2)
    en_time = time.time()
    print(f"{process_name} if finished {en_time - st_time}")

iter = 10**8
p1 = Process(target=count, args=(iter, "first"))
p2 = Process(target=count, args=(int(iter/2), "second"))

p1.start()
p2.start()

print("Processes were started")

p1.join()
p2.join()

print("Processes were joined")

Processes were started
second if finished 2.0598533153533936
first if finished 4.119459390640259
Processes were joined


So, although we started `first` process earlier, it was executed later, confirming that we achieved parallel computation.

`print("Processes were started")` was executed immediately after the processes were started, but `print("Processes were joined")` was executed only when both processes were finished - this shows us that the main process was stuck by the `join` method of the child processes.

## Pool

`multiprocessing.Pool` is more common to use.

The following function defines a function that creates an array of 1,000,000 floats and then calculates the minimum, maximum and average values over them. In the next cell we will try to run it multiple times, just in cycle and then in multiprocesses.

In [8]:
from multiprocessing import Pool
def gen_random(_):
    my_array = [random.random() for _ in range(1_000_000)]
    return (min(my_array), max(my_array), np.mean(my_array))

First - classiscal option only cycle that starts function 10 times.

In [5]:
%%timeit
[gen_random(None) for i in range(10)]

1.08 s ± 54.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Now lets run it in 10 therads.

In [6]:
%%timeit
pool = Pool(processes=10)
results = pool.map(gen_random, [None] * 10)
pool.close()
pool.join()

511 ms ± 16.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


The increase in speed is obvious.

But let's make sure that the solution via `multiprocessing.Pool` leads to the expected results.

In [11]:
pool = Pool(processes=10)
results = pool.map(gen_random, [None] * 10)
results

[(4.888834341798542e-07, 0.9999993695614751, 0.500253121167094),
 (1.5193695950266317e-06, 0.9999995713653611, 0.5004648527204844),
 (8.338447909927993e-08, 0.9999982350763111, 0.49987126646962265),
 (3.533316917048168e-06, 0.9999988394522739, 0.5003074522538739),
 (6.854293412850154e-07, 0.9999992469319836, 0.5001294510917905),
 (7.25034506876554e-08, 0.9999993853360523, 0.5000197205058422),
 (2.1432668373400077e-07, 0.9999985629946924, 0.500037866675235),
 (3.241340156279193e-07, 0.9999995020233481, 0.4998192310331728),
 (3.97997433898567e-08, 0.9999970648221497, 0.49967546355687814),
 (1.1574031938410556e-06, 0.999999502338194, 0.4999944256310727)]