# [Parallel Processing in Python](https://www.machinelearningplus.com/python/parallel-processing-python/)

__Parallel processing__ is a mode of operation where the task is executed simultaneously in multiple processors in the same computer. It is meant to reduce the overall processing time.

However, there is usually a bit of overhead when communicating between processes which can actually increase the overall time taken for small tasks instead of decreasing it.

In python, the __multiprocessing__ module is used to run independent parallel processes by using subprocesses (instead of threads). It allows you to leverage multiple processors on a machine (both Windows and Unix), which means, the processes can be run in completely separate memory locations.

### Reference
1. [Using Multiprocessing to Make Python Code Faster](https://medium.com/@urban_institute/using-multiprocessing-to-make-python-code-faster-23ea5ef996ba)
2. [Python multiprocessing](https://data-flair.training/blogs/python-multiprocessing/)

### Difference between Multi-processing and multi-threading
1.  [Geek for geeks: Difference between Multiprocessing and Multithreading](https://www.geeksforgeeks.org/difference-between-multiprocessing-and-multithreading/)
2. [Difference Between Multiprocessing and Multithreading](https://techdifferences.com/difference-between-multiprocessing-and-multithreading.html)

By the end of this tutorial you would know:

- How to structure the code and understand the syntax to enable parallel processing using multiprocessing?
- How to implement synchronous and asynchronous parallel processing?
- How to parallelize a Pandas DataFrame?
- Solve 3 different usecases with the multiprocessing.Pool() interface.

***

### How many maximum parallel processes can you run?
The maximum number of processes you can run at a time is limited by the number of processors in your computer. If you don’t know how many processors are present in the machine, the `cpu_count()` function in multiprocessing will show it.

In [1]:
import multiprocessing as mp

In [2]:
print("Number of processors: ", mp.cpu_count())

Number of processors:  4


### What is Synchronous and Asynchronous execution?
In parallel processing, there are two types of execution:

1. A __Synchronous execution__ is one the processes are completed in the same order in which it was started. This is achieved by locking the main program until the respective processes are finished.

2. A __Asynchronous execution__  doesn’t involve locking. As a result, the order of results can get mixed up but usually gets done quicker.

There are 2 main objects in multiprocessing to implement parallel execution of a function: The `Pool` Class and the `Process` Class.

1. Pool Class
    1. Synchronous execution
        - `Pool.map()` and `Pool.starmap()`
        - `Pool.apply()`
    2. Asynchronous execution
        - `Pool.map_async()` and `Pool.starmap_async()`
        - `Pool.apply_async())`

2. Process Class

Here we stick to the __Pool class__, because it is most convenient to use and serves most common practical applications.

***

### Problem Statement: Count how many numbers exist between a given range in each row
The first problem is: Given a 2D matrix (or list of lists), count how many numbers are present between a given range in each row. 



In [3]:
import numpy as np
from time import time

In [4]:
# Prepare data
np.random.RandomState(100)
arr = np.random.randint(0, 10, size=[200000, 5])
data = arr.tolist()
data[:5]

[[9, 2, 7, 7, 6],
 [1, 2, 7, 5, 1],
 [2, 4, 8, 5, 3],
 [4, 8, 2, 6, 6],
 [8, 6, 5, 4, 3]]

### Solution without parallelization
Let’s see how long it takes to compute it without parallelization. For this, we iterate the function `howmany_within_range()` (written below) to check how many numbers lie within range and returns the count.

In [5]:
def howmany_within_range(row, minimum, maximum):
    """Returns how many numbers lie within `maximum` and `minimum` in a given `row`"""
    count = 0
    for n in row:
        if minimum <= n <= maximum:
            count = count + 1
    return count

results = []
for row in data:
    results.append(howmany_within_range(row, minimum=4, maximum=8))

print(results[:10])

[3, 2, 3, 4, 4, 3, 3, 3, 2, 3]


## 1. How to parallelize any function?

The general way to parallelize any operation is to take a particular function that should be run multiple times and make it run parallelly in different processors.

To do this, you initialize a `Pool` with n number of processors and pass the function you want to parallelize to one of Pools parallization methods.

`multiprocessing.Pool()` provides the `apply()`, `map()` and `starmap()` methods to make any function run in parallel.

Nice! So what’s the difference between `apply()` and `map()`?

Both `apply` and `map` take the function to be parallelized as the main argument. But the difference is, `apply()` takes an args argument that accepts the parameters passed to the ‘function-to-be-parallelized’ as an argument, whereas, map can take only one iterable as an argument.

So, `map()` is really more suitable for simpler iterable operations but does the job faster.

We will get to `starmap()` once we see how to parallelize `howmany_within_range()` function with `apply()` and `map()`.

### 1.1 Parallelizing using Pool.apply()

Let’s parallelize the `howmany_within_range()` function using `multiprocessing.Pool()`.

In [6]:
import multiprocessing as mp

In [7]:
import time

In [8]:
start = time.time()
# Step 1: Init multiprocessing.Pool()
pool = mp.Pool(mp.cpu_count())
end = time.time()
end - start

0.07705473899841309

In [None]:
start = time.time()
# Step 2: `pool.apply` the `howmany_within_range()`
results = [pool.apply(howmany_within_range, args=(row, 4, 8)) for row in data]
end = time.time()
end - start

In [None]:


# Step 2: `pool.apply` the `howmany_within_range()`
results = [pool.apply(howmany_within_range, args=(row, 4, 8)) for row in data]

# Step 3: Don't forget to close
pool.close()    

results[:10]

### 1.2 Parallelizing using Pool.map()

`Pool.map()` accepts only one iterable as argument. So as a workaround, I modify the howmany_within_range function by setting a default to the minimum and maximum parameters to create a new `howmany_within_range_rowonly()` function so it accetps only an iterable list of rows as input. I know this is not a nice usecase of `map()`, but it clearly shows how it differs from `apply()`.

In [None]:
p = mp.Pool(3)

t = p.map(cube, range(5000))
print(t)