# Parallel processing via the `multiprocessing` module

CPUs with multiple cores have become the standard in the recent development of modern computer architectures and we can not only find them in supercomputer facilities but also in our desktop machines at home, and our laptops; even Apple's iPhone 5S got a 1.3 Ghz Dual-core processor in 2013.

However, the default Python interpreter was designed with simplicity in mind and has a thread-safe mechanism, the so-called "GIL" (Global Interpreter Lock). In order to prevent conflicts between threads, it executes only one statement at a time (so-called serial processing, or single-threading).

In this introduction to Python's `multiprocessing` module, we will see how we can spawn multiple subprocesses to avoid some of the GIL's disadvantages.

<br>
<br>

## Sections

- [An introduction to parallel programming using Python's `multiprocessing` module](#An-introduction-to-parallel-programming-using-Python's-`multiprocessing`-module)
    - [Multi-Threading vs. Multi-Processing](#Multi-Threading-vs.-Multi-Processing)
- [Introduction to the `multiprocessing` module](#Introduction-to-the-multiprocessing-module)
    - [The `Process` class](#The-Process-class)
        - [How to retrieve results in a particular order](#How-to-retrieve-results-in-a-particular-order)
    - [The `Pool` class](#The-Pool-class)
- [Kernel density estimation as benchmarking function](#Kernel-density-estimation-as-benchmarking-function)
    - [The Parzen-window method in a nutshell](#The-Parzen-window-method-in-a-nutshell)
    - [Sample data and `timeit` benchmarks](#Sample-data-and-timeit-benchmarks)
    - [Benchmarking functions](#Benchmarking-functions)
    - [Preparing the plotting of the results](#Preparing-the-plotting-of-the-results)
- [Results](#Results)
- [Conclusion](#Conclusion)

<br>
<br>

###  Multi-Threading vs. Multi-Processing

Depending on the application, two common approaches in parallel programming are either to run code via threads or multiple processes, respectively. If we submit "jobs" to different threads, those jobs can be pictured as "sub-tasks" of a single process and those threads will usually have access to the same memory areas (i.e., shared memory). This approach can easily lead to conflicts in case of improper  synchronization, for example, if processes are writing to the same memory location at the same time.  

A safer approach (although it comes with an additional overhead due to the communication overhead between separate processes) is to submit multiple processes to completely separate memory locations (i.e., distributed memory): Every process will run completely independent from each other.

Here, we will take a look at Python's [`multiprocessing`](https://docs.python.org/dev/library/multiprocessing.html) module and how we can use it to submit multiple processes that can run independently from each other in order to make best use of our CPU cores.

![](https://raw.githubusercontent.com/rasbt/python_reference/master/Images/multiprocessing_scheme.png)

<br>
<br>

# Introduction to the `multiprocessing` module

[[back to top](#Sections)]

The [multiprocessing](https://docs.python.org/dev/library/multiprocessing.html) module in Python's Standard Library has a lot of powerful features. If you want to read about all the nitty-gritty tips, tricks, and details, I would recommend to use the [official documentation](https://docs.python.org/dev/library/multiprocessing.html) as an entry point.  

In the following sections, I want to provide a brief overview of different approaches to show how the `multiprocessing` module can be used for parallel programming.

<br>
<br>

### The `Process` class

[[back to top](#Sections)]

The most basic approach is probably to use the `Process` class from the `multiprocessing` module.  
Here, we will use a simple queue function to generate four random strings in parallel.

In [1]:
%%file rand_string_.py

import random
import string
import os

def rand_string(length, output):
    """ Generates a random string of numbers, lower- and uppercase chars. """
    print('parent process:', os.getppid())
    print('process id:', os.getpid())
    rand_str = ''.join(random.choice(
                        string.ascii_lowercase 
                        + string.ascii_uppercase 
                        + string.digits)
                   for i in range(length))
    output.put(rand_str)

Overwriting rand_string_.py


In [2]:
import rand_string_

In [3]:
import multiprocessing as mp
import random
import string

random.seed(123)

# Define an output queue
output = mp.Queue()


# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string_.rand_string, args=(5, output)) \
             for x in range(4)]

# Run processes
for p in processes:
    p.start()

# Exit the completed processes
for p in processes:
    p.join()

# Get process results from the output queue
results = [output.get() for p in processes]

print(results)

['Y3S9B', '8IAV3', 'TpKIu', 'B5A7y']


<br>
<br>

### How to retrieve results in a particular order 

[[back to top](#Sections)]

The order of the obtained results does not necessarily have to match the order of the processes (in the `processes` list). Since we eventually use the `.get()` method to retrieve the results from the `Queue` sequentially, the order in which the processes finished determines the order of our results.  
E.g., if the second process has finished just before the first process, the order of the strings in the `results` list could have also been
`['PQpqM', 'yzQfA', 'SHZYV', 'PSNkD']` instead of `['yzQfA', 'PQpqM', 'SHZYV', 'PSNkD']`

If our application required us to retrieve results in a particular order, one possibility would be to refer to the processes' `._identity` attribute. In this case, we could also simply use the values from our `range` object as position argument. The modified code would be:

In [4]:
%%file rand_string_2.py

import random
import string

def rand_string(length, pos, output):
    """ Generates a random string of numbers, lower- and uppercase chars. """
    rand_str = ''.join(random.choice(
                        string.ascii_lowercase 
                        + string.ascii_uppercase 
                        + string.digits)
                   for i in range(length))
    output.put((pos, rand_str))

Writing rand_string_2.py


In [5]:
import rand_string_2

In [6]:
# Define an output queue
output = mp.Queue()

# define a example function
# def rand_string(length, pos, output):
#     """ Generates a random string of numbers, lower- and uppercase chars. """
#     rand_str = ''.join(random.choice(
#                         string.ascii_lowercase 
#                         + string.ascii_uppercase 
#                         + string.digits)
#                    for i in range(length))
#     output.put((pos, rand_str))

# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string_2.rand_string, args=(5, x, output)) for x in range(4)]

# Run processes
for p in processes:
    p.start()

# Exit the completed processes
for p in processes:
    p.join()

# Get process results from the output queue
results = [output.get() for p in processes]

print(results)

[(0, 'uCIwk'), (2, 'ELkQg'), (1, 'wAM9T'), (3, 'uRmzN')]


And the retrieved results would be tuples, for example, `[(0, 'KAQo6'), (1, '5lUya'), (2, 'nj6Q0'), (3, 'QQvLr')]`   
or `[(1, '5lUya'), (3, 'QQvLr'), (0, 'KAQo6'), (2, 'nj6Q0')]`

To make sure that we retrieved the results in order, we could simply sort the results and optionally get rid of the position argument:

In [7]:
results.sort()
results = [r[1] for r in results]
print(results)

['uCIwk', 'wAM9T', 'ELkQg', 'uRmzN']


**A simpler way to maintain an ordered list of results is to use the `Pool.apply` and `Pool.map` functions which we will discuss in the next section.**

<br>
<br>

### The `Pool` class

[[back to top](#Sections)]

Another and more convenient approach for simple parallel processing tasks is provided by the `Pool` class.  

There are four methods that are particularly interesting:

    - Pool.apply
    
    - Pool.map
    
    - Pool.apply_async
    
    - Pool.map_async
    
The `Pool.apply` and `Pool.map` methods are basically equivalents to Python's in-built [`apply`](https://docs.python.org/2/library/functions.html#apply) and [`map`](https://docs.python.org/2/library/functions.html#map) functions.

Before we come to the `async` variants of the `Pool` methods, let us take a look at a simple example using `Pool.apply` and `Pool.map`. Here, we will set the number of processes to 4, which means that the `Pool` class will only allow 4 processes running at the same time.

In [8]:
%%file cube_.py

def cube(x):
    return x**3

Writing cube_.py


In [9]:
import cube_

In [11]:
pool = mp.Pool(processes=4)
results = pool.map(cube_.cube, range(1,7))
print(results)

[1, 8, 27, 64, 125, 216]


The `Pool.map` and `Pool.apply` will lock the main program until all processes are finished, which is quite useful if we want to obtain results in a particular order for certain applications.   
In contrast, the `async` variants will submit all processes at once and retrieve the results as soon as they are finished. 
One more difference is that we need to use the `get` method after the `apply_async()` call in order to obtain the `return` values of the finished processes.

In [12]:
pool = mp.Pool(processes=4)
results = [pool.apply_async(cube_.cube, args=(x,)) for x in range(1,7)]
output = [p.get() for p in results]
print(output)

[1, 8, 27, 64, 125, 216]


<br>
<br>