# Multiprocessing

Viri:
- [multiprocessing — Manage Processes Like Threads](https://pymotw.com/3/multiprocessing/index.html)
- [Multiprocessing in Python](https://www.educative.io/courses/python-201-interactively-learn-advanced-concepts-in-python-3/gx2qKN97D1j)

The multiprocessing module was added to Python in version 2.6. It was originally defined in PEP 371 by Jesse Noller and Richard Oudkerk. The multiprocessing module allows you to spawn processes in much that same manner than you can spawn threads with the threading module. The idea here is that because you are now spawning processes, you can avoid the Global Interpreter Lock (GIL) and take full advantages of multiple processors on a machine.

The multiprocessing package also includes some APIs that are not in the threading module at all. For example, there is a neat Pool class that you can use to parallelize executing a function across multiple inputs. We will be looking at Pool in a later section. We will start with the multiprocessing module’s Process class.

## Getting Started With Multiprocessing

The simplest way to spawn a second process is to instantiate a Process object with a target function and call start() to let it begin working.

In [4]:
# multiprocessing_simple.py
import multiprocessing

def worker():
    """worker function"""
    print('Worker')


if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

Worker
Worker
Worker
Worker
Worker


The output includes the word “Worker” printed five times, although it may not come out entirely clean, depending on the order of execution, because each process is competing for access to the output stream.

It usually more useful to be able to spawn a process with arguments to tell it what work to do. Unlike with threading, in order to pass arguments to a multiprocessing Process the arguments must be able to be serialized using pickle. This example passes each worker a number to be printed.

In [5]:
# multiprocessing_simpleargs.py
import multiprocessing


def worker(num):
    """thread worker function"""
    print('Worker:', num)


if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker, args=(i,))
        jobs.append(p)
        p.start()

Worker: 0
Worker: 2
Worker: 1
Worker: 4
Worker: 3


The Process class is very similar to the threading module’s Thread class. Let’s try creating a series of processes that call the same function and see how that works:

In [1]:
# example_get_pid.py
import os

from multiprocessing import Process


def doubler(number):
    """
    A doubling function that can be used by a process
    """
    result = number * 2
    proc = os.getpid()
    print(f'{number} doubled to {result} by process id: {proc}')

if __name__ == '__main__':
    numbers = [5, 10, 15, 20, 25]
    procs = []

    for index, number in enumerate(numbers):
        proc = Process(target=doubler, args=(number,))
        procs.append(proc)
        proc.start()

    for proc in procs:
        proc.join()

10 doubled to 20 by process id: 64
5 doubled to 10 by process id: 63
20 doubled to 40 by process id: 70
15 doubled to 30 by process id: 67
25 doubled to 50 by process id: 73


> One difference between the threading and multiprocessing examples is the extra protection for `__main__` used in the multiprocessing examples. Due to the way the new processes are started, **the child process needs to be able to import the script containing the target function**. Wrapping the main part of the application in a check for `__main__` **ensures that it is not run recursively in each child as the module is imported**. Another approach is to import the target function from a separate script.

Depending on the platform, multiprocessing supports three ways to start a process. These start methods are:
- `spawn`: The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process objects run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver. *Available on Unix and Windows. The default on Windows and macOS.*
- `fork`: The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic. *Available on Unix only. The default on Unix.*
- `forkserver`:  When the program starts and selects the forkserver start method, a server process is started. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process. The fork server process is single threaded so it is safe for it to use os.fork(). No unnecessary resources are inherited. *Available on Unix platforms which support passing file descriptors over Unix pipes.*

For this example, we import Process and create a doubler function. Inside the function, we double the number that was passed in. We also use Python’s os module to get the current process’s ID (or pid). This will tell us which process is calling the function. Then in the block of code at the bottom, we create a series of Processes and start them. The very last loop just calls the `join()` method on each process, which tells Python to wait for the process to terminate. If you need to stop a process, you can call its `terminate()` method.

<hr>

Passing arguments to identify or name the process is cumbersome, and unnecessary. Each Process instance has a name with a default value that can be changed as the process is created. Naming processes is useful for keeping track of them, especially in applications with multiple types of processes running simultaneously.

Sometimes it’s nicer to have a more human readable name for your process though. Fortunately, the Process class does allow you to access the name of your process. Let’s take a look:

In [3]:
# multiprocessing_names.py
import os

from multiprocessing import Process, current_process


def doubler(number):
    """
    A doubling function that can be used by a process
    """
    result = number * 2
    proc_name = current_process().name
    print(f'{number} doubled to {result} by: {proc_name}')

if __name__ == '__main__':
    numbers = [5, 10, 15, 20, 25]
    procs = []
    proc = Process(target=doubler, args=(5,))

    for index, number in enumerate(numbers):
        proc = Process(target=doubler, args=(number,))
        procs.append(proc)
        proc.start()

    proc = Process(target=doubler, name='Test', args=(2,))
    proc.start()
    procs.append(proc)

    for proc in procs:
        proc.join()

10 doubled to 20 by: Process-15
5 doubled to 10 by: Process-14
15 doubled to 30 by: Process-16
20 doubled to 40 by: Process-17
25 doubled to 50 by: Process-18
2 doubled to 4 by: Test


This time around, we import something extra: current_process. The current_process is basically the same thing as the threading module’s current_thread. We use it to grab the name of the thread that is calling our function. You will note that for the first five processes, we don’t set a name. Then for the sixth, we set the process name to "Test". Let’s see what we get for output:

The output demonstrates that the multiprocessing module assigns a number to each process as a part of its name by default. Of course, when we specify a name, a number isn’t going to get added to it.

## Daemon Processes

By default, the main program will not exit until all of the children have exited. There are times when starting a background process that runs without blocking the main program from exiting is useful, such as in services where there may not be an easy way to interrupt the worker, or where letting it die in the middle of its work does not lose or corrupt data (for example, a task that generates “heart beats” for a service monitoring tool).

To mark a process as a daemon, set its daemon attribute to True. The default is for processes to not be daemons.

In [None]:
# multiprocessing_daemon.py
import multiprocessing
import time
import sys


def daemon():
    p = multiprocessing.current_process()
    print('Starting:', p.name, p.pid)
    sys.stdout.flush()
    time.sleep(2)
    print('Exiting :', p.name, p.pid)
    sys.stdout.flush()


def non_daemon():
    p = multiprocessing.current_process()
    print('Starting:', p.name, p.pid)
    sys.stdout.flush()
    print('Exiting :', p.name, p.pid)
    sys.stdout.flush()


if __name__ == '__main__':
    d = multiprocessing.Process(
        name='daemon',
        target=daemon,
    )
    d.daemon = True

    n = multiprocessing.Process(
        name='non-daemon',
        target=non_daemon,
    )
    n.daemon = False

    d.start()
    time.sleep(1)
    n.start()

The output does not include the “Exiting” message from the daemon process, since all of the non-daemon processes (including the main program) exit before the daemon process wakes up from its two second sleep.

The daemon process is terminated automatically before the main program exits, which avoids leaving orphaned processes running. This can be verified by looking for the process id value printed when the program runs, and then checking for that process with a command like ps.

## Waiting for Processes

To wait until a process has completed its work and exited, use the join() method.

In [None]:
# multiprocessing_daemon_join.py
import multiprocessing
import time
import sys


def daemon():
    name = multiprocessing.current_process().name
    print('Starting:', name)
    time.sleep(2)
    print('Exiting :', name)


def non_daemon():
    name = multiprocessing.current_process().name
    print('Starting:', name)
    print('Exiting :', name)


if __name__ == '__main__':
    d = multiprocessing.Process(
        name='daemon',
        target=daemon,
    )
    d.daemon = True

    n = multiprocessing.Process(
        name='non-daemon',
        target=non_daemon,
    )
    n.daemon = False

    d.start()
    time.sleep(1)
    n.start()

    d.join()
    n.join()

Since the main process waits for the daemon to exit using join(), the “Exiting” message is printed this time.

By default, join() blocks indefinitely. It is also possible to pass a timeout argument (a float representing the number of seconds to wait for the process to become inactive). If the process does not complete within the timeout period, join() returns anyway.

In [None]:
# multiprocessing_daemon_join_timeout.py
import multiprocessing
import time
import sys


def daemon():
    name = multiprocessing.current_process().name
    print('Starting:', name)
    time.sleep(2)
    print('Exiting :', name)


def non_daemon():
    name = multiprocessing.current_process().name
    print('Starting:', name)
    print('Exiting :', name)


if __name__ == '__main__':
    d = multiprocessing.Process(
        name='daemon',
        target=daemon,
    )
    d.daemon = True

    n = multiprocessing.Process(
        name='non-daemon',
        target=non_daemon,
    )
    n.daemon = False

    d.start()
    n.start()

    d.join(1)
    print('d.is_alive()', d.is_alive())
    n.join()

Since the timeout passed is less than the amount of time the daemon sleeps, the process is still “alive” after join() returns.

## Terminating Processes

Although it is better to use the poison pill method of signaling to a process that it should exit (see Passing Messages to Processes, later in this chapter), if a process appears hung or deadlocked it can be useful to be able to kill it forcibly. Calling terminate() on a process object kills the child process.

In [None]:
# multiprocessing_terminate.py
import multiprocessing
import time


def slow_worker():
    print('Starting worker')
    time.sleep(0.1)
    print('Finished worker')


if __name__ == '__main__':
    p = multiprocessing.Process(target=slow_worker)
    print('BEFORE:', p, p.is_alive())

    p.start()
    print('DURING:', p, p.is_alive())

    p.terminate()
    print('TERMINATED:', p, p.is_alive())

    p.join()
    print('JOINED:', p, p.is_alive())

It is important to join() the process after terminating it in order to give the process management code time to update the status of the object to reflect the termination.

> Terminate the process. On Unix this is done using the SIGTERM signal; on Windows TerminateProcess() is used. Note that exit handlers and finally clauses, etc., will not be executed.

## Process Exit Status

The status code produced when the process exits can be accessed via the exitcode attribute. The ranges allowed are listed in the table below.

<table border="1" class="docutils" id="id10">
<caption><span class="caption-text">Multiprocessing Exit Codes</span><a class="headerlink" href="#id10" title="Permalink to this table"></a></caption>
<colgroup>
<col width="14%">
<col width="86%">
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Exit Code</th>
<th class="head">Meaning</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">==</span> <span class="pre">0</span></code></td>
<td>no error was produced</td>
</tr>
<tr class="row-odd"><td><code class="docutils literal notranslate"><span class="pre">&gt;</span> <span class="pre">0</span></code></td>
<td>the process had an error, and exited with that code</td>
</tr>
<tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">&lt;</span> <span class="pre">0</span></code></td>
<td>the process was killed with a signal of <code class="docutils literal notranslate"><span class="pre">-1</span> <span class="pre">*</span> <span class="pre">exitcode</span></code></td>
</tr>
</tbody>
</table>

In [6]:
# multiprocessing_exitcode.py
import multiprocessing
import sys
import time


def exit_error():
    sys.exit(1)


def exit_ok():
    return


def return_value():
    return 1


def raises():
    raise RuntimeError('There was an error!')


def terminated():
    time.sleep(3)


if __name__ == '__main__':
    jobs = []
    funcs = [
        exit_error,
        exit_ok,
        return_value,
        raises,
        terminated,
    ]
    for f in funcs:
        print('Starting process for', f.__name__)
        j = multiprocessing.Process(target=f, name=f.__name__)
        jobs.append(j)
        j.start()

    jobs[-1].terminate()

    for j in jobs:
        j.join()
        print(f'{j.name}.exitcode = {j.exitcode}')

Starting process for exit_error
Starting process for exit_ok
Starting process for return_value
Starting process for raises
Starting process for terminated


Process raises:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-6-f36c71fbced5>", line 20, in raises
    raise RuntimeError('There was an error!')
RuntimeError: There was an error!


exit_error.exitcode = 1
exit_ok.exitcode = 0
return_value.exitcode = 0
raises.exitcode = 1
terminated.exitcode = -15


Processes that raise an exception automatically get an exitcode of 1.

## Logging

When debugging concurrency issues, it can be useful to have access to the internals of the objects provided by multiprocessing. There is a convenient module-level function to enable logging called `log_to_stderr()`. It sets up a logger object using logging and adds a handler so that log messages are sent to the standard error channel.

In [None]:
# multiprocessing_log_to_stderr.py
import multiprocessing
import logging
import sys


def worker():
    print('Doing some work')
    sys.stdout.flush()


if __name__ == '__main__':
    multiprocessing.log_to_stderr(logging.DEBUG)
    p = multiprocessing.Process(target=worker)
    p.start()
    p.join()

By default, the logging level is set to NOTSET so no messages are produced. Pass a different level to initialize the logger to the level of detail desired.

To manipulate the logger directly (change its level setting or add handlers), use `get_logger()`.

Logging processes is a little different than logging threads. The reason for this is that Python’s logging packages doesn’t use process shared locks, so it’s possible for you to end up with messages from different processes getting mixed up. Let’s try adding basic logging to the previous example. Here’s the code:

In [7]:
# multiprocessing_get_logger.py
import multiprocessing
import logging
import sys


def worker():
    print('Doing some work')
    sys.stdout.flush()


if __name__ == '__main__':
    multiprocessing.log_to_stderr()
    logger = multiprocessing.get_logger()
    logger.setLevel(logging.INFO)
    p = multiprocessing.Process(target=worker)
    p.start()
    p.join()

[INFO/Process-22] child process calling self.run()


Doing some work


[INFO/Process-22] process shutting down
[INFO/Process-22] process exiting with exitcode 0


The logger can also be configured through the logging configuration file API, using the name “multiprocessing”.

## Process Communication

When it comes to communicating between processes, the multiprocessing modules has two primary methods: Queues and Pipes. The Queue implementation is actually both thread and process safe. Let’s take a look at a fairly simple example that’s based on the Queue code from the previous chapter:

In [3]:
# multiprocessing_process_communication.py
from multiprocessing import Process, Queue

sentinel = -1

def creator(data, q):
    """
    Creates data to be consumed and waits for the consumer
    to finish processing
    """
    print('Creating data and putting it on the queue')
    for item in data:
        q.put(item)

def my_consumer(q):
    """
    Consumes some data and works on it

    In this case, all it does is double the input
    """
    while True:
        data = q.get()
        print('data found to be processed: {}'.format(data))
        processed = data * 2
        print(processed)

        if data is sentinel:
            break

if __name__ == '__main__':
    q = Queue()
    data = [5, 10, 13, -1]
    process_one = Process(target=creator, args=(data, q))
    process_two = Process(target=my_consumer, args=(q,))
    process_one.start()
    process_two.start()

    q.close()
    q.join_thread()

    process_one.join()
    process_two.join()

Creating data and putting it on the queue
data found to be processed: 5
10
data found to be processed: 10
20
data found to be processed: 13
26
data found to be processed: -1
-2


Here we just need to import Queue and Process. Then we two functions, one to create data and add it to the queue and the second to consume the data and process it. Adding data to the Queue is done by using the Queue’s put() method whereas getting data from the Queue is done via the get method. The last chunk of code just creates the Queue object and a couple of Processes and then runs them. You will note that we call join() on our process objects rather than the Queue itself.

## Passing Messages to Processes

As with threads, a common use pattern for multiple processes is to divide a job up among several workers to run in parallel. Effective use of multiple processes usually requires some communication between them, so that work can be divided and results can be aggregated. A simple way to communicate between processes with multiprocessing is to use a Queue to pass messages back and forth. Any object that can be serialized with pickle can pass through a Queue.

In [None]:
# multiprocessing_queue.py
import multiprocessing


class MyFancyClass:

    def __init__(self, name):
        self.name = name

    def do_something(self):
        proc_name = multiprocessing.current_process().name
        print(f'Doing something fancy in {proc_name} for {self.name}!')

def worker(q):
    obj = q.get()
    obj.do_something()


if __name__ == '__main__':
    queue = multiprocessing.Queue()

    p = multiprocessing.Process(target=worker, args=(queue,))
    p.start()

    queue.put(MyFancyClass('Fancy Dan'))

    # Wait for the worker to finish
    queue.close()
    queue.join_thread()
    p.join()

This short example only passes a single message to a single worker, then the main process waits for the worker to finish.

A more complex example shows how to manage several workers consuming data from a [JoinableQueue](https://docs.python.org/3.8/library/multiprocessing.html#multiprocessing.JoinableQueue) and passing results back to the parent process. The poison pill technique is used to stop the workers. After setting up the real tasks, the main program adds one “stop” value per worker to the job queue. When a worker encounters the special value, it breaks out of its processing loop. The main process uses the task queue’s join() method to wait for all of the tasks to finish before processing the results.

In [None]:
# multiprocessing_producer_consumer.py
import multiprocessing
import time


class Consumer(multiprocessing.Process):
    def __init__(self, task_queue, result_queue):
        multiprocessing.Process.__init__(self)
        self.task_queue = task_queue
        self.result_queue = result_queue

    def run(self):
        proc_name = self.name
        while True:
            next_task = self.task_queue.get()
            if next_task is None:
                # Poison pill means shutdown
                print(f'{proc_name}: Exiting')
                self.task_queue.task_done()
                break
            print(f'{proc_name}: {next_task}')
            answer = next_task()
            self.task_queue.task_done()
            self.result_queue.put(answer)


class Task:
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __call__(self):
        time.sleep(0.1)  # pretend to take time to do the work
        return '{self.a} * {self.b} = {product}'.format(
            self=self, product=self.a * self.b)

    def __str__(self):
        return '{self.a} * {self.b}'.format(self=self)


if __name__ == '__main__':
    # Establish communication queues
    tasks = multiprocessing.JoinableQueue()
    results = multiprocessing.Queue()

    # Start consumers
    num_consumers = multiprocessing.cpu_count() * 2
    print(f'Creating {num_consumers} consumers')
    consumers = [Consumer(tasks, results) for i in range(num_consumers)]
    for w in consumers:
        w.start()

    # Enqueue jobs
    num_jobs = 10
    for i in range(num_jobs):
        tasks.put(Task(i, i))

    # Add a poison pill for each consumer
    for i in range(num_consumers):
        tasks.put(None)

    # Wait for all of the tasks to finish
    tasks.join()

    # Start printing results
    while num_jobs:
        result = results.get()
        print('Result:', result)
        num_jobs -= 1

Although the jobs enter the queue in order, their execution is parallelized so there is no guarantee about the order they will be completed.

## Locks - Controlling Access to Resources

In situations when a single resource needs to be shared between multiple processes, a Lock can be used to avoid conflicting accesses.

In [None]:
# multiprocessing_nolock.py
import multiprocessing
import sys

def worker_with(stream):
    stream.write('Lock acquired via with\n')


def worker_no_with(stream):
    stream.write('Lock acquired directly\n')
        
        
if __name__ == '__main__':

    w = multiprocessing.Process(
        target=worker_with,
        args=(sys.stdout,),
    )
    nw = multiprocessing.Process(
        target=worker_no_with,
        args=(sys.stdout,),
    )

    w.start()
    nw.start()

    w.join()
    nw.join()

In [None]:
# multiprocessing_lock.py
import multiprocessing
import sys

def worker_with(lock, stream):
    with lock:
        stream.write('Lock acquired via with\n')


def worker_no_with(lock, stream):
    lock.acquire()
    try:
        stream.write('Lock acquired directly\n')
    finally:
        lock.release()
        
        
if __name__ == '__main__':
    lock = multiprocessing.Lock()
    w = multiprocessing.Process(
        target=worker_with,
        args=(lock, sys.stdout),
    )
    nw = multiprocessing.Process(
        target=worker_no_with,
        args=(lock, sys.stdout),
    )

    w.start()
    nw.start()

    w.join()
    nw.join()

In this example, the messages printed to the console may be jumbled together if the two processes do not synchronize their access of the output stream with the lock.

## Process Pools

https://docs.python.org/3.8/library/multiprocessing.html#module-multiprocessing.pool

The Pool class can be used to manage a fixed number of workers for simple cases where the work to be done can be broken up and distributed between workers independently. The return values from the jobs are collected and returned as a list. The pool arguments include the number of processes and a function to run when starting the task process (invoked once per child).

In [None]:
# multiprocessing_pool.py
import multiprocessing


def do_calculation(data):
    return data * 2


def start_process():
    print('Starting', multiprocessing.current_process().name)


if __name__ == '__main__':
    inputs = list(range(10))
    print('Input   :', inputs)

    builtin_outputs = map(do_calculation, inputs)
    print('Built-in:', builtin_outputs)

    pool_size = multiprocessing.cpu_count()
    
    with multiprocessing.Pool(processes=pool_size, initializer=start_process,) as pool:
        pool_outputs = pool.map(do_calculation, inputs)

    print('Pool    :', pool_outputs)

The result of the map() method is functionally equivalent to the built-in map(), except that individual tasks run in parallel. Since the pool is processing its inputs in parallel, close() and join() can be used to synchronize the main process with the task processes to ensure proper cleanup.

You can also **get the result of your process** in a pool by using the apply_async method:

In [8]:
# multiprocessing_pool_get_result.py
from multiprocessing import Pool


def doubler(number):
    return number * 2

if __name__ == '__main__':
    with Pool(processes=2) as pool:
        result = pool.apply_async(doubler, (25,))
        print(result.get(timeout=1))

[INFO/ForkPoolWorker-23] child process calling self.run()
[INFO/ForkPoolWorker-24] child process calling self.run()


50


What this allows us to do is actually ask for the result of the process. That is what the get function is all about. It tries to get our result. You will note that we also have a timeout set just in case something happened to the function we were calling. We don’t want it to block indefinitely after all.

## concurrent.futures - ProcessPoolExecutor

https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor

The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned.

The `__main__` module must be importable by worker subprocesses. This means that ProcessPoolExecutor will not work in the interactive interpreter.

Calling Executor or Future methods from a callable submitted to a ProcessPoolExecutor will result in deadlock.

`class concurrent.futures.ProcessPoolExecutor(max_workers=None, mp_context=None, initializer=None, initargs=())`

An Executor subclass that executes calls asynchronously using a pool of at most max_workers processes. If max_workers is None or not given, it will default to the number of processors on the machine. If max_workers is lower or equal to 0, then a ValueError will be raised. On Windows, max_workers must be equal or lower than 61. If it is not then ValueError will be raised. If max_workers is None, then the default chosen will be at most 61, even if more processors are available. mp_context can be a multiprocessing context or None. It will be used to launch the workers. If mp_context is None or not given, the default multiprocessing context is used.

initializer is an optional callable that is called at the start of each worker process; initargs is a tuple of arguments passed to the initializer. Should initializer raise an exception, all currently pending jobs will raise a BrokenProcessPool, as well any attempt to submit more jobs to the pool.

Changed in version 3.3: When one of the worker processes terminates abruptly, a BrokenProcessPool error is now raised. Previously, behaviour was undefined but operations on the executor or its futures would often freeze or deadlock.

Changed in version 3.7: The mp_context argument was added to allow users to control the start_method for worker processes created by the pool.

Added the initializer and initargs arguments.

<hr>

For example, say you want to do something computationally intensive with Python and
utilize multiple CPU cores. I’ll use an implementation of finding the greatest common
divisor of two numbers as a proxy for a more computationally intense algorithm, like
simulating fluid dynamics with the Navier-Stokes equation.

In [1]:
def gcd(pair):
    a, b = pair
    low = min(a, b)
    for i in range(low, 0, -1):
        if a % i == 0 and b % i == 0:
            return i

Running this function in serial takes a linearly increasing amount of time because there is
no parallelism.

In [6]:
from time import time

numbers = [(1963309, 2265973), (2030677, 3814172), (1551645, 2229620), (2039045, 2020802)]

start = time()
results = list(map(gcd, numbers))
end = time()

print(f'Took {(end - start):.3f} seconds.')

Took 1.510 seconds.


Running this code on multiple Python threads will yield no speed improvement because
the GIL prevents Python from using multiple CPU cores in parallel. Here, I do the same
computation as above using the concurrent.futures module with its
ThreadPoolExecutor class and two worker threads (to match the number of CPU
cores on my computer):

In [8]:
import concurrent.futures

start = time()
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as pool:
    results = list(pool.map(gcd, numbers))

end = time()
print(f'Took {(end - start):.3f} seconds.')

Took 1.525 seconds.


It’s even slower this time because of the overhead of starting and communicating with the
pool of threads.

Now for the surprising part: By changing a single line of code, something magical
happens. If I replace the ThreadPoolExecutor with the ProcessPoolExecutor
from the concurrent.futures module, everything speeds up.

In [9]:
import concurrent.futures

start = time()
with concurrent.futures.ProcessPoolExecutor(max_workers=2) as pool:
    results = list(pool.map(gcd, numbers))

end = time()
print(f'Took {(end - start):.3f} seconds.')

Took 0.978 seconds.


Running on my dual-core machine, it’s significantly faster! How is this possible? Here’s
what the ProcessPoolExecutor class actually does (via the low-level constructs
provided by the multiprocessing module):

1. It takes each item from the numbers input data to map.
2. It serializes it into binary data using the pickle module (see Item 44: “Make
pickle Reliable with copyreg”).
3. It copies the serialized data from the main interpreter process to a child interpreter
process over a local socket.
4. Next, it deserializes the data back into Python objects using pickle in the child
process.
5. It then imports the Python module containing the gcd function.
6. It runs the function on the input data in parallel with other child processes.
7. It serializes the result back into bytes.
8. It copies those bytes back through the socket.
9. It deserializes the bytes back into Python objects in the parent process.
10. Finally, it merges the results from multiple children into a single list to return.

Although it looks simple to the programmer, the multiprocessing module and
ProcessPoolExecutor class do a huge amount of work to make parallelism possible.
In most other languages, the only touch point you need to coordinate two threads is a
single lock or atomic operation. The overhead of using multiprocessing is high
because of all of the serialization and deserialization that must happen between the parent
and child processes.

This scheme is well suited to certain types of isolated, high-leverage tasks. By isolated, I
mean functions that don’t need to share state with other parts of the program. By highleverage,
I mean **situations in which only a small amount of data must be transferred
between the parent and child processes to enable a large amount of computation**. The
greatest common denominator algorithm is one example of this, but many other
mathematical algorithms work similarly.

If your computation doesn’t have these characteristics, then the overhead of
multiprocessing may prevent it from speeding up your program through
parallelization. When that happens, multiprocessing provides more advanced
facilities for shared memory, cross-process locks, queues, and proxies. But all of these
features are very complex. It’s hard enough to reason about such tools in the memory
space of a single process shared between Python threads. Extending that complexity to
other processes and involving sockets makes this much more difficult to understand.

I suggest avoiding all parts of multiprocessing and using these features via the
simpler concurrent.futures module. You can start by using the ThreadPoolExecutor class to run isolated, high-leverage functions in threads. Later, you can move to the ProcessPoolExecutor to get a speedup. Finally, once you’ve
completely exhausted the other options, you can consider using the multiprocessing
module directly.

### ProcessPoolExecutor Example

In [9]:
# process_pool_executor_ex.py
import concurrent.futures
import math

PRIMES = [
    112272535095293,
    112582705942171,
    112272535095293,
    115280095190773,
    115797848077099,
    1099726899285419]

def is_prime(n):
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False

    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def main():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
            print(f'{number} is prime: {prime}')

if __name__ == '__main__':
    main()

[INFO/ForkProcess-25] child process calling self.run()
[INFO/ForkProcess-26] child process calling self.run()


112272535095293 is prime: True
112582705942171 is prime: True
112272535095293 is prime: True
115280095190773 is prime: True


[INFO/ForkProcess-25] process shutting down
[INFO/ForkProcess-26] process shutting down
[INFO/ForkProcess-25] process exiting with exitcode 0
[INFO/ForkProcess-26] process exiting with exitcode 0


115797848077099 is prime: True
1099726899285419 is prime: False
