<img src="../../images/banners/python-advanced.png" width="600"/>

# <img src="../../images/logos/python.png" width="23"/> Multiprocessing


## <img src="../../images/logos/toc.png" width="20"/> Table of Contents 
* [`multiprocessing` in a Nutshell](#`multiprocessing`_in_a_nutshell)
    * [`multiprocessing` Code](#`multiprocessing`_code)
    * [Why the `multiprocessing` Version Rocks](#why_the_`multiprocessing`_version_rocks)
    * [The Problems With the `multiprocessing` Version](#the_problems_with_the_`multiprocessing`_version)
* [How to Speed Up a CPU-Bound Program](#how_to_speed_up_a_cpu-bound_program)
    * [CPU-Bound Synchronous Version](#cpu-bound_synchronous_version)
    * [Threading and asyncio Versions](#threading_and_asyncio_versions)
    * [CPU-Bound multiprocessing Version](#cpu-bound_multiprocessing_version)
    * [Why the `multiprocessing` Version Rocks](#why_the_`multiprocessing`_version_rocks)
    * [The Problems With the `multiprocessing` Version](#the_problems_with_the_`multiprocessing`_version)
* [When to Use Concurrency](#when_to_use_concurrency)
* [Conclusion](#conclusion)

---

Unlike the previous approaches to downloading sites in parallel, the `multiprocessing` version of the code takes full advantage of the multiple CPUs that your cool, new computer has. Or, in my case, that my clunky, old laptop has. Let’s start with the code:



In [None]:
import requests
import multiprocessing
import time

session = None

In [None]:
def set_global_session():
    global session
    if not session:
        session = requests.Session()

In [None]:
def download_site(url):
    with session.get(url) as response:
        name = multiprocessing.current_process().name
        print(f"{name}:Read {len(response.content)} from {url}")

In [None]:
def download_all_sites(sites):
    with multiprocessing.Pool(initializer=set_global_session) as pool:
        pool.map(download_site, sites)

In [None]:
sites = [
    "https://www.jython.org",
    "http://olympus.realpython.org/dice",
] * 80
start_time = time.time()
download_all_sites(sites)
duration = time.time() - start_time
print(f"Downloaded {len(sites)} in {duration} seconds")

This is much shorter than the `asyncio` example and actually looks quite similar to the `threading` example, but before we dive into the code, let’s take a quick tour of what `multiprocessing` does for you.



<a class="anchor" id="`multiprocessing`_in_a_nutshell"></a>

## `multiprocessing` in a Nutshell

Up until this point, all of the examples of concurrency in this article run only on a single CPU or core in your computer. The reasons for this have to do with the current design of CPython and something called the Global Interpreter Lock, or GIL.



This article won’t dive into the hows and whys of the [GIL](https://realpython.com/python-gil/). It’s enough for now to know that the synchronous, `threading`, and `asyncio` versions of this example all run on a single CPU.



`multiprocessing` in the standard library was designed to break down that barrier and run your code across multiple CPUs. At a high level, it does this by creating a new instance of the Python interpreter to run on each CPU and then farming out part of your program to run on it.



As you can imagine, bringing up a separate Python interpreter is not as fast as starting a new thread in the current Python interpreter. It’s a heavyweight operation and comes with some restrictions and difficulties, but for the correct problem, it can make a huge difference.



<a class="anchor" id="`multiprocessing`_code"></a>

### `multiprocessing` Code

The code has a few small changes from our synchronous version. The first one is in `download\_all\_sites()`. Instead of simply calling `download\_site()` repeatedly, it creates a `multiprocessing.Pool` object and has it map `download\_site` to the iterable `sites`. This should look familiar from the `threading` example.



What happens here is that the `Pool` creates a number of separate Python interpreter processes and has each one run the specified function on some of the items in the iterable, which in our case is the list of sites. The communication between the main process and the other processes is handled by the `multiprocessing` module for you.



The line that creates `Pool` is worth your attention. First off, it does not specify how many processes to create in the `Pool`, although that is an optional parameter. By default, `multiprocessing.Pool()` will determine the number of CPUs in your computer and match that. This is frequently the best answer, and it is in our case.



For this problem, increasing the number of processes did not make things faster. It actually slowed things down because the cost for setting up and tearing down all those processes was larger than the benefit of doing the I/O requests in parallel.



Next we have the `initializer=set\_global\_session` part of that call. Remember that each process in our `Pool` has its own memory space. That means that they cannot share things like a `Session` object. You don’t want to create a new `Session` each time the function is called, you want to create one for each process.



The `initializer` function parameter is built for just this case. There is not a way to pass a return value back from the `initializer` to the function called by the process `download\_site()`, but you can initialize a global `session` variable to hold the single session for each process. Because each process has its own memory space, the global for each one will be different.



That’s really all there is to it. The rest of the code is quite similar to what you’ve seen before.



<a class="anchor" id="why_the_`multiprocessing`_version_rocks"></a>

### Why the `multiprocessing` Version Rocks

The `multiprocessing` version of this example is great because it’s relatively easy to set up and requires little extra code. It also takes full advantage of the CPU power in your computer. The execution timing diagram for this code looks like this:



<img src="images/speed-up-your-python-program-with-concurrency/MProc.7cf3be371bbc.png" width="600px">

<a class="anchor" id="the_problems_with_the_`multiprocessing`_version"></a>

### The Problems With the `multiprocessing` Version

This version of the example does require some extra setup, and the global `session` object is strange. You have to spend some time thinking about which variables will be accessed in each process.



Finally, it is clearly slower than the `asyncio` and `threading` versions in this example:



```sh
$ ./io\\_mp.py
 [most output skipped]
Downloaded 160 in 5.718175172805786 seconds
```

That’s not surprising, as I/O-bound problems are not really why `multiprocessing` exists. You’ll see more as you step into the next section and look at CPU-bound examples.



<a class="anchor" id="how_to_speed_up_a_cpu-bound_program"></a>

## How to Speed Up a CPU-Bound Program

Let’s shift gears here a little bit. The examples so far have all dealt with an I/O-bound problem. Now, you’ll look into a CPU-bound problem. As you saw, an I/O-bound problem spends most of its time waiting for external operations, like a network call, to complete. A CPU-bound problem, on the other hand, does few I/O operations, and its overall execution time is a factor of how fast it can process the required data.



For the purposes of our example, we’ll use a somewhat silly function to create something that takes a long time to run on the CPU. This function computes the sum of the squares of each number from 0 to the passed-in value:



In [None]:
def cpu_bound(number):
    return sum(i * i for i in range(number))

You’ll be passing in large [numbers](https://realpython.com/python-numbers/), so this will take a while. Remember, this is just a placeholder for your code that actually does something useful and requires significant processing time, like computing the roots of equations or [sorting](https://realpython.com/sorting-algorithms-python/) a large data structure.



<a class="anchor" id="cpu-bound_synchronous_version"></a>

### CPU-Bound Synchronous Version

Now let’s look at the non-concurrent version of the example:



In [None]:
import time

In [None]:
def cpu_bound(number):
    return sum(i * i for i in range(number))

In [None]:
def find_sums(numbers):
    for number in numbers:
        cpu_bound(number)

In [None]:
numbers = [5_000_000 + x for x in range(20)]

start_time = time.time()
find_sums(numbers)
duration = time.time() - start_time
print(f"Duration {duration} seconds")

This code calls `cpu\_bound()` 20 times with a different large number each time. It does all of this on a single thread in a single process on a single CPU. The execution timing diagram looks like this:



<img src="images/speed-up-your-python-program-with-concurrency/CPUBound.d2d32cb2626c.png" width="600px">

Unlike the I/O-bound examples, the CPU-bound examples are usually fairly consistent in their run times. This one takes about 7.8 seconds on my machine:



```sh
$ ./cpu\\_non\\_concurrent.py
Duration 7.834432125091553 seconds
```

Clearly we can do better than this. This is all running on a single CPU with no concurrency. Let’s see what we can do to make it better.



<a class="anchor" id="threading_and_asyncio_versions"></a>

### Threading and asyncio Versions

How much do you think rewriting this code using `threading` or `asyncio` will speed this up?



If you answered “Not at all,” give yourself a cookie. If you answered, “It will slow it down,” give yourself two cookies.



Here’s why: In your I/O-bound example above, much of the overall time was spent waiting for slow operations to finish. `threading` and `asyncio` sped this up by allowing you to overlap the times you were waiting instead of doing them sequentially.



On a CPU-bound problem, however, there is no waiting. The CPU is cranking away as fast as it can to finish the problem. In Python, both threads and tasks run on the same CPU in the same process. That means that the one CPU is doing all of the work of the non-concurrent code plus the extra work of setting up threads or tasks. It takes more than 10 seconds:



```sh
$ ./cpu\\_threading.py
Duration 10.407078266143799 seconds
```

I’ve written up a `threading` version of this code and placed it with the other example code in the [GitHub repo](https://github.com/realpython/materials/tree/master/concurrency-overview) so you can go test this yourself. Let’s not look at that just yet, however.



<a class="anchor" id="cpu-bound_multiprocessing_version"></a>

### CPU-Bound multiprocessing Version

Now you’ve finally reached where `multiprocessing` really shines. Unlike the other concurrency libraries, `multiprocessing` is explicitly designed to share heavy CPU workloads across multiple CPUs. Here’s what its execution timing diagram looks like:



<img src="images/speed-up-your-python-program-with-concurrency/CPUMP.69c1a7fad9c4.png" width="600px">

Here’s what the code looks like:



In [None]:
import multiprocessing
import time

In [None]:
def cpu_bound(number):
    return sum(i * i for i in range(number))

In [None]:
def find_sums(numbers):
    with multiprocessing.Pool() as pool:
        pool.map(cpu_bound, numbers)

In [None]:
numbers = [5_000_000 + x for x in range(20)]

start_time = time.time()
find_sums(numbers)
duration = time.time() - start_time
print(f"Duration {duration} seconds")

Little of this code had to change from the non-concurrent version. You had to `import multiprocessing` and then just change from looping through the numbers to creating a `multiprocessing.Pool` object and using its `.map()` method to send individual numbers to worker-processes as they become free.



This was just what you did for the I/O-bound `multiprocessing` code, but here you don’t need to worry about the `Session` object.



As mentioned above, the `processes` optional parameter to the `multiprocessing.Pool()` constructor deserves some attention. You can specify how many `Process` objects you want created and managed in the `Pool`. By default, it will determine how many CPUs are in your machine and create a process for each one. While this works great for our simple example, you might want to have a little more control in a production environment.



Also, as we mentioned in the first section about `threading`, the `multiprocessing.Pool` code is built upon building blocks like `Queue` and `Semaphore` that will be familiar to those of you who have done multithreaded and multiprocessing code in other languages.



<a class="anchor" id="why_the_`multiprocessing`_version_rocks"></a>

### Why the `multiprocessing` Version Rocks

The `multiprocessing` version of this example is great because it’s relatively easy to set up and requires little extra code. It also takes full advantage of the CPU power in your computer.



Hey, that’s exactly what I said the last time we looked at `multiprocessing`. The big difference is that this time it is clearly the best option. It takes 2.5 seconds on my machine:



```sh
$ ./cpu\\_mp.py
Duration 2.5175397396087646 seconds
```

That’s much better than we saw with the other options.



<a class="anchor" id="the_problems_with_the_`multiprocessing`_version"></a>

### The Problems With the `multiprocessing` Version

There are some drawbacks to using `multiprocessing`. They don’t really show up in this simple example, but splitting your problem up so each processor can work independently can sometimes be difficult.



Also, many solutions require more communication between the processes. This can add some complexity to your solution that a non-concurrent program would not need to deal with.



<a class="anchor" id="when_to_use_concurrency"></a>

## When to Use Concurrency

You’ve covered a lot of ground here, so let’s review some of the key ideas and then discuss some decision points that will help you determine which, if any, concurrency module you want to use in your project.



The first step of this process is deciding if you *should* use a concurrency module. While the examples here make each of the libraries look pretty simple, concurrency always comes with extra complexity and can often result in bugs that are difficult to find.



Hold out on adding concurrency until you have a known performance issue and *then* determine which type of concurrency you need. As [Donald Knuth](https://en.wikipedia.org/wiki/Donald\_Knuth) has said, “Premature optimization is the root of all evil (or at least most of it) in programming.”



Once you’ve decided that you should optimize your program, figuring out if your program is CPU-bound or I/O-bound is a great next step. Remember that I/O-bound programs are those that spend most of their time waiting for something to happen while CPU-bound programs spend their time processing data or crunching numbers as fast as they can.



As you saw, CPU-bound problems only really gain from using `multiprocessing`. `threading` and `asyncio` did not help this type of problem at all.



For I/O-bound problems, there’s a general rule of thumb in the Python community: “Use `asyncio` when you can, `threading` when you must.” `asyncio` can provide the best speed up for this type of program, but sometimes you will require critical libraries that have not been ported to take advantage of `asyncio`. Remember that any task that doesn’t give up control to the event loop will block all of the other tasks.



<a class="anchor" id="conclusion"></a>

## Conclusion

You’ve now seen the basic types of concurrency available in Python:



- `threading`
- `asyncio`
- `multiprocessing`


You’ve got the understanding to decide which concurrency method you should use for a given problem, or if you should use any at all! In addition, you’ve achieved a better understanding of some of the problems that can arise when you’re using concurrency.



I hope you’ve learned a lot from this article and that you find a great use for concurrency in your own projects! Be sure to take our “Python Concurrency” quiz linked below to check your learning:

