## <b><font color='darkblue'>Preface</font></b>
([source](https://realpython.com/python-concurrency/)) <b><font size='3ptx'>Concurrency refers to the ability of a program to manage multiple tasks at once, improving performance and responsiveness.</font> It encompasses different models like threading, asynchronous tasks, and multiprocessing, each offering unique benefits and trade-offs. In Python, threads and asynchronous tasks facilitate concurrency on a single processor, while multiprocessing allows for true parallelism by utilizing multiple CPU cores</b>

<b>Understanding concurrency is crucial for optimizing programs, especially those that are I/O-bound or CPU-bound</b>. Efficient concurrency management can significantly enhance a program’s performance by reducing wait times and better utilizing system resources.

### <b><font color='darkgreen'>Agenda</font></b>
In this tutorial, you’ll learn how to:
* **Understand** the different forms of **concurrency** in Python
* **Implement multi-threaded** and asynchronous solutions **for I/O-bound tasks**.
* **Leverage multiprocessing for CPU-bound tasks** to achieve true parallelism
* **Choose the appropriate concurrency model** based on your program’s needs

## <b><font color='darkblue'>Exploring Concurrency in Python</font></b>
<b><font size='3ptx'>In this section, you’ll get familiar with the terminology surrounding concurrency.</font></b>

You’ll also learn that concurrency can take different forms depending on the problem it aims to solve. Finally, you’ll discover how the different concurrency models translate to Python.

### <b><font color='darkgreen'>What Is Concurrency?</font></b>
<b><font size='3ptx'>The dictionary definition of concurrency is simultaneous occurrence. </font></b>

In Python, the things that are occurring simultaneously are called by different names, including these:
* **Thread**
* **Task**
* **Process**

At a high level, they all refer to a sequence of instructions that run in order. You can think of them as **different trains of thought**. Each one can be stopped at certain points, and the CPU or brain that’s processing them can switch to a different one. The state of each train of thought is saved so it can be restored right where it was interrupted.

You might wonder why Python uses different words for the same concept. It turns out that threads, tasks, and processes are only the same if you view them from a high-level perspective. Once you start digging into the details, you’ll find that they all represent slightly different things. You’ll see more of how they’re different as you progress through the examples.

Now, you’ll consider the simultaneous part of that definition. You have to be a little careful because, **when you get down to the details, you’ll discover that only multiple system processes can enable Python to run these trains of thought at literally the same time**.

**In contrast, threads and asynchronous tasks always run on a single processor, which means they can only run one at a time.** They just cleverly find ways to take turns to speed up the overall process. Even though they don’t run different trains of thought simultaneously, **they still fall under the concept of concurrency**.

The way the threads, tasks, or processes take turns differs. **In a multi-threaded approach, the operating system actually knows about each thread and can interrupt it at any time to start running a different thread**. This mechanism is also true for processes. It’s called [**preemptive multitasking**](https://en.wikipedia.org/wiki/Preemption_%28computing%29#Preemptive_multitasking) since the operating system can preempt your thread or process to make the switch.

**On the other hand, asynchronous tasks use cooperative multitasking**. The tasks must cooperate with each other by announcing when they’re ready to be switched out without the operating system’s involvement. This means that the code in the task has to change slightly to make it happen.

The benefit of doing this extra work upfront is that you always know where your task will be swapped out, making it easier to reason about the flow of execution. A task won’t be swapped out in the middle of a Python statement unless that statement is appropriately marked. You’ll see later how this can simplify parts of your design.

### <b><font color='darkgreen'>What Is Parallelism?</font></b>
<b><font size='3ptx'>So far, you’ve looked at concurrency that happens on a single processor. What about all of those CPU cores your cool, new laptop has?</font> How can you make use of them in Python? The answer is to execute separate processes!</b>

<b>A process can be thought of as almost a completely different program, though technically, it’s usually defined as a collection of resources including memory, [file handles](https://en.wikipedia.org/wiki/File_descriptor), and things like that</b>. One way to think about it is that each process runs in its own Python interpreter.

<b>Because they’re different processes, each of your trains of thought in a program leveraging multiprocessing can run on a different CPU core</b>. Running on a different core means that they can actually run at the same time, which is fabulous. There are some complications that arise from doing this, but Python does a pretty good job of smoothing them over most of the time.

Now that you have an idea of what <b>concurrency</b> and <b>parallelism</b> are, you can review their differences and then determine which Python modules support them:

|Python Module  |CPU |Multitasking|Switching Decision                                                   |
|---------------|----|------------|---------------------------------------------------------------------|
|<b><a href='https://docs.python.org/3/library/asyncio.html'>asyncio</a></b>        |One |Cooperative |The tasks decide when to give up control.                            |
|<b><a href='https://docs.python.org/3/library/threading.html'>threading</a></b>      |One |Preemptive  |The operating system decides when to switch tasks external to Python.|
|<b><a href='https://docs.python.org/3/library/multiprocessing.html'>multiprocessing</a></b>|Many|Preemptive  |The processes all run at the same time on different processors.      |



Each of the corresponding types of concurrency can be useful in its own way. You’ll now take a look at what types of programs they can help you speed up.

### <b><font color='darkgreen'>When Is Concurrency Useful?</font></b>
Concurrency can make a big difference for two types of problems:
1. <b><a href='https://en.wikipedia.org/wiki/I/O_bound'>I/O-Bound</a></b>
2. <b>[CPU-Bound](https://en.wikipedia.org/wiki/CPU-bound)</b>

**I/O-bound problems cause your program to slow down because it frequently must wait for input or output (I/O) from some external resource.** They arise when your program is working with things that are much slower than your CPU.

Examples of things that are slower than your CPU are legion, but your program thankfully doesn’t interact with most of them. The slow things your program will interact with the most are the **file system** and **network connections**.

Here’s a diagram illustrating an I/O-bound operation:
![ui](https://files.realpython.com/media/IOBound.4810a888b457.png)


On the flip side, there are classes of programs that do significant computation without talking to the network or accessing a file. These are CPU-bound programs because the resource limiting the speed of your program is the CPU, not the network or the file system.

Here’s a corresponding diagram for a **CPU-bound program**:
![cpu-bound](https://realpython.com/cdn-cgi/image/width=1737,format=auto/https://files.realpython.com/media/CPUBound.d2d32cb2626c.png)

As you work through the examples in the following section, you’ll see that different forms of concurrency work better or worse with I/O-bound and CPU-bound programs. <b>Adding concurrency to your program introduces extra code and complications, so you’ll need to decide if the potential speedup is worth the additional effort. By the end of this tutorial, you should have enough information to start making that decision</b>.

Here’s a quick summary to clarify this concept:
|I/O-Bound Process                                                                                                 |CPU-Bound Process                                                                       |
|------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------|
|Your program spends most of its time talking to a slow device, like a network adapter, a hard drive, or a printer.|Your program spends most of its time doing CPU operations.                              |
|Speeding it up involves overlapping the times spent waiting for these devices.                                    |Speeding it up involves finding ways to do more computations in the same amount of time.|

You’ll look at I/O-bound programs first. Then, you’ll get to see some code dealing with CPU-bound programs.

## <b><font color='darkblue'>Speeding Up an I/O-Bound Program</font></b>
<b><font size='3ptx'>In this section, you’ll focus on I/O-bound programs and a common problem: downloading content over the network.</font> For this example, you’ll be downloading web pages from a few sites, but it really could be any network traffic. It’s just more convenient to visualize and set up with web pages.</b>

### <b><font color='darkgreen'>Synchronous Version</font></b>
<b><font size='3ptx'>You’ll start with a non-concurrent version of this task. </font></b>

Note that this program requires the third-party Requests library. So, you should first run the following command in an activated virtual environment:

In [2]:
# python -m pip install requests
!pip freeze | grep requests

requests==2.32.3


This version of your program doesn’t use concurrency at all:

- **`constants.py`**:
```python
import logging
import time


SITES = (
        "https://www.jython.org",
        "http://olympus.realpython.org/dice",
) * 50


def timer_decorator(func):
  def wrap():
    start_time = time.perf_counter()
    resp = func()
    duration = time.perf_counter() - start_time
    print(f'Execution time: {duration:.02f}s')
    return resp

  return wrap
```


- <font color='olive'>**`io_non_concurrent.py`**</font>:
```python
import constants
import requests


SITES = constants.SITES


@constants.timer_decorator
def download_all_sites(sites=SITES):
  with requests.Session() as session:
    for url in sites:
      with session.get(url) as response:
        print(f"Read {len(response.content)} bytes from {url}")


if __name__ == '__main__':
  download_all_sites()
```

The big problem here is that it’s relatively slow compared to the other solutions that you’re about to see. Here’s an example of what the final output might look like:
```shell
$ python io_no_concurrent.py
...
Read 272 bytes from http://olympus.realpython.org/dice
Read 10966 bytes from https://www.jython.org
Read 272 bytes from http://olympus.realpython.org/dice
Execution time: 11.42s
```

### <b><font color='darkgreen'>Multi-Threaded Version</font></b>
<b><font size='3ptx'>As you probably guessed, writing a program leveraging multithreading takes more effort.</font></b>

However, you might be surprised at how little extra effort it takes for basic cases. Here’s what the same program looks like when you take advantage of the [**concurrent.futures**](https://docs.python.org/3/library/concurrent.futures.html) and [**threading**](https://docs.python.org/3/library/threading.html) modules mentioned earlier:
- **`io_threads.py`**:

```python
import constants
import requests
import threading
import time
from concurrent.futures import ThreadPoolExecutor


SITES = constants.SITES

@constants.timer_decorator
def download_all_sites(sites=SITES):
  thread_local = threading.local()  # line 13

  def get_session_for_thread():
    if not hasattr(thread_local, "session"):
        thread_local.session = requests.Session()
    return thread_local.session

  def download_site(url):  # line 20
    session = get_session_for_thread()
    with session.get(url) as response:
        print(f"Read {len(response.content)} bytes from {url}")

  print(f'sites: {sites}')
  with ThreadPoolExecutor(max_workers=5) as executor:  # line 26
    executor.map(download_site, sites)


if __name__ == "__main__":
  download_all_sites()
```

The overall structure of your program is the same, but the highlighted lines indicate the changes you needed to make.

**On `line 26`, you created an instance of the [ThreadPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor) to manage the threads for you.** In this case, you explicitly requested five workers or threads.

Creating a [**ThreadPoolExecutor**](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor) seems like a complicated thing. But, when you break it down, you’ll end up with these three components:
- **Thread**
- **Pool**
- **Executor**

You already know about the **thread** part. That’s just the train of thought mentioned earlier. The **pool** portion is where it starts to get interesting. This object is going to create [a pool of threads](https://en.wikipedia.org/wiki/Thread_pool), each of which can run concurrently. Finally, the **executor** is the part that’s going to control how and when each of the threads in the pool will run. It’ll execute the request in the pool.

The standard library implements [**ThreadPoolExecutor**](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor) as a [**context manager**](https://realpython.com/python-with-statement/), so you can use the `with` syntax to manage creating and freeing the pool of [**threading.Thread**](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor) instances.

In this multi-threaded version of the program, you let the executor call `download_site()` on your behalf instead of doing it manually in a loop. The <font color='blue'>executor.map()</font> method on line 21 takes care of distributing the workload across the available threads, allowing each one to handle a different site concurrently. This method takes two arguments:
- A function to be executed on each data item, like a site address
- A collection of data items to be processed by that function

Since the function that you passed to the executor’s `.map()` method must take exactly one argument, you modified download_site() on `line 20` to only accept a URL. But how do you obtain the `session` object now?

<b>This is one of the interesting and difficult issues with threading. Because the operating system controls when your task gets interrupted and another task starts, any data shared between the threads needs to be protected or [thread-safe](https://realpython.com/python-thread-lock/) to avoid unexpected behavior or potential data corruption</b>. Unfortunately, requests.Session() isn’t thread-safe, meaning that one thread may interfere with the session while another thread is still using it.

There are several strategies for making data access thread-safe. One of them is to **use a thread-safe data structure**, such as a [**queue.Queue**](https://realpython.com/queue-in-python/#using-thread-safe-queues), [**multiprocessing.Queue**](https://realpython.com/queue-in-python/#using-multiprocessingqueue-for-interprocess-communication-ipc), or an [**asyncio.Queue**](https://realpython.com/queue-in-python/#asyncioqueue). These objects use low-level primitives like [**lock objects**](https://docs.python.org/3/library/threading.html#lock-objects) **to ensure that only one thread can access a block of code or a bit of memory at the same time.** You’re using this strategy indirectly by way of the [**ThreadPoolExecutor**](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor) object.

**Another strategy to use here is something called [thread-local storage](https://en.wikipedia.org/wiki/Thread-local_storage)**. When you call <font color='blue'>threading.local()</font> on `line 13`, you create an object that resembles a **[global variable](https://realpython.com/python-use-global-variable-in-function/) but is specific to each individual thread.** It looks a little odd, but you only want to create one of these objects, not one for each thread. The object itself takes care of separating accesses from different threads to its attributes.

**When <font color='blue'>get_session_for_thread()</font> is called, the `session` it looks up is specific to the particular thread on which it’s running**. So each thread will create a single session the first time it calls <font color='blue'>get_session_for_thread()</font> and then will use that `session` on each subsequent call throughout its lifetime.

Okay. It’s time to put your multi-threaded program to the ultimate test:
```shell
$ python io_threads.py 
...
Read 272 bytes from http://olympus.realpython.org/dice
Read 272 bytes from http://olympus.realpython.org/dice
Read 272 bytes from http://olympus.realpython.org/dice
Read 272 bytes from http://olympus.realpython.org/dice
Execution time: 2.75s
```

It’s fast! Remember that the non-concurrent version took more than ten seconds in the best case. Here’s what its execution timing diagram looks like:
![threading flow](https://files.realpython.com/media/Threading.3eef48da829e.png)

<b>The program uses multiple threads to have many open requests out to web sites at the same time</b>. This allows your program to overlap the waiting times and get the final result faster. Yippee! That was the goal.

Are there any problems with the multi-threaded version? Well, as you can see from the example, <b>it takes a little more code to make this happen, and you really have to give some thought to what data is shared between threads</b>.

<b>Threads can interact in ways that are subtle and hard to detect. These interactions can cause [race conditions](https://realpython.com/python-thread-lock/#race-conditions) that frequently result in random, intermittent bugs that can be quite difficult to find.</b> If you’re unfamiliar with this concept, then you might want to check out a section on [**race conditions**](https://realpython.com/python-thread-lock/#race-conditions) in another tutorial on thread safety.

### <b><font color='darkgreen'>Asynchronous Version</font></b>
<b>Running threads concurrently allowed you to cut down the total execution time of your original synchronous code by an order of magnitude. That’s already pretty remarkable, but <font size='3ptx'>you can do even better than that by taking advantage of Python’s [asyncio](https://docs.python.org/3/library/asyncio.html) module, which enables [asynchronous I/O](https://en.wikipedia.org/wiki/Asynchronous_I/O).</font></b>

Asynchronous processing is a concurrency model that’s well-suited for I/O-bound tasks—hence the name, [**asyncio**](https://docs.python.org/3/library/asyncio.html). It avoids the overhead of context switching between threads by employing the **event loop**, **non-blocking operations**, and **coroutines**, among other things. Perhaps somewhat surprisingly, the asynchronous code needs only one thread of execution to run concurrently.

**In a nutshell, the event loop controls how and when each asynchronous task gets to execute. As the name suggests, it continuously loops through your tasks while monitoring their state**. As soon as the current task starts waiting for an I/O operation to finish, the loop suspends it and immediately switches to another task. Conversely, once the expected event occurs, the loop will eventually resume the suspended task in the next iteration.

**A [coroutine](https://docs.python.org/3/glossary.html#term-coroutine) is similar to a thread but much more lightweight and cheaper to suspend or resume**. That’s what makes it possible to spawn many more coroutines than threads without a significant memory or performance overhead. This capability helps address the [**C10k problem**](https://en.wikipedia.org/wiki/C10k_problem), which involves handling ten thousand concurrent connections efficiently. But there’s a catch.

**You can’t have blocking function calls in your coroutines if you want to reap the full benefits of asynchronous programming**. A blocking call is a synchronous one, meaning that it prevents other code from running while it’s waiting for data to arrive. In contrast, **a non-blocking call can voluntarily give up control and wait to be notified when the data is ready**.

In Python, you create a **coroutine object** by calling an **asynchronous function**, also known as a [coroutine function](https://docs.python.org/3/glossary.html#term-coroutine-function). Those are defined with the [`async def`](https://docs.python.org/3/reference/compound_stmts.html#async-def) statement instead of the usual `def`. Only within the body of an asynchronous function are you allowed to use the `await` keyword, which pauses the execution of the coroutine until the awaited task is completed:
```python
import asyncio

async def main():
    await asyncio.sleep(3.5)
```

In this case, you defined `main()` as an asynchronous function that implicitly returns a coroutine object when called. Thanks to the await keyword, your coroutine makes a non-blocking call to [`asyncio.sleep()`](https://docs.python.org/3/library/asyncio-task.html#asyncio.sleep), simulating a delay of three and a half seconds. While your `main()` function awaits the wake-up event, other tasks could potentially run concurrently.

Now that you’ve got a basic understanding of what asynchronous I/O is, you can walk through the asynchronous version of the example code and figure out how it works. However, because the Requests library that you’ve been using in this tutorial is blocking, you must now switch to a non-blocking counterpart, such as [**aiohttp**](https://aiohttp.readthedocs.io/en/stable/), which was designed for Python’s [**asyncio**](https://docs.python.org/3/library/asyncio.html):

In [5]:
# python -m pip install aiohttp
!pip freeze | grep 'aiohttp'

aiohttp==3.11.18


After installing this library in your virtual environment, you can use it in the asynchronous version of the code:

* **`constants.py`**:
```python
...
def async_timer_decorator(func):
  async def wrap():
    start_time = time.perf_counter()
    resp = await func()
    duration = time.perf_counter() - start_time
    print(f'Execution time: {duration:.02f}s')

    return resp

  return wrap
```

* **`io_asyncio.py`**:
```python
import constants

import asyncio
import time

import aiohttp


SITES = constants.SITES


@constants.async_timer_decorator
async def download_all_sites(sites=SITES):
    async def download_site(url, session):
      async with session.get(url) as response:
          print(f"Read {len(await response.read())} bytes from {url}")

    async with aiohttp.ClientSession() as session:
        tasks = [download_site(url, session) for url in sites]
        await asyncio.gather(*tasks, return_exceptions=True)


if __name__ == '__main__':
  asyncio.run(download_all_sites())
```

This version looks strikingly similar to the synchronous one, which is yet another advantage of [**asyncio**](https://docs.python.org/3/library/asyncio.html). It’s a double-edged sword, though. While it arguably makes your concurrent code easier to reason about than the multi-threaded version, asyncio is far from easy when you get into more complex scenarios.

And, it’s really fast. The asynchronous version is the fastest of them all by a good margin:
```shell
$ python io_asyncio.py
...
Read 272 bytes from http://olympus.realpython.org/dice
Read 272 bytes from http://olympus.realpython.org/dice
Read 272 bytes from http://olympus.realpython.org/dice
Execution time: 0.67s
```

The execution timing diagram looks quite similar to what’s happening in the multi-threaded example. It’s just that the I/O requests are all done by the same thread:
![async flow](https://files.realpython.com/media/Asyncio.31182d3731cf.png)


**There’s a common argument that having to add `async` and `await` in the proper locations is an extra complication**. To a small extent, that’s true. The flip side of this argument is that it forces you to think about when a given task will get swapped out, which can help you create a better design.

**The scaling issue also looms large here**. Running the multi-threaded example with a thread for each site is noticeably slower than running it with a handful of threads. Running the asyncio example with hundreds of tasks doesn’t slow it down at all.

There are a couple of issues with [**asyncio**](https://docs.python.org/3/library/asyncio.html) at this point. **You need special asynchronous versions of libraries to gain the full advantage of [asyncio](https://docs.python.org/3/library/asyncio.html).** Had you just used Requests for downloading the sites, it would’ve been much slower because Requests isn’t designed to notify the event loop that it’s blocked. This issue is becoming less significant as time goes on and more libraries embrace [**asyncio**](https://docs.python.org/3/library/asyncio.html).

**Another more subtle issue is that all the advantages of cooperative multitasking get thrown away if one of the tasks doesn’t cooperate.** A minor mistake in code can cause a task to run off and hold the processor for a long time, starving other tasks that need running. There’s no way for the event loop to break in if a task doesn’t hand control back to it.

With that in mind, you can step up to a radically different approach to concurrency using multiple processes.

### <b><font color='darkgreen'>Process-Based Version</font></b>
**Up to this point, all of the examples of concurrency in this tutorial ran only on a single CPU or core in your computer**. The reasons for this have to do with the current design of [**CPython**](https://realpython.com/cpython-source-code-guide/) and something called the [**Global Interpreter Lock**](https://realpython.com/python-gil/), or GIL.

This tutorial won’t dive into the hows and whys of the GIL. It’s enough for now to know that the **synchronous**, **multi-threaded**, and **asynchronous** versions of this example all run on a single CPU.

**The [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) module, along with the corresponding wrappers in [concurrent.futures](https://docs.python.org/3/library/concurrent.futures.html), was designed to break down that barrier and run your code across multiple CPUs**. At a high level, it does this by creating a new instance of the Python interpreter to run on each CPU and then farming out part of your program to run on it.

**As you can imagine, bringing up a separate Python interpreter is not as fast as starting a new thread in the current Python interpreter**. It’s a heavyweight operation and comes with some restrictions and difficulties, but for the correct problem, it can make a huge difference.

Unlike the previous approaches, **using multiprocessing allows you to take full advantage of the all CPUs that your cool, new computer has**.

Here’s the sample code:
- **`io_processes.py`**:
```python
import constants

import atexit
import multiprocessing
import time
from concurrent.futures import ProcessPoolExecutor

import requests


SITES = constants.SITES
session: requests.Session | None = None


@constants.timer_decorator
def main():
  download_all_sites()


def download_all_sites(sites=SITES):
  with ProcessPoolExecutor(initializer=init_process) as executor:
    executor.map(download_site, sites)


def download_site(url):
  global session
  print(f'Session: {session}')
  with session.get(url) as response:
    name = multiprocessing.current_process().name
    print(f"{name}:Read {len(response.content)} bytes from {url}")


def init_process():
  global session
  session = requests.Session()
  atexit.register(session.close)
  print('Initialized process!')


if __name__ == "__main__":
  main()
```

What happens here is that the pool creates a number of separate **Python interpreter processes** and has each one run the specified function on some of the items in the [**iterable**](https://realpython.com/python-iterators-iterables/), which in your case is the list of sites. The communication between the main process and the other processes is handled for you.

The line that creates a pool instance is worth your attention. First off, it doesn’t specify how many processes to create in the pool, although that’s an optional parameter. By default, **it’ll determine the number of CPUs in your computer and match that**. This is frequently the best answer, and it is in your case.

For an I/O-bound problem, increasing the number of processes won’t make things faster. It’ll actually slow things down because the cost of setting up and tearing down all those processes is larger than the benefit of doing the I/O requests in parallel.

Next, you have the initializer part of that call. **Remember that each process in our pool has its own memory space. That means they can’t easily share things like a `session` object. You don’t want to create a new `Session` instance each time the function is called—you want to create one for each process.**

The `initializer` function parameter is built for just this case. There’s no way to pass a return value back from the `initializer` to `download_site()`, but you can initialize a global `session` variable to hold the single `session` for each process. **Because each process has its own memory space, the global for each one will be different**.

While this version takes full advantage of the CPU power in your computer, the resulting performance is surprisingly underwhelming:
```shell
$ python io_processes.py
...
ForkProcess-7:Read 272 bytes from http://olympus.realpython.org/dice
ForkProcess-3:Read 272 bytes from http://olympus.realpython.org/dice
ForkProcess-4:Read 272 bytes from http://olympus.realpython.org/dice
Execution time: 1.91s
```

The execution timing diagram for this code looks like this:
![multi-proc flow](https://realpython.com/cdn-cgi/image/width=1893,format=auto/https://files.realpython.com/media/MProc.7cf3be371bbc.png)

There are a few separate processes executing in parallel. The corresponding diagrams of each one of them resemble the non-concurrent version you saw at the beginning of this tutorial.

I/O-bound problems aren’t really why multiprocessing exists. You’ll see more as you step into the next section and look at CPU-bound examples.

## <b><font color='darkblue'>Speeding Up a CPU-Bound Program</font></b>
<b><font size='3ptx'>It’s time to shift gears here a little bit. The examples so far have all dealt with an I/O-bound problem. Now, you’ll look into a CPU-bound problem. </font> As you learned earlier, an I/O-bound problem spends most of its time waiting for external operations to complete, such as network calls. In contrast, a CPU-bound problem performs fewer I/O operations, and its total execution time depends on how quickly it can process the required data.</b>

For the purposes of this example, you’ll use a somewhat silly function to create a piece of code that takes a long time to run on the CPU. This function computes the [n-th Fibonacci number](https://realpython.com/fibonacci-sequence-python/) using the [recursive](https://realpython.com/python-recursion/) approach:
```python
>>> def fib(n):
...     return n if n < 2 else fib(n - 2) + fib(n - 1)
...
>>> for n in range(1, 11):
...     print(f"fib({n:>2}) = {fib(n):,}")
...
fib( 1) = 1
fib( 2) = 1
fib( 3) = 2
fib( 4) = 3
fib( 5) = 5
fib( 6) = 8
fib( 7) = 13
fib( 8) = 21
fib( 9) = 34
fib(10) = 55
```

Notice how quickly the resulting values grow as the function computes higher Fibonacci numbers. The recursive nature of this implementation leads to many repeated calculations of the same numbers, which requires substantial processing time. That’s what makes this such a convenient example of a CPU-bound task.

Remember, this is just a placeholder for your code that actually does something useful and requires lengthy processing, like computing the roots of equations or [sorting](https://realpython.com/sorting-algorithms-python/) a large data structure.

### <b><font color='darkgreen'>Synchronous Version</font></b>
First off, you can look at the non-concurrent version of the example:

* **`cpu_no_concurrent.py`**:
```python
import constants
import time


@constants.timer_decorator
def main():
  for _ in range(20):
    fib(35)


def fib(n):
  return n if n < 2 else fib(n - 2) + fib(n - 1)


if __name__ == "__main__":
  main()
```

This code calls `fib(35)` twenty times in a loop. Due to the recursive nature of its implementation, **the function calls itself hundreds of millions of times! It does all of this on a single thread in a single process on a single CPU**.

The execution timing diagram looks like this:
![proc](https://files.realpython.com/media/CPUBound.d2d32cb2626c.png)

Unlike the I/O-bound examples, the CPU-bound examples are usually fairly consistent in their run times:
```shell
$ ./cpu_no_concurrent.py 
Execution time: 23.67s
```

Clearly, you can do better than this. After all, it’s all running on a single CPU with no concurrency. Next, you’ll see what you can do to improve it.

### <b><font color='darkgreen'>Multi-Threaded Version</font></b>
<b><font size='3ptx'>How much do you think rewriting this code using threads—or asynchronous tasks—will speed this up?</font></b>

If you answered “Not at all,” then give yourself a cookie. If you answered, “It will slow it down,” then give yourself two cookies.

Here’s why: In your earlier I/O-bound example, much of the overall time was spent waiting for slow operations to finish. Threads and asynchronous tasks sped this up by allowing you to overlap the waiting times instead of performing them sequentially.

**With a CPU-bound problem, there’s no waiting. The CPU is cranking away as fast as it can to finish the problem.** In Python, both threads and asynchronous tasks run on the same CPU in the same process. This means that the **one CPU is doing all of the work of the non-concurrent code plus the extra work of setting up threads or tasks**.

Here’s the code of the multi-threaded version of your CPU-bound problem:
- **`cpu_threads.py`**:
```python
import constants
import time
from concurrent.futures import ThreadPoolExecutor


@constants.timer_decorator
def main():
  def fib(n):
    return n if n < 2 else fib(n - 2) + fib(n - 1)

  with ThreadPoolExecutor(max_workers=5) as executor:
    executor.map(fib, [35] * 20)


if __name__ == "__main__":
    main()
```

Below is the output you might see when running this code:
```shell
$ ./cpu_threads.py 
Execution time: 28.71s
```

Unsurprisingly, it takes a few seconds longer than the synchronous version.

Okay. At this point, you should know what to expect from the asynchronous version of a CPU-bound problem. But for completeness, you’ll now test how it stacks up against the others.

### <b><font color='darkgreen'>Asynchronous Version</font></b>
Implementing the asynchronous version of this CPU-bound problem involves rewriting your functions into coroutine functions with `async def` and awaiting their return values:

- **`cpu_asyncio.py`**:
```python
import asyncio
import constants
import time


async def fib(n):
  return n if n < 2 else await fib(n - 2) + await fib(n - 1)

async def async_main():
  tasks = [fib(35) for _ in range(20)]
  await asyncio.gather(*tasks, return_exceptions=True)


@constants.timer_decorator
def main():
  asyncio.run(async_main())


if __name__ == "__main__":
  main()
```

When run, this code takes over twice as long to execute as your original synchronous version and also takes longer than the multi-threaded version:
```shell
$ ./cpu_asyncio.py 
Execution time: 58.56s
```

Ironically, the asynchronous approach is the slowest for a CPU-bound problem, yet it was the fastest for an I/O-bound one. **Because there are no I/O operations involved here, there’s nothing to wait for. The overhead of the event loop and context switching at every single await statement slows down the total execution substantially**.

In Python, to improve the performance of a CPU-bound task like this one, you must use an alternative concurrency model. You’ll take a closer look at that now.

### <b><font color='darkgreen'>Process-Based Version</font></b>
<b><font size='3ptx'>You’ve finally reached the part where [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) really shines.</font></b>

Unlike the other concurrency models, process-based parallelism is explicitly designed to <b>share heavy CPU workloads across multiple CPUs</b>.

Here’s what the corresponding code looks like:
- **`cpu_processes.py`**:
```python
import constants
import time
from concurrent.futures import ProcessPoolExecutor


def fib(n):
  return n if n < 2 else fib(n - 2) + fib(n - 1)


@constants.timer_decorator
def main():
  with ProcessPoolExecutor() as executor:
    executor.map(fib, [35] * 20)


if __name__ == "__main__":
    main()
```

It’s almost identical to the multi-threaded version of the Fibonacci problem. You literally changed just two lines of code! Instead of using [**ThreadPoolExecutor**](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor), you replaced it with [**ProcessPoolExecutor**](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor).

**As mentioned before, the `max_workers` optional parameter to the pool’s constructor deserves some attention**. You can use it to specify how many processes you want to be created and managed in the pool. **By default, it’ll determine how many CPUs are in your machine and create a process for each one**. While this works great for your simple example, you might want to have a little more control in a production environment.

This version takes much less time, compared to the non-concurrent implementation you started with:
```shell
$ ./cpu_processes.py 
Execution time: 5.53s
```

This is much better than what you saw with the other options, making it by far the best choice for this kind of task.

Here’s what the execution timing diagram looks like:
![process](https://files.realpython.com/media/CPUMP.69c1a7fad9c4.png)

The individual tasks run alongside each other on separate CPU cores, making **parallel execution** possible.

There are some drawbacks to using multiprocessing that don’t really show up in a simple example like this one. For example, **dividing your problem into segments so each processor can operate independently can sometimes be difficult**.

**Also, many solutions require more communication between the processes**. This can add some complexity to your solution that a non-concurrent program just wouldn’t need to deal with.

## <b><font color='darkblue'>Deciding When to Use Concurrency</font></b>
<b><font size='3ptx'>You’ve covered a lot of ground here, so it might be a good time to review some of the key ideas and then discuss some decision points that will help you determine which, if any, concurrency module you want to use in your project.</font></b>

<font size='3ptx'><b>The first step of this process is deciding if you should use a concurrency module</b></font>. While the examples here make each of the libraries look pretty simple, concurrency always comes with extra complexity and can often result in bugs that are difficult to find.

Hold out on adding concurrency until you have a known performance issue and then determine which type of concurrency you need. As [**Donald Knuth**](https://en.wikipedia.org/wiki/Donald_Knuth) has said, “<b><font size='3ptx'>Premature optimization is the root of all evil (or at least most of it) in programming.</font></b>”

<b><font size='3ptx'>Once you’ve decided that you should optimize your program, figuring out if your program is I/O-bound or CPU-bound is a great next step.</font></b> Remember that I/O-bound programs are those that spend most of their time waiting for something to happen, while CPU-bound programs spend their time processing data or crunching numbers as fast as they can.

As you saw, <font size='3pt'><b>CPU-bound problems only really benefit from using process-based concurrency in Python</b></font>. Multithreading and asynchronous I/O don’t help this type of problem at all.

<b><font size='3ptx'>For I/O-bound problems, there’s a general rule of thumb in the Python community: “Use asyncio when you can, threading or concurrent.futures when you must.”</font></b> [**asyncio**](https://docs.python.org/3/library/asyncio.html) can provide the best speed-up for this type of program, but sometimes you’ll require critical libraries that haven’t been ported to take advantage of asyncio. Remember that any task that doesn’t give up control to the event loop will block all of the other tasks.

## <b><font color='darkblue'>Conclusions</font></b>
<b><font size='3ptx'>You’ve learned about concurrency in Python and how it can enhance the performance and responsiveness of your programs.</font></b>

In this tutorial, you’ve learned how to:
* <b>Understand</b> the different forms of concurrency in Python
* **Implement** multi-threaded and asynchronous solutions for I/O-bound tasks
* **Leverage** multiprocessing for CPU-bound tasks to achieve true parallelism
* **Choose** the appropriate concurrency model based on your program’s needs

<b><font size='3ptx'>With these skills, you’re now equipped to analyze your Python programs and apply concurrency effectively to tackle performance bottlenecks.</font></b> Whether optimizing a [**web scraper**](https://realpython.com/beautiful-soup-web-scraper-python/) or a data processing pipeline, you can confidently select the best concurrency model to enhance your application’s performance.