<img src="../../images/banners/python-advanced.png" width="600"/>

# <img src="../../images/logos/python.png" width="23"/> An Intro to Threading in Python 


## <img src="../../images/logos/toc.png" width="20"/> Table of Contents 
* [Download Sites: Synchronous Version](#download_sites:_synchronous_version)
* [What Is a Thread?](#what_is_a_thread?)
* [Starting a Thread](#starting_a_thread)
    * [Daemon Threads](#daemon_threads)
    * [`join()` a Thread](#`join()`_a_thread)
* [Working With Many Threads](#working_with_many_threads)
* [Using a ThreadPoolExecutor](#using_a_threadpoolexecutor)
* [Race Conditions](#race_conditions)
    * [One Thread](#one_thread)
    * [Two Threads](#two_threads)
    * [Why This Isn’t a Silly Example](#why_this_isn’t_a_silly_example)
        * [How Does This Really Work](#how_does_this_really_work)
* [Basic Synchronization Using Lock](#basic_synchronization_using_lock)
* [Deadlock](#deadlock)
* [Threading Objects](#threading_objects)
    * [Semaphore](#semaphore)
    * [Timer](#timer)
    * [Barrier](#barrier)
* [Download Sites: Threading Version](#download_sites:_threading_version)
* [Conclusion: Threading in Python](#conclusion:_threading_in_python)

---

Let’s start by focusing on I/O-bound programs and a common problem: downloading content over the network. For our example, you will be downloading web pages from a few sites, but it really could be any network traffic. It’s just easier to visualize and set up with web pages.



<a class="anchor" id="download_sites:_synchronous_version"></a>

## Download Sites: Synchronous Version

We’ll start with a non-concurrent version of this task. Note that this program requires the [`requests`](http://docs.python-requests.org/en/master/) module. You should run `pip install requests` before running it, probably using a [virtualenv](https://realpython.com/python-virtual-environments-a-primer/). This version does not use concurrency at all:



In [None]:
import requests
import time

In [None]:
def download_site(url, session):
    with session.get(url) as response:
        print(f"Read {len(response.content)} from {url}")

In [None]:
def download_all_sites(sites):
    with requests.Session() as session:
        for url in sites:
            download_site(url, session)

In [None]:
sites = [
    "https://www.jython.org",
    "http://olympus.realpython.org/dice",
] * 80
start_time = time.time()
download_all_sites(sites)
duration = time.time() - start_time
print(f"Downloaded {len(sites)} in {duration} seconds")

As you can see, this is a fairly short program. `download_site()` just downloads the contents from a URL and prints the size. One small thing to point out is that we’re using a [`Session`](https://2.python-requests.org/en/master/user/advanced/#id1) object from `requests`.



It is possible to simply use `get()` from `requests` directly, but creating a `Session` object allows `requests` to do some fancy networking tricks and really speed things up.



`download_all_sites()` creates the `Session` and then walks through the [list](https://realpython.com/python-lists-tuples/) of sites, downloading each one in turn. Finally, it prints out how long this process took so you can have the satisfaction of seeing how much concurrency has helped us in the following examples.



The processing diagram for this program will look much like the I/O-bound diagram in the last section.



**Why the Synchronous Version Rocks**



The great thing about this version of code is that, well, it’s easy. It was comparatively easy to write and debug. It’s also more straight-forward to think about. There’s only one train of thought running through it, so you can predict what the next step is and how it will behave.



**The Problems With the Synchronous Version**



The big problem here is that it’s relatively slow compared to the other solutions we’ll provide. Here’s an example of what the final output gave on my machine:



```sh
$ ./io_non_concurrent.py
 [most output skipped]
Downloaded 160 in 14.289619207382202 seconds
```

Being slower isn’t always a big issue, however. If the program you’re running takes only 2 seconds with a synchronous version and is only run rarely, it’s probably not worth adding concurrency. You can stop here.



What if your program is run frequently? What if it takes hours to run? Let’s move on to concurrency by rewriting this program using `threading`.



Python threading allows you to have different parts of your program run concurrently and can simplify your design. If you’ve got some experience in Python and want to speed up your program using threads, then this tutorial is for you!



**In this article, you’ll learn:**



- What threads are
- How to create threads and wait for them to finish
- How to use a `ThreadPoolExecutor`
- How to avoid race conditions
- How to use the common tools that Python `threading` provides


<a class="anchor" id="what_is_a_thread?"></a>
## What Is a Thread?

A thread is a separate flow of execution. This means that your program will have two things happening at once. But for most Python 3 implementations the different threads do not actually execute at the same time: they merely appear to.



It’s tempting to think of threading as having two (or more) different processors running on your program, each one doing an independent task at the same time. That’s almost right. The threads may be running on different processors, but they will only be running one at a time. 



Getting multiple tasks running simultaneously requires a non-standard implementation of Python, writing some of your code in a different language, or using `multiprocessing` which comes with some extra overhead.



Because of the way CPython implementation of Python works, threading may not speed up all tasks. This is due to interactions with the GIL that essentially limit one Python thread to run at a time.


The Python Global Interpreter Lock or [GIL](https://wiki.python.org/moin/GlobalInterpreterLock), in simple words, is a mutex (or a lock) that allows only one thread to hold the control of the Python interpreter. You will read more about GIL and how it works in the upcoming sections. This means that only one thread can be in a state of execution at any point in time. The impact of the GIL isn’t visible to developers who execute single-threaded programs, but it can be a performance bottleneck in CPU-bound and multi-threaded code. GIL will be covered in details in the upcoming sections.




Tasks that spend much of their time waiting for external events are generally good candidates for threading. Problems that require heavy CPU computation and spend little time waiting for external events might not run faster at all. 



This is true for code written in Python and running on the standard CPython implementation. If your threads are written in C they have the ability to release the GIL and run concurrently. If you are running on a different Python implementation, check with the documentation too see how it handles threads. 



If you are running a standard Python implementation, writing in only Python, and have a CPU-bound problem, you should check out the `multiprocessing` module instead.



Architecting your program to use threading can also provide gains in design clarity. Most of the examples you’ll learn about in this tutorial are not necessarily going to run faster because they use threads. Using threading in them helps to make the design cleaner and easier to reason about.



So, let’s stop talking about threading and start using it!



<a class="anchor" id="starting_a_thread"></a>
## Starting a Thread

Now that you’ve got an idea of what a thread is, let’s learn how to make one. The Python standard library provides [`threading`](https://docs.python.org/3/library/threading.html), which contains most of the primitives you’ll see in this article. `Thread`, in this module, nicely encapsulates threads, providing a clean interface to work with them.



To start a separate thread, you create a `Thread` instance and then tell it to `.start()`:



In [None]:
import logging
import threading
import time

def thread_function(name):
    logging.info(f"Thread {name}: starting")
    time.sleep(2)
    logging.info(f"Thread {name}: finishing")

if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO, datefmt="%H:%M:%S")

    logging.info("Main    : before creating thread")
    x = threading.Thread(target=thread_function, args=(1,))
    logging.info("Main    : before running thread")
    x.start()
    logging.info("Main    : wait for the thread to finish")
    # x.join()
    logging.info("Main    : all done")

If you look around the [logging](https://realpython.com/python-logging/) statements, you can see that the `main` section is creating and starting the thread:



In [None]:
x = threading.Thread(target=thread_function, args=(1,))
x.start()

When you create a `Thread`, you pass it a function and a list containing the arguments to that function. In this case, you’re telling the `Thread` to run `thread_function()` and to pass it `1` as an argument.



For this article, you’ll use sequential integers as names for your threads. There is `threading.get_ident()`, which returns a unique name for each thread, but these are usually neither short nor easily readable.



`thread_function()` itself doesn’t do much. It simply logs some messages with a [`time.sleep()`](https://realpython.com/python-sleep/) in between them.



When you run this program as it is (with line twenty commented out), the output will look like this:



```sh
$ ./single_thread.py
Main : before creating thread
Main : before running thread
Thread 1: starting
Main : wait for the thread to finish
Main : all done
Thread 1: finishing
```

You’ll notice that the `Thread` finished after the `Main` section of your code did. You’ll come back to why that is and talk about the mysterious line twenty (`x.join()`) in the next section.



<a class="anchor" id="daemon_threads"></a>
### Daemon Threads

In computer science, a [`daemon`](https://en.wikipedia.org/wiki/Daemon\\_(computing)) is a process that runs in the background.



Python `threading` has a more specific meaning for `daemon`. A `daemon` thread will shut down immediately when the program exits. One way to think about these definitions is to consider the `daemon` thread a thread that runs in the background without worrying about shutting it down.



If a program is running `Threads` that are not `daemons`, then the program will wait for those threads to complete before it terminates. `Threads` that *are* daemons, however, are just killed wherever they are when the program is exiting.



Let’s look a little more closely at the output of your program above. The last two lines are the interesting bit. When you run the program, you’ll notice that there is a pause (of about 2 seconds) after [`__main__`](https://realpython.com/python-main-function/) has printed its `all done` message and before the thread is finished.



This pause is Python waiting for the non-daemonic thread to complete. When your Python program ends, part of the shutdown process is to clean up the threading routine.



If you look at the [source for Python `threading`](https://github.com/python/cpython/blob/df5cdc11123a35065bbf1636251447d0bfe789a5/Lib/threading.py#L1263), you’ll see that `threading._shutdown()` walks through all of the running threads and calls `.join()` on every one that does not have the `daemon` flag set.



So your program waits to exit because the thread itself is waiting in a sleep. As soon as it has completed and printed the message, `.join()` will return and the program can exit.



Frequently, this behavior is what you want, but there are other options available to us. Let’s first repeat the program with a `daemon` thread. You do that by changing how you construct the `Thread`, adding the `daemon=True` flag:



In [None]:
x = threading.Thread(target=thread_function, args=(1,), daemon=True)

When you run the program now, you should see this output:



```sh
$ ./daemon_thread.py
Main : before creating thread
Main : before running thread
Thread 1: starting
Main : wait for the thread to finish
Main : all done
```

The difference here is that the final line of the output is missing. `thread_function()` did not get a chance to complete. It was a `daemon` thread, so when `__main__` reached the end of its code and the program wanted to finish, the daemon was killed.



<a class="anchor" id="`join()`_a_thread"></a>
### `join()` a Thread

Daemon threads are handy, but what about when you want to wait for a thread to stop? What about when you want to do that and not exit your program? Now let’s go back to your original program and look at that commented out line twenty:



In [None]:
# x.join()

To tell one thread to wait for another thread to finish, you call `.join()`. If you uncomment that line, the main thread will pause and wait for the thread `x` to complete running.



Did you test this on the code with the daemon thread or the regular thread? It turns out that it doesn’t matter. If you `.join()` a thread, that statement will wait until either kind of thread is finished.



<a class="anchor" id="working_with_many_threads"></a>
## Working With Many Threads

The example code so far has only been working with two threads: the main thread and one you started with the `threading.Thread` object. 



Frequently, you’ll want to start a number of threads and have them do interesting work. Let’s start by looking at the harder way of doing that, and then you’ll move on to an easier method.



The harder way of starting multiple threads is the one you already know:



In [None]:
import logging
import threading
import time

def thread_function(name):
    logging.info("Thread %s: starting", name)
    time.sleep(2)
    logging.info("Thread %s: finishing", name)

if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO,
                        datefmt="%H:%M:%S")

    threads = list()
    for index in range(3):
        logging.info("Main    : create and start thread %d.", index)
        x = threading.Thread(target=thread_function, args=(index,))
        threads.append(x)
        x.start()

    for index, thread in enumerate(threads):
        logging.info("Main    : before joining thread %d.", index)
        thread.join()
        logging.info("Main    : thread %d done", index)

This code uses the same mechanism you saw above to start a thread, create a `Thread` object, and then call `.start()`. The program keeps a list of `Thread` objects so that it can then wait for them later using `.join()`.



Running this code multiple times will likely produce some interesting results. Here’s an example output from my machine:



```sh
$ ./multiple_threads.py
Main : create and start thread 0.
Thread 0: starting
Main : create and start thread 1.
Thread 1: starting
Main : create and start thread 2.
Thread 2: starting
Main : before joining thread 0.
Thread 2: finishing
Thread 1: finishing
Thread 0: finishing
Main : thread 0 done
Main : before joining thread 1.
Main : thread 1 done
Main : before joining thread 2.
Main : thread 2 done
```

If you walk through the output carefully, you’ll see all three threads getting started in the order you might expect, but in this case they finish in the opposite order! Multiple runs will produce different orderings. Look for the `Thread x: finishing` message to tell you when each thread is done.



The order in which threads are run is determined by the operating system and can be quite hard to predict. It may (and likely will) vary from run to run, so you need to be aware of that when you design algorithms that use threading.



Fortunately, Python gives you several primitives that you’ll look at later to help coordinate threads and get them running together. Before that, let’s look at how to make managing a group of threads a bit easier.



<a class="anchor" id="using_a_threadpoolexecutor"></a>
## Using a ThreadPoolExecutor

There’s an easier way to start up a group of threads than the one you saw above. It’s called a `ThreadPoolExecutor`, and it’s part of the standard library in [`concurrent.futures`](https://docs.python.org/3/library/concurrent.futures.html) (as of Python 3.2).



The easiest way to create it is as a context manager, using the [`with` statement](https://realpython.com/python-with-statement/) to manage the creation and destruction of the pool.



Here’s the `__main__` from the last example rewritten to use a `ThreadPoolExecutor`:



In [None]:
import concurrent.futures

# [rest of code]

if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(
        format=format, level=logging.INFO, datefmt="%H:%M:%S"
    )

    with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
        executor.map(thread_function, range(3))

The code creates a `ThreadPoolExecutor` as a context manager, telling it how many worker threads it wants in the pool. It then uses `.map()` to step through an iterable of things, in your case `range(3)`, passing each one to a thread in the pool. You can think of `concurrent.map` as just a parallel version of the builtin `map` function.



The end of the `with` block causes the `ThreadPoolExecutor` to do a `.join()` on each of the threads in the pool. It is *strongly* recommended that you use `ThreadPoolExecutor` as a context manager when you can so that you never forget to `.join()` the threads.



Running your corrected example code will produce output that looks like this:



```sh
$ ./executor.py
Thread 0: starting
Thread 1: starting
Thread 2: starting
Thread 1: finishing
Thread 0: finishing
Thread 2: finishing
```

Again, notice how `Thread 1` finished before `Thread 0`. The scheduling of threads is done by the operating system and does not follow a plan that’s easy to figure out.



<a class="anchor" id="race_conditions"></a>
## Race Conditions

Before you move on to some of the other features tucked away in Python `threading`, let’s talk a bit about one of the more difficult issues you’ll run into when writing threaded programs: [race conditions](https://en.wikipedia.org/wiki/Race\\_condition).



Once you’ve seen what a race condition is and looked at one happening, you’ll move on to some of the primitives provided by the standard library to prevent race conditions from happening.



Race conditions can occur when two or more threads access a shared piece of data or resource. In this example, you’re going to create a large race condition that happens every time, but be aware that most race conditions are not this obvious. Frequently, they only occur rarely, and they can produce confusing results. As you can imagine, **this makes them quite difficult to debug**.



Fortunately, this race condition will happen every time, and you’ll walk through it in detail to explain what is happening.



For this example, you’re going to write a class that updates a database. Okay, you’re not really going to have a database: you’re just going to fake it, because that’s not the point of this article.



Your `FakeDatabase` will have `.__init__()` and `.update()` methods:



In [None]:
class FakeDatabase:
    def __init__(self):
        self.value = 0

    def update(self, name):
        logging.info(f"Thread {name}: starting update")
        local_copy = self.value
        local_copy += 1
        time.sleep(0.1)
        self.value = local_copy
        logging.info(f"Thread {name}: finishing update")

`FakeDatabase` is keeping track of a single number: `.value`. This is going to be the shared data on which you’ll see the race condition.



`.__init__()` simply initializes `.value` to zero. So far, so good.



`.update()` looks a little strange. It’s simulating reading a value from a database, doing some computation on it, and then writing a new value back to the database.



In this case, reading from the database just means copying `.value` to a local variable. The computation is just to add one to the value and then `.sleep()` for a little bit. Finally, it writes the value back by copying the local value back to `.value`.



Here’s how you’ll use this `FakeDatabase`:



In [None]:
format = "%(asctime)s: %(message)s"
logging.basicConfig(format=format, level=logging.INFO, datefmt="%H:%M:%S")

database = FakeDatabase()
logging.info(f"Testing update. Starting value is {database.value}.")
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
    for ® in range(2):
        executor.submit(database.update, index)
logging.info(f"Testing update. Ending value is {database.value}.")

The program creates a `ThreadPoolExecutor` with two threads and then calls `.submit()` on each of them, telling them to run `database.update()`.



`.submit()` has a signature that allows both positional and named arguments to be passed to the function running in the thread:



In [None]:
.submit(function, *args, **kwargs)

In the usage above, `index` is passed as the first and only positional argument to `database.update()`. You’ll see later in this article where you can pass multiple arguments in a similar manner.



Since each thread runs `.update()`, and `.update()` adds one to `.value`, you might expect `database.value` to be `2` when it’s printed out at the end. But you wouldn’t be looking at this example if that was the case. If you run the above code, the output looks like this:



```sh
$ ./racecond.py
Testing unlocked update. Starting value is 0.
Thread 0: starting update
Thread 1: starting update
Thread 0: finishing update
Thread 1: finishing update
Testing unlocked update. Ending value is 1.
```

You might have expected that to happen, but let’s look at the details of what’s really going on here, as that will make the solution to this problem easier to understand.



<a class="anchor" id="one_thread"></a>
### One Thread

Before you dive into this issue with two threads, let’s step back and talk a bit about some details of how threads work.



You won’t be diving into all of the details here, as that’s not important at this level. We’ll also be simplifying a few things in a way that won’t be technically accurate but will give you the right idea of what is happening.



When you tell your `ThreadPoolExecutor` to run each thread, you tell it which function to run and what parameters to pass to it: `executor.submit(database.update, index)`. 



The result of this is that each of the threads in the pool will call `database.update(index)`. Note that `database` is a reference to the one `FakeDatabase` object created in `__main__`. Calling `.update()` on that object calls an [instance method](https://realpython.com/instance-class-and-static-methods-demystified/) on that object.



Each thread is going to have a reference to the same `FakeDatabase` object, `database`. Each thread will also have a unique value, `index`, to make the logging statements a bit easier to read:



<img src="images/an-intro-to-threading-in-python/intro-threading-shared-database.267a5d8c6aa1.png" width="600px">

When the thread starts running `.update()`, it has its own version of all of the data **local** to the function. In the case of `.update()`, this is `local_copy`. This is definitely a good thing. Otherwise, two threads running the same function would always confuse each other. It means that all variables that are scoped (or local) to a function are **thread-safe**.



Now you can start walking through what happens if you run the program above with a single thread and a single call to `.update()`.



The image below steps through the execution of `.update()` if only a single thread is run. The statement is shown on the left followed by a diagram showing the values in the thread’s `local_copy` and the shared `database.value`:



<img src="images/an-intro-to-threading-in-python/intro-threading-single-thread.6a11288bc199.png" width="600px">

The diagram is laid out so that time increases as you move from top to bottom. It begins when `Thread 1` is created and ends when it is terminated.



When `Thread 1` starts, `FakeDatabase.value` is zero. The first line of code in the method, `local_copy = self.value`, copies the value zero to the local variable. Next it increments the value of `local_copy` with the `local_copy += 1` statement. You can see `.value` in `Thread 1` getting set to one.



Next `time.sleep()` is called, which makes the current thread pause and allows other threads to run. Since there is only one thread in this example, this has no effect.



When `Thread 1` wakes up and continues, it copies the new value from `local_copy` to `FakeDatabase.value`, and then the thread is complete. You can see that `database.value` is set to one.



So far, so good. You ran `.update()` once and `FakeDatabase.value` was incremented to one.



<a class="anchor" id="two_threads"></a>
### Two Threads

Getting back to the race condition, the two threads will be running concurrently but not at the same time. They will each have their own version of `local_copy` and will each point to the same `database`. It is this shared `database` object that is going to cause the problems.



The program starts with `Thread 1` running `.update()`:



<img src="images/an-intro-to-threading-in-python/intro-threading-two-threads-part1.c1c0e65a8481.png" width="600px">

When `Thread 1` calls `time.sleep()`, it allows the other thread to start running. This is where things get interesting.



`Thread 2` starts up and does the same operations. It’s also copying `database.value` into its private `local_copy`, and this shared `database.value` has not yet been updated:



<img src="images/an-intro-to-threading-in-python/intro-threading-two-threads-part2.df42d4fbfe21.png" width="600px">

When `Thread 2` finally goes to sleep, the shared `database.value` is still unmodified at zero, and both private versions of `local_copy` have the value one.



`Thread 1` now wakes up and saves its version of `local_copy` and then terminates, giving `Thread 2` a final chance to run. `Thread 2` has no idea that `Thread 1` ran and updated `database.value` while it was sleeping. It stores *its* version of `local_copy` into `database.value`, also setting it to one:



<img src="images/an-intro-to-threading-in-python/intro-threading-two-threads-part3.18576920f88f.png" width="600px">

The two threads have interleaving access to a single shared object, overwriting each other’s results. Similar race conditions can arise when one thread frees memory or closes a file handle before the other thread is finished accessing it.



<a class="anchor" id="why_this_isn’t_a_silly_example"></a>
### Why This Isn’t a Silly Example

The example above is contrived to make sure that the race condition happens every time you run your program. Because the operating system can swap out a thread at any time, it is possible to interrupt a statement like `x = x + 1` after it has read the value of `x` but before it has written back the incremented value.



The details of how this happens are quite interesting, but not needed for the rest of this article, so feel free to skip over the section below.



<a class="anchor" id="how_does_this_really_work"></a>
#### How Does This Really Work

The code above isn’t quite as out there as you might originally have thought. It was designed to force a race condition every time you run it, but that makes it much easier to solve than most race conditions.

There are two things to keep in mind when thinking about race conditions:

1. Even an operation like x += 1 takes the processor many steps. Each of these steps is a separate instruction to the processor.
2. The operating system can swap which thread is running at any time. A thread can be swapped out after any of these small instructions. This means that a thread can be put to sleep to let another thread run in the middle of a Python statement.

Let’s look at this in detail. The REPL below shows a function that takes a parameter and increments it:

In [2]:
def inc(x):
    x += 1

In [3]:
import dis

In [4]:
dis.dis(inc)

  2           0 LOAD_FAST                0 (x)
              2 LOAD_CONST               1 (1)
              4 INPLACE_ADD
              6 STORE_FAST               0 (x)
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE


The REPL example uses [dis](https://docs.python.org/3/library/dis.html) from the Python standard library to show the smaller steps that the processor does to implement your function. It does a `LOAD_FAST` of the data value `x`, it does a `LOAD_CONST 1`, and then it uses the `INPLACE_ADD` to add those values together.

We’re stopping here for a specific reason. This is the point in `.update()` above where `time.sleep()` forced the threads to switch. It is entirely possible that, every once in while, the operating system would switch threads at that exact point even without `sleep()`, but the call to `sleep()` makes it happen every time.

As you learned above, the operating system can swap threads at any time. You’ve walked down this listing to the statement marked 4. If the operating system swaps out this thread and runs a different thread that also modifies `x`, then when this thread resumes, it will overwrite `x` with an incorrect value.

Technically, this example won’t have a race condition because `x` is local to `inc()`. It does illustrate how a thread can be interrupted during a single Python operation, however. The same `LOAD`, `MODIFY`, `STORE` set of operations also happens on global and shared values. You can explore with the `dis` module and prove that yourself.

It’s rare to get a race condition like this to occur, but remember that an infrequent event taken over millions of iterations becomes likely to happen. **The rarity of these race conditions makes them much, much harder to debug than regular bugs.**

Now back to your regularly scheduled tutorial!

Now that you’ve seen a race condition in action, let’s find out how to solve them!



<a class="anchor" id="basic_synchronization_using_lock"></a>
## Basic Synchronization Using Lock

There are a number of ways to avoid or solve race conditions. You won’t look at all of them here, but there are a couple that are used frequently. Let’s start with `Lock`.



To solve your race condition above, you need to find a way to allow only one thread at a time into the read-modify-write section of your code. The most common way to do this is called `Lock` in Python. In some other languages this same idea is called a `mutex`. Mutex comes from MUTual EXclusion, which is exactly what a `Lock` does.



A `Lock` is an object that acts like a hall pass. Only one thread at a time can have the `Lock`. Any other thread that wants the `Lock` must wait until the owner of the `Lock` gives it up.



The basic functions to do this are `.acquire()` and `.release()`. A thread will call `my_lock.acquire()` to get the lock. If the lock is already held, the calling thread will wait until it is released. There’s an important point here. If one thread gets the lock but never gives it back, your program will be stuck. You’ll read more about this later.



Fortunately, Python’s `Lock` will also operate as a context manager, so you can use it in a `with` statement, and it gets released automatically when the `with` block exits for any reason.



Let’s look at the `FakeDatabase` with a `Lock` added to it. The calling function stays the same:



In [None]:
class FakeDatabase:
    def __init__(self):
        self.value = 0
        self._lock = threading.Lock()

    def locked_update(self, name):
        logging.info("Thread %s: starting update", name)
        logging.debug("Thread %s about to lock", name)
        with self._lock:
            logging.debug("Thread %s has lock", name)
            local_copy = self.value
            local_copy += 1
            time.sleep(0.1)
            self.value = local_copy
            logging.debug("Thread %s about to release lock", name)
        logging.debug("Thread %s after release", name)
        logging.info("Thread %s: finishing update", name)

Other than adding a bunch of debug logging so you can see the locking more clearly, the big change here is to add a member called `._lock`, which is a `threading.Lock()` object. This `._lock` is initialized in the unlocked state and locked and released by the `with` statement.



It’s worth noting here that the thread running this function will hold on to that `Lock` until it is completely finished updating the database. In this case, that means it will hold the `Lock` while it copies, updates, sleeps, and then writes the value back to the database.



If you run this version with logging set to warning level, you’ll see this:



```sh
$ ./fixrace.py
Testing locked update. Starting value is 0.
Thread 0: starting update
Thread 1: starting update
Thread 0: finishing update
Thread 1: finishing update
Testing locked update. Ending value is 2.
```

Look at that. Your program finally works!



You can turn on full logging by setting the level to `DEBUG` by adding this statement after you configure the logging output in `__main__`:



In [None]:
logging.getLogger().setLevel(logging.DEBUG)

Running this program with `DEBUG` logging turned on looks like this:



```sh
$ ./fixrace.py
Testing locked update. Starting value is 0.
Thread 0: starting update
Thread 0 about to lock
Thread 0 has lock
Thread 1: starting update
Thread 1 about to lock
Thread 0 about to release lock
Thread 0 after release
Thread 0: finishing update
Thread 1 has lock
Thread 1 about to release lock
Thread 1 after release
Thread 1: finishing update
Testing locked update. Ending value is 2.
```

In this output you can see `Thread 0` acquires the lock and is still holding it when it goes to sleep. `Thread 1` then starts and attempts to acquire the same lock. Because `Thread 0` is still holding it, `Thread 1` has to wait. This is the mutual exclusion that a `Lock` provides.



Many of the examples in the rest of this article will have `WARNING` and `DEBUG` level logging. We’ll generally only show the `WARNING` level output, as the `DEBUG` logs can be quite lengthy. Try out the programs with the logging turned up and see what they do.



<a class="anchor" id="deadlock"></a>
## Deadlock

Before you move on, you should look at a common problem when using `Locks`. As you saw, if the `Lock` has already been acquired, a second call to `.acquire()` will wait until the thread that is holding the `Lock` calls `.release()`. What do you think happens when you run this code:



In [None]:
import threading

l = threading.Lock()
print("before first acquire")
l.acquire()
print("before second acquire")
l.acquire()
print("acquired lock twice")

When the program calls `l.acquire()` the second time, it hangs waiting for the `Lock` to be released. In this example, you can fix the deadlock by removing the second call, but deadlocks usually happen from one of two subtle things:



1. An implementation bug where a `Lock` is not released properly
2. A design issue where a utility function needs to be called by functions that might or might not already have the `Lock`


The first situation happens sometimes, but using a `Lock` as a context manager greatly reduces how often. It is recommended to write code whenever possible to make use of context managers, as they help to avoid situations where an exception skips you over the `.release()` call.



The design issue can be a bit trickier in some languages. Thankfully, Python threading has a second object, called `RLock`, that is designed for just this situation. It allows a thread to `.acquire()` an `RLock` multiple times before it calls `.release()`. That thread is still required to call `.release()` the same number of times it called `.acquire()`, but it should be doing that anyway.



`Lock` and `RLock` are two of the basic tools used in threaded programming to prevent race conditions. There are a few other that work in different ways. Before you look at them, let’s shift to a slightly different problem domain.



<a class="anchor" id="threading_objects"></a>
## Threading Objects

There are a few more primitives offered by the Python `threading` module. While you didn’t need these for the examples above, they can come in handy in different use cases, so it’s good to be familiar with them.



<a class="anchor" id="semaphore"></a>
### Semaphore

The first Python `threading` object to look at is `threading.Semaphore`. A `Semaphore` is a counter with a few special properties. The first one is that the counting is atomic. This means that there is a guarantee that the operating system will not swap out the thread in the middle of incrementing or decrementing the counter.



The internal counter is incremented when you call `.release()` and decremented when you call `.acquire()`.



The next special property is that if a thread calls `.acquire()` when the counter is zero, that thread will block until a different thread calls `.release()` and increments the counter to one.



Semaphores are frequently used to protect a resource that has a limited capacity. An example would be if you have a pool of connections and want to limit the size of that pool to a specific number.



<a class="anchor" id="timer"></a>
### Timer

A `threading.Timer` is a way to schedule a function to be called after a certain amount of time has passed. You create a `Timer` by passing in a number of seconds to wait and a function to call:



In [None]:
t = threading.Timer(30.0, my_function)

You start the `Timer` by calling `.start()`. The function will be called on a new thread at some point after the specified time, but be aware that there is no promise that it will be called exactly at the time you want.



If you want to stop a `Timer` that you’ve already started, you can cancel it by calling `.cancel()`. Calling `.cancel()` after the `Timer` has triggered does nothing and does not produce an exception.



A `Timer` can be used to prompt a user for action after a specific amount of time. If the user does the action before the `Timer` expires, `.cancel()` can be called.



<a class="anchor" id="barrier"></a>
### Barrier

A `threading.Barrier` can be used to keep a fixed number of threads in sync. When creating a `Barrier`, the caller must specify how many threads will be synchronizing on it. Each thread calls `.wait()` on the `Barrier`. They all will remain blocked until the specified number of threads are waiting, and then the are all released at the same time.



Remember that threads are scheduled by the operating system so, even though all of the threads are released simultaneously, they will be scheduled to run one at a time.



One use for a `Barrier` is to allow a pool of threads to initialize themselves. Having the threads wait on a `Barrier` after they are initialized will ensure that none of the threads start running before all of the threads are finished with their initialization.



<a class="anchor" id="download_sites:_threading_version"></a>
## Download Sites: Threading Version

As you probably guessed, writing a threaded program takes more effort. You might be surprised at how little extra effort it takes for simple cases, however. Here’s what the same program looks like with `threading`:



In [None]:
import concurrent.futures
import requests
import threading
import time

In [None]:
thread_local = threading.local()

In [None]:
def get_session():
    if not hasattr(thread_local, "session"):
        thread_local.session = requests.Session()
    return thread_local.session

In [None]:
def download_site(url):
    session = get_session()
    with session.get(url) as response:
        print(f"Read {len(response.content)} from {url}")

In [None]:
def download_all_sites(sites):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        executor.map(download_site, sites)

In [None]:
sites = [
    "https://www.jython.org",
    "http://olympus.realpython.org/dice",
] * 80
start_time = time.time()
download_all_sites(sites)
duration = time.time() - start_time
print(f"Downloaded {len(sites)} in {duration} seconds")

When you add `threading`, the overall structure is the same and you only needed to make a few changes. `download_all_sites()` changed from calling the function once per site to a more complex structure.



In this version, you’re creating a `ThreadPoolExecutor`, which seems like a complicated thing. Let’s break that down: `ThreadPoolExecutor` = `Thread` + `Pool` + `Executor`.



You already know about the `Thread` part. That’s just a train of thought we mentioned earlier. The `Pool` portion is where it starts to get interesting. This object is going to create a pool of threads, each of which can run concurrently. Finally, the `Executor` is the part that’s going to control how and when each of the threads in the pool will run. It will execute the request in the pool.



Helpfully, the standard library implements `ThreadPoolExecutor` as a context manager so you can use the `with` syntax to manage creating and freeing the pool of `Threads`.



Once you have a `ThreadPoolExecutor`, you can use its handy `.map()` method. This method runs the passed-in function on each of the sites in the list. The great part is that it automatically runs them concurrently using the pool of threads it is managing.



Those of you coming from other languages, or even Python 2, are probably wondering where the usual objects and functions are that manage the details you’re used to when dealing with `threading`, things like `Thread.start()`, `Thread.join()`, and `Queue`.



These are all still there, and you can use them to achieve fine-grained control of how your threads are run. But, starting with Python 3.2, the standard library added a higher-level abstraction called `Executors` that manage many of the details for you if you don’t need that fine-grained control.



The other interesting change in our example is that each thread needs to create its own `requests.Session()` object. When you’re looking at the documentation for `requests`, it’s not necessarily easy to tell, but reading [this issue](https://github.com/requests/requests/issues/2766), it seems fairly clear that you need a separate Session for each thread.



This is one of the interesting and difficult issues with `threading`. Because the operating system is in control of when your task gets interrupted and another task starts, any data that is shared between the threads needs to be protected, or thread-safe. Unfortunately `requests.Session()` is not thread-safe.



There are several strategies for making data accesses thread-safe depending on what the data is and how you’re using it. One of them is to use thread-safe data structures like `Queue` from Python’s `queue` module.



These objects use low-level primitives like [`threading.Lock`](https://docs.python.org/2/library/threading.html#lock-objects) to ensure that only one thread can access a block of code or a bit of memory at the same time. You are using this strategy indirectly by way of the `ThreadPoolExecutor` object.



Another strategy to use here is something called thread local storage. `threading.local()` creates an object that looks like a global but is specific to each individual thread. In your example, this is done with `thread_local` and `get_session()`:



In [None]:
thread_local = threading.local()

In [None]:
def get_session():
    if not hasattr(thread_local, "session"):
        thread_local.session = requests.Session()
    return thread_local.session

`local()` is in the `threading` module to specifically solve this problem. It looks a little odd, but you only want to create one of these objects, not one for each thread. The object itself takes care of separating accesses from different threads to different data.



When `get_session()` is called, the `session` it looks up is specific to the particular thread on which it’s running. So each thread will create a single session the first time it calls `get_session()` and then will simply use that session on each subsequent call throughout its lifetime.



Finally, a quick note about picking the number of threads. You can see that the example code uses 5 threads. Feel free to play around with this number and see how the overall time changes. You might expect that having one thread per download would be the fastest but, at least on my system it was not. I found the fastest results somewhere between 5 and 10 threads. If you go any higher than that, then the extra overhead of creating and destroying the threads erases any time savings.



The difficult answer here is that the correct number of threads is not a constant from one task to another. Some experimentation is required.



**Why the `threading` Version Rocks**



It’s fast! Here’s the fastest run of my tests. Remember that the non-concurrent version took more than 14 seconds:



```sh
$ ./io_threading.py
 [most output skipped]
Downloaded 160 in 3.7238826751708984 seconds
```

Here’s what its execution timing diagram looks like:



<img src="images/speed-up-your-python-program-with-concurrency/Threading.3eef48da829e.png" width="600px">

It uses multiple threads to have multiple open requests out to web sites at the same time, allowing your program to overlap the waiting times and get the final result faster! Yippee! That was the goal.



**The Problems with the `threading` Version**



Well, as you can see from the example, it takes a little more code to make this happen, and you really have to give some thought to what data is shared between threads.



Threads can interact in ways that are subtle and hard to detect. These interactions can cause race conditions that frequently result in random, intermittent bugs that can be quite difficult to find.


<a class="anchor" id="conclusion:_threading_in_python"></a>
## Conclusion: Threading in Python

You’ve now seen much of what Python `threading` has to offer and some examples of how to build threaded programs and the problems they solve. You’ve also seen a few instances of the problems that arise when writing and debugging threaded programs.



If you’d like to explore other options for concurrency in Python, check out [Speed Up Your Python Program With Concurrency](https://realpython.com/python-concurrency/).

If you’re interested in doing a deep dive on the `asyncio` module, go read [Async IO in Python: A Complete Walkthrough](https://realpython.com/async-io-python/).



Whatever you do, you now have the information and confidence you need to write programs using Python threading!



*Special thanks to reader JL Diaz for helping to clean up the introduction.*

