# Programming with threading in Python
Until now, we have used Python in a linear fashion: instructions were executed in order, and in order for the next one to run, the one before it had to be completed.

But Python offers us in its standard library several modules to do "parallel programming", which means that several instructions of code will execute at the same time, or almost at the same time.

We will take a closer look at the `threading` module, which offers a simple interface to create threads, i.e. portions of our code that will be executed at the same time.

To follow this chapter, you will need to know how to create classes and the basics of inheritance.


## Linear programming VS parallel programming
So far, we have been working with "linear" programming. Take a look at this code:

In [1]:
import time
print("Before sleep...")
time.sleep(2)
print("After sleep.")

Before sleep...
After sleep.


If you execute this code, unsurprisingly, the first message `Before Sleeping...` is displayed, then the program stops for a few seconds. Finally, the second message `After Sleep.` appears.

This is a blocking code. During the seconds of sleep, the script is blocked and does nothing else. And that's where the threads come in. 

Threads allow you to execute several instructions at the same time. This is called "parallel programming", because instead of developing according to a single thread of instructions, several threads are developed in parallel.

## First thread


Another example of linear programming is below.

In [2]:
def thread_function(name):
    print("Thread {}: starting".format(name))
    time.sleep(2)
    print("Thread {}: finishing".format(name))

for i in range(5):
    thread_function(i)

Thread 0: starting
Thread 0: finishing
Thread 1: starting
Thread 1: finishing
Thread 2: starting
Thread 2: finishing
Thread 3: starting
Thread 3: finishing
Thread 4: starting
Thread 4: finishing


As you can see, the instructions take 2 seconds to execute at each iteration. As the code is blocking, this script will take 10 seconds to execute. 


Let's tweak the code so it runs in parallel. 

To do this, we will have to create a class that will inherit from the `Thread` class. Then we will have to initialize `Thread` in the constructor of our class. Our class should also contain a ``run()`` method. This is the method that will be executed when we call our class.

In [3]:
from threading import Thread

class ThreadFunction(Thread):
    def __init__(self, name):
        Thread.__init__(self)
        self.name = name
    
    def run(self):
        print("Thread {}: starting".format(self.name))
        time.sleep(2)
        print("Thread {}: finishing".format(self.name))

for i in range(5):
    thread = ThreadFunction(i)
    thread.start()

Thread 0: starting
Thread 1: starting
Thread 2: starting
Thread 3: startingThread 4: starting

Thread 0: finishingThread 1: finishing

Thread 4: finishingThread 2: finishing

Thread 3: finishing


Here we can see that each iteration also lasts 2 seconds. But in contrary to the previous example, this time each iteration runs in parallel and is executed at the same time. So the final script will only last 2 seconds in total since the code is not blocking.

Parallel programming can be very convenient, but it also has its pitfalls. We will now look at some of them and the methods that exist to avoid them.

## Thread synchronization
Programming multiple instruction streams brings its share of difficulties. At first glance, it seems very convenient to have several parts of our code running at the same time. During a task that may take a long time to run (perhaps downloading information from a website) we can do something else, not just wait for the resource to be downloaded.

But development can be proportionately more complicated. You have to keep in mind that different instruction streams can be advanced to different points at a given time.

**Let's look at an example :**

In [4]:
number = 1
number += 1

It is the second line that interests us here : `number += 1`. If you call it in one of your threads and `number` is shared by several of your threads, you might get strange results. Not all the time. That's the problem: most of the time you won't have any problems, sometimes you will get strange results.

Let's say this variable is used to count information (the number of times a certain operation is executed, maybe). If you are unlucky, two threads will access to this code but the number will only be increased by 1.

This is due to the fact that number += 1 does three things:

* It will retrieve the value of the variable `number`;

* It will add 1 to it;

* It will write the result to the variable `number`.

Represent these steps on a sheet of paper. Now imagine the same steps for a second thread.

Let's assume that `thread_1` and `thread_2` run almost at the same time:

* `thread_1` starts executing the instruction. It executes steps 1 and 2 (i.e. it will retrieve the value of the variable number) but does not yet execute step 3 (i.e. the variable number is not yet modified) ;

* and here is `thread_2` executing the instruction (all three steps this time). It retrieves `number`, adds 1 to it, and writes the result to the variable ;

* Finally, `thread_1` executes step 3 and writes the result to the variable. But this value is based on the old value of `number` (before `thread_2` was called). In the end, after executing our two threads, number was only incremented by 1.

As you can see here, a very simple instruction line can have unexpected results if it is called at the same time by different threads.

The problem is even more obvious when you want to access resources from different threads. For example, you want to write to a file (the same file from different threads).


In [5]:

class MyThread(Thread):
    def __init__(self, text):
        Thread.__init__(self)
        self.text = text
    
    def run(self):
        print(self.text)
        with open('threads.txt', 'a') as f:
            f.write(self.text)


In [6]:
thread_1 = MyThread("My First thread! ")
thread_2 = MyThread("My Second thread! ")
thread_3 = MyThread("My third thread! ")
thread_4 = MyThread("My fourth thread! ")

thread_1.start()
thread_2.start()
thread_3.start()
thread_4.start()

My First thread! 
My Second thread! 
My third thread! 
My fourth thread! 


Exception in thread Thread-11:
Traceback (most recent call last):
  File "C:\Users\jeremy\Anaconda3\lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "<ipython-input-5-9d761eba0e90>", line 8, in run
    with open('threads.txt', 'a') as f:
PermissionError: [Errno 13] Permission denied: 'threads.txt'

Exception in thread Thread-12:
Traceback (most recent call last):
  File "C:\Users\jeremy\Anaconda3\lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "<ipython-input-5-9d761eba0e90>", line 8, in run
    with open('threads.txt', 'a') as f:
PermissionError: [Errno 13] Permission denied: 'threads.txt'

Exception in thread Thread-14:
Traceback (most recent call last):
  File "C:\Users\jeremy\Anaconda3\lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "<ipython-input-5-9d761eba0e90>", line 8, in run
    with open('threads.txt', 'a') as f:
PermissionError: [Errno 13] Permission denied: 'threads.txt'

Exception in thread Thread-13:
Tra

In [7]:
f = open("threads.txt")
f.read()

FileNotFoundError: [Errno 2] No such file or directory: 'threads.txt'

The results are random. Sometimes you will get the expected result, but at other times you may get some surprising results. For example, when I ran the script, I got this: 


``'My Second thread! My First thread! My third thread! My 4e thread! '``

In my example, thread 2 ran before thread 1. This could give surprises in some cases. 


## Lock
There are several ways to "synchronize" our threads, i.e. to make some of the code only run if no one is using the shared resource. The simplest synchronization mechanism is the lock.

It is an object proposed by `threading` that is extremely simple to use: at the beginning of our instructions that use our shared resource, we tell the lock to block for the other threads. If another thread wants to use this resource, it must wait until it is released.

Rather than a long speech, I'll give you our slightly modified code for using locks.

In [8]:
from threading import Thread, RLock

lock = RLock()

class SyncThread(Thread):
    def __init__(self, text):
        Thread.__init__(self)
        self.text = text
    
    def run(self):
        with lock:
            print(self.text)
            with open('synch_thread.txt', 'a') as file:
                file.write(self.text)

1. We import `RLock` from the threading module
1. We create a lock that we put into our `lock` variable
1. In our `run` method, we lock part of our thread.

In [9]:
thread_1 = SyncThread("Thread 1 /")
thread_2 = SyncThread("Thread 2 /")
thread_3 = SyncThread("Thread 3 /")
thread_4 = SyncThread("Thread 4 /")

thread_1.start()
thread_2.start()
thread_3.start()
thread_4.start()

Thread 1 /
Thread 2 /


Exception in thread Thread-15:
Traceback (most recent call last):
  File "C:\Users\jeremy\Anaconda3\lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "<ipython-input-8-781bedd7ba08>", line 13, in run
    with open('synch_thread.txt', 'a') as file:
PermissionError: [Errno 13] Permission denied: 'synch_thread.txt'

Exception in thread Thread-16:
Traceback (most recent call last):
  File "C:\Users\jeremy\Anaconda3\lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "<ipython-input-8-781bedd7ba08>", line 13, in run
    with open('synch_thread.txt', 'a') as file:
PermissionError: [Errno 13] Permission denied: 'synch_thread.txt'



Thread 3 /
Thread 4 /


Exception in thread Thread-17:
Traceback (most recent call last):
  File "C:\Users\jeremy\Anaconda3\lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "<ipython-input-8-781bedd7ba08>", line 13, in run
    with open('synch_thread.txt', 'a') as file:
PermissionError: [Errno 13] Permission denied: 'synch_thread.txt'

Exception in thread Thread-18:
Traceback (most recent call last):
  File "C:\Users\jeremy\Anaconda3\lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "<ipython-input-8-781bedd7ba08>", line 13, in run
    with open('synch_thread.txt', 'a') as file:
PermissionError: [Errno 13] Permission denied: 'synch_thread.txt'



In [10]:
f = open('synch_thread.txt')
f.read()

FileNotFoundError: [Errno 2] No such file or directory: 'synch_thread.txt'

You can re-run the script as many times as you want, you can see that the thread order is always respected. 

## Join 

Let's look at another scenario. In the example below, we start the thread and it is followed by a print. The print will not wait until the thread is finished to run. And this can be a problem in some cases. 

In [11]:
thread = ThreadFunction(1)
thread.start()
print("hello")

Thread 1: starting
hello
Thread 1: finishing


As I was saying, here the print is sometimes executed before the end of the thread. This is not a desired behavior. 

To control this behavior, we can use the join method. The join method allows you to wait until the execution of the thread is finished before continuing the script normally.  

In [12]:
thread = ThreadFunction(1)
thread.start()
thread.join()
print("hello")

Thread 1: starting
Thread 1: finishing
hello


## In summary
* There are several mechanisms for parallel programming, including the threads offered in the `threading` module of the standard library.

* Creating a thread is done by redefining a class inherited from `threading.Thread` and calling its `start()` method.

* We can use locks to synchronize our threads and make some parts of our code run well after others.

* We can use the join method to wait for the thread to finish executing.

* Generally, we want to use threads where your code is IO-bound; that is, it spends a significant amount of time waiting on input or output. An example might be downloading data from a list of URLs in parallel; the code can start requesting the data from the next URL while still waiting for the previous one to return.

## Additional resources

* [Async IO](https://realpython.com/async-io-python/)
* [Concurrent futures](https://docs.python.org/3/library/concurrent.futures.html)
