# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Threading" data-toc-modified-id="Threading-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Threading</a></div><div class="lev2 toc-item"><a href="#Multithreading" data-toc-modified-id="Multithreading-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Multithreading</a></div><div class="lev2 toc-item"><a href="#Locks" data-toc-modified-id="Locks-12"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Locks</a></div>

In [1]:
# 1. magic for inline plot
# 2. magic to print version
# 3. magic so that the notebook will reload external python modules
# 4. a ipython magic to enable retina (high resolution) plots
# https://gist.github.com/minrk/3301035
%matplotlib inline
%load_ext watermark
%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = 'retina'

import time
import threading

%watermark -a 'Ethen' -d -t -v

Ethen 2018-01-20 10:49:14 

CPython 3.5.2
IPython 6.2.1


# Threading

In Python, threading allows us to run multiple I/O bound tasks concurrently, so if we have two tasks on our hand, task A and task B, we can run them simultaneously without having to wait for task A to finish before running running task B. The keyword here is "I/O bound". The **GIL (Global Interpreter Lock)** in Python that prevents threads from actually running in parallel. The GIL is necessary because the Python interpreter is not thread safe. This means that there is a globally enforced lock when trying to safely access Python objects from within threads. At any time only a single thread can acquire a lock for a Python object or C API. This is the reason that makes threads unsuitable for CPU intensive tasks in Python.

On the other hand, threads are very efficient and beneficial for task that are not CPU intensive. The benefit of threading in Python appears when our problems are network bound or data input/output (I/O) bound. This means that the Python interpreter is waiting for the result of a function call that's manipulating with data from an external source, such as network address or hard disk. 

For example, consider a Python code that is scraping many web URLs. By adding a new thread for each download resource, the code can download multiple data sources in parallel and combine the results at the end of every download. This means that each subsequent download is not waiting on the download of earlier web pages. In this case the program is now bound by the bandwidth limitations of the client/server(s) instead.

## Multithreading

Let's look at some examples of working with threads. Before creating our thread, we first define a function that does nothing but sleeps for a specified amount of time.

In [2]:
def sleeper(n_time):
    name = threading.current_thread().name
    print('I am {}. Going to sleep for {} seconds'.format(name, n_time))
    time.sleep(n_time)
    print('{} has woken up from sleep'.format(name))

We then initialize our thread with the `Thread` class from the `threading` module.

- `target`: accepts the function that we're going to execute.
- `name`: naming the thread; this allows us to easily differentiate between threads when we have multiple threads.
- `args`: pass in the argument to our function here.

In [3]:
# we call .start to start executing the function from the thread
n_time = 2
thread = threading.Thread(target = sleeper, name = 'thread1', args = (n_time,))
thread.start()

I am Thread-4. Going to sleep for 2 seconds
Thread-4 has woken up from sleep


When we run a program and something is sleeping for a few seconds, we would have to wait for that portion to wake up before we can continue with the rest of the program, but the concurrency of threads can bypass this behavior. Suppose we consider the main program as the main thread and our thread as its own separate thread, the code chunk below demonstrates the concurrency property, i.e. we don't have to wait for the calling thread to finish before running the rest of our program.

In [None]:
# hello is printed "before" the wake up message from the function
thread = threading.Thread(target = sleeper, name = 'thread2', args = (n_time,))
thread.start()

print()
print('hello')

Sometimes, we don't want Python to switch to the main thread until the thread we defined has finished executing its function. To do this, we can use `.join` method, this is essentially what people called the blocking call. It blocks the interpreter from accessing or executing the main program until the thread finishes it task.

In [None]:
# hello is printed "after" the wake up message from the function
thread = threading.Thread(target = sleeper, name = 'thread3', args = (n_time,))
thread.start()
thread.join()

print()
print('hello')

The following code chunk showcase how to initialize and utilize multiple threads.

In [None]:
n_time = 2
n_threads = 5
start = time.time()

# create n_threads number of threads and store them in a list
threads = []
for i in range(n_threads):
    name = 'thread_{}'.format(i)
    args = n_time, name
    thread = threading.Thread(target = sleeper, name = name, args = args)
    threads.append(thread)
    # we can start the thread while we're creating it, or move
    # this to its own loop
    thread.start()

# we could instead start the thread in a separate loop
# for thread in threads:
#     thread.start()

# ensure all threads have finished before executing main program
for thread in threads:
    thread.join()

elapse = time.time() - start
print()
print('Elapse time: ', elapse)

From the result above, we can observe from the elapse time that it doesn't take n_threads * (the time we told the sleep function to sleep) amount of time to finish all the task.

## Locks

The next topic is to introduce **Locks**. Locks are used when multiple threads are trying to access the same variable. By using locks we can guard ourselves from accessing the same object from multiple threads simultaneously, which can potentially corrupt our data.

For example, consider we have a program that does some kind of I/O processing and simply keeps track of how many items have we processed.