# Threading: Overview
By nature, Python is a linear language, but the threading module comes in handy when you want a little more processing power. While threading in Python cannot be used for parallel CPU computation, it's perfect for I/O operations such as web scraping where the processor sits idle waiting for data.

Threading is game-changing because many scripts related to network/data I/O spend the majority of their time waiting for data from a remote source. Because downloads might not be linked (i.e., scraping separate websites), the processor can download from different data sources in parallel and combine the result at the end. For CPU intensive processes, there is little benefit to using the threading module.


### Basic Use
Threading is included in the Python standard library, so we need not install any package. We simply impor the threading library:

In [1]:
import threading
from queue import Queue
import time

You can use `target` as the callable object, `args` to pass parameters to the function, and `start` to start the thread

In [10]:
def testThread(num):
    print(num)

if __name__ == '__main__':
    for i in range(5):
        t = threading.Thread(target=testThread, args=(i,))
        t.start()

0
1
2
3
4


If calling this as a script you'll want to embed the threading below the  `if __name__ == '__main__':` statement to make sure the code that's nested inside it will only run if the script is run directly (not imported). We don't need to do this in this notebook. 

We see that the way we create and begin a thread process is with the `start()` call.

In [13]:
def worker(num):
    """thread worker function"""
    print('Worker: %s' % num)
    return

threads = []
for i in range(5):
    t = threading.Thread(target=worker, args=(i,))
    threads.append(t)
    t.start()

Worker: 0
Worker: 1
Worker: 2
Worker: 3
Worker: 4


### Locks

You'll often want your threads to be able to use or modify variables common between threads but to do that you'll have to use something known as a lock. Whenever a function wants to modify a variable, it locks that variable. When another function wants to use a variable, it must wait until that variable is unlocked.

Imagine two functions which both iterate a variable by 1. The lock allows you to ensure that one function can access the variable, perform calculations, and write back to that variable before another function can access the same variable.

When using the threading module, this can also happen when you're printing because the text can get jumbled up (and cause data corruption). You can use a print lock to ensure that only one thread can print at a time.

Locks are the simplest synchronization primitives in Python. A Lock has only two states — locked and unlocked (surprise). It is created in the unlocked state and has two principal methods — acquire() and release(). The acquire() method locks the Lock and blocks execution until the release() method in some other coroutine sets it to unlocked. Then it locks the Lock again and returns True. The release() method should only be called in the locked state, it sets the state to unlocked and returns immediately. If release() is called in the unlocked state, a RunTimeError is raised.

Here’s the code which uses a Lock primitive for securely accessing a shared variable:

In [18]:

from threading import Lock, Thread
lock = Lock()
g = 0

def add_one():
   """
   Just used for demonstration. It’s bad to use the ‘global’
   statement in general.
   """
   
   global g
   lock.acquire()
   g += 1
   lock.release()

def add_two():
   global g
   lock.acquire()
   g += 2
   lock.release()

threads = []
for func in [add_one, add_two]:
   threads.append(Thread(target=func))
   threads[-1].start()

for thread in threads:
   """
   Waits for threads to complete before moving on with the main
   script.
   """
   thread.join()

print(g)

3


This simply gives an output of 3, but now we are sure that the two functions are not changing the value of the global variable `g` simultaneously although they run on two different threads. Thus, Locks can be used to avoid inconsistent output by allowing only one thread to modify data at a time.

## Setting the thread level on cPython C extension libraries

One can control the number of threads used by the C extension libraries that employ threads. Think here numpy and scipy and anything that calls on them. 

We can set the number of threads external prior to launching python or internal within python prior to loading the package which has threading enabled. 

### External setting of threads: 

In [19]:
%%bash
# we would do this from the shell environment (bash here)
# for example we set the total thread count to 2 below
export OMP_NUM_THREADS=2

### Internal setting of threads: (done within python)

In [None]:
import os
os.environ["OMP_NUM_THREADS"] = "1"
import numpy as np