Multithreading & Multiprocessing in python
==========================================

# General Understanding

Art of doing multiple task at the same time. 'Same time' is bit of an confusing term. If there are multiple processors then we can actually do multiple task simultaneously. Else we can atleast take advantage of process which are I/O blocked.  

## Multi-Threading
Multithreading in Python allows to take advantage when our operation is blocked by I/O. Like calling multiple webservices or database calls or network calls at the same time.

## Multi-Processing
If we want to take advantage of the multiple processors present on the machine we would need to use multiprocessing


## Example without multithreading or multiprocessing

In the below example we are executing code without multithreading.
It will take 4 seconds as there are two functions which take 2 seconds each.

In [None]:
import time
import threading

def calculate_raise_to(x, y):
    time.sleep(2)
    print([i ** y for i in x])

input = [1,2,3,4,5,6,7]
start = time.time()
calculate_raise_to(input, 2)
calculate_raise_to(input,3,)
print('time taken', round(time.time()-start,2) , 'seconds')

## Example with multithreading

In the below example we are executing code with multithreading.
It will take 2 seconds as there are two functions which are running in parallel and waiting also in parallel, due to the above it finishes in just more than 2 seeconds.

In [22]:
import time
import threading

def calculate_raise_to(x, y):
    time.sleep(2)
    print([i ** y for i in x])

input = [1,2,3,4,5,6,7]

start = time.time()
t1 = threading.Thread(target = calculate_raise_to, args=(input,2,))
t2 = threading.Thread(target = calculate_raise_to, args=(input,3,))
t1.start()
t2.start() 
t1.join()
t2.join()
print('time taken', round(time.time()-start,2) , 'seconds')

[1, 4, 9, 16, 25, 36, 49]
[1, 8, 27, 64, 125, 216, 343]
time taken 2.0 seconds


## Global variables with multithreading

Multithreading happens with in the same process, so the global variables are shared accross various threads.

Check in the below example that the global variables would be filled up with results

In [5]:
import threading

results = []

def calculate_raise_to(x, y):
    global results
    results.append([i ** y for i in x])
    print('results is',results)

input = [1,2,3,4,5,6,7]

t1 = threading.Thread(target = calculate_raise_to, args=(input,2,))
t1.start()
t1.join()
print('results is',results)

results is [[1, 4, 9, 16, 25, 36, 49]]
results is [[1, 4, 9, 16, 25, 36, 49]]


## Example with multithreading

In the below example we are executing code with multiprocessnig. It will take 2 seconds as there are two functions which are running on seperate processes which take 2 seconds each.

If you take a snapshot of process before and after the process are started, then we can notice that we have a couple of process added and then removed once the processes are finished

In [30]:
import time
import multiprocessing as mp
import psutil

def print_python_processes(msg):
    print(msg)
    for pid in psutil.process_iter():
        if('python' in str(pid)):
            print(str(pid))
        
def calculate_raise_to(x, y):
    time.sleep(5)
    print([i ** y for i in x])

input = [1,2,3,4,5,6,7]

start = time.time()
p1 = mp.Process(target = calculate_raise_to, args=(input,2,), name='Process_test_1_123')
p2 = mp.Process(target = calculate_raise_to, args=(input,3,), name='Process_test_2_abc')
print_python_processes('processes before we start our process')
p1.start()
p2.start() 
print_python_processes('processes after we start our process')
p1.join()
p2.join()
print_python_processes('processes after we end our process')
print('time taken', round(time.time()-start,2) , 'seconds')

processes before we start our process
psutil.Process(pid=1304, name='python3.7', started='09:08:53')
psutil.Process(pid=1361, name='python3.7', started='09:09:03')
psutil.Process(pid=1371, name='python3.7', started='09:09:14')
psutil.Process(pid=2570, name='python3.7', started='13:50:29')
psutil.Process(pid=3029, name='python3.7', started='16:31:37')
psutil.Process(pid=3034, name='python3.7', started='16:31:38')
processes after we start our process
psutil.Process(pid=1304, name='python3.7', started='09:08:53')
psutil.Process(pid=1361, name='python3.7', started='09:09:03')
psutil.Process(pid=1371, name='python3.7', started='09:09:14')
psutil.Process(pid=2570, name='python3.7', started='13:50:29')
psutil.Process(pid=3029, name='python3.7', started='16:31:37')
psutil.Process(pid=3034, name='python3.7', started='16:31:38')
psutil.Process(pid=3415, name='python3.7', started='17:00:45')
psutil.Process(pid=3416, name='python3.7', started='17:00:45')
[1, 4, 9, 16, 25, 36, 49]
[1, 8, 27, 64, 12

## Global variables with multiprocessing

Multiprocessing happens with on different process, so the global variables are on discrete/different address spaces and are not shared.

Check in the below example that the global variables would ***not*** be filled up with results

In [3]:
import multiprocessing

results = []

def calculate_raise_to(x, y):
    global results
    results.append([i ** y for i in x])
    print('results is',results)
    

input = [1,2,3,4,5,6,7]
t1 = multiprocessing.Process(target = calculate_raise_to, args=(input,2,))
t1.start()
t1.join()
print('results is',results)

results is [[1, 4, 9, 16, 25, 36, 49]]
results is []


## Share memory between processes

As discribed above we would need something special to share information across processes.

There are many ways, two of the them creating special arrays and values.

Couple of examples below will explain the same

In [9]:
import multiprocessing
import time

def calculate_raise_to(x, y, results6, value6):
    print('result before sleep is', results6[:])
    print('value before sleep is', value6.value)
    time.sleep(10)
    print('result after sleep is', results6[:])
    print('value after sleep is', value6.value)
    for index,value in enumerate(x):
        results6[index] = results6[index] + (value ** y)
    print('results before we return',results6[:]) 
    value6.value = 10
    print('value before we return', value6.value)



input7 = [1,2,3,4,5]
value7 = multiprocessing.Value('i')
value7.value = 1
result7 = multiprocessing.Array('i', len(input7))
t1 = multiprocessing.Process(target = calculate_raise_to, args=(input7,2,result7,value7))
t1.start()
time.sleep(5)
result7[0] = 10
value7.value = 2
t1.join()
print('results after we return',result7[:]) 
print('value after we return', value7.value)


result before sleep is [0, 0, 0, 0, 0]
value before sleep is 1
result after sleep is [10, 0, 0, 0, 0]
value after sleep is 2
results before we return [11, 4, 9, 16, 25]
value before we return 10
results after we return [11, 4, 9, 16, 25]
value after we return 10
