# Parallel Computing

## Test Functions

In [1]:
import time

In [2]:
def long_io(i, res_dict): # a function simulating IO, user input, etc. (no computation)
    print(f'function {i} started at {time.ctime()}')
    time.sleep(2)
    res = i*2
    res_dict[i] = res
    print(f'function {i} finished at {time.ctime()}')
    return res

In [3]:
res = {}
[long_io(i, res) for i in range(4)]
res

function 0 started at Mon Jul  1 20:16:53 2019
function 0 finished at Mon Jul  1 20:16:55 2019
function 1 started at Mon Jul  1 20:16:55 2019
function 1 finished at Mon Jul  1 20:16:57 2019
function 2 started at Mon Jul  1 20:16:57 2019
function 2 finished at Mon Jul  1 20:16:59 2019
function 3 started at Mon Jul  1 20:16:59 2019
function 3 finished at Mon Jul  1 20:17:01 2019


{0: 0, 1: 2, 2: 4, 3: 6}

In [4]:
def long_comp(i, res_dict): # a function involving heavy computation
    print(f'function {i} started at {time.ctime()}')
    x = i
    for y in range(5000000):
        if x % 2 == 0:
            x = (x-y)**2
        else:
            x = (x+y)**0.5
    res_dict[i] = x
    print(f'function {i} finished at {time.ctime()}')
    return x

In [5]:
res = {}
[long_comp(i, res) for i in range(4)]
res

function 0 started at Mon Jul  1 20:17:01 2019
function 0 finished at Mon Jul  1 20:17:03 2019
function 1 started at Mon Jul  1 20:17:03 2019
function 1 finished at Mon Jul  1 20:17:06 2019
function 2 started at Mon Jul  1 20:17:06 2019
function 2 finished at Mon Jul  1 20:17:08 2019
function 3 started at Mon Jul  1 20:17:08 2019
function 3 finished at Mon Jul  1 20:17:10 2019


{0: 2236.5678097446853,
 1: 2236.5678097446853,
 2: 2236.5678097446853,
 3: 2236.5678097446853}

Time intensive functions can be either CPU-limited (calculations) or limited by other factors (such as IO, network response times, user interactions). For both cases, test functions are defined.

## Multithreading

In [6]:
import threading

In [7]:
# non-computational limited function
threads = []
res = {}
for i in range(4):
    threads.append(threading.Thread(target=long_io, args=(i,res)))
    threads[i].start()
print(f'all threads started at {time.ctime()}')
for i in range(4):
    threads[i].join()
print(f'all threads finished at {time.ctime()}')
res

function 0 started at Mon Jul  1 20:17:10 2019
function 1 started at Mon Jul  1 20:17:10 2019
function 2 started at Mon Jul  1 20:17:10 2019
function 3 started at Mon Jul  1 20:17:10 2019
all threads started at Mon Jul  1 20:17:10 2019
function 0 finished at Mon Jul  1 20:17:12 2019
function 1 finished at Mon Jul  1 20:17:12 2019
function 2 finished at Mon Jul  1 20:17:12 2019
function 3 finished at Mon Jul  1 20:17:12 2019
all threads finished at Mon Jul  1 20:17:12 2019


{0: 0, 1: 2, 2: 4, 3: 6}

Using Multithreading, executing all 4 functions take the same time as executing a single function.


Threads use the same memory space as the main process, thus one could use data structures like dictionaries to pass information from and to the threads.

In [8]:
# computational limited function
threads = []
res = {}
for i in range(4):
    threads.append(threading.Thread(target=long_comp, args=(i,res)))
    threads[i].start()
print(f'all threads started at {time.ctime()}')
for i in range(4):
    threads[i].join()
print(f'all threads finished at {time.ctime()}')
res

function 0 started at Mon Jul  1 20:17:12 2019
function 1 started at Mon Jul  1 20:17:12 2019
function 2 started at Mon Jul  1 20:17:13 2019
function 3 started at Mon Jul  1 20:17:13 2019
all threads started at Mon Jul  1 20:17:13 2019
function 0 finished at Mon Jul  1 20:17:21 2019
function 3 finished at Mon Jul  1 20:17:21 2019
function 2 finished at Mon Jul  1 20:17:21 2019
function 1 finished at Mon Jul  1 20:17:22 2019
all threads finished at Mon Jul  1 20:17:22 2019


{0: 2236.5678097446853,
 3: 2236.5678097446853,
 2: 2236.5678097446853,
 1: 2236.5678097446853}

For the computational intensive function, running 4 instances as thread takes nearly 4 times as long as running a single instance.
This is true even for multiple processors.

Reason: The Global Interpreter Lock (GIL) in CPython allows only one CPU access at a time for one process.

## Multiprocessing

In [9]:
import multiprocessing

In [10]:
# non-computational limited function
pool = multiprocessing.Pool(processes=4)
processes = []
res = {}
for i in range(4):
    processes.append(pool.apply_async(long_io, args=(i,res)))
print(f'all processes started at {time.ctime()}')
pool.close() # close pool so that it does not accept further submissions
pool.join() # wait until all processes are finished
print(f'all processes finished at {time.ctime()}')
res

function 1 started at Mon Jul  1 20:17:22 2019
function 0 started at Mon Jul  1 20:17:22 2019
function 3 started at Mon Jul  1 20:17:22 2019
function 2 started at Mon Jul  1 20:17:22 2019
all processes started at Mon Jul  1 20:17:22 2019
function 0 finished at Mon Jul  1 20:17:24 2019
function 1 finished at Mon Jul  1 20:17:24 2019
function 3 finished at Mon Jul  1 20:17:24 2019
function 2 finished at Mon Jul  1 20:17:24 2019
all processes finished at Mon Jul  1 20:17:24 2019


{}

Note that the dictionary passed as function parameter is not updated by the processes (in contrast to the threads shown above). This is because the spawned processes do not share memory with each other / the main process.

The return value of the functions can be obtained using:

In [11]:
[process.get() for process in processes]

[0, 2, 4, 6]

In [12]:
# computational limited function
pool = multiprocessing.Pool(processes=4)
processes = []
res = {}
for i in range(4):
    processes.append(pool.apply_async(long_comp, args=(i,res)))
print(f'all processes started at {time.ctime()}')
pool.close() # close pool so that it does not accept further submissions
pool.join() # wait until all processes are finished
print(f'all processes finished at {time.ctime()}')
res

function 0 started at Mon Jul  1 20:17:24 2019
function 1 started at Mon Jul  1 20:17:24 2019
function 2 started at Mon Jul  1 20:17:24 2019
function 3 started at Mon Jul  1 20:17:24 2019
all processes started at Mon Jul  1 20:17:24 2019
function 0 finished at Mon Jul  1 20:17:27 2019
function 1 finished at Mon Jul  1 20:17:27 2019
function 3 finished at Mon Jul  1 20:17:27 2019
function 2 finished at Mon Jul  1 20:17:27 2019
all processes finished at Mon Jul  1 20:17:27 2019


{}

In [13]:
{i: process.get() for i, process in enumerate(processes)}

{0: 2236.5678097446853,
 1: 2236.5678097446853,
 2: 2236.5678097446853,
 3: 2236.5678097446853}

Using Multiprocessing, the instances of the computational intensive function were exectuted in parallel (here on 4 cores), resulting in the same calculation time as for a single instance.


Different processes do not share memory with each other and the main process (in contrast to threads). This may sound like a disadvantage compared to threads, but is actually in most cases an advantage:

* The GIL is avoided using Multiprocessing (the reason for GIL is to avoid memory conflicts, which could not happen here), allowing parallelization of computational-intensive functions on multiple CPU cores.
* Pure functions, where all input is given as function arguments and all output is in the return value, work fine with multiprocessing.
* Side-effects due to global variables or mutable data types are avoided. The code is enforced to be cleaner and more modular.

## Conclusion

* For non-calculation bound processes, like IO, user interactions, network responses, use Threading because it is light-weight and creates less overhead.
* For calculation-bound processes, use Multiprocessing. In Python, there is no benefit using Multithreading in this case.

More information is given here:
https://medium.com/@bfortuner/python-multithreading-vs-multiprocessing-73072ce5600b