# Parallel programming for CPU

## process based parallelism via the concurrent.futures module 
----
----

## ProcessPoolExecutor (concurrent.futures)

The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the **Global Interpreter Lock** but also means that only picklable objects can be executed and returned. Each Process is a true system process without shared memory. If shared memory is needed the multiprocessing module provides features for sharing data and passing messages between them so that in many cases converting from threads to processes is as simple as changing a few import statements.
- “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy. Examples are `None`, `True`, and `False` ... integers, floating-point numbers, complex numbers ... strings, bytes, bytearrays ... tuples, lists, sets, and dictionaries containing only picklable objects.... and many more .... https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled

**NOTE** An important difference with processes vs threads is that each child process needd to import the script containing the target function. Therefore it is important to wrap the main part of the application with `__main__` to ensure this part is not executed by every child process. Alternatively the target function can be stored in a different file that can be then imported into the main.

In [1]:
%%writefile process_id.py
from concurrent import futures
import os
import time

def do_work(n):
    time.sleep(n)
    return (n, os.getpid())

if __name__ == '__main__':
    tasks = range(1,5)
    ex = futures.ProcessPoolExecutor(max_workers=len(tasks))
    results = ex.map(do_work, tasks)
    for n, pid in results:
        print('ran task {} in process {}'.format(n, pid))

Overwriting process_id.py


In [2]:
!python process_id.py

ran task 1 in process 3952291
ran task 2 in process 3952294
ran task 3 in process 3952295
ran task 4 in process 3952296


Just to check.... What are the PID's that area assigned to each thread using the ThreadPoolExecutor class?

In [3]:
from concurrent import futures
import os
import time

def do_work(n):
    time.sleep(n)
    return (n, os.getpid())

tasks = range(1,5)
ex = futures.ThreadPoolExecutor(max_workers=len(tasks))
results = ex.map(do_work, tasks)
for n, pid in results:
    print('ran task {} in process {}'.format(n, pid))

ran task 1 in process 3951951
ran task 2 in process 3951951
ran task 3 in process 3951951
ran task 4 in process 3951951


## Compute Bound code (threads vs Processes)

### Example of CPU bound code (Threads vs Processes)

In [182]:
%%writefile prime.py
from concurrent import futures
import math
import time
import random

def is_prime(n):
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False

    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
            
    return True

def processes(PRIMES,num_workers):
    
    ex = futures.ProcessPoolExecutor(max_workers=num_workers)
    fut = ex.map(is_prime, PRIMES)
    
    return(list(fut))

def threads(PRIMES,num_workers):
    
    ex = futures.ThreadPoolExecutor(max_workers=num_workers)
    fut = ex.map(is_prime, PRIMES)
    
    return(list(fut))

if __name__ == '__main__':
    
    
    PRIMES = [0, 2, 4, 5, 7, 10]

    # make it more intesive
    list_size = 1000
    PRIMES = [random.randrange(100000000000000, 500000000000000, 1) for i in range(list_size)]

    p_start = time.time()
    result_p = processes(PRIMES,16)
    p_end = time.time()
    t_start = time.time()
    result_t = threads(PRIMES,16)
    t_end = time.time()
    
    print("\n\nTimings:")
    print("Processes: {}".format(p_end - p_start))    
    print("Threads  : {}".format(t_end - t_start))
    
    #print(result_p)
    #print("N_prime ", result_p.count(True))
    

Overwriting prime.py


In [183]:
!python prime.py



Timings:
Processes: 2.0788538455963135
Threads  : 25.077348947525024


### Further Process control via the multiprocessing module 

The [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) - module mirrors threading, except that instead of a Thread class it provides a Process. Each Process is a true system process without shared memory, but multiprocessing provides features for sharing data and passing messages between them so that in many cases converting from threads to processes is as simple as changing a few import statements.

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the `Global Interpreter Lock` by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

The multiprocessing module also introduces APIs which do not have analogs in the threading module. A prime example of this is the `Pool` object which offers a convenient means of parallelizing the execution of a function across multiple input values, distributing the input data across processes (data parallelism). The following example demonstrates the common practice of defining such functions in a module so that child processes can successfully import that module.