## Joblib

pitfalls: resource contention, we want to be parallelizing tasks to match our resources, e.g. we have multiple CPU, thus ideally we want to parallelize things that is going to use the CPU not something like reading/writing to disk/database since we usually only have 1 of them. And our different parallel tasks will be contending and fighting for that one shared resources.


common operation: map

M = function(item) ...
map(M, items) = function(item1), function(item2)

In [1]:
import multiprocessing
from joblib import Parallel, delayed
n_cores = multiprocessing.cpu_count()
n_cores

8

In [3]:
def square(num):
    """a function that we wish to run in parallel"""
    return num * num

# we run the Parallel function specifying the number of cores,
# wrap our function in the delayed function and specify the range
# or the input items we want to iterate over

# it's also going to take care of making sure everything gets returned
# in order
Parallel(n_jobs = n_cores)( delayed(square)(i) for i in range(10) )

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [4]:
# modify A in place
A = [0, 0, 0, 0]

def update_array(i):
    global A
    A[i] = i

# when we run this in a normal for loop
for i in range(4):
    update_array(i)
    
A

[0, 1, 2, 3]

In [8]:
A = [0, 0, 0, 0]

# when we run the parallel task, each cores will get a copy of
# the array A, thus it's not going to modify the global array A
Parallel(n_jobs = -1)( delayed(update_array)(i) for i in range(4) )
A

[None, None, None, None]

In [9]:
# so the better way to do this is to simply have function that
# returns the information from the parallelized task
def update_array2(i):
    return i

Parallel(n_jobs = -1)( delayed(update_array2)(i) for i in range(4) )

[0, 1, 2, 3]

In [None]:
# example of the scikit-learn library,
# changing n_jobs

- [Youtube: Parallel Programming in R and Python](https://www.youtube.com/watch?v=FIS_LsOzxYo)

## Parallel Processing

Modern computers often have multiple cores, we should leverage the capabilities of these processors to increase performance. To do so, we need to divide a problem into sub-problems and tackle them independently. We'll first look at the `multiprocessing` library.

The following sections follows the tutorial given in the Youtube link below.

- [Youtube: Multiprocessing - Intermediate Python Programming p.10](https://www.youtube.com/watch?v=oEYDqQ1pq9o&t=599s)
- [Youtube: Getting returned values from Processes - Intermediate Python Programming p.11](https://www.youtube.com/watch?v=kUKOEuPJXGc)

In [None]:
from multiprocessing import Process

def spawn(num):
    print('Spawned {}'.format(num))

if __name__ == '__main__':
    for i in range(5):
        # initiate a process using the function as the target,
        # then pass in the argument to the function in the args
        # argument; the argument expects a tuple so (i,) allows
        # us to create a one-element tuple, after that we can
        # start the process   
        p = Process( target = spawn, args = (i,) )
        p.start()
        
        # here we're calling .join() to
        # join the process, this is basically waiting for 
        # each individual process to complete so they come in the
        # pre-defined order, we wouldn't need this if task is
        # independent of each other and the order of completion
        # also does not matter
        p.join()

Next, we'll see how we can introduce communications between processes, so instead of printing stuff out, we can return values.

In [None]:
from multiprocessing import Pool

def job(num):
    return num * 2

# we create a Pool, and the processes argument refers
# to how many stuff will be processed at a single time
# after that we call map to apply the function to an iterable.
# to our function, we can have multiple map (task)
with Pool(processes = 10) as p:
    data = p.map( job, range(5) )

data