# Parallization

**Table of contents**<a id='toc0_'></a>    
- 1. [Serial problem](#toc1_)    
- 2. [Parallization with joblib](#toc2_)    
- 3. [Parallization with Numba](#toc3_)    
- 4. [Limitations](#toc4_)    

<!-- vscode-jupyter-toc-config
	numbering=true
	anchor=true
	flat=false
	minLevel=2
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

You will be introduced to how to use the **parallization**.

In [2]:
import time
import joblib

import numpy as np
import numba as nb

from scipy import optimize

import matplotlib.pyplot as plt
plt.rcParams.update({"axes.grid":True,"grid.color":"black","grid.alpha":"0.25","grid.linestyle":"--"})
plt.rcParams.update({'font.size': 14})
import quantecon as qe # pip install quantecon

Collecting quantecon
  Downloading quantecon-0.7.0-py3-none-any.whl (214 kB)
     ---------------------------------------- 0.0/214.8 kB ? eta -:--:--
     --------------------- ---------------- 122.9/214.8 kB 3.6 MB/s eta 0:00:01
     -------------------------------------- 214.8/214.8 kB 3.3 MB/s eta 0:00:00
Installing collected packages: quantecon
Successfully installed quantecon-0.7.0
Note: you may need to restart the kernel to use updated packages.


In [3]:
import psutil
CPUs = psutil.cpu_count()
CPUs_list = set(np.sort([1,2,4,*np.arange(8,CPUs+1,4)])) 
print(f'This computer has {CPUs} CPUs')
print(f'{CPUs_list = }')

This computer has 8 CPUs
CPUs_list = {8, 1, 2, 4}


## 1. <a id='toc1_'></a>[Serial problem](#toc0_)

Assume we need to **solve the following optimization problem**

In [None]:
def solver(alpha,beta,gamma):

    def obj(x):
        return (x[0]-np.exp(alpha))**2 + (x[1]-np.exp(beta))**2 + (x[2]-np.exp(gamma))**2

    return optimize.minimize(obj,np.array([0.0,0.0,0.0]),method='Nelder-Mead')

$n$ times:

In [None]:
n = 4000
alphas = np.random.uniform(size=n)
betas = np.random.uniform(size=n)
gammas = np.random.uniform(size=n)

def serial_solver(alphas,betas,gammas):
    results = [solver(alpha,beta,gamma) for (alpha,beta,gamma) in zip(alphas,betas,gammas)]
    return [result.x for result in results]

%time xopts = serial_solver(alphas,betas,gammas)

Wall time: 32.4 s


## 2. <a id='toc2_'></a>[Parallization with joblib](#toc0_)

**Joblib** can be used to run python code in **parallel**.

1. ``joblib.delayed(FUNC)(ARGS)`` create a task to call  ``FUNC`` with ``ARGS``.
2. ``joblib.Parallel(n_jobs=K)(TASKS)`` execute the tasks in ``TASKS`` in ``K`` parallel processes.


In [None]:
def parallel_solver_joblib(alphas,betas,gammas,n_jobs=1):

    tasks = (joblib.delayed(solver)(alpha,beta,gamma) for (alpha,beta,gamma) in zip(alphas,betas,gammas))
    results = joblib.Parallel(n_jobs=n_jobs)(tasks)
    
    return [result.x for result in results]
    
for n_jobs in CPUs_list:
    print(f'n_jobs = {n_jobs}')
    %time xopts = parallel_solver_joblib(alphas,betas,gammas,n_jobs=n_jobs)
    print(f'')

n_jobs = 8
Wall time: 16.4 s

n_jobs = 1
Wall time: 30.9 s

n_jobs = 2
Wall time: 17.2 s

n_jobs = 4
Wall time: 12.4 s



**Drawback:** The inputs to the functions are serialized and copied to each parallel process.

[More on Joblib](https://joblib.readthedocs.io/en/latest/index.html) ([examples](https://joblib.readthedocs.io/en/latest/parallel.html))

## 3. <a id='toc3_'></a>[Parallization with Numba](#toc0_)

Using solver from `QuantEcon` (see [documentation](https://quanteconpy.readthedocs.io/en/latest/index.html)).

In [None]:
@nb.njit
def solver_nb(alpha,beta,gamma):

    def obj(x,alpha,beta,gamma):
        return (x[0]-alpha)**2 + (x[1]-beta)**2 + (x[2]-gamma)**2

    res = qe.optimize.nelder_mead(obj,np.array([0.0,0.0,0.0]),args=(alpha,beta,gamma))

    return res.x


**Serial version:**

In [None]:
@nb.njit
def serial_solver_nb(alphas,betas,gammas):

    n = alphas.size
    xopts = np.zeros((n,3))

    for i in range(n):
        xopts[i,:] = solver_nb(alphas[i],betas[i],gammas[i])

%time serial_solver_nb(alphas,betas,gammas)
%time serial_solver_nb(alphas,betas,gammas)

Wall time: 19.5 s
Wall time: 3.9 s


**Parallel version:**

In [None]:
@nb.njit(parallel=True)
def parallel_solver_nb(alphas,betas,gammas):

    n = alphas.size
    xopts = np.zeros((n,3))

    for i in nb.prange(n):
        xopts[i,:] = solver_nb(alphas[i],betas[i],gammas[i])

%time parallel_solver_nb(alphas,betas,gammas)
%time parallel_solver_nb(alphas,betas,gammas)

Wall time: 15.3 s
Wall time: 3.21 s


## 4. <a id='toc4_'></a>[Limitations](#toc0_)

**Parallization** can not always be used. Some problems are inherently sequential. 


If the result from a previous iteration of the loop is required in a later iteration, the cannot be executed seperately in parallel.<br>(except in some special cases such as summing). 

The larger the proportion of the code, which can be run in parallel is, the larger the potential speed-up is.<br>
This is called **Amdahl's Law**.

<img src="https://github.com/NumEconCopenhagen/lectures-2019/raw/master/11/amdahls_law.png" alt="amdahls_law" width=40% />