In [None]:
import warnings
warnings.filterwarnings('ignore')

# References 
The material and images presented here are with reference to the following sites: 
* [Reference1](http://www.admin-magazine.com/HPC/Articles/Parallel-Python-with-Joblib)
* [joblib-examples](https://joblib.readthedocs.io/en/latest/auto_examples/index.html)


# How to make your computation Faster !! 

---
# Joblib

* A set of tools to provide **lightweight pipelining in python**
    * In particular
        * transparent disk-caching 
        * embarrassingly parallel computing 

allows to: 
* easily parallel computations in python
* avoid repetetive and costly computations
* store intermediate results to warm start experience
* have memory-map to an array stored on disk

## Embarrassingly parallel computing 


* One definition: 

```
    Problems involving input objects that can be independently and concurrently processed are referred as Embarrassingly parallel
    
```
* Example in our daily routine: **Gridsearch**



* Parallel processing of task on multiple CPUs
<img src="./images/job-F01_reference.jpg" alt="drawing" width="700"/>

In [None]:
import time 
def f(x):
    time.sleep(0.5)
    return sqrt(x)

In [None]:
%%time
from math import sqrt
from joblib import Parallel, delayed
Parallel(n_jobs=-1)(delayed(f)(i) for i in range(10))


In [None]:
%%time
from math import sqrt
[f(i) for i in range(10)]

<div class="alert alert-success">

<b>EXERCISE</b>:

<ul>
 <li> Compare the parallel processing using joblib and multiprocessing library by looking at <b>parallel_joblib.py</b> and <b>parallel_multiprocessing</b> 
  </li>
 <li> <b>Note:</b>  run the two process in terminal not in notebook </li>
</ul>

</div>

## Caching: avoiding repetetive and costly computations
* Joblibe provides a chache method that can be used as a decorator for a function with one ore more arguments
* Using cache the results are saved on disk by the memory objects 
* This means: **If the results are already in the cache the function wont compute them again !!**

* Memory cache of joblib
<img src="./images/job-F02_reference.jpg" alt="drawing" width="700"/>


In [None]:
from joblib import Memory 

In [None]:
memory = Memory()

In [None]:
@memory.cache
def f(x):
    time.sleep(0.5)
    return sqrt(x)

In [None]:
%%time
Parallel(n_jobs=-1)(delayed(f)(i) for i in range(10))

<div class="alert alert-success">

<b>EXERCISE</b>:

<ul>
 <li> Lets look at <b> memory_function.py</b> and run the script for better understanding of the memory function.</li>

</ul>

</div>

In [None]:
%run memory_function.py

## Storing intermediate results
* This information are inspired from this [example](https://joblib.readthedocs.io/en/latest/auto_examples/nested_parallel_memory.html#sphx-glr-auto-examples-nested-parallel-memory-py) in joblib library

* Using `joblib.Memory` & `joblib.Parallel` we will cache intermediate results

* Lets have a look at `nested_parallel_memory.py`

In [None]:
%run nested_parallel_memory.py

## Memmap
* Looking at `joblib.Parallel` help we have these two parameters: 

```

max_nbytes int, str, or None, optional, 1M by default
      Threshold on the size of arrays passed to the workers that
      triggers automated memory mapping in temp_folder. 
      Can be an int in Bytes, or a human-readable string, e.g., '1M' for 1 megabyte.
      Use None to disable memmapping of large arrays.
      Only active when backend="loky" or "multiprocessing".
    
mmap_mode: {None, 'r+', 'r', 'w+', 'c'}
       Memmapping mode for numpy arrays passed to workers.
       See 'max_nbytes' parameter documentation for more details
 
```

* These ability is useful while dealing with large data or arrays. Which avoids copying the entire data for each processing and just reads the data from memory map located on the disk

In [None]:
%run parallel_memmap.py