# <i>On demand recomputing (disk-caching)</i> using [Joblib](https://joblib.readthedocs.io/en/latest/index.html)

### Install Joblib package

In [1]:
!pip install joblib



### Import the Memory class

In [2]:
from joblib import Memory

### Import other modules

In [3]:
import numpy as np
from time import sleep 

### Ignore any warnings raised by Jupyter notebook

In [4]:
import warnings
warnings.filterwarnings('ignore')

### Let's create our cache directory

In [5]:
pwd = "/kaggle/working/"
cache_dir = pwd + 'cache_dir'
mem = Memory(cache_dir)

#### Directory with name '/kaggle/working/cache_dir/' has been created

In [6]:
!ls -ld $pwd*/

drwxr-xr-x 3 root root 4096 May 15 11:46 /kaggle/working/cache_dir/


### Define some large inputs

In [7]:
input1 = np.vander(np.arange(10**4)).astype(np.float)
input2 = np.vander(np.random.uniform(low=0,high=10**5, size=5000))
print("Shape of input1: ",input1.shape)
print("Shape of input2: ",input2.shape)

Shape of input1:  (10000, 10000)
Shape of input2:  (5000, 5000)


<br>

## There are two ways to pass a function to Memory.cache

### Method 1: Passing a function to Memory.cache

#### Define function

In [8]:
def func(x):
    print("Example of Computationally intensive function!")
    print("The result is not cached for this particular input")
    sleep(4.0)
    return np.square(x)

#### Pass it to Memory.cache function

In [9]:
func_mem = mem.cache(func, verbose=0)

#### Before we begin, let's check the cache directory size

In [10]:
!du -sh $cache_dir

16K	/kaggle/working/cache_dir


#### Let's checkout some caching results

In [11]:
%%time
input1_result = func_mem(input1)

Example of Computationally intensive function!
The result is not cached for this particular input
CPU times: user 4.82 s, sys: 1.96 s, total: 6.78 s
Wall time: 10.8 s


In [12]:
%%time
input1_cache_result = func_mem(input1)

CPU times: user 1.61 s, sys: 498 ms, total: 2.11 s
Wall time: 2.1 s


#### Check the time difference in execution. When we fetch the results of <i>func_mem</i> with same parameters i.e. input1, we use the <span style="color:red">cached results instead of doing the computations again</span>. 

<i><u>Note</u>: The Memory.cache only caches the result returned by func_mem. Print statement result is not printed.</i>

#### Memory class uses fast cryptographic hashing of the input arguments to check if they have been computed

#### The result for input2 hasn't been cached

In [13]:
%%time
input2_result = func_mem(input2)

Example of Computationally intensive function!
The result is not cached for this particular input
CPU times: user 1.2 s, sys: 400 ms, total: 1.6 s
Wall time: 5.6 s


#### Notice the time difference in execution for the above code execution for input2

In [14]:
%%time
input2_cache_result = func_mem(input2)

CPU times: user 406 ms, sys: 118 ms, total: 524 ms
Wall time: 523 ms


<br>

#### Let's again check the cache directory size.

In [15]:
!du -sh $cache_dir

954M	/kaggle/working/cache_dir


#### *We see that there is change in size*

<br>

### Method 2: Memory.cache as a decorator

In [16]:
@mem.cache(verbose=0)
def func_as_decorator(x):
    print("Example of Computationally intensive function!")
    print("The result is not cached for this particular input")
    sleep(3.0)
    return np.square(x)

In [17]:
%%time
input1_decorator_result = func_as_decorator(input1)

Example of Computationally intensive function!
The result is not cached for this particular input
CPU times: user 4.79 s, sys: 1.91 s, total: 6.7 s
Wall time: 9.71 s


#### Notice the time difference in execution

In [18]:
%%time
input1_decorator_result = func_as_decorator(input1)

CPU times: user 1.59 s, sys: 623 ms, total: 2.21 s
Wall time: 2.21 s


<br>

## Using Memmapping (memory mapping) if working with numpy

### Memmapping speeds up cache looking when reloading large numpy arrays

In [19]:
cache_dir2 = pwd + 'cache_dir2'
memory2 = Memory(cache_dir2, mmap_mode='c')

In [20]:
@memory2.cache(verbose=0)
def func_memmap(x):
    print("Example of Computationally intensive function!")
    print("The result is not cached for this particular input")
    sleep(3.0)
    return np.square(x)

In [21]:
%%time
input1_memmap = func_memmap(x=input1)

Example of Computationally intensive function!
The result is not cached for this particular input
CPU times: user 4.77 s, sys: 1.39 s, total: 6.16 s
Wall time: 9.16 s


In [22]:
%%time
input1_memmap = func_memmap(x=input1)

CPU times: user 1.51 s, sys: 0 ns, total: 1.51 s
Wall time: 1.51 s


[Check the time difference in execution when using memory map vs non memory map](#Notice-the-time-difference-in-execution)

<br>

## Clearning cache

### Clear function's cache

In [23]:
# Disk utilization before clearning function cache
!du -sh $cache_dir

1.7G	/kaggle/working/cache_dir


In [24]:
func_mem.clear()
func_as_decorator.clear()

In [25]:
# Disk utilization after clearning function cache
!du -sh $cache_dir

28K	/kaggle/working/cache_dir


#### Notice above the disk utilization of "*/kaggle/working/cache_dir*" before and after clearing function cache

### Erase complete cache directory

In [26]:
mem.clear()

#### Let's check if the cache directory has been cleared

In [27]:
!du -sh $cache_dir

8.0K	/kaggle/working/cache_dir


## Congratulations on completing disk-caching using [Joblib]! 

## Looking forward for your feedback in the comments section below
### If you liked this kernel please hit the Upvote button.

# Next - Learn how to parallelize `for loops` using [Joblib](https://joblib.readthedocs.io/en/latest/index.html) in the most easiest way: https://www.kaggle.com/karanpathak/parallelize-loops-using-joblib