# How to use joblib.Memory



### In this notebook we will learn how to use the joblib.Memory


will be using the function and methods

## w/o joblib.Memory

`costly_compute` emulates a computtational expensive process which later will benefit from caching using joblib.Memory

In [11]:
import time 
import numpy as np

def costly_compute(data, column_index=0):
    """Simulate an expensive computation"""
    time.sleep(5)
    return data[column_index]


set random seet to generate deterministic data (data that is true, like a phone number for a customer)

In [12]:
rng = np.random.RandomState(42)
data = rng.randn(int(1e5), 10)
start = time.time()
data_trans = costly_compute(data)
end = time.time()

print('\nThe function took {:2f} s to compute.'.format(end - start))
print('\nThe transformed data are:\n {}'.format(data_trans)
     )


The function took 5.004489 s to compute.

The transformed data are:
 [ 0.49671415 -0.1382643   0.64768854  1.52302986 -0.23415337 -0.23413696
  1.57921282  0.76743473 -0.46947439  0.54256004]


## Caching the result of a funciton to avoid recomputing

No point in recomputing anything if its all the same. `joblib.Memory` enables to cache results from a function into a specific location.



In [16]:
from joblib import Memory
location = './cachedir'
memory = Memory(location, verbose=0)


def costly_compute_cached(data, column_index=0):
    """Simulate an expensive computation"""
    time.sleep(5)
    return data[column_index]

costly_compute_cached = memory.cache(costly_compute_cached)
start = time.time()
data_trans = costly_compute_cached(data)
end = time.time()

print('\nThe function took {:.2f} s to compute.'.format(end - start))
print('\nThe transformed data are:\n {}'.format(data_trans))
    


The function took 5.05 s to compute.

The transformed data are:
 [ 0.49671415 -0.1382643   0.64768854  1.52302986 -0.23415337 -0.23413696
  1.57921282  0.76743473 -0.46947439  0.54256004]


First results will be cached, therefore the computation time corresponds to the time to comptue the results plus time to dump results into disk.


In [22]:
start = time.time()
data_trans = costly_compute_cached(data)
end = time.time()
                  
print('\nThe function took {:.2f} s to compute.'.format(end - start))
print('\nThe transformed data are:\n {}'.format(data_trans))                  


The function took 0.01 s to compute.

The transformed data are:
 [ 0.49671415 -0.1382643   0.64768854  1.52302986 -0.23415337 -0.23413696
  1.57921282  0.76743473 -0.46947439  0.54256004]


At the second call, the computation time is reduced because we are using the previously dumped data from disk and not `recomputing`

## Using Joblib.Memory with a method



In [27]:
def _costly_compute_cached(data, column):
    time.sleep(5)
    return data[column]

class Algorithm(object):
    """A class which is using the previous function"""
    
    def __init__(self, column=0):
        self.column = column
        
    def transform(self, data):
        costly_compute = memory.cache(_costly_compute_cached)
        return costly_compute(data, self.column)

transformer = Algorithm()
start = time.time()
data_trans = transformer.transform(data)
end = time.time()

    
print('\nThe function took {:.2f} s to compute.'.format(end - start))
print('\nThe transformed data are:\n {}'.format(data_trans))    


The function took 0.01 s to compute.

The transformed data are:
 [ 0.49671415 -0.1382643   0.64768854  1.52302986 -0.23415337 -0.23413696
  1.57921282  0.76743473 -0.46947439  0.54256004]


In [None]:
start = time.time()
data_trans = transformer.transform(data)