# Caching expensive calculations

Say we have a calculation that is expensive to perform. Additionally, let’s make the results of the calculation interesting enough that we would like to access them from various places in our code.

In [1]:
from functools import lru_cache
from collections import Counter
import time
import torch
import numpy as np
from torchvision.datasets import ImageFolder

In [2]:
class ExtendedFolder(ImageFolder):
    def __init__(self): pass
    
    def get_img_dims(self):
        time.sleep(5) # looking at a lot of files
        return [(10, 20), (10, 20), (40, 40)]
    
class Analytics():
    def __init__(self, dataset):
        self.dataset = dataset
        
    def get_dim_counts(self):
        return Counter(self.dataset.get_img_dims())
    
def get_noise_masks(dims):
    return [np.random.randn(*dim) for dim in dims]

In [3]:
%%time
dataset = ExtendedFolder()
analytics = Analytics(dataset)

print(f'Counts per dimension: {analytics.get_dim_counts()}')
# getting noise masks to experiment with regularization
msks = get_noise_masks(dataset.get_img_dims()) # making a second call to dataset.get_img_dims()

Counts per dimension: Counter({(10, 20): 2, (40, 40): 1})
CPU times: user 4 ms, sys: 4 ms, total: 8 ms
Wall time: 10 s


How can we access the results without having to run the calculation again? We could store them somewhere and run the calculation conditionally.

In [4]:
class ExtendedFolder(ImageFolder):
    def __init__(self):
        self._img_dims = None
    
    def get_img_dims(self):
        if not self._img_dims:
            time.sleep(5) # looking at a lot of files
            self._img_dims = [(10, 20), (10, 20), (40, 40)]
        return self._img_dims
    
class Analytics():
    def __init__(self, dataset):
        self.dataset = dataset
        
    def get_dim_counts(self):
        return Counter(self.dataset.get_img_dims())
    
def get_noise_masks(dims):
    return [np.random.randn(*dim) for dim in dims]

In [5]:
%%time
dataset = ExtendedFolder()
analytics = Analytics(dataset)

print(f'Counts per dimension: {analytics.get_dim_counts()}')
msks = get_noise_masks(dataset.get_img_dims())

Counts per dimension: Counter({(10, 20): 2, (40, 40): 1})
CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 5.01 s


But this is not very elegant and requires additional lines of code. We can do better.

There exists a technique tailored solving this specific problem. It is called [memoization](https://en.wikipedia.org/wiki/Memoization). In Python, all it takes is adding the lru_cache decorator to enable it.

In [6]:
class ExtendedFolder(ImageFolder):
    def __init__(self): pass
    
    @lru_cache()
    def get_img_dims(self):
        time.sleep(5) # looking at a lot of files
        return [(10, 20), (10, 20), (40, 40)]
    
class Analytics():
    def __init__(self, dataset):
        self.dataset = dataset
        
    def get_dim_counts(self):
        return Counter(self.dataset.get_img_dims())
    
def get_noise_masks(dims):
    return [np.random.randn(*dim) for dim in dims]

In [7]:
%%time
dataset = ExtendedFolder()
analytics = Analytics(dataset)

print(f'Counts per dimension: {analytics.get_dim_counts()}')
msks = get_noise_masks(dataset.get_img_dims())

Counts per dimension: Counter({(10, 20): 2, (40, 40): 1})
CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 5.01 s


Our code is clean and easy to read.

On top of that, we get full benefits of memoization!
Our results get cached on a per argument basis.

In [8]:
arr_A = np.random.rand(1000, 1000)

@lru_cache()
def invert_and_multiply(coeff):
    np.linalg.inv(arr_A) * coeff

In [9]:
%%time
invert_and_multiply(2)

CPU times: user 760 ms, sys: 28 ms, total: 788 ms
Wall time: 697 ms


In [10]:
%%time
invert_and_multiply(2)

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 11.9 µs


In [11]:
%%time
invert_and_multiply(5)

CPU times: user 572 ms, sys: 12 ms, total: 584 ms
Wall time: 385 ms


We get access to diagnostic information.

In [12]:
invert_and_multiply.cache_info()

CacheInfo(hits=1, misses=2, maxsize=128, currsize=2)

And we can expire the cache at will.

In [13]:
%%time
invert_and_multiply(2)

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 11.9 µs


In [14]:
invert_and_multiply.cache_clear()

In [15]:
%%time
invert_and_multiply(2)

CPU times: user 460 ms, sys: 0 ns, total: 460 ms
Wall time: 250 ms
