# Caching function results
This notebook demonstrates how one can use the MLRepo to cache function calls. Especially when working interactively using jupyter notebooks it may be usefull to 
cache function values for time consuming functions. In this notebook we demonstrate how to use pailab's ml_cache decorator to realize caching for functions.
## Setup

In [9]:
import numpy as np
from pailab.ml_repo.repo import MLRepo, MLObjectType
from pailab.ml_repo.repo_objects import RawData, RepoInfoKey, RepoObject

import logging
logging.basicConfig(level=logging.ERROR)

In [10]:
ml_repo = MLRepo(user='job_runner_user')

For demonstration we use a simple example: We just create two RawData objects, one holding input data of a function evaluated at random points and one artificially created model output (where we just add a andom number to the function values to simulate some kind of model output with error).

In [11]:
x = np.random.rand(10000, 3)
y = x[:,1]*x[:,2]+ x[:,0]
y_approx =y + x[:,1]*np.random.rand(10000)
data = RawData(x, ['x0', 'x1', 'x2'], y, ['f'], repo_info={RepoInfoKey.NAME: 'eval', RepoInfoKey.CATEGORY: MLObjectType.TRAINING_DATA})
data_eval = RawData(y_approx, ['eval'],  repo_info={RepoInfoKey.NAME: 'error', RepoInfoKey.CATEGORY: MLObjectType.EVAL_DATA})

In [12]:
ml_repo.add([data, data_eval])

{'error': 'd1b75694-2d49-11ea-a433-fc084a6691eb',
 'eval': 'd1b75694-2d49-11ea-a433-fc084a6691eb'}

## Using the ml_cache decorator

We define a simple function computing the pointwise distance from the model function values to the input function values. To show what can be cached the function returns  a string, a double value, a numpy array and the numpy array encapsulated in a RawData object.
A print statement shows if the function has been executed. We use the ml_cache decorator to avoid evaluations if inputs have not been changed.

In [13]:
from pailab.tools.tools import ml_cache

@ml_cache
def eval_error(input_data, eval_data, factor):
    error = input_data.y_data-eval_data.x_data
    error_repo = RawData(error, ['error'], repo_info={RepoInfoKey.NAME: 'error', RepoInfoKey.CATEGORY: MLObjectType.CACHED_VALUE})
    error_mean = error.mean()
    print('Has been evaluated!')
    return 'super', factor*error_mean, y, error_repo, error

We evaluate the function for the first time.

In [14]:
results = eval_error(data, data_eval, 1.0, cache = ml_repo)
print(results)

Has been evaluated!
('super', -0.25087897301168705, array([1.36528749, 1.14104518, 0.91320373, ..., 1.00610814, 1.02412198,
       1.14861662]), <pailab.ml_repo.repo_objects.RawData object at 0x000001E93C796630>, array([[-0.62862863],
       [-0.48248166],
       [-0.46978257],
       ...,
       [-0.02572784],
       [-0.13859842],
       [-0.20757022]]))


If we call the function again using the same arguments, the function is not executed, but results are returned.

In [18]:
results = eval_error(data, data_eval, 1.0, cache = ml_repo)
print(results)

('super', -0.25087897301168705, array([1.36528749, 1.14104518, 0.91320373, ..., 1.00610814, 1.02412198,
       1.14861662]), <pailab.ml_repo.repo_objects.RawData object at 0x000001E93C7B84E0>, array([[-0.62862863],
       [-0.48248166],
       [-0.46978257],
       ...,
       [-0.02572784],
       [-0.13859842],
       [-0.20757022]]))


Now we call the function with a modified argument which leads to a new function evaluatio.

In [20]:
results = eval_error(data, data_eval, 2.0, cache = ml_repo)
print(results)

Has been evaluated!
('super', -0.5017579460233741, array([1.36528749, 1.14104518, 0.91320373, ..., 1.00610814, 1.02412198,
       1.14861662]), <pailab.ml_repo.repo_objects.RawData object at 0x000001E93C7B8B70>, array([[-0.62862863],
       [-0.48248166],
       [-0.46978257],
       ...,
       [-0.02572784],
       [-0.13859842],
       [-0.20757022]]))


Using the previous argument returns the previous result without evaluating the function.

In [21]:
results = eval_error(data, data_eval, 1.0, cache = ml_repo)
print(results)

('super', -0.25087897301168705, array([1.36528749, 1.14104518, 0.91320373, ..., 1.00610814, 1.02412198,
       1.14861662]), <pailab.ml_repo.repo_objects.RawData object at 0x000001E93C7B8438>, array([[-0.62862863],
       [-0.48248166],
       [-0.46978257],
       ...,
       [-0.02572784],
       [-0.13859842],
       [-0.20757022]]))


In [22]:
from pailab.tools.tree import MLTree
MLTree.add_tree(ml_repo)


In [23]:
ml_repo.tree.cache.eval_error.history()

{'eval_error': [{'RepoInfoKey.AUTHOR': 'job_runner_user',
   'RepoInfoKey.COMMIT_DATE': '2020-01-02 11:22:46.355949',
   'RepoInfoKey.COMMIT_MESSAGE': '',
   'RepoInfoKey.NAME': 'eval_error'},
  {'RepoInfoKey.AUTHOR': 'job_runner_user',
   'RepoInfoKey.COMMIT_DATE': '2020-01-02 11:23:43.978998',
   'RepoInfoKey.COMMIT_MESSAGE': '',
   'RepoInfoKey.NAME': 'eval_error'}]}