# Store and load `skopt` optimization results

Mikhail Pak, October 2016.

In [1]:
import numpy as np
np.random.seed(777)

## Problem statement

We often want to store optimization results in a file. This can be useful, for example,

* if you want to share your results with colleagues;
* if you want to archive and/or document your work;
* or if you want to postprocess your results in a different Python instance or on an another computer.

The process of converting an object into a byte stream that can be stored in a file is called _serialization_.
Conversely, _deserialization_ means loading an object from a byte stream.

**Warning:** Deserialization is not secure against malicious or erroneous code. Never load serialized data from untrusted or unauthenticated sources!

## Simple example

We will use the same optimization problem as in the [`bayesian-optimization.ipynb`](https://github.com/scikit-optimize/scikit-optimize/blob/master/examples/bayesian-optimization.ipynb) notebook:

In [2]:
from skopt import gp_minimize

noise_level = 0.1

def obj_fun(x, noise_level=noise_level):
    return np.sin(5 * x[0]) * (1 - np.tanh(x[0] ** 2)) + np.random.randn() * noise_level

res = gp_minimize(obj_fun,            # the function to minimize
                  [(-2.0, 2.0)],      # the bounds on each dimension of x
                  x0=[0.],            # the starting point
                  acq_func="LCB",     # the acquisition function (optional)
                  n_calls=15,         # the number of evaluations of f including at x0
                  n_random_starts=0,  # the number of random initialization points
                  random_state=777)

As long as your Python session is active, you can access all the optimization results via the `res` object.

So how can you store this data in a file? We will present two most common alternatives:

* using the [`pickle`](https://docs.python.org/3/library/pickle.html) module;
* using the [`joblib`](https://pythonhosted.org/joblib/) module.

Of course, there also are other, less frequently used modules for data serialization, e.g. [`dill`](http://trac.mystic.cacr.caltech.edu/project/pathos/wiki/dill/User_Guide.html).

## Using `pickle`

`pickle` is the most widespread data serialization library. In fact, it is a part of the Python standard library, i.e. you do not have to install any additional libraries.

Let's try to pickle our result to a file named `result_pickle.pkl`:

In [3]:
import pickle

with open('result_pickle.pkl', 'wb') as f:
    pickle.dump(res, f)

Now we can load this file using `pickle.load()`:

In [4]:
with open('result_pickle.pkl', 'rb') as f:
    res_pickle_loaded = pickle.load(f)

res_pickle_loaded.fun

-0.17487957729512074

## Using `joblib`

`joblib` is an external library for lightweight pipelining in Python. It also conveniently provides `joblib.dump()` and `joblib.load()` functions for serialization and deserialization which are optimized for large NumPy array.

If you have not used it before, install joblib by typing `pip install joblib` in your terminal. The usage is very similar to the `pickle` alternative:

In [5]:
import joblib

with open('result_joblib.pkl', 'wb') as f:
    joblib.dump(res, f)

In fact, you can skip opening the file manually and specify the filename as the second argument:

In [6]:
joblib.dump(res, 'result_joblib.pkl');

Use `joblib.load()` to load from file:

In [7]:
res_loaded_joblib = joblib.load('result_joblib.pkl')

res_loaded_joblib.fun

-0.17487957729512074

## What should I use?

The main advantage of `pickle` is that it is a part of the standard library. Hence, stored results can be loaded virtually everywhere, provided the same Python version. On the other side, `joblib` is better for big data; you should consider it if you have very large objects, e.g. if the number of function evaluations is very large. 

## Possible problems

* __Python versions incompatibility:__ In general, objects serialized in Python 2 cannot be deserialized in Python 3 and vice versa. This is true for both `pickle` and `joblib`.
* __Security issues:__ Once again, do not load any files from untrusted sources.
* __Unable to serialize the objective function:__ This happens if your objective function is non-trivial (e.g. you call MATLAB from Python) and cannot be serialized using either of the libraries. In this case, a possible workaround is to create a deep copy of the results object and to delete the function before serializing it: 

In [8]:
from copy import deepcopy

res_no_fun = deepcopy(res)
del res_no_fun.specs['args']['func']
joblib.dump(res_no_fun, 'result_joblib_no_fun.pkl')

res_loaded_no_fun = joblib.load('result_joblib_no_fun.pkl')

Notice that the objective function is missing in the loaded object:

In [9]:
res_loaded_no_fun.specs['args'].keys()

dict_keys(['base_estimator', 'xi', 'kappa', 'dimensions', 'n_restarts_optimizer', 'x0', 'n_random_starts', 'callback', 'n_points', 'acq_optimizer', 'n_calls', 'verbose', 'acq_func', 'y0', 'random_state'])

You can also directly delete the `.specs['args']['func']` field in the original result if you are sure you will not need it later.

Be careful: If you create a (shallow) copy (i.e. `copy.copy()`) of the results object and then delete `.specs['args']['func']` in the copied object, this field will be deleted in the original object as well.