# Loading and Saving - Python / Theano Tutorial

Following

http://deeplearning.net/software/theano/tutorial/loading_and_saving.html

Python's standard way of saving class instances is `pickle`. Many Theano objects can be serialized and deserialed using it. A limitation though is that `pickle` does not save the code or data of a class along with the instance being serialized. 

## The Basics of Pickling

We have `pickle` and `cPickle`, which is much the same in terms of functionality but coded in C and much faster.

In [1]:
import cPickle

In [2]:
class myobj(object):
    def __init__(self, value):
        self.myval = value

In [3]:
myo = myobj(43)

In [4]:
myo.myval

43

In [5]:
f = file('obj.save', 'wb')
cPickle.dump(myo, f, protocol=cPickle.HIGHEST_PROTOCOL)
f.close()

**NOTE**: always use the `protocol=cPickle.HIGHEST_PROTOCOL` option - the resulting file can be dozens of times smaller than with the default protocol. Also, opening the file in binary mode is required for portability.

To load:

In [6]:
f = file('obj.save', 'rb')
loaded_obj = cPickle.load(f)
f.close()
loaded_obj.myval

43

You may pickle several objects into the same file and then load them all (in the same order):

In [7]:
myo2 = myobj(42)
myo3 = myobj(41)
f = file('objects.save', 'wb')
for obj in [myo, myo2, myo3]:
    cPickle.dump(obj, f, protocol=cPickle.HIGHEST_PROTOCOL)
    
f.close()

Then:

In [8]:
f = file('objects.save', 'rb')
loaded_objects = []
for i in range(3):
    loaded_objects.append(cPickle.load(f))
    
f.close()

for obj in loaded_objects:
    print obj.myval

43
42
41


## Short-Term Serialization

If we're are confident the class instance we're serializing will be deserialized by a compatible version of the code, then pickling the whole model is an adequate solution. For example, if you're saving models and reloading them during the same execution of a program, or if working with a really stable class.

We can control what `pickle` will save with the `__getstate__` and `__setstate__` methods. This is especially useful if the model contains a link to a dataset that you don't want to pickle along with every instance of the model.

We define methods along the lines of:

    def __getstate__(self):
        state = dict(self.__dict__)
        del state['training_set']
        return state

    def __setstate__(self, d):
        self.__dict__.update(d)
        self.training_set = cPickle.load(file(self.training_set_file, 'rb'))

## Robust Serialization

This sort of serialization uses some Theano-specific functions. It serializes objects using Python's pickling protocol, but any `ndarray` ro `CudaNdarray` objects contained within the object are saved as separate NPY files. These NPY files and the pickled file are saved together in a single zip-file.

The main advantage here is we don't need Theano installed to look at the value of shared variables that we pickled. We can load them manually with NumPy:

    import numpy
    numpy.load('model.zip')
    
This is a good serialization method when sharing a model with people who might not have Theano installed or might be using a different Python version, etc.

See

http://deeplearning.net/software/theano/library/misc/pkl_utils.html#theano.misc.pkl_utils.dump

and

http://deeplearning.net/software/theano/library/misc/pkl_utils.html#theano.misc.pkl_utils.load

## Long-Term Serialization

If the implementation of the class we want to save is very unstable, we should save and load only the immutable and required parts of the class. In this case, we want to define `__getstate__` and `__setstate__` in terms of what we want to save rather than what we want to exclude:

    def __getstate__(self):
        return (self.W, self.b)

    def __setstate__(self, state):
        W, b = state
        self.W = W
        self.b = b

If we rename `W` and `b` as `weights` and `bias`, the older pickled files are still usable if we update the functions to reflect the change in name:

    def __getstate__(self):
        return (self.weights, self.bias)

    def __setstate__(self, state):
        W, b = state
        self.weights = W
        self.bias = b