# Saving models

In [1]:
import dill
import pickle

import numpy as np

from sklearn.preprocessing import FunctionTransformer

## pickle

Pickle is simpliest way to save python objects and sklearn models/transformers as well.

### Basic

Here is a simple sklearn transformer that returns an array with specified row in each value with same number of observations as in input array.

In [2]:
def test(X):
    return np.array([["im from pickle"]]*X.shape[0])

transformer_obj = FunctionTransformer(test)
transformer_obj.fit_transform(np.array([[2],[3],[4]]))

array([['im from pickle'],
       ['im from pickle'],
       ['im from pickle']], dtype='<U14')

Here is option to save it using pickle. After saving, the object is removed from python memory.

In [3]:
with open("saving_models_files/test_transformer.pkl", "wb") as f:
    pickle.dump(transformer_obj, f)
del transformer_obj

Now let's load the model from the file - all goes well.

In [4]:
with open("saving_models_files/test_transformer.pkl", "rb") as f:
    loaded_transformer = pickle.load(f)
loaded_transformer.transform(np.array([[1],[2]]))

array([['im from pickle'],
       ['im from pickle']], dtype='<U14')

### Troubles with functions

When using pickle to save models, there is one nuance - the functions you use in your pipline must be available where you are going to deploy it.

Here the transformer is created and saved as in the previous section, but after saving not only the transformer itself is deleted, but also the function used in it.

In [5]:
def test(X):
    return np.array([["im from pickle"]]*X.shape[0])

transformer_obj = FunctionTransformer(test)
transformer_obj.fit_transform(np.array([[2],[3],[4]]))

with open("saving_models_files/test_transformer.pkl", "wb") as f:
    pickle.dump(transformer_obj, f)
del transformer_obj, test

Now let's try to load this function from a file - and get an error saying that there is no access to the requested function.

In [6]:
try:
    with open(
        "saving_models_files/test_transformer.pkl", "rb"
    ) as f:
        loaded_transformer = pickle.load(f)
except Exception as e:
    print("Got exception:", e)

Got exception: Can't get attribute 'test' on <module '__main__'>


## dill

In the [troubles with functions](#pickle) section, I mentioned that Pickle doesn't store functions that can be used in Sklearn constructions. Using dill for this purpose can help to solve this problem. Let's try the same example using the Dill module instead of the `pickle` module use `dill`.

The following cell creates a Sklearn transformer with specific behaviour:

In [15]:
def test(X):
    return np.array([["im from dill"]]*X.shape[0])

transformer_obj = FunctionTransformer(test)
transformer_obj.fit_transform(np.array([[2],[3],[4]]))

array([['im from dill'],
       ['im from dill'],
       ['im from dill']], dtype='<U12')

Now let's save it and immediately delete the transformer and use it.

In [16]:
with open("saving_models_files/test_transformer.pkl", "wb") as f:
    dill.dump(transformer_obj, f)
del transformer_obj, test

After loading it with dill it still saves it behaviour:

In [18]:
with open("saving_models_files/test_transformer.pkl", "rb") as f:
    transformer_loaded = dill.load(f)
transformer_loaded.transform(np.array([[2],[3]]))

array([['im from dill'],
       ['im from dill']], dtype='<U12')