<h1>Table of Contents<span class="tocSkip"></span></h1>


# Introduction
<hr style = "border:2px solid black" ></hr>


**What?** Model serialisation with pickle, joblib, skops



# What is model persistance?
<hr style = "border:2px solid black" ></hr>


- After training a scikit-learn model, it is desirable to have a way to persist the model for future use without having to retrain. 
- The following sections give you some hints on how to persist a scikit-learn model.



# Imports
<hr style = "border:2px solid black" ></hr>

In [1]:
from sklearn import svm
from sklearn import datasets
import pickle
from joblib import dump, load
import skops.io as sio

# Load dataset and fit a dummy model
<hr style = "border:2px solid black" ></hr>

In [2]:
clf = svm.SVC()
X, y = datasets.load_iris(return_X_y=True)
clf.fit(X, y)

SVC()

# `pickle`
<hr style = "border:2px solid black" ></hr>

In [3]:
s = pickle.dumps(clf)
clf2 = pickle.loads(s)
print("prediction", clf2.predict(X[0:1]))
print("target: ", y[0])

prediction [0]
target:  0


# `joblib`
<hr style = "border:2px solid black" ></hr>



- In the specific case of scikit-learn, it may be better to use joblib’s replacement of pickle (dump & load), which is more efficient on objects that carry large numpy arrays internally as is often the case for fitted scikit-learn estimators, but can only pickle to the disk and not to a string.



In [4]:
dump(clf, './filename.joblib')

['./filename.joblib']

In [5]:
!ls *.joblib

filename.joblib


In [6]:
clf3 = load('filename.joblib') 

In [7]:
print("prediction", clf3.predict(X[0:1]))
print("target: ", y[0])

prediction [0]
target:  0


# `skops`
<hr style = "border:2px solid black" ></hr>


- `skops` provides a more secure format via the skops.io module. 
- It avoids using pickle and only loads files which have types and references to functions which are trusted either by default or by the user. 



In [8]:
obj = sio.dumps(clf)

In [11]:
unknown_types = sio.get_untrusted_types(obj)
clf = sio.loads(obj, trusted=unknown_types)

TypeError: get_untrusted_types() takes 0 positional arguments but 1 was given

In [12]:
clf = sio.loads(obj, trusted=True)

In [13]:
print("prediction", clf.predict(X[0:1]))
print("target: ", y[0])

prediction [0]
target:  0


# Clean-up
<hr style = "border:2px solid black" ></hr>

In [None]:
!rm *.jobliv

In [None]:
!ls *.joblib

# Conclusions
<hr style = "border:2px solid black" ></hr>


- `pickle` (and `joblib` by extension), has some issues regarding maintainability and security. Because of this,
    - Never unpickle untru sted data as it could lead to malicious code being executed upon loading.
    - While models saved using one version of scikit-learn might load in other versions, this is entirely unsupported and inadvisable. It should also be kept in mind that operations performed on such data could give different and unexpected results.



# References
<hr style = "border:2px solid black" ></hr>


- [Model persistance in scikit-learn](https://scikit-learn.org/stable/model_persistence.html)

