## How to save a trained classifier or a feature model?

Having trained a classifier, a final predictive model, it is a common practice to Pickle it. Cuz everytime you want to use it you don't want to go back and re-train it every single time.

"To Pickle" means to save it to disk.

It means to serialize a Python object and then be able to use it somewhere else. This object can be anything, in this case will be an estimator. 

In [1]:
from sklearn import svm
from sklearn import datasets
clf = svm.SVC()
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf.fit(X, y)  

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [2]:
import pickle
pickle_out = open("pickle/test_classifier.pkl", "wb")
pickle.dump(clf, pickle_out)
pickle_out.close()

Then I could open it like

In [3]:
pickle_in = open("pickle/test_classifier.pkl", "rb")
# This will load the object from the pickle
pickled_classifier = pickle.load(pickle_in)
pickle_in.close()

pickled_classifier

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

**Something similar could be done using joblib**

But joblib is more efficient specially on objects that carry large numpy arrays internally.

In [4]:
from sklearn.externals import joblib
joblib.dump(clf, 'pickle/test_joblib_classifier.pkl') 

['pickle/test_joblib_classifier.pkl',
 'pickle/test_joblib_classifier.pkl_01.npy',
 'pickle/test_joblib_classifier.pkl_02.npy',
 'pickle/test_joblib_classifier.pkl_03.npy',
 'pickle/test_joblib_classifier.pkl_04.npy',
 'pickle/test_joblib_classifier.pkl_05.npy',
 'pickle/test_joblib_classifier.pkl_06.npy',
 'pickle/test_joblib_classifier.pkl_07.npy',
 'pickle/test_joblib_classifier.pkl_08.npy',
 'pickle/test_joblib_classifier.pkl_09.npy',
 'pickle/test_joblib_classifier.pkl_10.npy',
 'pickle/test_joblib_classifier.pkl_11.npy']

In [5]:
clf = joblib.load('pickle/test_joblib_classifier.pkl') 

## Pickle a Pipeline

Now, to pickle a pipeline is kind of similar

In [17]:
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA

pipe = Pipeline([('pca', PCA()),
                 ('svc', svm.SVC(C=10))])
pipe.fit(X, y)

Pipeline(steps=[('pca', PCA(copy=True, n_components=None, whiten=False)), ('svc', SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])

In [22]:
joblib.dump(pipe, 'pickle/pipe.pkl') 

['pickle/pipe.pkl',
 'pickle/pipe.pkl_01.npy',
 'pickle/pipe.pkl_02.npy',
 'pickle/pipe.pkl_03.npy',
 'pickle/pipe.pkl_04.npy',
 'pickle/pipe.pkl_05.npy',
 'pickle/pipe.pkl_06.npy',
 'pickle/pipe.pkl_07.npy',
 'pickle/pipe.pkl_08.npy',
 'pickle/pipe.pkl_09.npy',
 'pickle/pipe.pkl_10.npy',
 'pickle/pipe.pkl_11.npy',
 'pickle/pipe.pkl_12.npy',
 'pickle/pipe.pkl_13.npy',
 'pickle/pipe.pkl_14.npy',
 'pickle/pipe.pkl_15.npy']

In [26]:
pipe_pickled = joblib.load('pickle/pipe.pkl') 

In [27]:
print(pipe_pickled.steps[0][1])
print(pipe_pickled.steps[1][1])

PCA(copy=True, n_components=None, whiten=False)
SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)
