<a href="https://colab.research.google.com/github/marcelounb/ML-Mastery-with-Python-Course/blob/master/chap17_Save_and_Load_ML_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Finalize Your Model with pickle


Pickle is the standard way of serializing objects in Python. You can use the pickle1 operation to serialize your machine learning algorithms and save the serialized format to a ﬁle. Later you can load this ﬁle to deserialize your model and use it to make new predictions

In [0]:
from pandas import read_csv 
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LogisticRegression 
from pickle import dump 
from pickle import load

In [0]:
# load data 
filename = '/content/diabetes_moddd.csv' 
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] 
dataframe = read_csv(filename, names=names) 
array = dataframe.values 
X = array[:,0:8] 
Y = array[:,8]

In [4]:
# Fit the model on 33% 
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.33, random_state=7) 
model = LogisticRegression(max_iter=200) 
model.fit(X_train, Y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=200,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [10]:
model.score(X_train, Y_train)

0.7762645914396887

In [0]:
# save the model to disk 
filename = 'finalized_model.sav' 
dump(model, open(filename, 'wb'))

In [12]:
# some time later...

# load the model from disk 
loaded_model = load(open(filename, 'rb')) 
result = loaded_model.score(X_test, Y_test)
result

0.7874015748031497

# Finalize Your Model with Joblib


Joblib2 library is part of the SciPy ecosystem and provides utilities for pipelining Python jobs. It provides utilities for saving and loading Python objects that make use of NumPy data structures, eﬃciently3. This can be useful for some machine learning algorithms that require a lot of parameters or store the entire dataset (e.g. k-Nearest Neighbors). The example below demonstrates how you can train a logistic regression model on the Pima Indians onset of diabetes dataset, save the model to ﬁle using Joblib and load it to make predictions on the unseen test set.


In [0]:
# Save Model Using joblib 
from pandas import read_csv 
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LogisticRegression 
from sklearn.externals.joblib import dump as dump2
from sklearn.externals.joblib import load as load2

In [13]:
# Data loaded from above like:
"""
filename = '/content/diabetes_moddd.csv' 
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] 
dataframe = read_csv(filename, names=names) 
array = dataframe.values 
X = array[:,0:8] 
Y = array[:,8]
"""

"\nfilename = '/content/diabetes_moddd.csv' \nnames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] \ndataframe = read_csv(filename, names=names) \narray = dataframe.values \nX = array[:,0:8] \nY = array[:,8]\n"

In [14]:
# Fit the model on 33% 
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.33, random_state=7) 
model = LogisticRegression(max_iter=200) 
model.fit(X_train, Y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=200,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [0]:
# save the model to disk 
filename = 'finalized_model2.sav' 
dump2(model, open(filename, 'wb'))

In [19]:
# some time later...


# load the model from disk 
loaded_model = load2(open(filename, 'rb')) 
result = loaded_model.score(X_test, Y_test)
result

0.7874015748031497

# Tips for Finalizing Your Model

**Python Version**. Take note of the Python version. You almost certainly require the same major (and maybe minor) version of Python used to serialize the model when you later load it and deserialize it.


**Library Versions**. The version of all major libraries used in your machine learning project almost certainly need to be the same when deserializing a saved model. This is not limited to the version of NumPy and the version of scikit-learn.

**Manual Serialization**. You might like to manually output the parameters of your learned model so that you can use them directly in scikit-learn or another platform in the future. Often the techniques used internally by machine learning algorithms to make predictions are a lot simpler than those used to learn the parameters can may be easy to implement in custom code that you have control over.
