#**Save and Load Machine Learning Models**

Finding an accurate machine learning model is not the end of the project.

In this post you will discover how to save and load your machine learning model in Python using scikit-learn.

This allows you to save your model to file and load it later in order to make predictions.

##**1-Pickle**

Pickle is the standard way of serializing objects in Python.

You can use the pickle operation to serialize your machine learning algorithms and save the serialized format to a file.

Later you can load this file to deserialize your model and use it to make new predictions.

The example below demonstrates how you can train a logistic regression model on the Pima Indians onset of diabetes dataset, save the model to file and load it to make predictions on the unseen test set 


In [None]:
# PICKLE

import numpy as np 
from sklearn.model_selection import train_test_split
# Load dataset 
from sklearn.datasets import load_iris 
iris = load_iris() 

X = iris.data 
y = iris.target 

# Split dataset into train and test 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3) 

# import KNeighborsClassifier model 
from sklearn.neighbors import KNeighborsClassifier as KNN 
knn = KNN(n_neighbors = 3) 

# train model 
knn.fit(X_train, y_train) 

import pickle 

# Save the trained model as a pickle string. 
saved_model = pickle.dumps(knn) 

# Load the pickled model 
knn_from_pickle = pickle.loads(saved_model) 

# Use the loaded pickled model to make predictions 
knn_from_pickle.predict(X_test) 


array([0, 0, 0, 2, 0, 0, 1, 0, 1, 0, 1, 2, 2, 1, 2, 0, 2, 0, 0, 2, 0, 0,
       1, 2, 1, 2, 1, 2, 0, 0, 0, 0, 1, 0, 1, 0, 1, 2, 1, 0, 2, 2, 1, 0,
       2])

##**2-Joblib**

Joblib is part of the SciPy ecosystem and provides utilities for pipelining Python jobs.

It provides utilities for saving and loading Python objects that make use of NumPy data structures, efficiently.

This can be useful for some machine learning algorithms that require a lot of parameters or store the entire dataset (like K-Nearest Neighbors).

The example below demonstrates how you can train a logistic regression model on the Pima Indians onset of diabetes dataset, saves the model to file using joblib and load it to make predictions on the unseen test set.

In [None]:
# JOBLIB

import numpy as np 
from sklearn.model_selection import train_test_split
import joblib
# Load dataset 
from sklearn.datasets import load_iris 
iris = load_iris() 

X = iris.data 
y = iris.target 

# Split dataset into train and test 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3) 

# import KNeighborsClassifier model 
from sklearn.neighbors import KNeighborsClassifier as KNN 
model = KNN(n_neighbors = 3) 

# train model 
model.fit(X_train, y_train) 

# save the model to disk
filename = 'finalized_model.sav'
joblib.dump(model, filename)
 
# some time later...
 
# load the model from disk
loaded_model = joblib.load(filename)
result = loaded_model.score(X_test, y_test)
print(result)

0.9333333333333333
