# Python : How to Save and Load ML Models

**object serialization**
This process / procedure of saving a ML Model is also known as object serialization - representing an object with a stream of bytes, in order to store it on disk, send it over a network or save to a database.

**deserialization**
While the restoring/reloading of ML Model procedure is known as deserialization. 

In this notebook, we explore 2 ways to Save and Reload ML Models in Python and scikit-learn, we will also discuss about the pros and cons of each method. 

We will be covering 2 approaches of Saving and Reloading a ML Model -

1) Pickle Approach
2) Joblib Approach

**ML Model Creation**

For the purpose of Demo , we will create a basic Logistic Regression Model on IRIS Dataset.
Dataset used : IRIS 
Model        : Logistic Regression using Scikit Learn

In [4]:
# Import Required packages 
from sklearn.linear_model import LogisticRegression  
from sklearn.datasets import load_iris  
from sklearn.model_selection import train_test_split


In [5]:
# Load the data
Iris_data = load_iris()  


In [6]:
# Split data
x_train, x_test, y_train, y_test = train_test_split(Iris_data.data, 
                                                    Iris_data.target, 
                                                    test_size = 0.2, 
                                                    random_state = 18)  

In [8]:
# Define the Model
log_reg = LogisticRegression(C = 0.1,  
                              max_iter = 20, 
                              fit_intercept = True, 
                              solver = 'liblinear')

# Train the Model
log_reg.fit(x_train, y_train)  

LogisticRegression(C=0.1, max_iter=20, solver='liblinear')

**Approach 1 : Pickle approach**

Following lines of code, the log_reg which we created in the previous step is saved to file, and then loaded as a new object called pickle_log_reg. 
The loaded model is then used to calculate the accuracy score and predict outcomes on new unseen (test) data.

In [9]:
# Import pickle Package
import pickle

In [65]:
# Save the Modle to file in the current working directory
pkl_filename = "save/pickle_log_reg.pkl"  

with open(pkl_filename, 'wb') as file:  
    pickle.dump(log_reg, file)


In [66]:
# Load the Model back from file
with open(pkl_filename, 'rb') as file:  
    pickle_log_reg = pickle.load(file)

pickle_log_reg

LogisticRegression(C=0.1, max_iter=20, solver='liblinear')

In [67]:
# Use the Reloaded Model to 
# Calculate the accuracy score and predict target values

# Calculate the Score 
score = pickle_log_reg.score(x_test, y_test)  

# Print the Score
print("Test score: {0:.2f} %".format(100 * score))  

# Predict the Labels using the reloaded Model
y_pred = pickle_log_reg.predict(x_test)  

y_pred

Test score: 96.67 %


array([1, 1, 1, 0, 0, 0, 2, 0, 2, 1, 2, 1, 0, 2, 0, 2, 0, 2, 0, 0, 1, 2,
       2, 1, 2, 0, 0, 0, 2, 2])

**Let's Reflect back on Pickle approach :**

PROs of Pickle :

1) save and restore our learning models is quick - we can do it in two lines of code. 
2) It is useful if you have optimized the model's parameters on the training data, so you don't need to repeat this step again. 


CONs of Pickle :

1) it doesn't save the test results or any data. 

**Approach 2 - Joblib** :

The Joblib Module is available from Scikit Learn package and is intended to be a replacement for Pickle, for objects containing large data. 

This approach will save our ML Model in the pickle format only but we dont need to load additional libraries as the 'Pickling' facility is available within Scikit Learn package itself which we will use invariably for developing our ML models.

In [68]:
# Import Joblib Module from Scikit Learn
import joblib

In [69]:
# Save RL_Model to file in the current working directory
joblib_file = "save/joblib_log_reg.joblib"  
joblib.dump(log_reg, joblib_file)

['save/joblib_log_reg.joblib']

In [70]:
# Load from file
joblib_log_reg = joblib.load(joblib_file)

joblib_log_reg

LogisticRegression(C=0.1, max_iter=20, solver='liblinear')

In [71]:
# Use the Reloaded Joblib Model to 
# Calculate the accuracy score and predict target values

# Calculate the Score 
score2 = joblib_log_reg.score(x_test, y_test)  

# Print the Score
print("Test score: {0:.2f} %".format(100 * score2))  

# Predict the Labels using the reloaded Model
y_pred2 = joblib_log_reg.predict(x_test)  

y_pred2

Test score: 96.67 %


array([1, 1, 1, 0, 0, 0, 2, 0, 2, 1, 2, 1, 0, 2, 0, 2, 0, 2, 0, 0, 1, 2,
       2, 1, 2, 0, 0, 0, 2, 2])

**Let's Reflect back on Joblib approach :**

PROs of Joblib :

1) the Joblib library offers a bit simpler workflow compared to Pickle. 
2) While Pickle requires a file object to be passed as an argument, Joblib works with both file objects and string filenames. 
3) In case our model contains large arrays of data, each array will be stored in a separate file, but the save and restore procedure will remain the same. 
4) Joblib also allows different compression methods, such as 'zlib', 'gzip', 'bz2', and different levels of compression.