Save and load trained model using:
- pickle
- sklearn joblib
We can save the model to a file which we can use later on to load the file onto memory to make predicitons. 

Two steps:
- Training model using training data set. 
- Make predictions using the fitted model. 

The more data used for training the model more accurate the model. The training step can be very time consuming if the dataset is large. We can save the trained model into a file which we can access in the future to make predictions, so we won't have to train the model every time. 

# First approach: pickle

In [1]:
# Using the linear regression from lesson 2 as an example. 
import pickle # allows us to serialize a python object into a file. 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model

In [2]:
df = pd.read_csv("datasets/homeprices2.csv")
df.head()

Unnamed: 0,area,price
0,2600,550000
1,3000,565000
2,3200,610000
3,3600,680000
4,4000,725000


In [3]:
model = linear_model.LinearRegression()
model.fit(df[['area']], df.price)

LinearRegression()

In [4]:
model.predict([[5000]])

array([859554.79452055])

In [5]:
import pickle # serialize python object into a file.

In [6]:
# Create model_pickle file in the working directory. The model is saved in this file. 
with open('model_pickle', 'wb') as f:
    pickle.dump(model, f)

In [7]:
# Load model from file from memory. 
with open('model_pickle', 'rb') as f: # 'rb': read mode and binary file. 
     mp = pickle.load(f)

In [8]:
# Testing whether model and mp yields the same predictions. 
ab = pd.DataFrame({'area': list(range(100, 600, 100))})
ab

Unnamed: 0,area
0,100
1,200
2,300
3,400
4,500


In [9]:
ab_model_preds = model.predict(ab[['area']])
ab_mp_preds = mp.predict(ab[['area']])

In [10]:
ab['model predictions'] = ab_model_preds
ab['mp predictions'] = ab_mp_preds

The predictions made using the model and mp yield the same results as expected. 

# Second approach: sklearn joblib

sklearn model persistence for joblib documentation. Effectively has the same functionality as pickle but for some cases, it may be more efficient to use joblib. joblib is more efficient for objects that carry large numpy arrays internallly. 

In [11]:
import joblib

In [12]:
# Save model to file 'model_joblib' in working directory.
joblib.dump(model, 'model_joblib')

['model_joblib']

In [13]:
# Load the model object. 
mj = joblib.load('model_joblib')

In [14]:
# Test to see whether it gives the same predictions as model and mp. 
ab['mj_predictions'] = mj.predict(ab[['area']])

In [15]:
ab

Unnamed: 0,area,model predictions,mp predictions,mj_predictions
0,100,194195.205479,194195.205479,194195.205479
1,200,207773.972603,207773.972603,207773.972603
2,300,221352.739726,221352.739726,221352.739726
3,400,234931.506849,234931.506849,234931.506849
4,500,248510.273973,248510.273973,248510.273973


In [28]:
print(f"""Other information such as the coefficients and intercepts are also storesd:
model coefficient: {round(model.coef_[0], 2)}
mp coefficient: {round(mp.coef_[0], 2)}
mj coefficient: {round(mj.coef_[0], 2)}
model intercept: {round(model.intercept_, 2)}
mp intercept: {round(mp.intercept_, 2)}
mj intercept: {round(mj.intercept_, 2)}""")

Other information such as the coefficients and intercepts are also storesd:
model coefficient: 135.79
mp coefficient: 135.79
mj coefficient: 135.79
model intercept: 180616.44
mp intercept: 180616.44
mj intercept: 180616.44
