## Saving and Loading a Model
In this exercise, you will train a simple model and use it for prediction. You will then proceed to save the model and then load it back in. You will use the loaded model for a second prediction, and then compare the predictions from the first model to those from the second model. You will make use of the fish dataset for this exercise.

In [1]:
#2 import libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

In [2]:
#3 read in data
_headers = ['CIC0', 'SM1', 'GATS1i', 'NdsCH', 'Ndssc','MLOGP', 'response']
df = pd.read_csv('https://raw.githubusercontent.com/'\
                 'PacktWorkshops/The-Data-Science-Workshop'\
                 '/master/Chapter06/Dataset/qsar_fish_toxicity.csv', names=_headers, sep=';')

In [3]:
df.head()

Unnamed: 0,CIC0,SM1,GATS1i,NdsCH,Ndssc,MLOGP,response
0,3.26,0.829,1.676,0,1,1.453,3.77
1,2.189,0.58,0.863,0,0,1.348,3.115
2,2.125,0.638,0.831,0,0,1.348,3.531
3,3.027,0.331,1.472,1,0,1.807,3.51
4,2.094,0.827,0.86,0,0,1.886,5.39


In [4]:
#5 split data into features and labels, and into training and validation sets
features = df.drop('response', axis=1).values
labels = df[['response']].values
X_train, X_eval, y_train, y_eval = train_test_split(features, labels, test_size=0.2, random_state=0)
X_val, X_test, y_val, y_test = train_test_split(X_eval, y_eval, random_state=0)

In [5]:
#6 create and train/fit training data to linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

LinearRegression()

In [6]:
#8 use model for prediction:
y_pred = model.predict(X_val)

In [8]:
#9 import joblib:
import joblib

In [9]:
#10 save the model
joblib.dump(model, './model.joblib')

['./model.joblib']

In [10]:
#11 load saved model as new model
m2 = joblib.load('./model.joblib')

In [11]:
#12 use new model for predictions
m2_preds = m2.predict(X_val)

In [13]:
#13 compare predictions
ys = pd.DataFrame(dict(predicted=y_pred.reshape(-1), m2=m2_preds.reshape(-1)))
ys.head()

Unnamed: 0,predicted,m2
0,4.155885,4.155885
1,6.398238,6.398238
2,5.183181,5.183181
3,3.771333,3.771333
4,4.593059,4.593059


You can see that the predictions from the model before it was saved are exactly the same as the predictions from the model after it was saved and loaded back in. It is safe to conclude that saving and loading models does not affect their quality.