<a href="https://colab.research.google.com/github/mohammad0alfares/MachineLearningNotebooks/blob/master/save_load_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

how to persist your machine learning algorithms in Python with scikit-learn.

You learned two techniques that you can use:

1.   The pickle API for serializing standard Python objects.
2.   The joblib API for efficiently serializing Python objects with NumPy arrays.

**The pickle API for serializing standard Python objects.**

In [0]:
# Save Model Using Pickle
import pandas as pd
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
import pickle



In [0]:
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values

In [39]:
array.shape

(768, 9)

In [7]:
dataframe.head()

Unnamed: 0,preg,plas,pres,skin,test,mass,pedi,age,class
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [0]:
## check nulls
pd.isnull(dataframe).sum()

In [48]:
X = array[:,0:8]
Y = array[:,8]
test_size = 0.33
seed = 7
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=test_size, random_state=seed)
print (len(X) ,len(Y))
print (X.size ,Y.size)

768 768
6144 768


In [49]:
# Fit the model on training set
model = LogisticRegression()
model.fit(X_train, Y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [51]:
# save the model to disk
filename = 'finalized_model.sav'
pickle.dump(model, open(filename, 'wb'))


finalized_model.sav  sample_data


In [52]:
!pwd
!ls 

/content
finalized_model.sav  sample_data


In [53]:
# some time later...
filename = 'finalized_model.sav'
# load the model from disk
loaded_model = pickle.load(open(filename, 'rb'))
result = loaded_model.score(X_test, Y_test)
print(result)

0.7874015748031497


**The joblib API for efficiently serializing Python objects with NumPy arrays.**

In [0]:
import joblib

In [56]:
# save the model to disk
filename = 'finalized_model.sav'
joblib.dump(model, filename)
 


['finalized_model.sav']

In [57]:
!ls

finalized_model.sav  sample_data


In [59]:
# some time later...
filename = 'finalized_model.sav'
# load the model from disk
loaded_model = joblib.load(filename)
result = loaded_model.score(X_test, Y_test)
print(result)

0.7874015748031497
