___
<h1> Machine Learning </h1>
<h2> Systems Engineering and Computer Technologies / Engenharia de Sistemas e Tecnologias Informáticas
(LESTI)</h2>
<h3> Instituto Superior de Engenharia / Universidade do Algarve </h3>

[LESTI](https://ise.ualg.pt/curso/1941) / [ISE](https://ise.ualg.pt) / [UAlg](https://www.ualg.pt)

Pedro J. S. Cardoso (pcardoso@ualg.pt)

___

# Model persistence

In this section we will see how to save a model and load it for later use. This is known as model persistence.

Saving a model in scikit-learn is very easy. It can be done using Python’s built-in persistence model, namely with pickle, or using joblib, an efficient implementation of the same algorithm specialized on big data, but only able to pickle to the disk and not to a string.

The advantages of saving the model are the possibility to:
- reuse the model without having to retrain (and possibly reconfigure) it.
- share the model with others.
- compare different models.
- use it as part of a larger application of workflow.
- save the model on a remote machine (with more resources) and use it later for predictions, as usually the predictions are much faster than the training.
- use it as part of a service / web application / mobile application.

## Training of the model
So, let start by training a model, as we did before...

In [1]:
from sklearn import datasets, svm

digits = datasets.load_digits()
training_set = digits.data[:-1]
target_set = digits.target[:-1]

clf = svm.SVC(gamma=0.001, C=100.)
clf.fit(training_set, target_set)

# Saving & loading the model

Several ways of saving the model are possible. We will see two of them: pickle and joblib.

## pickle
It is possible to save a model in the scikit by using Python’s built-in persistence model, namely with pickle:

In [2]:
import pickle

with open('models/digits-svc.pickle', 'wb') as f:
    pickle.dump(clf, f)

And latter we can reload it

In [3]:
with open('models/digits-svc.pickle', 'rb') as f:
    clf_copy = pickle.load(f)

and do predictions

In [4]:
clf_copy.predict([digits.data[-1]])
'predicting {} for {}'.format(clf_copy.predict([digits.data[-1]])[0], digits.target[-1])

'predicting 8 for 8'

## joblib

In the specific case of the scikit, it may be more interesting to use joblib’s replacement of pickle (joblib.dump & joblib.load), which is more efficient on big data, but can only pickle to the disk and not to a string, which is probably what you want to do if you want to send your model to another machine.

In [5]:
import joblib
joblib.dump(clf, 'models/digits-svc.joblib')

['models/digits-svc.joblib']

And, as before,  latter we can reload it

In [6]:
clf_copy = joblib.load('models/digits-svc.joblib')

and do predictions

In [7]:
clf_copy.predict([digits.data[-1]])
'predicting {} for {}'.format(clf_copy.predict([digits.data[-1]])[0], digits.target[-1])

'predicting 8 for 8'

# Model from Orange
You can also use model prepared in other places, such as Orange. Don't forget to install run the Orange application, if needed, and run the iris.ows workflow located in the Orange folder of this repository.

In other words, use Orange to train a model and save it, then load it and use it for predictions.

In [8]:
# you might need to install Orange3 and pyqt
# !pip install Orange3
# !pip install PyQt5

import pickle
from sklearn.datasets import load_iris
iris = load_iris()

# load the model from disk
with open('../week2/models/iris_orange_knn_model.pkcls', 'rb') as model:
    knn = pickle.load(model)

knn

SklModelClassification(skl_model=KNeighborsClassifier(metric='euclidean', n_neighbors=8))  # params={'n_neighbors': 8, 'weights': 'uniform', 'algorithm': 'auto', 'metric': 'euclidean', 'metric_params': None}

And do predictions

In [9]:
pred = knn.predict(iris.data)
pred

(array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 2., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2.,
        2., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
        2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
        2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.]),
 array([[1.   , 0.   , 0.   ],
        [1.   , 0.   , 0.   ],
        [1.   , 0.   , 0.   ],
        [1.   , 0.   , 0.   ],
        [1.   , 0.   , 0.   ],
        [1.   , 0.   , 0.   ],
        [1.   , 0.   , 0.   ],
        [1.   , 0.   , 0.   ],
        [1.   , 0.   , 0.   ],
        [1.   , 0.   , 0.   ],
        [1.   , 

And check the accuracy (over many of the samples used to train the model! So, this is not a good measure of the accuracy of the model)

In [10]:
iris.target == pred[0]

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True, False,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True, False,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,

In [11]:
print("Precision is", sum(iris.target == pred[0]) / len(iris.target == pred[0]))

Precision is 0.98


You can also check the probabilities of each class, for each sample, giving you a better idea of the confidence of the model. For this use the `predict_proba` method, common to many models in scikit-learn.

In [12]:
knn.predict_proba(iris.data)

array([[1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.   , 0.   , 0.   ],
       [1.