## Exporting trained models

This file contains examples of exporting trained models from `sklearn` library.

### Loading common packages
Instaling the `sklearn-export` using pip.

In [None]:
import sys
!{sys.executable} -m pip install sklearn_export

Loading common packages.

In [182]:
import numpy as np
import json
from sklearn.datasets import load_iris, load_diabetes, load_breast_cancer
from sklearn_export import Export
from sklearn.model_selection import train_test_split

### Exporting a simple linear classifier

One of the simplest classification models is Logistic Regression.

In [183]:
from sklearn.linear_model import LogisticRegression

Let us split the iris dataset in train/test sets.

In [184]:
# Load data and spliting in train and test sets
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42)

Now, we can train a Logistic Regression classifier on iris training data.

In [185]:
# Training a multiclass model
clf = LogisticRegression()
clf = clf.fit(X_train, y_train)

It is easy to export the model, using `sklearn-export`. Since we have more than two classes, then it is considered a multiclass problem. 

In [186]:
# Exporting the model
export = Export(clf)
result = export.to_json(filename='classifier.json')
result

{'type': 'MulticlassLogisticRegression',
 'coefficients': [-0.4111635345026534,
  0.5014427857156071,
  -0.09027925121295816,
  0.9567487657508645,
  -0.24516272354360685,
  -0.7115860422072594,
  -2.40210144301465,
  -0.21668110335362667,
  2.618782546368286,
  -1.0169844325985156,
  -0.7962492446803879,
  1.8132336772789102],
 'numRows': 3,
 'numColumns': 4,
 'intercept': [9.269282335394589, 1.9347554260562954, -11.204037761450788]}

It is easy to load the file and have the model data in an dict again.

In [187]:
# Opening JSON file
f = open('classifier.json')

# Transforming in a dict (same as result above)
model_data = json.load(f)
model_data

{'coefficients': [-0.4111635345026534,
  0.5014427857156071,
  -0.09027925121295816,
  0.9567487657508645,
  -0.24516272354360685,
  -0.7115860422072594,
  -2.40210144301465,
  -0.21668110335362667,
  2.618782546368286,
  -1.0169844325985156,
  -0.7962492446803879,
  1.8132336772789102],
 'intercept': [9.269282335394589, 1.9347554260562954, -11.204037761450788],
 'numColumns': 4,
 'numRows': 3,
 'type': 'MulticlassLogisticRegression'}

The prediction using Logistic Regression is just a linear product between the coefficients followed by a sigmoid activation. If we deal with a multiclass problem, then the output is a one hot encoding representation of the propabilities, otherwise it is just the probability of a sample to be from class one. So, it is easy to implement it as a python method.

In [188]:
# A logistic regression prediction implemented in pure python
def logistic_regression_predict(X, model_data, threshold = 0.5): 
    
    # Loading structures from model_data
    numRows = model_data['numRows']
    numColumns = model_data['numColumns']
    coefs = np.asarray(model_data['coefficients']).reshape(numRows, numColumns, order='F')
    intercepts = np.asarray(model_data['intercept'])
    
    # Sigmoid function
    sigmoid = lambda z: 1/(1 + np.exp(-z))
    
    # Prediction using Logistic Regression
    h = sigmoid(np.dot(X, coefs.T) + intercepts)
    
    # Verify if it is a binary or multiclass model
    if model_data['type'] == 'BinaryLogisticRegression':    
        return np.where(h >= threshold, 1, 0).flatten()
    else:
        return np.argmax(h, axis=1)

Let us test it, the values are equivalent to `clf.predict(X_test)`.

In [189]:
y_test_pred = logistic_regression_predict(X_test, model_data)
y_test_pred

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0])

We can also train a binary model and export it using `sklearn-export`.

In [190]:
# Load data and spliting in train and test sets
X_bin, y_bin = load_breast_cancer(return_X_y=True)
X_train_bin, X_test_bin, y_train_bin, y_test_bin = train_test_split(X_bin, y_bin, test_size=0.15, random_state=42)

# Training a binary model
clf = LogisticRegression(max_iter=10000)
clf = clf.fit(X_train_bin, y_train_bin)

# Exporting the model
export = Export(clf)
result = export.to_json(filename='bin_classifier.json')
result

{'type': 'BinaryLogisticRegression',
 'coefficients': [1.0626825351029405,
  0.207700479130765,
  -0.3151109050235008,
  0.021851677397795394,
  -0.17248139556008463,
  -0.2354072815720348,
  -0.5431771843727237,
  -0.3018785250723484,
  -0.279126469718788,
  -0.03712698707083015,
  -0.09519598168145216,
  1.4983661241518935,
  -0.27943868911610087,
  -0.08106420275054811,
  -0.02640924428136126,
  0.05258594188216266,
  -0.04579926137569867,
  -0.038933782237984355,
  -0.0425608497308384,
  0.011144362638684105,
  0.15892309566430088,
  -0.4765936832970375,
  -0.026876210631710383,
  -0.016469784651761566,
  -0.3436359030204391,
  -0.7592426124916646,
  -1.457670507197651,
  -0.5709249630073063,
  -0.8469757152408383,
  -0.10466683627688866],
 'numRows': 1,
 'numColumns': 30,
 'intercept': [25.67346438652212]}

The process to load and apply the model is the same, since the method `logistic_regression_predict` supports binary, multiclass and multinomial classification.

In [191]:
# Opening JSON file
f = open('bin_classifier.json')

# Transforming in a dict (same as result above)
model_data_bin = json.load(f)

# Applying the model
y_test_pred = logistic_regression_predict(X_test_bin, model_data_bin)
y_test_pred

array([1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1,
       0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1,
       0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0])

### Export a simple linear regression model

Another common machine learning model is Linear Regression. 

In [192]:
from sklearn.linear_model import LinearRegression

To exemplify Linear Regression let us use the diabetes dataset.

In [193]:
# Load data and spliting in train and test sets
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42)

Let us train a linear regression model.

In [194]:
# Training a linear regression model
reg = LinearRegression()
reg = reg.fit(X_train, y_train)

Now, we can export the model using `sklearn-export`.

In [195]:
# Exporting the model
export = Export(reg)
result = export.to_json(filename='regression.json')
result

{'type': 'LinearRegression',
 'coefficients': [48.99364574197365,
  -259.21566993946186,
  546.1448869918012,
  334.8479656112388,
  -941.2605767800529,
  522.9857566947101,
  188.0695888010302,
  288.08908840310215,
  734.4797225998652,
  65.97918206039108],
 'intercept': [151.55991925367033]}

To load data we follow the same process.

In [196]:
# Opening JSON file
f = open('regression.json')

# Transforming in a dict (same as result above)
model_data = json.load(f)
model_data

{'coefficients': [48.99364574197365,
  -259.21566993946186,
  546.1448869918012,
  334.8479656112388,
  -941.2605767800529,
  522.9857566947101,
  188.0695888010302,
  288.08908840310215,
  734.4797225998652,
  65.97918206039108],
 'intercept': [151.55991925367033],
 'type': 'LinearRegression'}

To predict new values using linear regression we just need a linear combination between the inputs and the coefficients.

In [197]:
# A linear regression prediction in pure python
def linear_regression_predict(X, model_data):
    coefs = np.asarray(model_data['coefficients'])
    intercepts = np.asarray(model_data['intercept'])
    h = np.dot(X, coefs) + intercepts
    return h

Let us test it, the values are equivalent to `reg.predict(X_test)`.

In [198]:
linear_regression_predict(X_test, model_data)

array([141.37778758, 180.7255943 , 134.36241958, 293.08707521,
       123.23097217,  94.87463471, 258.19647894, 181.05434575,
        88.84583359, 107.96288979,  95.02533371, 166.97965719,
        53.13754119, 206.14160297, 100.04688003, 130.09140001,
       220.56476013, 251.12055486, 193.64660562, 218.10183003,
       207.07382965,  90.10258179,  73.04159918, 188.20478373,
       155.62378976, 158.18178827, 186.62437427, 177.98242562,
        49.34448643, 108.8547319 , 177.19143043,  86.40053583,
       132.93825524, 183.45477116, 176.95402859, 188.3834104 ,
       123.89790908, 119.36510788, 148.90535094,  60.85929783,
        74.49415162, 108.22594591, 162.71784406, 156.01863679,
       172.08866209,  62.93589272,  72.63730594, 118.15249179,
        52.23777769, 167.50005324, 153.84129513,  62.19915028,
       102.36601143, 111.35188515, 172.55444537, 154.7830712 ,
        96.3971005 , 209.27124268, 120.84600962,  81.30997871,
       188.96205708, 206.45482559, 140.0458645 , 105.76