# Exploring training vs. deployment requirements
> 20-10-2020

In this notebook we illustrate the differences between model training and model deployment in a bit more depth, using a simple logistic regression model as an example. This notebook accompanies the following Medium post: [Exploiting the differences between model training and prediction](https://medium.com/p/40f087e52923/).

---

> The code accompanying the section on **Model training**

---

### Data generation
We start by generating some data that we can use to fit our example logistic regression model to. The code below generates 1000 observations according to the following simple model:

$Pr(y = 1 | x) = \frac{1}{1 + e^{-1(.75 + 1.5x_1 -.5x_2)}}$.

Thus, we have a simple logistic model with two features and parameters $\beta_0 = .75$ (intercept), $\beta_1 = 1.5$, and $\beta_2=-.5$.

In [25]:
import numpy as np
np.random.seed(66)  # Set seed for replication

# Simulate Data Generating Process
n = 1000  # 1000 observations
x1 = np.random.uniform(-2,2,n)  # x_1 & x_2 between -2 and 2
x2 = np.random.uniform(-2,2,n)
p = 1 / (1 + np.exp( -1*(.75 + 1.5*x1 - .5*x2) ))  # Implement DGP

y = np.random.binomial(1, p, n)  # Draw outcomes

# Create dataset and print first few lines:
data = np.column_stack((x1,x2,y))
print(data[:10])

[[-1.38284969  0.11827429  0.        ]
 [-1.46520176  1.97455925  0.        ]
 [-0.54925814 -0.84398759  0.        ]
 [ 0.7164355   1.84642289  0.        ]
 [-1.22219977  0.07416729  0.        ]
 [-0.99515847 -0.08198656  0.        ]
 [ 1.03366557 -0.42144807  1.        ]
 [ 0.23047436  1.66806784  1.        ]
 [ 0.05921167 -0.63964774  1.        ]
 [-0.12880055  1.93380488  1.        ]]


### Model training
After generating the example data, we can fit the model. 

In [26]:
from sklearn.linear_model import LogisticRegression  # import sklearn LogisticRegression

# Fit the model
mod = LogisticRegression().fit(data[:,[0,1]], np.ravel(data[:,[2]]))

In [27]:
# Print the number of iterations
print(f'The number of iterations is: {mod.n_iter_}.')

The number of iterations is: [7].


---
> The code accompanying the section on Generating predictions
---

### Inspecting the fitted model
We inspect the fitted model parameters and the number of iterations.

In [28]:
# Retrieve the fitted model parameters:
b = np.concatenate((mod.intercept_, mod.coef_.flatten()))

print(f'Fitted model parameters: {b}.')


Fitted model parameters: [ 0.84576563  1.39541631 -0.47393112].


---
> The code accompanying the section called Exploiting the differences for (edge) deployment
---
### Inspecting the sizes of various objects.
Here we inspect the sizes of various of the generated objects. Note that we use the `get_size` function as [introduced in this post](https://goshippo.com/blog/measure-real-size-any-python-object/).

In [29]:
import sys

def get_size(obj, seen=None):
    """Recursively finds size of objects"""
    size = sys.getsizeof(obj)
    if seen is None:
        seen = set()
    obj_id = id(obj)
    if obj_id in seen:
        return 0
    # Important mark as seen *before* entering recursion to gracefully handle
    # self-referential objects
    seen.add(obj_id)
    if isinstance(obj, dict):
        size += sum([get_size(v, seen) for v in obj.values()])
        size += sum([get_size(k, seen) for k in obj.keys()])
    elif hasattr(obj, '__dict__'):
        size += get_size(obj.__dict__, seen)
    elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
        size += sum([get_size(i, seen) for i in obj])
    return size


print(f'Size of the dataset: {get_size(data)}.')
print(f'Size of the model: {get_size(mod)}.')
print(f'Size of the cofficients: {get_size(b)}.')

Size of the dataset: 24496.
Size of the model: 2424.
Size of the cofficients: 216.


In [30]:
# Use pickle to store the model
import pickle
s = pickle.dumps(mod)

# Dump the object
pickle.dump(mod, open( "model.pickle", "wb" ))

print(f'Size of the pickled model object: {get_size(s)}.')

Size of the pickled model object: 841.


### Converting the model to WebAssembly
The code to convert the fitted model to a WebAssembly binary using the [sclblpy](https://pypi.org/project/sclblpy/) package.

In [31]:
import sclblpy as sp

In [32]:
row = np.array([1,1])  # Example feature vector

# Upload the model to convert to .WASM (note, no documentation uploaded)
sp.upload(mod, row)

We will simply use LogisticRegression as its name without further documentation.
Your model was successfully uploaded to Scailable!
NOTE: After transpiling, we will send you an email and your model will be available at https://admin.sclbl.net.
Or, alternatively, you can use the 'endpoints()' function to list all your uploaded models. 



True

In [33]:
# After uploading, inspect the results
ep = sp.endpoints()

You currently own the following endpoints:

  1: LogisticRegression, 
   - cfid: 9cd6ecd0-12e3-11eb-8eec-9600004e79cc 
   - example: https://admin.sclbl.net/run.html?cfid=9cd6ecd0-12e3-11eb-8eec-9600004e79cc&exin=%5B%5B1%2C%201%5D%5D 

  2: Add - for js client, 
   - cfid: 27d21872-c4ff-11ea-816c-9600004e79cc 
   - example: https://admin.sclbl.net/run.html?cfid=27d21872-c4ff-11ea-816c-9600004e79cc&exin=%5B1%2C2%2C3%2C4%5D 

  3: Simple linear regression demo, 
   - cfid: e871d8e5-b2e2-11ea-a47d-9600004e79cc 
   - example: https://admin.sclbl.net/run.html?cfid=e871d8e5-b2e2-11ea-a47d-9600004e79cc&exin=%5B%5B2%2C%205%5D%5D 

  4: XGBoost breast cancer model, 
   - cfid: 007bdbaa-b093-11ea-a47d-9600004e79cc 
   - example: https://admin.sclbl.net/run.html?cfid=007bdbaa-b093-11ea-a47d-9600004e79cc&exin=%5B%5B17.99%2C%2010.38%2C%20122.8%2C%201001.0%2C%200.1184%2C%200.2776%2C%200.3001%2C%200.1471%2C%200.2419%2C%200.07871%2C%201.095%2C%200.9053%2C%208.589%2C%20153.4%2C%200.006399%2C%200.04904%

In [34]:
# The output produced by the Scailable .WASM
result = mod.decision_function(row.reshape(1,-1))
print(result) ## This is the output created by the Scailable .WASM

[1.76725083]


> **Note:** The prediction is not the class label but rather the output of the `decision_function()`. See the [`sklearn` documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.decision_function) for details. If the output is $>0$, the label $1$ is predicted.

### Consuming the REST endpoint
To wrap up: the REST endpoint running the deployed WebAssembly model can be tester [here](https://admin.sclbl.net/run.html?cfid=9cd6ecd0-12e3-11eb-8eec-9600004e79cc&exin=%5B%5B1%2C%201%5D%5D). Note that the generated .WASM executable can be downloaded [here](https://cdn.sclbl.net:8000/file/9cd6ecd0-12e3-11eb-8eec-9600004e79cc.wasm).