# Exploring training vs. deployment requirements
> 19-10-2020

In this notebook we illustrate the differences between model training and model deployment in a bit more depth, using a simple logistic regression model as an example.

## The data
We start by generating some data that we can use to fit our example logistic regression model to. The code below generates 1000 observations according to the following simple model:

$Pr(y = 1 | x) = \frac{1}{1 + e^{-1(.75 + 1.5x_1 -.5x_2)}}$.

Thus, we have $\beta_0 = .75$, $\beta_1 = 1.5$, and $\beta_2=-.5$.

In [33]:
import numpy as np

# Simulate Data Generating Process
n = 1000  # 1000 observations
x1 = np.random.uniform(-2,2,n)  # x_1 & x_2 between -2 and 2
x2 = np.random.uniform(-2,2,n)
p = 1 / (1 + np.exp( -1*(.75 + 1.5*x1 - .5*x2) ))  # Implement DGP

y = np.random.binomial(1, p, n)  # Draw outcomes

# Create dataset and print first few lines:
data = np.column_stack((x1,x2,y))
print(data[:10])

print(data[:,[0,1]])

[[ 0.21699199  1.70626983  0.        ]
 [ 0.33088623  0.31927093  1.        ]
 [ 0.42139769 -0.56196383  1.        ]
 [ 1.62725099  1.67986145  1.        ]
 [-0.0407933  -0.53697263  1.        ]
 [-1.29861553  0.08937192  0.        ]
 [ 0.78383487  1.43907342  1.        ]
 [ 1.27006773  0.05205827  1.        ]
 [-1.03640642 -0.81844398  0.        ]
 [-1.73163517  1.64711652  0.        ]]
[[ 0.21699199  1.70626983]
 [ 0.33088623  0.31927093]
 [ 0.42139769 -0.56196383]
 ...
 [-1.30713503 -0.84853104]
 [-0.35355419  1.11673014]
 [ 0.02514672  0.41085014]]


## Fitting the model
After generating the example data, we can fit the model. Note that we print the iterations of the model to make explicit how the training is carried out.

In [34]:
from sklearn.linear_model import LogisticRegression
mod = LogisticRegression().fit(data[:,[0,1]], np.ravel(data[:,[2]]))

In [49]:
b = np.concatenate((mod.intercept_, mod.coef_.flatten()))

print(b)
print(mod.n_iter_)

[ 0.7911692   1.58688224 -0.45307928]
[8]


In [51]:
import sys

def get_size(obj, seen=None):
    """Recursively finds size of objects"""
    size = sys.getsizeof(obj)
    if seen is None:
        seen = set()
    obj_id = id(obj)
    if obj_id in seen:
        return 0
    # Important mark as seen *before* entering recursion to gracefully handle
    # self-referential objects
    seen.add(obj_id)
    if isinstance(obj, dict):
        size += sum([get_size(v, seen) for v in obj.values()])
        size += sum([get_size(k, seen) for k in obj.keys()])
    elif hasattr(obj, '__dict__'):
        size += get_size(obj.__dict__, seen)
    elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
        size += sum([get_size(i, seen) for i in obj])
    return size

print(get_size(data))
print(get_size(mod))
print(get_size(b))

24496
2424
216


In [53]:
import pickle
s = pickle.dumps(mod)

print(get_size(s))

pickle.dump(mod, open( "model.pickle", "wb" ) )

841


In [54]:
import sclblpy as sp


*** Thanks for importing sclblpy! ***
You can use the 'upload()' function to upload your models.
To inspect your currently uploaded models, use `endpoints()`.
Check the docs at https://pypi.org/project/sclblpy/ for more info. 



In [55]:
sp.upload(mod)

TypeError: upload() missing 1 required positional argument: 'feature_vector'

In [63]:
row = np.array([1,1])
sp.upload(mod, row)

Your stored user credentials have been removed. 
You will have to re-enter your username and password next time.
Successfully removed your user credentials.
We will simply use LogisticRegression as its name without further documentation.
Please provide your username: maurits.kaptein@scailable.net
Please type your password: ········
Would you like us to store your user credentials (y/n)? y
Your model was successfully uploaded to Scailable!
NOTE: After transpiling, we will send you an email and your model will be available at https://admin.sclbl.net.
Or, alternatively, you can use the 'endpoints()' function to list all your uploaded models. 



True

In [64]:
sp.endpoints()

You currently own the following endpoints:

  1: LogisticRegression, 
   - cfid: 09694abc-1219-11eb-8eec-9600004e79cc 
   - example: https://admin.sclbl.net/run.html?cfid=09694abc-1219-11eb-8eec-9600004e79cc&exin=%5B%5B1%2C%201%5D%5D 

  2: Add - for js client, 
   - cfid: 27d21872-c4ff-11ea-816c-9600004e79cc 
   - example: https://admin.sclbl.net/run.html?cfid=27d21872-c4ff-11ea-816c-9600004e79cc&exin=%5B1%2C2%2C3%2C4%5D 

  3: Simple linear regression demo, 
   - cfid: e871d8e5-b2e2-11ea-a47d-9600004e79cc 
   - example: https://admin.sclbl.net/run.html?cfid=e871d8e5-b2e2-11ea-a47d-9600004e79cc&exin=%5B%5B2%2C%205%5D%5D 

  4: XGBoost breast cancer model, 
   - cfid: 007bdbaa-b093-11ea-a47d-9600004e79cc 
   - example: https://admin.sclbl.net/run.html?cfid=007bdbaa-b093-11ea-a47d-9600004e79cc&exin=%5B%5B17.99%2C%2010.38%2C%20122.8%2C%201001.0%2C%200.1184%2C%200.2776%2C%200.3001%2C%200.1471%2C%200.2419%2C%200.07871%2C%201.095%2C%200.9053%2C%208.589%2C%20153.4%2C%200.006399%2C%200.04904%

[{'agent': 'toolchain',
  'cfid': '09694abc-1219-11eb-8eec-9600004e79cc',
  'created_day': '19 Oct 2020',
  'created_time': '16:40:30',
  'cycles': '1',
  'docs': '-- EMPTY --',
  'exampleinput': '[[1, 1]]',
  'exampleoutput': '[1.0]',
  'filename': 'b54c7068-7453-4444-a802-e41b286d31b6-0733929d-1219-11eb-98cf-9600004e7a82.wasm',
  'id': 192,
  'jwt_secured': False,
  'location': 'https://cdn.sclbl.net:8000/file/09694abc-1219-11eb-8eec-9600004e79cc.wasm',
  'name': 'LogisticRegression',
  'profit': '0.1',
  'updated_day': '19 Oct 2020',
  'updated_time': '16:40:30'},
 {'agent': 'user',
  'cfid': '27d21872-c4ff-11ea-816c-9600004e79cc',
  'created_day': '13 Jul 2020',
  'created_time': '13:51:15',
  'cycles': '1',
  'docs': 'Model add',
  'exampleinput': '[1,2,3,4]',
  'exampleoutput': '',
  'filename': 'sclbl-intsum.wasm',
  'id': 166,
  'jwt_secured': False,
  'location': 'https://cdn.sclbl.net:8000/file/27d21872-c4ff-11ea-816c-9600004e79cc.wasm',
  'name': 'Add - for js client',
  'pr

In [73]:
print(row)
mod.predict_log_proba(row.reshape(1, -1))

[1 1]


array([[-2.0611449 , -0.13617274]])

In [79]:
mod.predict_log_proba(np.array([1,1]).reshape(1, -1))

array([[-2.0611449 , -0.13617274]])

In [74]:
mod.get_params()

{'C': 1.0,
 'class_weight': None,
 'dual': False,
 'fit_intercept': True,
 'intercept_scaling': 1,
 'l1_ratio': None,
 'max_iter': 100,
 'multi_class': 'auto',
 'n_jobs': None,
 'penalty': 'l2',
 'random_state': None,
 'solver': 'lbfgs',
 'tol': 0.0001,
 'verbose': 0,
 'warm_start': False}