# Deployment (Including Serialization)
This notebook walks through the basics of how to set up a model to be served from a webserver.

In [1]:
%matplotlib inline 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
plt.style.use('ggplot')

We can use the `joblib` library to deserialize the serialized pipeline.  HOWEVER... we need to make sure we have loaded all of the transformer classes into the scope here, or else deserialization will fail:

In [2]:
# pipe = joblib.load("train_pipe.joblib")

AttributeError: module '__main__' has no attribute 'FeatureSelector'

I've put all the relevant transformers in a separate script called `pipeline.py`, and we can import them all in one go:

In [3]:
from pipeline import *

In [4]:
import pipeline
dir(pipeline)

['BaseEstimator',
 'DateTimeExpander',
 'FeatureSelector',
 'FeelsLikeExpander',
 'LagExpander',
 'TargetDropper',
 'Temp',
 'TransformerMixin',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'feels_like',
 'pd']

In [5]:
! cat pipeline.py

import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin

class FeatureSelector(BaseEstimator, TransformerMixin):

    def __init__(self, feature_names, ts_index):
        self.feature_names = feature_names
        self.index = ts_index

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        X = X.set_index(pd.to_datetime(X[self.index]))
        return X[self.feature_names]

class DateTimeExpander(BaseEstimator, TransformerMixin):

    def __init__(self):
        pass

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        dts = pd.Series(X.index).dt
        X["dts_month"] = dts.month.values
        X["dts_hour"] = dts.hour.values
        X["dts_day_of_week"] = dts.dayofweek.values

        return X

from meteocalc import Temp, feels_like
class FeelsLikeExpander(BaseEstimator, TransformerMixin):

    def __init__(self, temp_col, hum_col, windspeed_col, atemp_col):
   

Now the pipeline can be deserialized correctly

In [6]:
pipe = joblib.load("train_pipe.joblib")

We can see that the steps from the pipeline are perfectly preserved:

In [7]:
pipe.steps



[('feat_pipe',
  Pipeline(steps=[('feat_select',
                   FeatureSelector(feature_names=['temp', 'hum', 'windspeed',
                                                  'cnt'],
                                   ts_index=None)),
                  ('feat_dts', DateTimeExpander()),
                  ('feat_feels',
                   FeelsLikeExpander(atemp_col='atemp', hum_col=None,
                                     temp_col=None, windspeed_col=None)),
                  ('feat_lag', LagExpander(lag_col=None)),
                  ('target_dropper', TargetDropper(target_col=None))])),
 ('scaler', MinMaxScaler()),
 ('regressor', LinearRegression())]

Now we can load in some data for testing the deserialized pipeline.  We don't need to worry about train/test split here... this is just to verify that it works.

In [8]:
dat = pd.read_csv("../data/bike-hour-raw.csv")

Since the sklearn apis are vectorized, we can request and retrieve many predictions at once:

In [13]:
pipe.predict(dat[:10])

array([  6.30861094,  10.18964081,  19.43346503,  33.97426817,
        43.16755554,  37.72667015,  56.25009971,  60.13230485,
        80.01827321, 110.57811815])

When we want to make requests against a webserver, we'll need to *serialize* the data on our end in order to transmit it as a web request.

(Launch server from other notebook)

In [22]:
serialized_input = dat[:10].to_json()
serialized_input

'{"temp":{"0":3.28,"1":2.34,"2":2.34,"3":3.28,"4":3.28,"5":3.28,"6":2.34,"7":1.4,"8":3.28,"9":7.04},"hum":{"0":81.0,"1":80.0,"2":80.0,"3":75.0,"4":75.0,"5":75.0,"6":80.0,"7":86.0,"8":75.0,"9":76.0},"windspeed":{"0":0.0,"1":0.0,"2":0.0,"3":0.0,"4":0.0,"5":6.0032,"6":0.0,"7":0.0,"8":0.0,"9":0.0},"casual":{"0":3,"1":8,"2":5,"3":3,"4":0,"5":0,"6":2,"7":1,"8":1,"9":8},"registered":{"0":13,"1":32,"2":27,"3":10,"4":1,"5":1,"6":0,"7":2,"8":7,"9":6},"cnt":{"0":16,"1":40,"2":32,"3":13,"4":1,"5":1,"6":2,"7":3,"8":8,"9":14},"dtetime":{"0":"2011-01-01 00:00:00","1":"2011-01-01 01:00:00","2":"2011-01-01 02:00:00","3":"2011-01-01 03:00:00","4":"2011-01-01 04:00:00","5":"2011-01-01 05:00:00","6":"2011-01-01 06:00:00","7":"2011-01-01 07:00:00","8":"2011-01-01 08:00:00","9":"2011-01-01 09:00:00"}}'

With properly serialized data, we can pass the payload as *POST* data inside a request, and our server can pick it up from there.

In [23]:
import requests 
  
url = "http://127.0.0.1:5000"
response = requests.post(url, data={"input": serialized_input})

In [24]:
response.ok

True

In [25]:
response.json()

[6.308610938795027,
 10.189640807547036,
 19.433465029011387,
 33.97426816928325,
 43.167555539713504,
 37.72667015287527,
 56.25009970813392,
 60.13230485249136,
 80.0182732113939,
 110.57811814908152]