## Using BentoML with scikit-learn

In thus notebook we will wrap a Logistic Regression classifier (trained using the Titanic dataset) into a REST API endpoint using `bentoml`.

BentoML makes this really easier. For details on `BentoML` head over to https://github.com/bentoml/BentoML.

In [19]:
# Import dependencies
import pandas as pd
import numpy as np
import warnings

warnings.filterwarnings("ignore")

In [20]:
# Load the dataset in a DataFrame object
url = "http://s3.amazonaws.com/assets.datacamp.com/course/Kaggle/train.csv"
df = pd.read_csv(url)

include = ['Age', 'Sex', 'Embarked', 'Survived'] # Only four features
df_ = df[include]

In [21]:
# Data preprocessing: Missing value replacement with 0 and one-hot encoding of the categoricals
categoricals = []
for col, col_type in df_.dtypes.iteritems():
     if col_type == 'O':
          categoricals.append(col)
     else:
          df_[col].fillna(0, inplace=True)

df_ohe = pd.get_dummies(df_, columns=categoricals, dummy_na=True)

In [None]:
# Ready to call clf.fit() :P
from sklearn.linear_model import LogisticRegression

dependent_variable = 'Survived'
x = df_ohe[df_ohe.columns.difference([dependent_variable])]
y = df_ohe[dependent_variable]

lr = LogisticRegression()
lr.fit(x, y)

In [35]:
%%writefile passenger_classifier.py
from bentoml import BentoService, api, env, artifacts
from bentoml.artifact import PickleArtifact
from bentoml.handlers import JsonHandler

# Explicitly specify it
import pandas as pd 

# You can also import your own python module here and BentoML will automatically
# figure out the dependency chain and package all those python modules

@artifacts([PickleArtifact('model')])
@env(conda_pip_dependencies=["scikit-learn"])
class PassengerClassifier(BentoService):
    
    @api(JsonHandler)
    def predict(self, json):
        # Arbitrary preprocessing or feature fetching code can be placed here 
        query_df = pd.DataFrame(json)
        query = pd.get_dummies(query_df)
        column_names = ['Age',
                     'Embarked_C',
                     'Embarked_Q',
                     'Embarked_S',
                     'Embarked_nan',
                     'Sex_female',
                     'Sex_male',
                     'Sex_nan']
        query = query.reindex(columns=column_names, fill_value=0)
        prediction = self.artifacts.model.predict(query)
    
        return prediction

Overwriting passenger_classifier.py


In [36]:
# 1) import the custom BentoService defined above
from passenger_classifier import PassengerClassifier

# 2) `pack` it with required artifacts
svc = PassengerClassifier.pack(model=lr)

# 3) save packed BentoService as archive
svc.save('./bento_archive', version='v0.0.1')
# Archive will be saved to ./bento_archive/PassengerClassifier/v0.0.1/

'./bento_archive/PassengerClassifier/0.0.v0.0.1'

When you execute `bentoml serve ./bento_archive/PassengerClassifier/0.0.v0.0.1/` you should get a trace which looks something like the following - 
```
* Serving Flask app "PassengerClassifier" (lazy loading)
* Environment: production
  WARNING: Do not use the development server in a production environment.
  Use a production WSGI server instead.
* Debug mode: off
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
```

You can now use an API client like Postman to test your API endpoint. You can send request to the endpoint with the following configuration in Postman - 

![](https://i.ibb.co/R21pF1f/Screen-Shot-2019-04-19-at-12-01-02-PM.png)

When you hit the `Send` button, you should get a response like the following - 
```
[
    0,
    1,
    0,
    0
]
```

Feel free to map these integer values to more meaningful messages. 