# BentoML and Production

In this project, we are going to work again with the [Credit Scoring dataset](https://github.com/gastonstat/CreditScoring/) from module 6, but this time we are concerned with sending our machine learning model to production. For that, we are going to pick the XGBoost with optimized parameters found at the end of module 6, and use the [BentoML](https://www.bentoml.com/) package to perform that task.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline

We are going to load a version of the dataset with all the preprocessing already done.

In [2]:
df = pd.read_csv('../data/CreditScoringClean.csv')

df.head()

Unnamed: 0,status,seniority,home,time,age,marital,records,job,expenses,income,assets,debt,amount,price
0,ok,9,rent,60,30,married,no,freelance,73,129.0,0.0,0.0,800,846
1,ok,17,rent,60,58,widow,no,fixed,48,131.0,0.0,0.0,1000,1658
2,default,10,owner,36,46,married,yes,freelance,90,200.0,3000.0,0.0,2000,2985
3,ok,0,rent,60,24,single,no,fixed,63,182.0,2500.0,0.0,900,1325
4,ok,0,rent,36,26,single,no,fixed,46,107.0,0.0,0.0,310,910


## Data Preparation
The data will be split into train/validation/test in a 60%/20%/20% ratio.

In [3]:
from sklearn.model_selection import train_test_split

df_train_full, df_test = train_test_split(df, test_size=0.2, random_state=11)

print("Size of full set: {}".format(len(df)))
print("Size of train set: {} -> {:0.1f} %".format(len(df_train_full), 100 * len(df_train_full) / len(df)))
print("Size of test set: {} -> {:0.1f} %".format(len(df_test), 100 * len(df_test) / len(df)))

Size of full set: 4454
Size of train set: 3563 -> 80.0 %
Size of test set: 891 -> 20.0 %


In [4]:
from sklearn.feature_extraction import DictVectorizer

# Generating
y_train = (df_train_full.status == 'default').values
y_test = (df_test.status == 'default').values

# Deleting status
del df_train_full['status']
del df_test['status']

# Filling missing values with zeros
dict_train = df_train_full.fillna(0).to_dict(orient='records')
dict_test = df_test.fillna(0).to_dict(orient='records')

# One-hot encoding
dv = DictVectorizer(sparse=False)
X_train = dv.fit_transform(dict_train)
X_test = dv.transform(dict_test)

Now, we train the model and test it on unseen data (the test set):

In [5]:
import xgboost as xgb
from sklearn.metrics import roc_auc_score

dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

xgb_params = {'eta': 0.1,
              'max_depth': 3,
              'min_child_weight': 1,
              'objective': 'binary:logistic',
              'nthread': 8,
              'seed': 1,
              'eval_metric': 'auc'}

model_xgb_final = xgb.train(xgb_params, 
                            dtrain,
                            num_boost_round=180)

y_pred_train = model_xgb_final.predict(dtrain)
y_pred_test = model_xgb_final.predict(dtest)

xgb_score_train = roc_auc_score(y_train, y_pred_train)
xgb_score_test = roc_auc_score(y_test, y_pred_test)

print("ROC AUC Train = {:0.3f}".format(xgb_score_train))
print("ROC AUC Test = {:0.3f}".format(xgb_score_test))

ROC AUC Train = 0.923
ROC AUC Test = 0.833


### BentoML
To send the model to production, we first need to save it. For that, we are going to use BentoML

In [7]:
import bentoml

bentoml.xgboost.save_model("credit_risk_model", model_xgb_final,
                           custom_objects={
                            "DictVectorizer": dv
                           },
                           signatures={
                            "predict": {
                                "batchable": True,
                                "batch_dim": 0
                            }
                           }
                          )

Model(tag="credit_risk_model:7apooicuusfmwqen", path="C:\Users\Filipe\bentoml\models\credit_risk_model\7apooicuusfmwqen\")

It gave us a tag for our model and a path where the model is saved. 

Now, we will get a sample from the test set test in our service.

In [8]:
import json

request = df_test.iloc[0].to_dict()
print(json.dumps(request, indent=2))

{
  "seniority": 3,
  "home": "owner",
  "time": 36,
  "age": 26,
  "marital": "single",
  "records": "no",
  "job": "freelance",
  "expenses": 35,
  "income": 0.0,
  "assets": 60000.0,
  "debt": 3000.0,
  "amount": 800,
  "price": 1000
}
