## Homework

> Note: sometimes your answer might not match one of the options exactly. That's fine. 
Select the option that's closest to your solution.

The goal of this homework is to familiarize you with BentoML and how to build and test an ML production service.

## Background

You are a new recruit at ACME corp. Your manager is emailing you about your first assignment.


## Email from your manager

Good morning recruit! It's good to have you here! I have an assignment for you. I have a data scientist that's built
a credit risk model in a jupyter notebook. I need you to run the notebook and save the model with BentoML and see
how big the model is. If it's greater than a certain size, I'm going to have to request additional resources from 
our infra team. Please let me know how big it is.

Thanks,

Mr McManager

In [12]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer

from sklearn.ensemble import RandomForestClassifier

import xgboost as xgb

import json

In [2]:
import bentoml

## Question 1

* Install BentoML
* What's the version of BentoML you installed?
* Use `--version` to find out

In [3]:
version = bentoml.__version__

print(version)

1.0.7


## Question 2

How big approximately is the saved BentoML model? Size can slightly vary depending on your local development environment.
Choose the size closest to your model.

* 924kb
* 724kb
* 114kb
* 8kb

In [4]:
data = 'CreditScoring.csv'
df = pd.read_csv(data)

In [5]:
df.columns = df.columns.str.lower()

status_values = {
    1: 'ok',
    2: 'default',
    0: 'unk'
}

df.status = df.status.map(status_values)

home_values = {
    1: 'rent',
    2: 'owner',
    3: 'private',
    4: 'ignore',
    5: 'parents',
    6: 'other',
    0: 'unk'
}

df.home = df.home.map(home_values)

marital_values = {
    1: 'single',
    2: 'married',
    3: 'widow',
    4: 'separated',
    5: 'divorced',
    0: 'unk'
}

df.marital = df.marital.map(marital_values)

records_values = {
    1: 'no',
    2: 'yes',
    0: 'unk'
}

df.records = df.records.map(records_values)

job_values = {
    1: 'fixed',
    2: 'partime',
    3: 'freelance',
    4: 'others',
    0: 'unk'
}

df.job = df.job.map(job_values)

for c in ['income', 'assets', 'debt']:
    df[c] = df[c].replace(to_replace=99999999, value=np.nan)

df = df[df.status != 'unk'].reset_index(drop=True)

In [6]:
df_train, df_test = train_test_split(df, test_size=0.2, random_state=11)

df_train = df_train.reset_index(drop=True)
df_test = df_test.reset_index(drop=True)

y_train = (df_train.status == 'default').astype('int').values
y_test = (df_test.status == 'default').astype('int').values

del df_train['status']
del df_test['status']

In [7]:
dv = DictVectorizer(sparse=False)

train_dicts = df_train.fillna(0).to_dict(orient='records')
X_train = dv.fit_transform(train_dicts)

test_dicts = df_test.fillna(0).to_dict(orient='records')
X_test = dv.transform(test_dicts)

### Models

#### 1. Random Forest

In [8]:
rf = RandomForestClassifier(n_estimators=200,
                            max_depth=10,
                            min_samples_leaf=3,
                            random_state=1)
rf.fit(X_train, y_train)

#### 2. XGBoost

In [9]:
dtrain = xgb.DMatrix(X_train, label=y_train)

In [10]:
xgb_params = {
    'eta': 0.1, 
    'max_depth': 3,
    'min_child_weight': 1,

    'objective': 'binary:logistic',
    'eval_metric': 'auc',

    'nthread': 8,
    'seed': 1,
    'verbosity': 1,
}

model = xgb.train(xgb_params, dtrain, num_boost_round=175)

### Using BentoML

In [11]:
bentoml.xgboost.save_model(
    'credit_risk_model',
    model,
    custom_objects={
        'dictVectorizer': dv
    })

Model(tag="credit_risk_model:inkcu3su2kxxiaav", path="/home/igor-diniz/bentoml/models/credit_risk_model/inkcu3su2kxxiaav/")

In [13]:
request = df_test.iloc[0].to_dict()
print(json.dumps(request, indent=2))

{
  "seniority": 3,
  "home": "owner",
  "time": 36,
  "age": 26,
  "marital": "single",
  "records": "no",
  "job": "freelance",
  "expenses": 35,
  "income": 0.0,
  "assets": 60000.0,
  "debt": 3000.0,
  "amount": 800,
  "price": 1000
}


In [14]:
modelBML = bentoml.xgboost.save_model(
    'credit_risk_model',
    model,
    custom_objects={
        'dictVectorizer': dv
    })

In [15]:
modelBML.info.to_dict()

{'name': 'credit_risk_model',
 'version': 'sopdfmsu2kxxiaav',
 'module': 'bentoml.xgboost',
 'labels': {},
 'options': {'model_class': 'Booster'},
 'metadata': {},
 'context': {'framework_name': 'xgboost',
  'framework_versions': {'xgboost': '1.6.2'},
  'bentoml_version': '1.0.7',
  'python_version': '3.10.8'},
 'signatures': {'predict': {'batchable': False}},
 'api_version': 'v2',
 'creation_time': '2022-10-26T02:04:51.484514+00:00'}

In [16]:
modelBML.validate

<bound method Model.validate of Model(tag="credit_risk_model:sopdfmsu2kxxiaav", path="/home/igor-diniz/bentoml/models/credit_risk_model/sopdfmsu2kxxiaav/")>

In [21]:
!bentoml models list

[1m [0m[1mTag                         [0m[1m [0m[1m [0m[1mModule         [0m[1m [0m[1m [0m[1mSize      [0m[1m [0m[1m [0m[1mCreation Time      [0m[1m [0m
 credit_risk_model:sopdfmsu2…  bentoml.xgboost  197.77 KiB  2022-10-25 23:04:51 
 credit_risk_model:inkcu3su2…  bentoml.xgboost  197.77 KiB  2022-10-25 23:02:36 


So, the size is `197.77 KiB`

In [22]:
NewData = {
            "name": "Tim",
            "age": 37,
            "country": "US",
            "rating": 3.14
          }

In [35]:
!curl -O https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel.bentomodel
!bentoml models import coolmodel.bentomodel

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1724  100  1724    0     0   1626      0  0:00:01  0:00:01 --:--:--  1624


In [47]:
modelyaml = {
    'name': 'mlzoomcamp_homework',
    'version': 'qtzdz3slg6mwwdu5',
    'module': 'bentoml.sklearn',
    'labels': '{}',
    'options': '{}',
    'metadata': '{}',
    'context': '',
    'framework_name': 'sklearn',
    'framework_versions':[
        'scikit-learn: 1.1.1',
        'bentoml_version: 1.0.7',
        'python_version: 3.9.12',
        'signatures:',
        'predict:',
        'batchable: false'],
    'api_version': 'v1',
    'creation_time': '2022-10-13T20:42:14.411084+00:00'}


In [49]:
!pip install locust

Traceback (most recent call last):
  File "/bin/pip", line 11, in <module>
    load_entry_point('pip==20.0.2', 'console_scripts', 'pip')()
  File "/usr/lib/python3/dist-packages/pip/_internal/cli/main.py", line 73, in main
    command = create_command(cmd_name, isolated=("--isolated" in cmd_args))
  File "/usr/lib/python3/dist-packages/pip/_internal/commands/__init__.py", line 96, in create_command
    module = importlib.import_module(module_path)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, 

In [50]:
!curl -O https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel2.bentomodel

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1728  100  1728    0     0   1653      0  0:00:01  0:00:01 --:--:--  1653


In [51]:
!bentoml models import coolmodel2.bentomodel

Model(tag="mlzoomcamp_homework:jsi67fslz6txydu5") imported


In [62]:
from bentoml.io import JSON


# Pull the model as model reference (it pulls all the associate metadata of the model)
model_ref = bentoml.xgboost.get('credit_risk_model:latest')
# Create the model runner (it can also scale the model separately)
model_runner = model_ref.to_runner()
svc = bentoml.Service('credit_risk_classifier', runners=[model_runner])


# Define an endpoint on the BentoML service
@svc.api(input=JSON(), output=JSON()) # decorate endpoint as in json format for input and output
def classify(application_data):
    # transform data from client using dictvectorizer
    vector = DV.transform(application_data)
    # make predictions using 'runner.predict.run(input)' instead of 'model.predict'
    prediction = model_runner.predict.run(vector)
    
    result = prediction[0] # extract prediction from 1D array
    print('Prediction:', result)

    if result > 0.5:
        return {'Status': 'DECLINED'}
    elif result > 0.3:
        return {'Status': 'MAYBE'}
    else:
        return {'Status': 'APPROVED'}
