# MLZoomcamp 2022 - Session #7 - Homework

Author: José Victor

* Goal: Familiarize with BentoML and how to build and test an ML production service.

* Dataset: [Credit Risk](https://raw.githubusercontent.com/gastonstat/CreditScoring/master/CreditScoring.csv)

In [2]:
%cd ..
%cd data

/home/jvictor/vs_code/mlzoomcamp2022_jvscursulim/07-bento-production
/home/jvictor/vs_code/mlzoomcamp2022_jvscursulim/07-bento-production/data


In [3]:
!wget https://raw.githubusercontent.com/gastonstat/CreditScoring/master/CreditScoring.csv

--2022-10-17 22:41:18--  https://raw.githubusercontent.com/gastonstat/CreditScoring/master/CreditScoring.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 182489 (178K) [text/plain]
Saving to: ‘CreditScoring.csv.1’


2022-10-17 22:41:19 (737 KB/s) - ‘CreditScoring.csv.1’ saved [182489/182489]



## Background

You are a new recruit at ACME corp. Your manager is emailing you about your first assignment.

## Email from your manager

Good morning recruit! It's good to have you here! I have an assignment for you. I have a data scientist that's built a credit risk model in a jupyter notebook. I need you to run the notebook and save the model with BentoML and see how big the model is. If it's greater than a certain size, I'm going to have to request additional resources from our infra team. Please let me know how big it is.

Thanks,

Mr McManager

## Imports

In [4]:
import bentoml
import numpy as np
import pandas as pd
import xgboost as xgb

from sklearn.feature_extraction import DictVectorizer
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

## Preparing and cleaning data

In [8]:
df = pd.read_csv("CreditScoring.csv")
df.head()

Unnamed: 0,Status,Seniority,Home,Time,Age,Marital,Records,Job,Expenses,Income,Assets,Debt,Amount,Price
0,1,9,1,60,30,2,1,3,73,129,0,0,800,846
1,1,17,1,60,58,3,1,1,48,131,0,0,1000,1658
2,2,10,2,36,46,2,2,3,90,200,3000,0,2000,2985
3,1,0,1,60,24,1,1,1,63,182,2500,0,900,1325
4,1,0,1,36,26,1,1,1,46,107,0,0,310,910


In [9]:
df.columns = df.columns.str.lower()

In [10]:
status_values = {1: "ok",
                 2: "default",
                 0: "unk"}

home_values = {1: "rent",
               2: "owner",
               3: "private",
               4: "ignore",
               5: "parents",
               6: "other",
               0: "unk"}

marital_values = {1: "single",
                  2: "married",
                  3: "widow",
                  4: "separated",
                  5: "divorced",
                  0: "unk"}

records_values = {1: "no",
                  2: "yes",
                  0: "unk"}

job_values = {1: "fixed",
              2: "partime",
              3: "freelance",
              4: "others",
              0: "unk"}

In [11]:
df.status = df.status.map(status_values)
df.home = df.home.map(home_values)
df.marital = df.marital.map(marital_values)
df.records = df.records.map(records_values)
df.job = df.job.map(job_values)

In [12]:
df.head()

Unnamed: 0,status,seniority,home,time,age,marital,records,job,expenses,income,assets,debt,amount,price
0,ok,9,rent,60,30,married,no,freelance,73,129,0,0,800,846
1,ok,17,rent,60,58,widow,no,fixed,48,131,0,0,1000,1658
2,default,10,owner,36,46,married,yes,freelance,90,200,3000,0,2000,2985
3,ok,0,rent,60,24,single,no,fixed,63,182,2500,0,900,1325
4,ok,0,rent,36,26,single,no,fixed,46,107,0,0,310,910


In [13]:
df.describe().round()

Unnamed: 0,seniority,time,age,expenses,income,assets,debt,amount,price
count,4455.0,4455.0,4455.0,4455.0,4455.0,4455.0,4455.0,4455.0,4455.0
mean,8.0,46.0,37.0,56.0,763317.0,1060341.0,404382.0,1039.0,1463.0
std,8.0,15.0,11.0,20.0,8703625.0,10217569.0,6344253.0,475.0,628.0
min,0.0,6.0,18.0,35.0,0.0,0.0,0.0,100.0,105.0
25%,2.0,36.0,28.0,35.0,80.0,0.0,0.0,700.0,1118.0
50%,5.0,48.0,36.0,51.0,120.0,3500.0,0.0,1000.0,1400.0
75%,12.0,60.0,45.0,72.0,166.0,6000.0,0.0,1300.0,1692.0
max,48.0,72.0,68.0,180.0,99999999.0,99999999.0,99999999.0,5000.0,11140.0


In [14]:
for c in ['income', 'assets', 'debt']:
    df[c] = df[c].replace(to_replace=99999999, value=np.nan)

In [15]:
df = df[df.status != 'unk'].reset_index(drop=True)

In [16]:
df_full_train, df_test = train_test_split(df, test_size=0.2, random_state=11)
df_train, df_val = train_test_split(df_full_train, test_size=0.25, random_state=11)

In [20]:
df_full_train = df_full_train.reset_index(drop=True)

In [21]:
y_full_train = (df_full_train.status == 'default').astype(int).values

In [22]:
del df_full_train['status']

In [23]:
dicts_full_train = df_full_train.to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X_full_train = dv.fit_transform(dicts_full_train)

dicts_test = df_test.to_dict(orient='records')
X_test = dv.transform(dicts_test)

In [24]:
dfulltrain = xgb.DMatrix(X_full_train, label=y_full_train,
                    feature_names=dv.get_feature_names())

dtest = xgb.DMatrix(X_test, feature_names=dv.get_feature_names())



## Question 1

* Install BentoML
* What's the version of BentoML you installed?
* Use  `--version` to find out

In [5]:
!pip install -q bentoml

In [6]:
!bentoml --version

bentoml, version 1.0.7


## Question 2

Run the notebook from module 6 and save the credit risk model with BentoML

How big approximately is the saved BentoML model?

* ( ) 924kb
* ( ) 724kb
* (X) 114kb
* ( ) 8kb

In [25]:
xgb_params = {
    'eta': 0.1, 
    'max_depth': 3,
    'min_child_weight': 1,

    'objective': 'binary:logistic',
    'eval_metric': 'auc',

    'nthread': 8,
    'seed': 1,
    'verbosity': 1,
}

model = xgb.train(xgb_params, dfulltrain, num_boost_round=175)

In [28]:
bentoml.xgboost.save_model("credit_risk_model", 
                           model=model,
                           custom_objects={"dictVectorizer": dv})

In [34]:
!bentoml models list

[1m [0m[1mTag                         [0m[1m [0m[1m [0m[1mModule         [0m[1m [0m[1m [0m[1mSize      [0m[1m [0m[1m [0m[1mCreation Time      [0m[1m [0m
 credit_risk_model:pyte4nsor…  bentoml.xgboost  197.00 KiB  2022-10-17 23:13:44 
 credit_risk_model:npfxcnsor…  bentoml.xgboost  197.00 KiB  2022-10-17 23:06:04 
 credit_risk_model:6zdegqsoq…  bentoml.xgboost  196.30 KiB  2022-10-17 22:55:37 


## Another email from your manager

Great job recruit! Looks like I won't be having to go back to the procurement team. Thanks for the information.

However, I just got word from one of the teams that's using one of our ML services and they're saying our service is "broken" and their trying to blame our model. I looked at the data their sending and it's completely bogus. I don't want them to send bad data to us and blame us for our models. Could you write a pydantic schema for the data that they should be sending? That way next time it will tell them it's their data that's bad and not our model.

Thanks,

Mr McManager

## Question 3

Say you have the following data that you're sending to your service:

```json
{
    "name": "Tim",
    "age": 37,
    "country": "US",
    "rating": 3.14
}
```

What would the pydantic class look like? You can name the class `UserProfile`.

In [1]:
!pip install pydantic

Collecting pydantic
  Using cached pydantic-1.10.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.6 MB)
Installing collected packages: pydantic
Successfully installed pydantic-1.10.2


In [2]:
from pydantic import BaseModel

class UserProfile(BaseModel):
    name: str
    age: int
    country: str
    rating: float

## Email from your CEO

Good morning! I hear you're the one to go to if I need something done well! We've got a new model that a big client needs deployed ASAP. I need you to build a service with it and test it against the old model and make sure that it performs better, otherwise we're going to lose this client. All our hopes are with you!

Thanks,

CEO of Acme Corp

## Question 4

We've prepared a model for you that you can import using:

```bash
curl -O https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel.bentomodel
bentoml models import coolmodel.bentomodel
```

What version of scikit-learn was this model trained with?

* (X) 1.1.1
* ( ) 1.1.2
* ( ) 1.1.3
* ( ) 1.1.4
* ( ) 1.1.5

In [36]:
%cd ..
!curl -O https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel.bentomodel

/home/jvictor/vs_code/mlzoomcamp2022_jvscursulim/07-bento-production
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1724  100  1724    0     0   1748      0 --:--:-- --:--:-- --:--:--  1746


In [39]:
%cd ..

/home/jvictor/vs_code/mlzoomcamp2022_jvscursulim


In [40]:
%cd ..

/home/jvictor/vs_code


In [41]:
%cd

/home/jvictor


In [43]:
%cd bentoml

/home/jvictor/bentoml


In [44]:
%cd models

/home/jvictor/bentoml/models


In [None]:
!bentoml models import coolmodel.bentomodel

In [45]:
%ls

coolmodel.bentomodel  [0m[01;34mcredit_risk_model[0m/  [01;34mmlzoomcamp_homework[0m/


In [46]:
!bentoml models get mlzoomcamp_homework:latest

[91;40mname[0m[97;40m:[0m[97;40m [0m[40mmlzoomcamp_homework[0m[40m                                                       [0m
[91;40mversion[0m[97;40m:[0m[97;40m [0m[40mqtzdz3slg6mwwdu5[0m[40m                                                       [0m
[91;40mmodule[0m[97;40m:[0m[97;40m [0m[40mbentoml.sklearn[0m[40m                                                         [0m
[91;40mlabels[0m[97;40m:[0m[97;40m [0m[40m{[0m[40m}[0m[40m                                                                      [0m
[91;40moptions[0m[97;40m:[0m[97;40m [0m[40m{[0m[40m}[0m[40m                                                                     [0m
[91;40mmetadata[0m[97;40m:[0m[97;40m [0m[40m{[0m[40m}[0m[40m                                                                    [0m
[91;40mcontext[0m[97;40m:[0m[40m                                                                        [0m
[97;40m  [0m[91;40mframework_name[0m[97;40m:

## Question 5

Create a bento out of this scikit-learn model. This will require installing scikit-learn like this:

```bash
pip install scikit-learn
```

Hint: The output type for this endpoint should be NumPyNdarray()

Send this array to the bento:

```python
[[6.4, 3.5, 4.5, 1.2]]
```

You can use curl of the Swagger UI. What value does it return?

* ( ) 0
* (X) 1
* ( ) 2
* ( ) 3

In [21]:
!curl -X POST -H "Content-Type: application/json" --data "[[6.4, 3.5, 4.5, 1.2]]" http://0.0.0.0:3000/classify

[1]

## Question 6

Ensure to serve your bento with `--production` for this question

Install locust using:
```bash
pip install locust
```
Use the following locust file: [locustfile.py](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/cohorts/2022/07-bento-production/locustfile.py)

Ensure that it is pointed at your bento's endpoint (In case you didn't name your endpoint "classify")

![image](https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/07-bento-production/resources/classify-endpoint.png)

Configure 100 users with ramp time of 10 users per second. Click "Start Swarming" and ensure that it is working

Now download a second model with this command:
```bash
curl -O https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel2.bentomodel
```
Or you can download with this link as well: [https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel2.bentomodel](https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel2.bentomodel)

Now import the model:
```bash
bentoml models import coolmode2.bentomodel
```
Update your bento's runner tag and test with both models. Which model allows more traffic (more throughput) as your ramp up the traffic? Remember to turn off and turn on your bento service between changing the model tag. Use Crtl-C to close the service. Then call bentoml serve

Test out the first model and the second model, which one performance better at higher volumes?

* ( ) The first model
* (X) The second model

In [10]:
%cd ..

/home/jvictor/vs_code/mlzoomcamp2022_jvscursulim/07-bento-production


In [11]:
%cd ..

/home/jvictor/vs_code/mlzoomcamp2022_jvscursulim


In [12]:
%cd ..

/home/jvictor/vs_code


In [13]:
%cd ..

/home/jvictor


In [14]:
%cd bentoml

/home/jvictor/bentoml


In [16]:
%cd models

/home/jvictor/bentoml/models


In [18]:
!curl -O https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel2.bentomodel

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1728  100  1728    0     0   1836      0 --:--:-- --:--:-- --:--:--  1834


In [19]:
!bentoml models import coolmodel2.bentomodel

Model(tag="mlzoomcamp_homework:jsi67fslz6txydu5") imported


## Email from markenting

Hello ML person! I hope this email finds you well. I've heard there's this cool new ML model called Stable Diffusion. I hear if you give it a description of a picture it will generate an image. We need a new company logo and I want it to be fierce but also cool, think you could help out?

Thanks,

Mike Marketer

## Question 7 (optional)

Go to this Bento deployment of Stable Diffusion: [http://54.176.205.174/](http://54.176.205.174/) (or deploy it yourself)

Use the txt2image endpoint and update the prompt to: "A cartoon dragon with sunglasses". Don't change the seed, it should be 0 by default

What is the resulting image?

#1

![image](https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/07-bento-production/resources/dragon1.jpeg)

#2

![image](https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/07-bento-production/resources/dragon2.jpeg)

#3 (X)

![image](https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/07-bento-production/resources/dragon3.jpeg)

#4

![image](https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/07-bento-production/resources/dragon4.jpeg)