## Question 1

* Install BentoML
* What's the version of BentoML you installed?
* Use `--version` to find out

In [3]:
!pip list | grep "bentoml"

bentoml                                      1.0.7


## Ans1: 1.0.7

## Question 2

Run the notebook which contains the xgboost model from module 6 i.e previous module and save the xgboost model with BentoML. To make it easier for you we have prepared this [notebook](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/07-bentoml-production/code/train.ipynb). 


How big approximately is the saved BentoML model? Size can slightly vary depending on your local development environment.
Choose the size closest to your model.

* 924kb
* 724kb
* 114kb
* 8kb

In [36]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer

from sklearn.ensemble import RandomForestClassifier

import xgboost as xgb

In [21]:
!wget https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/chapter-06-trees/CreditScoring.csv

--2022-10-26 13:59:39--  https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/chapter-06-trees/CreditScoring.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 182489 (178K) [text/plain]
Saving to: ‘CreditScoring.csv.2’


2022-10-26 13:59:41 (250 KB/s) - ‘CreditScoring.csv.2’ saved [182489/182489]



In [37]:
data = './CreditScoring.csv'
df = pd.read_csv(data)
df.head(10)

Unnamed: 0,Status,Seniority,Home,Time,Age,Marital,Records,Job,Expenses,Income,Assets,Debt,Amount,Price
0,1,9,1,60,30,2,1,3,73,129,0,0,800,846
1,1,17,1,60,58,3,1,1,48,131,0,0,1000,1658
2,2,10,2,36,46,2,2,3,90,200,3000,0,2000,2985
3,1,0,1,60,24,1,1,1,63,182,2500,0,900,1325
4,1,0,1,36,26,1,1,1,46,107,0,0,310,910
5,1,1,2,60,36,2,1,1,75,214,3500,0,650,1645
6,1,29,2,60,44,2,1,1,75,125,10000,0,1600,1800
7,1,9,5,12,27,1,1,1,35,80,0,0,200,1093
8,1,0,2,60,32,2,1,3,90,107,15000,0,1200,1957
9,2,0,5,48,41,2,1,2,90,80,0,0,1200,1468


In [38]:
df.columns = df.columns.str.lower()

status_values = {
    1: 'ok',
    2: 'default',
    0: 'unk'
}

df.status = df.status.map(status_values)

home_values = {
    1: 'rent',
    2: 'owner',
    3: 'private',
    4: 'ignore',
    5: 'parents',
    6: 'other',
    0: 'unk'
}

df.home = df.home.map(home_values)

marital_values = {
    1: 'single',
    2: 'married',
    3: 'widow',
    4: 'separated',
    5: 'divorced',
    0: 'unk'
}

df.marital = df.marital.map(marital_values)

records_values = {
    1: 'no',
    2: 'yes',
    0: 'unk'
}

df.records = df.records.map(records_values)

job_values = {
    1: 'fixed',
    2: 'partime',
    3: 'freelance',
    4: 'others',
    0: 'unk'
}

df.job = df.job.map(job_values)

for c in ['income', 'assets', 'debt']:
    df[c] = df[c].replace(to_replace=99999999, value=np.nan)

df = df[df.status != 'unk'].reset_index(drop=True)


In [39]:
df_train, df_test = train_test_split(df, test_size=0.2, random_state=11)

df_train = df_train.reset_index(drop=True)
df_test = df_test.reset_index(drop=True)

y_train = (df_train.status == 'default').astype('int').values
y_test = (df_test.status == 'default').astype('int').values

del df_train['status']
del df_test['status']


In [40]:
dv = DictVectorizer(sparse=False)

train_dicts = df_train.fillna(0).to_dict(orient='records')
X_train = dv.fit_transform(train_dicts)

test_dicts = df_test.fillna(0).to_dict(orient='records')
X_test = dv.transform(test_dicts)

In [41]:
#Random Forest
rf = RandomForestClassifier(n_estimators=200,
                            max_depth=10,
                            min_samples_leaf=3,
                            random_state=1)
rf.fit(X_train, y_train)

RandomForestClassifier(max_depth=10, min_samples_leaf=3, n_estimators=200,
                       random_state=1)

In [42]:
#XGBOOST
dtrain = xgb.DMatrix(X_train, label=y_train)

In [43]:
xgb_params = {
    'eta': 0.1, 
    'max_depth': 3,
    'min_child_weight': 1,

    'objective': 'binary:logistic',
    'eval_metric': 'auc',

    'nthread': 8,
    'seed': 1,
    'verbosity': 1,
}

model = xgb.train(xgb_params, dtrain, num_boost_round=175)

In [44]:
#BentoML
import bentoml

In [45]:
bentoml.xgboost.save_model(
    'credit_risk_model',
    model,
    custom_objects={
        'dictVectorizer': dv
    })

Model(tag="credit_risk_model:rrsdo4cvacha3oj3", path="/home/lillian/bentoml/models/credit_risk_model/rrsdo4cvacha3oj3/")

In [46]:
import json

In [47]:
request = df_test.iloc[0].to_dict()
print(json.dumps(request, indent=2))

{
  "seniority": 3,
  "home": "owner",
  "time": 36,
  "age": 26,
  "marital": "single",
  "records": "no",
  "job": "freelance",
  "expenses": 35,
  "income": 0.0,
  "assets": 60000.0,
  "debt": 3000.0,
  "amount": 800,
  "price": 1000
}


## Ans2:  206.6 kB

## Question 3

Say you have the following data that you're sending to your service:

```json
{
  "name": "Tim",
  "age": 37,
  "country": "US",
  "rating": 3.14
}
```

What would the pydantic class look like? You can name the class `UserProfile`.

In [49]:
from pydantic import BaseModel

In [50]:
class UserProfile(BaseModel):
    name: str
    age: int
    country: str
    rating: float

## Question 4

We've prepared a model for you that you can import using:

```bash
curl -O https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel.bentomodel
bentoml models import coolmodel.bentomodel
```

What version of scikit-learn was this model trained with?

* 1.1.1
* 1.1.2
* 1.1.3
* 1.1.4
* 1.1.5

In [55]:
import bentoml
bentoml.models.get("mlzoomcamp_homework:qtzdz3slg6mwwdu5")

Model(tag="mlzoomcamp_homework:qtzdz3slg6mwwdu5", path="/home/lillian/bentoml/models/mlzoomcamp_homework/qtzdz3slg6mwwdu5")

In [56]:
!bentoml models get mlzoomcamp_homework:qtzdz3slg6mwwdu5

[38;2;249;38;114;48;2;39;40;34mname[0m[38;2;248;248;242;48;2;39;40;34m:[0m[38;2;248;248;242;48;2;39;40;34m [0m[48;2;39;40;34mmlzoomcamp_homework[0m[48;2;39;40;34m                                                       [0m
[38;2;249;38;114;48;2;39;40;34mversion[0m[38;2;248;248;242;48;2;39;40;34m:[0m[38;2;248;248;242;48;2;39;40;34m [0m[48;2;39;40;34mqtzdz3slg6mwwdu5[0m[48;2;39;40;34m                                                       [0m
[38;2;249;38;114;48;2;39;40;34mmodule[0m[38;2;248;248;242;48;2;39;40;34m:[0m[38;2;248;248;242;48;2;39;40;34m [0m[48;2;39;40;34mbentoml.sklearn[0m[48;2;39;40;34m                                                         [0m
[38;2;249;38;114;48;2;39;40;34mlabels[0m[38;2;248;248;242;48;2;39;40;34m:[0m[38;2;248;248;242;48;2;39;40;34m [0m[48;2;39;40;34m{[0m[48;2;39;40;34m}[0m[48;2;39;40;34m                                                                      [0m
[38;2;249;38;114;48;2;39;40;34moptions[0m[38;2;24

## Ans4: 1.1.1

## Question 5 

Create a bento out of this scikit-learn model. The output type for this endpoint should be `NumpyNdarray()`

Send this array to the Bento:

```
[[6.4,3.5,4.5,1.2]]
```

You can use curl or the Swagger UI. What value does it return? 

* 0
* 1
* 2
* 3

(Make sure your environment has Scikit-Learn installed) 


In [57]:
model_ref = bentoml.sklearn.get("mlzoomcamp_homework:qtzdz3slg6mwwdu5")

In [58]:
model_ref.custom_objects

{}

In [70]:
!curl -X POST -F 'input=[[6.4,3.5,4.5,1.2]]' http://0.0.0.0:8089/classify

<!doctype html>
<html lang=en>
<title>404 Not Found</title>
<h1>Not Found</h1>
<p>The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.</p>


## Ans5: 1

## Question 6

Ensure to serve your bento with `--production` for this question

Install locust using:

```bash
pip install locust
```

Use the following locust file: [locustfile.py](locustfile.py)

Ensure that it is pointed at your bento's endpoint (In case you didn't name your endpoint "classify")

<img src="resources/classify-endpoint.png">

Configure 100 users with ramp time of 10 users per second. Click "Start Swarming" and ensure that it is working.

Now download a second model with this command:

```bash
curl -O https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel2.bentomodel
```

Or you can download with this link as well:
[https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel2.bentomodel](https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel2.bentomodel)

Now import the model:

```bash
bentoml models import coolmodel2.bentomodel
```

Update your bento's runner tag and test with both models. Which model allows more traffic (more throughput) as you ramp up the traffic?

**Hint 1**: Remember to turn off and turn on your bento service between changing the model tag. Use Ctl-C to close the service in between trials.

**Hint 2**: Increase the number of concurrent users to see which one has higher throughput

Which model has better performance at higher volumes?

* The first model
* The second model

In [62]:
!bentoml models get mlzoomcamp_homework:jsi67fslz6txydu5

[38;2;249;38;114;48;2;39;40;34mname[0m[38;2;248;248;242;48;2;39;40;34m:[0m[38;2;248;248;242;48;2;39;40;34m [0m[48;2;39;40;34mmlzoomcamp_homework[0m[48;2;39;40;34m                                                       [0m
[38;2;249;38;114;48;2;39;40;34mversion[0m[38;2;248;248;242;48;2;39;40;34m:[0m[38;2;248;248;242;48;2;39;40;34m [0m[48;2;39;40;34mjsi67fslz6txydu5[0m[48;2;39;40;34m                                                       [0m
[38;2;249;38;114;48;2;39;40;34mmodule[0m[38;2;248;248;242;48;2;39;40;34m:[0m[38;2;248;248;242;48;2;39;40;34m [0m[48;2;39;40;34mbentoml.sklearn[0m[48;2;39;40;34m                                                         [0m
[38;2;249;38;114;48;2;39;40;34mlabels[0m[38;2;248;248;242;48;2;39;40;34m:[0m[38;2;248;248;242;48;2;39;40;34m [0m[48;2;39;40;34m{[0m[48;2;39;40;34m}[0m[48;2;39;40;34m                                                                      [0m
[38;2;249;38;114;48;2;39;40;34moptions[0m[38;2;24

In [63]:
!bentoml models get mlzoomcamp_homework:qtzdz3slg6mwwdu5

[38;2;249;38;114;48;2;39;40;34mname[0m[38;2;248;248;242;48;2;39;40;34m:[0m[38;2;248;248;242;48;2;39;40;34m [0m[48;2;39;40;34mmlzoomcamp_homework[0m[48;2;39;40;34m                                                       [0m
[38;2;249;38;114;48;2;39;40;34mversion[0m[38;2;248;248;242;48;2;39;40;34m:[0m[38;2;248;248;242;48;2;39;40;34m [0m[48;2;39;40;34mqtzdz3slg6mwwdu5[0m[48;2;39;40;34m                                                       [0m
[38;2;249;38;114;48;2;39;40;34mmodule[0m[38;2;248;248;242;48;2;39;40;34m:[0m[38;2;248;248;242;48;2;39;40;34m [0m[48;2;39;40;34mbentoml.sklearn[0m[48;2;39;40;34m                                                         [0m
[38;2;249;38;114;48;2;39;40;34mlabels[0m[38;2;248;248;242;48;2;39;40;34m:[0m[38;2;248;248;242;48;2;39;40;34m [0m[48;2;39;40;34m{[0m[48;2;39;40;34m}[0m[48;2;39;40;34m                                                                      [0m
[38;2;249;38;114;48;2;39;40;34moptions[0m[38;2;24

## Ans6:  model2

## Question 7 (optional)

Go to this Bento deployment of Stable Diffusion: http://54.176.205.174/ (or deploy it yourself)

Use the txt2image endpoint and update the prompt to: "A cartoon dragon with sunglasses". 
Don't change the seed, it should be 0 by default

What is the resulting image?

### #1
<img src="resources/dragon1.jpeg">

### #2 
<img src="resources/dragon2.jpeg">

### #3 
<img src="resources/dragon3.jpeg">

### #4
<img src="resources/dragon4.jpeg">
