## Question 1

Install BentoML

### What's the version of BentoML you installed?

Use --version to find out

In [None]:
#Answer : 1.0.7

## Question 2
Run the notebook which contains random forest model from module 6 i.e previous module and save the model with BentoML. To make it easier for you we have prepared this notebook.

### How big approximately is the saved BentoML model? Size can slightly vary depending on your local development environment. Choose the size closest to your model.

* 924kb
* 724kb
* 114kb
* 8kb

In [3]:
# download data:
!curl --create-dirs -o './data/CreditScoring.csv' 'https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/chapter-06-trees/CreditScoring.csv'


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  178k  100  178k    0     0  1047k      0 --:--:-- --:--:-- --:--:-- 1054k


In [7]:
#Build 
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer

from sklearn.ensemble import RandomForestClassifier

import xgboost as xgb

### Data preparation

data = './data/CreditScoring.csv'
df = pd.read_csv(data)

df.columns = df.columns.str.lower()

status_values = {
    1: 'ok',
    2: 'default',
    0: 'unk'
}

df.status = df.status.map(status_values)

home_values = {
    1: 'rent',
    2: 'owner',
    3: 'private',
    4: 'ignore',
    5: 'parents',
    6: 'other',
    0: 'unk'
}

df.home = df.home.map(home_values)

marital_values = {
    1: 'single',
    2: 'married',
    3: 'widow',
    4: 'separated',
    5: 'divorced',
    0: 'unk'
}

df.marital = df.marital.map(marital_values)

records_values = {
    1: 'no',
    2: 'yes',
    0: 'unk'
}

df.records = df.records.map(records_values)

job_values = {
    1: 'fixed',
    2: 'partime',
    3: 'freelance',
    4: 'others',
    0: 'unk'
}

df.job = df.job.map(job_values)

for c in ['income', 'assets', 'debt']:
    df[c] = df[c].replace(to_replace=99999999, value=np.nan)

df = df[df.status != 'unk'].reset_index(drop=True)

df_train, df_test = train_test_split(df, test_size=0.2, random_state=11)

df_train = df_train.reset_index(drop=True)
df_test = df_test.reset_index(drop=True)

y_train = (df_train.status == 'default').astype('int').values
y_test = (df_test.status == 'default').astype('int').values

del df_train['status']
del df_test['status']

dv = DictVectorizer(sparse=False)

train_dicts = df_train.fillna(0).to_dict(orient='records')
X_train = dv.fit_transform(train_dicts)

test_dicts = df_test.fillna(0).to_dict(orient='records')
X_test = dv.transform(test_dicts)

### Random forest

rf = RandomForestClassifier(n_estimators=200,
                            max_depth=10,
                            min_samples_leaf=3,
                            random_state=1)
rf.fit(X_train, y_train)

### XGBoost

'''
Note:

We removed feature names

It was 

```python
features = dv.get_feature_names_out()
dtrain = xgb.DMatrix(X_train, label=y_train, feature_names=features)
```

Now it's

```python
dtrain = xgb.DMatrix(X_train, label=y_train)
```
'''

dtrain = xgb.DMatrix(X_train, label=y_train)

xgb_params = {
    'eta': 0.1, 
    'max_depth': 3,
    'min_child_weight': 1,

    'objective': 'binary:logistic',
    'eval_metric': 'auc',

    'nthread': 8,
    'seed': 1,
    'verbosity': 1,
}

model = xgb.train(xgb_params, dtrain, num_boost_round=175)

In [10]:
import bentoml

bentoml.xgboost.save_model(
                            'credit_risk_score', 
                            model=model, 
                            custom_objects={
                                "dictVectorizer": dv
                            })

Model(tag="credit_risk_score:573xgwsqikftuepw", path="/home/ubuntu/bentoml/models/credit_risk_score/573xgwsqikftuepw/")

In [None]:
# Answer is 197 KiB (Closest option: 114kb)

# USED BELOW COMMAND:
# !bentoml models list

```
Another email from your manager
Great job recruit! Looks like I won't be having to go back to the procurement team. Thanks for the information.

However, I just got word from one of the teams that's using one of our ML services and they're saying our service is "broken" and their trying to blame our model. I looked at the data their sending and it's completely bogus. I don't want them to send bad data to us and blame us for our models. Could you write a pydantic schema for the data that they should be sending? That way next time it will tell them it's their data that's bad and not our model.

Thanks,

Mr McManager
```

## Question 3
Say you have the following data that you're sending to your service:
```
{
  "name": "Tim",
  "age": 37,
  "country": "US",
  "rating": 3.14
}
```
### What would the pydantic class look like? You can name the class UserProfile.

In [2]:
# Answer is below:
from pydantic import BaseModel

class UserProfile(BaseModel):
    name: str
    age: int
    country: str
    rating: float

```
Email from your CEO
Good morning! I hear you're the one to go to if I need something done well! We've got a new model that a big client needs deployed ASAP. I need you to build a service with it and test it against the old model and make sure that it performs better, otherwise we're going to lose this client. All our hopes are with you!

Thanks,

CEO of Acme Corp
```


## Question 4
We've prepared a model for you that you can import using:

```
curl -O https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel.bentomodel
bentoml models import coolmodel.bentomodel
```

### What version of scikit-learn was this model trained with?

* 1.1.1
* 1.1.2
* 1.1.3
* 1.1.4
* 1.1.5

In [3]:
# Answer : 1.1.1 
!curl -O https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel.bentomodel


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1724  100  1724    0     0   4414      0 --:--:-- --:--:-- --:--:--  4409


In [4]:
!bentoml models import coolmodel.bentomodel

Model(tag="mlzoomcamp_homework:qtzdz3slg6mwwdu5") imported


In [5]:
!bentoml models get mlzoomcamp_homework:qtzdz3slg6mwwdu5

[38;2;249;38;114;48;2;39;40;34mname[0m[38;2;248;248;242;48;2;39;40;34m:[0m[38;2;248;248;242;48;2;39;40;34m [0m[48;2;39;40;34mmlzoomcamp_homework[0m[48;2;39;40;34m                                                       [0m
[38;2;249;38;114;48;2;39;40;34mversion[0m[38;2;248;248;242;48;2;39;40;34m:[0m[38;2;248;248;242;48;2;39;40;34m [0m[48;2;39;40;34mqtzdz3slg6mwwdu5[0m[48;2;39;40;34m                                                       [0m
[38;2;249;38;114;48;2;39;40;34mmodule[0m[38;2;248;248;242;48;2;39;40;34m:[0m[38;2;248;248;242;48;2;39;40;34m [0m[48;2;39;40;34mbentoml.sklearn[0m[48;2;39;40;34m                                                         [0m
[38;2;249;38;114;48;2;39;40;34mlabels[0m[38;2;248;248;242;48;2;39;40;34m:[0m[38;2;248;248;242;48;2;39;40;34m [0m[48;2;39;40;34m{[0m[48;2;39;40;34m}[0m[48;2;39;40;34m                                                                      [0m
[38;2;249;38;114;48;2;39;40;34moptions[0m[38;2;24

## Question 5
Create a bento out of this scikit-learn model. This will require installing scikit-learn like this:

`pip install scikit-learn`

Hint: The output type for this endpoint should be NumpyNdarray()

Send this array to the bento:

[[6.4,3.5,4.5,1.2]]

### You can use curl or the Swagger UI. What value does it return?

* 0
* 1
* 2
* 3

In [1]:
# Asnwer : 1

*created service.py with below content:*

```
import bentoml
from bentoml.io import JSON

model_ref = bentoml.sklearn.get("mlzoomcamp_homework:qtzdz3slg6mwwdu5")

model_runner = model_ref.to_runner()

svc = bentoml.Service('mlzoomcamp_homework', runners=[model_runner])

@svc.api(input=JSON(), output=JSON())
def classify(application_data):
    prediction = model_runner.predict.run(application_data)
    return { "prediction": prediction }
```

*And then ran below command*

`bentoml serve service.py:svc`

## Question 6
Ensure to serve your bento with --production for this question

Install locust using:

`pip install locust`

Use the following locust file: locustfile.py

Ensure that it is pointed at your bento's endpoint (In case you didn't name your endpoint "classify")



Configure 100 users with ramp time of 10 users per second. Click "Start Swarming" and ensure that it is working.

Now download a second model with this command:

`curl -O https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel2.bentomodel`
Or 
you can download with this link as well: https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel2.bentomodel

Now import the model:

`bentoml models import coolmodel2.bentomodel`

### Update your bento's runner tag and test with both models. Which model allows more traffic (more throughput) as you ramp up the traffic?

Hint 1: Remember to turn off and turn on your bento service between changing the model tag. Use Ctl-C to close the service in between trials.

Hint 2: Increase the number of concurrent users to see which one has higher throughput

Which model has better performance at higher volumes?

* The first model
* The second model

In [2]:
#Answer The second model