In [22]:
import numpy as np
import pandas as pd

import bentoml
import json
import requests

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer

from sklearn.ensemble import RandomForestClassifier

import xgboost as xgb

In [2]:
data = 'https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/chapter-06-trees/CreditScoring.csv'
df = pd.read_csv(data)

In [3]:
df.columns = df.columns.str.lower()

status_values = {
    1: 'ok',
    2: 'default',
    0: 'unk'
}

df.status = df.status.map(status_values)

home_values = {
    1: 'rent',
    2: 'owner',
    3: 'private',
    4: 'ignore',
    5: 'parents',
    6: 'other',
    0: 'unk'
}

df.home = df.home.map(home_values)

marital_values = {
    1: 'single',
    2: 'married',
    3: 'widow',
    4: 'separated',
    5: 'divorced',
    0: 'unk'
}

df.marital = df.marital.map(marital_values)

records_values = {
    1: 'no',
    2: 'yes',
    0: 'unk'
}

df.records = df.records.map(records_values)

job_values = {
    1: 'fixed',
    2: 'partime',
    3: 'freelance',
    4: 'others',
    0: 'unk'
}

df.job = df.job.map(job_values)

for c in ['income', 'assets', 'debt']:
    df[c] = df[c].replace(to_replace=99999999, value=np.nan)

df = df[df.status != 'unk'].reset_index(drop=True)

In [4]:
df_train, df_test = train_test_split(df, test_size=0.2, random_state=11)

df_train = df_train.reset_index(drop=True)
df_test = df_test.reset_index(drop=True)

y_train = (df_train.status == 'default').astype('int').values
y_test = (df_test.status == 'default').astype('int').values

del df_train['status']
del df_test['status']

In [5]:
dv = DictVectorizer(sparse=False)

train_dicts = df_train.fillna(0).to_dict(orient='records')
X_train = dv.fit_transform(train_dicts)

test_dicts = df_test.fillna(0).to_dict(orient='records')
X_test = dv.transform(test_dicts)

In [6]:
rf = RandomForestClassifier(n_estimators=200,
                            max_depth=10,
                            min_samples_leaf=3,
                            random_state=1)
rf.fit(X_train, y_train)

In [7]:
dtrain = xgb.DMatrix(X_train, label=y_train)

In [8]:
xgb_params = {
    'eta': 0.1, 
    'max_depth': 3,
    'min_child_weight': 1,

    'objective': 'binary:logistic',
    'eval_metric': 'auc',

    'nthread': 8,
    'seed': 1,
    'verbosity': 1,
}

model = xgb.train(xgb_params, dtrain, num_boost_round=175)

In [10]:
bentoml.xgboost.save_model(
    'credit_risk_model',
    model,
    custom_objects={
        'dictVectorizer': dv
    })

Model(tag="credit_risk_model:dmtxcn2trc7g4aaq", path="C:\Users\User\bentoml\models\credit_risk_model\dmtxcn2trc7g4aaq\")

## Question 1

* Install BentoML
* What's the version of BentoML you installed?
* Use `--version` to find out

In [9]:
!bentoml -v

bentoml, version 1.0.7


## Question 2

Run the notebook which contains the xgboost model from module 6 i.e previous module and save the xgboost model with BentoML. To make it easier for you we have prepared this [notebook](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/07-bentoml-production/code/train.ipynb). 


How big approximately is the saved BentoML model? Size can slightly vary depending on your local development environment.
Choose the size closest to your model.

* 924kb
* 724kb
* 114kb
* 8kb

Go to the folder: given as path in the output of the cell above. 
That is in my case `"C:\Users\User\bentoml\models\credit_risk_model\dmtxcn2trc7g4aaq\"`

I used gitbash here and need to remove the last slash and use the single or double quotes. \
`cd 'C:/Users/User/bentoml/models/credit_risk_model/dmtxcn2trc7g4aaq'`

Alternatively, the changing the Windows' backslash `\` with to the Unix's slash `/` also works without the quotes. \
`cd C:Users/User/bentoml/models/credit_risk_model/dmtxcn2trc7g4aaq`

Then executing the `ls -las` command gives sizes in kiloBytes, which is ~ 200 kB -> closest to 114 kB
![Model_Size](./src/model_size.png)

OR simply run in command line: \
`bentoml models list`

In [16]:
!bentoml models list

 Tag                          Module           Size        Creation Time       
 credit_risk_model:dmtxcn2t…  bentoml.xgboost  197.77 KiB  2022-10-24 13:39:15 


In [19]:
# Details of the model..
!bentoml models get credit_risk_model:dmtxcn2trc7g4aaq

name: credit_risk_model                                                        
version: dmtxcn2trc7g4aaq                                                      
module: bentoml.xgboost                                                        
labels: {}                                                                     
options:                                                                       
  model_class: Booster                                                         
metadata: {}                                                                   
context:                                                                       
  framework_name: xgboost                                                      
  framework_versions:                                                          
    xgboost: 1.6.1                                                             
  bentoml_version: 1.0.7                                                       
  python_version: 3.9.12                

In [17]:
request = df_test.iloc[0].to_dict()
print(json.dumps(request, indent=2))

{
  "seniority": 3,
  "home": "owner",
  "time": 36,
  "age": 26,
  "marital": "single",
  "records": "no",
  "job": "freelance",
  "expenses": 35,
  "income": 0.0,
  "assets": 60000.0,
  "debt": 3000.0,
  "amount": 800,
  "price": 1000
}


## Another email from your manager

Great job recruit! Looks like I won't be having to go back to the procurement team. Thanks for the information.

However, I just got word from one of the teams that's using one of our ML services and they're saying our service is "broken"
and their trying to blame our model. I looked at the data their sending and it's completely bogus. I don't want them
to send bad data to us and blame us for our models. Could you write a pydantic schema for the data that they should be sending?
That way next time it will tell them it's their data that's bad and not our model.

Thanks,

Mr McManager

## Question 3

Say you have the following data that you're sending to your service:

```json
{
  "name": "Tim",
  "age": 37,
  "country": "US",
  "rating": 3.14
}
```

What would the pydantic class look like? You can name the class `UserProfile`.


## Email from your CEO

Good morning! I hear you're the one to go to if I need something done well! We've got a new model that a big client
needs deployed ASAP. I need you to build a service with it and test it against the old model and make sure that it performs
better, otherwise we're going to lose this client. All our hopes are with you!

Thanks,

CEO of Acme Corp

The answer:
```python
from pydantic import BaseModel

class UserProfile(BaseModel):
    name: str
    age: int
    country: str
    rating: float
```

## Question 4

We've prepared a model for you that you can import using:

```bash
curl -O https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel.bentomodel
bentoml models import coolmodel.bentomodel
```

What version of scikit-learn was this model trained with?

* 1.1.1
* 1.1.2
* 1.1.3
* 1.1.4
* 1.1.5

In [20]:
!curl -O https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel.bentomodel

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  1724  100  1724    0     0   1285      0  0:00:01  0:00:01 --:--:--  1287


After importing the model as follows: \
`!bentoml models import coolmodel.bentomodel`

In [36]:
!bentoml models list

 Tag                          Module           Size        Creation Time       
 credit_risk_model:dmtxcn2t…  bentoml.xgboost  197.77 KiB  2022-10-24 13:39:15 
 mlzoomcamp_homework:qtzdz3…  bentoml.sklearn  5.79 KiB    2022-10-13 23:42:14 


In [45]:
# make sure to get full tag in the command line, as it is trimmed in the Jupyter notebook.
!bentoml models get mlzoomcamp_homework:qtzdz3slg6mwwdu5

name: mlzoomcamp_homework                                                      
version: qtzdz3slg6mwwdu5                                                      
module: bentoml.sklearn                                                        
labels: {}                                                                     
options: {}                                                                    
metadata: {}                                                                   
context:                                                                       
  framework_name: sklearn                                                      
  framework_versions:                                                          
    scikit-learn: 1.1.1                                                        
  bentoml_version: 1.0.7                                                       
  python_version: 3.9.12                                                       
signatures:                             

## Question 5 

Create a bento out of this scikit-learn model. The output type for this endpoint should be `NumpyNdarray()`

Send this array to the Bento:

```
[[6.4,3.5,4.5,1.2]]
```

You can use curl or the Swagger UI. What value does it return? 

* 0
* 1
* 2
* 3

(Make sure your environment has Scikit-Learn installed) 

In [40]:
# Input array shape to enforce in service2.py
input_data = [[6.4,3.5,4.5,1.2]]
np.array(input_data).shape

(1, 4)

After running the service2.py like below, we run the next cell\
`bentoml serve service2.py:svc`

In [44]:
print(requests.post(
     "http://localhost:3000/classify/",
     headers={"content-type": "application/json"},
     data= str(input_data)
 ).text)

[1]


## Question 6

Ensure to serve your bento with `--production` for this question

Install locust using:

```bash
pip install locust
```

Use the following locust file: [locustfile.py](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/cohorts/2022/07-bento-production/locustfile.py)

Ensure that it is pointed at your bento's endpoint (In case you didn't name your endpoint "classify")

Configure 100 users with ramp time of 10 users per second. Click "Start Swarming" and ensure that it is working.

Now download a second model with this command:

```bash
curl -O https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel2.bentomodel
```

Or you can download with this link as well:
[https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel2.bentomodel](https://s3.us-west-2.amazonaws.com/bentoml.com/mlzoomcamp/coolmodel2.bentomodel)

Now import the model:

```bash
bentoml models import coolmodel2.bentomodel
```

Update your bento's runner tag and test with both models. Which model allows more traffic (more throughput) as you ramp up the traffic?

**Hint 1**: Remember to turn off and turn on your bento service between changing the model tag. Use Ctl-C to close the service in between trials.

**Hint 2**: Increase the number of concurrent users to see which one has higher throughput

Which model has better performance at higher volumes?

* The first model
* The second model

After testing both models in the `production_tests folder` by changing the python-file in `bentofile.yaml` there as below: \
service3.py -> Model1 \
service4.py -> Model2 

Then building these models each time and running in production using `bentoml serve --production` and measuring with the locust:

Locus test results of the First Model:
![Model1Test](./src/model1test4.png)

And for the Second Model:
![Model2Test](./src/model2test4.png)


I think we can deduce that Model2 performs better because it has higher RPS(req per sec) than the Model1, even though it has worse stats.

---
The end of the Week 7 assignment.

---