## Homework

In this homework, we will use Bank credit scoring dataset from [here](https://www.kaggle.com/datasets/kapturovalexander/bank-credit-scoring/data).

> **Note**: sometimes your answer doesn't match one of the options exactly. That's fine. 
Select the option that's closest to your solution.

> **Note**: we recommend using python 3.10 in this homework.

## Question 1

* Install Pipenv
* What's the version of pipenv you installed?
* Use `--version` to find out

In [1]:
%%sh
pip install pipenv
pipenv --version




[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


pipenv, version 2023.9.8


In [2]:
print("The pipenv version is 2023.9.8.")

The pipenv version is 2023.9.8.


## Question 2

* Use Pipenv to install Scikit-Learn version 1.3.1
* What's the first hash for scikit-learn you get in Pipfile.lock?

> **Note**: you should create an empty folder for homework
and do it there. 


## Models

We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:

```python
features = ['job','duration', 'poutcome']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression().fit(X, y)
```

> **Note**: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

* [DictVectorizer](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2023/05-deployment/homework/dv.bin?raw=true)
* [LogisticRegression](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2023/05-deployment/homework/model1.bin?raw=true)

With `wget`:

```bash
PREFIX=https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2023/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin
```

In [3]:
!pipenv install scikit-learn

[1;32mInstalling scikit-learn[0m[1;33m...[0m
[?25lResolving scikit-learn[33m...[0m
[2K✔ Installation Succeeded
[2K[32m⠋[0m Installing scikit-learn...
[1A[2K[1mInstalling dependencies from Pipfile.lock [0m[1m([0m[1m5bcbaf[0m[1m)[0m[1;33m...[0m
To activate this project's virtualenv, run [33mpipenv shell[0m.
Alternatively, run a command inside the virtualenv with [33mpipenv run[0m.


In [4]:
import json
with open('Pipfile.lock', 'rb') as pipfile:
    json_file = json.load(pipfile)
    first_hash = json_file['_meta']['hash']
    print(f"The first Pipfile.lock has is: {first_hash}.")

The first Pipfile.lock has is: {'sha256': '274a40143df6722a398a09f94cdbe356f1669d80b40c4059aa5e7fb0bc5bcbaf'}.


In [5]:
!mkdir homework_05
PREFIX="https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2023/05-deployment/homework"
!wget $PREFIX/model1.bin $PREFIX/dv.bin -P homework_05

mkdir: cannot create directory ‘homework_05’: File exists
--2023-10-16 22:04:12--  https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2023/05-deployment/homework/model1.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 842 [application/octet-stream]
Saving to: ‘homework_05/model1.bin.1’


2023-10-16 22:04:12 (97.2 MB/s) - ‘homework_05/model1.bin.1’ saved [842/842]

--2023-10-16 22:04:12--  https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2023/05-deployment/homework/dv.bin
Reusing existing connection to raw.githubusercontent.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 560 [application/octet-stream]
Saving to: ‘homework_05/dv.bin.1’


2023-10-16 22:04:12 (72.7 M

## Question 3

Let's use these models!

* Write a script for loading these models with pickle
* Score this client:

```json
{"job": "retired", "duration": 445, "poutcome": "success"}
```

What's the probability that this client will get a credit? 

* 0.162
* 0.392
* 0.652
* 0.902

If you're getting errors when unpickling the files, check their checksum:

```bash
$ md5sum model1.bin dv.bin
8ebfdf20010cfc7f545c43e3b52fc8a1  model1.bin
924b496a89148b422c74a62dbc92a4fb  dv.bin
```

In [7]:
import pickle

model_file = "homework_05/model1.bin"
dv_file = "homework_05/dv.bin"

def read_pickle(file_path):
    with open(file_path, "rb") as f_in:
        py_object = pickle.load(f_in)
    return py_object

model = read_pickle(model_file)
dv = read_pickle(dv_file)

In [8]:
customer = {"job": "retired", "duration": 445, "poutcome": "success"}

In [9]:
def predict(customer):

    X = dv.transform([customer])
    y_pred = model.predict_proba(X)[0, 1]
    get_credit = y_pred >= 0.5

    result = {"credit_probability": float(y_pred), "get_credit": bool(get_credit)}

    return result

In [10]:
credit_analysis = predict(customer)
credit_analysis

{'credit_probability': 0.9019309332297606, 'get_credit': True}

In [11]:
print(f"The probability of this customer get credit is {credit_analysis['credit_probability']:.3f}")

The probability of this customer get credit is 0.902


## Question 4

Now let's serve this model as a web service

* Install Flask and gunicorn (or waitress, if you're on Windows)
* Write Flask code for serving the model
* Now score this client using `requests`:

```python
url = "YOUR_URL"
client = {"job": "unknown", "duration": 270, "poutcome": "failure"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a credit?

* 0.140
* 0.440
* 0.645
* 0.845

**The api code is in predict.py**

In [23]:
url = 'http://localhost:9696/predict'
client_id = 'alpha-123'
client = {"job": "unknown", "duration": 270, "poutcome": "failure"}
response = requests.post(url, json=client).json()
print(response)

{'credit_probability': 0.13968947052356817, 'get_credit': False}


In [25]:
if response['get_credit'] == True:
    print(f'The client {client_id} will get credit, the probability is {response["credit_probability"]:.3f}.')
else:
    print(f'The client {client_id} will not get credit, the probability is {response["credit_probability"]:.3f}.')

The client alpha-123 will not get credit, the probability is 0.140.


## Docker

Install [Docker](https://github.com/DataTalksClub/machine-learning-zoomcamp/blob/master/05-deployment/06-docker.md). 
We will use it for the next two questions.

For these questions, we prepared a base image: `svizor/zoomcamp-model:3.10.12-slim`. 
You'll need to use it (see Question 5 for an example).

This image is based on `python:3.10.12-slim` and has a logistic regression model 
(a different one) as well a dictionary vectorizer inside. 

This is how the Dockerfile for this image looks like:

```docker 
FROM python:3.10.12-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```

We already built it and then pushed it to [`svizor/zoomcamp-model:3.10.12-slim`](https://hub.docker.com/r/svizor/zoomcamp-model).

> **Note**: You don't need to build this docker image, it's just for your reference.

## Question 5

Download the base image `svizor/zoomcamp-model:3.10.12-slim`. You can easily make it by using [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command.

So what's the size of this base image?

* 47 MB
* 147 MB
* 374 MB
* 574 MB

You can get this information when running `docker images` - it'll be in the "SIZE" column.

In [26]:
print("The image base size is 147MB.")

The image base size is 147MB.


## Dockerfile

Now create your own Dockerfile based on the image we prepared.

It should start like that:

```docker
FROM svizor/zoomcamp-model:3.10.12-slim
# add your stuff here
```

Now complete it:

* Install all the dependencies form the Pipenv file
* Copy your Flask script
* Run it with Gunicorn 

After that, you can build your docker image.

## Question 6

Let's run your docker container!

After running it, score this client once again:

```python
url = "YOUR_URL"
client = {"job": "retired", "duration": 445, "poutcome": "success"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a credit now?

* 0.168
* 0.530
* 0.730
* 0.968

In [None]:
import requests

url = 'http://localhost:9696/predict'

client = {"job": "retired", "duration": 445, "poutcome": "success"}
response = requests.post(url, json=client).json()
print(response)