## Homework

In this homework, we will use Bank credit scoring dataset from [here](https://www.kaggle.com/datasets/kapturovalexander/bank-credit-scoring/data).

> **Note**: sometimes your answer doesn't match one of the options exactly. That's fine. 
Select the option that's closest to your solution.

> **Note**: we recommend using python 3.10 in this homework.

## Question 1

* Install Pipenv
* What's the version of pipenv you installed?
* Use `--version` to find out

In [None]:
# Install a pip package in the current Jupyter kernel
import sys
import json
!{sys.executable} -m pip install pipenv
! pipenv --version


[1mpipenv[0m, version 2023.10.3


pipenv, version 2023.10.3

## Question 2

* Use Pipenv to install Scikit-Learn version 1.3.1
* What's the first hash for scikit-learn you get in Pipfile.lock?
--> "sha256:0c275a06c5190c5ce00af0acbb61c06374087949f643ef32d355ece12c4db043"

> **Note**: you should create an empty folder for homework
and do it there. 

In [None]:
!{sys.executable} -m pipenv install scikit-learn==1.3.1

[1;32mInstalling scikit-[0m[1;33mlearn[0m[1;32m==[0m[1;36m1.3[0m[1;32m.[0m[1;36m1[0m[1;33m...[0m
[?25lResolving scikit-[33mlearn[0m==[1;36m1.3[0m.[1;36m1[0m[33m...[0m
[2K✔ Installation Succeeded
[2K[32m⠋[0m Installing scikit-learn...
[1A[2K[1mInstalling dependencies from Pipfile.lock [0m[1m([0m[1ma4b71e[0m[1m)[0m[1;33m...[0m
To activate this project's virtualenv, run [33mpipenv shell[0m.
Alternatively, run a command inside the virtualenv with [33mpipenv run[0m.


In [1]:
with open("/Pipfile.lock") as infile:
    print(json.loads(infile.read())["default"]["scikit-learn"]["hashes"][0])

FileNotFoundError: [Errno 2] No such file or directory: '/Pipfile.lock'

sha256:0c275a06c5190c5ce00af0acbb61c06374087949f643ef32d355ece12c4db043

## Models

We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:

```python
features = ['job','duration', 'poutcome']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression().fit(X, y)
```

> **Note**: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

* [DictVectorizer](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2023/05-deployment/homework/dv.bin?raw=true)
* [LogisticRegression](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2023/05-deployment/homework/model1.bin?raw=true)

With `wget`:

```bash
PREFIX=https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2023/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin
```

## Question 3

Let's use these models!

* Write a script for loading these models with pickle
* Score this client:

```json
{"job": "retired", "duration": 445, "poutcome": "success"}
```

What's the probability that this client will get a credit? 

* 0.162
* 0.392
* 0.652
* 0.902

If you're getting errors when unpickling the files, check their checksum:

```bash
$ md5sum model1.bin dv.bin
8ebfdf20010cfc7f545c43e3b52fc8a1  model1.bin
924b496a89148b422c74a62dbc92a4fb  dv.bin
```

In [27]:
## Sript to load these models with pickle then predict

#Libraries and external functions
import pickle

# Load dictionary vectoriser and model file
dv_file = 'dv.bin'
model_file = 'model1.bin'

with open(dv_file, 'rb') as f_in:
    dv = pickle.load(f_in)

with open(model_file, 'rb') as f_in:
    model = pickle.load(f_in)

Client = {"job": "retired", "duration": 445, "poutcome": "success"}

def pred_Credit(Client):
    X = dv.transform(Client)
    y_pred = model.predict_proba(X)[0,1]
    Credit = y_pred >= 0.5

    #Now output into ditionary
    result = {'Credit_prob': float(y_pred),
              'Give_Credit': bool(Credit)}
    return result

print(pred_Credit(Client))


{'Credit_prob': 0.9019309332297606, 'Give_Credit': True}


Answer is 0.902

## Question 4

Now let's serve this model as a web service

* Install Flask and gunicorn (or waitress, if you're on Windows)
* Write Flask code for serving the model
* Now score this client using `requests`:

```python
url = "YOUR_URL"
client = {"job": "unknown", "duration": 270, "poutcome": "failure"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a credit?

* 0.140
* 0.440
* 0.645
* 0.845


## Docker

Install [Docker](https://github.com/DataTalksClub/machine-learning-zoomcamp/blob/master/05-deployment/06-docker.md). 
We will use it for the next two questions.

For these questions, we prepared a base image: `svizor/zoomcamp-model:3.10.12-slim`. 
You'll need to use it (see Question 5 for an example).

This image is based on `python:3.10.12-slim` and has a logistic regression model 
(a different one) as well a dictionary vectorizer inside. 

This is how the Dockerfile for this image looks like:

```docker 
FROM python:3.10.12-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```

We already built it and then pushed it to [`svizor/zoomcamp-model:3.10.12-slim`](https://hub.docker.com/r/svizor/zoomcamp-model).

> **Note**: You don't need to build this docker image, it's just for your reference.

In [28]:
#Libraries and external functions
import pickle
from flask import Flask, request, jsonify


# Load dictionary vectoriser and model file
dv_file = 'dv.bin'
model_file = 'model1.bin'

with open(dv_file, 'rb') as f_in:
    dv = pickle.load(f_in)

with open(model_file, 'rb') as f_in:
    model = pickle.load(f_in)

app = Flask('predict')

@app.route('/predict', methods=['POST'])
def predict():
    client = request.get_json() #gets the value of the request as a JSON and turns it into a JSON
    result = pred_Credit(client)
    return jsonify(result)

if __name__ == "__main__":
    app.run(debug=True, host='0.0.0.0', port=9696)

 * Serving Flask app 'predict'
 * Debug mode: on


 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:9696
 * Running on http://192.168.0.18:9696
[33mPress CTRL+C to quit[0m
 * Restarting with stat
Traceback (most recent call last):
  File "/Users/marcusleiwe/anaconda3/envs/Ch5_MLzoomcamp/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in <module>
    app.launch_new_instance()
  File "/Users/marcusleiwe/anaconda3/envs/Ch5_MLzoomcamp/lib/python3.10/site-packages/traitlets/config/application.py", line 1052, in launch_instance
    app.initialize(argv)
  File "/Users/marcusleiwe/anaconda3/envs/Ch5_MLzoomcamp/lib/python3.10/site-packages/traitlets/config/application.py", line 117, in inner
    return method(app, *args, **kwargs)
  File "/Users/marcusleiwe/anaconda3/envs/Ch5_MLzoomcamp/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 689, in initialize
    self.init_sockets()
  File "/Users/marcusleiwe/anaconda3/envs/Ch5_MLzoomcamp/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 328, in 

SystemExit: 1

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


Now the script above is saved as `predict-test.py`. Run it in the terminal with `$python predict-test.py`

In [33]:
import requests

url = "http://localhost:9696/predict"
client = {"job": "unknown", "duration": 270, "poutcome": "failure"}
CreditQuery = requests.post(url, json=client).json()

print(CreditQuery)

{'Credit_prob': 0.13968947052356817, 'Give_Credit': False}


Answer is {'Credit_prob': 0.13968947052356817, 'Give_Credit': False}

Now it is working I can move from dev to prod by using gunicorn.
In the terminal type `$gunicorn --bind 0.0.0.0:9696 predict-test:app` `--bind` specifies the application url, while `predict-test` is the name of the file, and `app` is the variable of interest

Then test it again, it should work and give the same value as the cell above

In [19]:
url = "http://localhost:9696/predict"
client = {"job": "unknown", "duration": 270, "poutcome": "failure"}
CreditQuery = requests.post(url, json=client).json()

print(CreditQuery)

{'Credit_prob': 0.13968947052356817, 'Give_Credit': False}


## Question 5

Download the base image `svizor/zoomcamp-model:3.10.12-slim`. You can easily make it by using [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command.

So what's the size of this base image?

* 47 MB
* 147 MB
* 374 MB
* 574 MB

You can get this information when running `docker images` - it'll be in the "SIZE" column.

In [22]:
! docker pull svizor/zoomcamp-model:3.10.12-slim
! docker images

3.10.12-slim: Pulling from svizor/zoomcamp-model
Digest: sha256:e8441100b9d8da56344f50c673eb2daded3c61ce9565e45c3592c02f34fb3149
Status: Image is up to date for svizor/zoomcamp-model:3.10.12-slim
docker.io/svizor/zoomcamp-model:3.10.12-slim
[1m
What's Next?
[0m  View a summary of image vulnerabilities and recommendations → [36mdocker scout quickview svizor/zoomcamp-model:3.10.12-slim[0m
REPOSITORY              TAG            IMAGE ID       CREATED      SIZE
svizor/zoomcamp-model   3.10.12-slim   08266c8f0c4b   7 days ago   147MB


So the size is 147MB

## Dockerfile

Now create your own Dockerfile based on the image we prepared.

It should start like that:

```docker
FROM svizor/zoomcamp-model:3.10.12-slim
# add your stuff here
```

Now complete it:

* Install all the dependencies form the Pipenv file
* Copy your Flask script
* Run it with Gunicorn 

After that, you can build your docker image.

Dockerfile can be found [here]()

## Question 6

Let's run your docker container!

In [None]:
! docker run -it --rm -p 9696:9696

After running it, score this client once again:

```python
url = "YOUR_URL"
client = {"job": "retired", "duration": 445, "poutcome": "success"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a credit now?

* 0.168
* 0.530
* 0.730
* 0.968


## Submit the results

- Submit your results here: https://forms.gle/gfruq6FGoLass3Ff9
- If your answer doesn't match options exactly, select the closest one.
- You can submit your solution multiple times. In this case, only the last submission will be used


## Deadline

The deadline for submitting is October 16 (Monday), 23:00 CET. After that the form will be closed.