## Instructions
In this homework, we will use Credit Card Data from the previous homework.

Note: sometimes your answer doesn't match one of the options exactly. That's fine. Select the option that's closest to your solution.

## Question 1
-   Install Pipenv
-   What's the version of pipenv you installed?
-   Use `--version` to find out

In [1]:
!pip install pipenv



In [2]:
!pipenv --version

pipenv, version 2022.10.10



## Question 2
-   Use Pipenv to install Scikit-Learn version 1.0.2
-   What's the first hash for scikit-learn you get in Pipfile.lock?

Note: you should create an empty folder for homework and do it there.

**Answer:** 08ef968f6b72033c16c479c966bf37ccd49b06ea91b765e1cc27afefe723920b

## Models

We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:
```
features = ['reports', 'share', 'expenditure', 'owner']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression(solver='liblinear').fit(X, y)
```
Note: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:
-   DictVectorizer
-   LogisticRegression

With `wget`:
```
PREFIX=https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin
```

In [3]:
# Using wget package as no 'wget' command is natively available in Windows
!pip install wget



In [4]:
# Run once if files are not already downloaded

import wget

PREFIX="https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/05-deployment/homework"
!python -m wget $PREFIX/model1.bin
!python -m wget $PREFIX/dv.bin


Saved under model1.bin

Saved under dv.bin


## Question 3

Let's use these models!
-   Write a script for loading these models with pickle
-   Score this client:

`{"reports": 0, "share": 0.001694, "expenditure": 0.12, "owner": "yes"}`

What's the probability that this client will get a credit card?
-   0.162
-   0.391
-   0.601
-   0.993

If you're getting errors when unpickling the files, check their checksum:

```
$ md5sum model1.bin dv.bin
3f57f3ebfdf57a9e1368dcd0f28a4a14  model1.bin
6b7cded86a52af7e81859647fa3a5c2e  dv.bin
```

**Answer:** We first load the transformer (DictVectorizer) and the model (LogisticRegression) in the following cells. We then put the code in a script and later create a Flask based web app service in file `predict_prob.py`.

In [5]:
import numpy as np
import pickle

In [6]:
with open('model1.bin','rb') as f_model:
    model = pickle.load(f_model)

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


In [7]:
with open('dv.bin','rb') as f_dv:
    dv = pickle.load(f_dv)

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


In [8]:
client = {"reports": 0, "share": 0.001694, "expenditure": 0.12, "owner": "yes"}

In [9]:
X = dv.transform([client])
y_prob_pred = model.predict_proba(X)[0,1]
round(y_prob_pred,3)

0.162

To avoid the warning and risk assocaiated with version difference in Scikit-learn (or any other package, for that matter), we will run the script in the right virtual environment created by `pipenv`.

## Question 4

Now let's serve this model as a web service
-   Install Flask and gunicorn (or waitress, if you're on Windows)
-   Write Flask code for serving the model
-   Now score this client using `requests`:

```
url = "YOUR_URL"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()
```
What's the probability that this client will get a credit card?

-   0.274
-   0.484
-   0.698
-   0.928

**Answer:** The Flask based web application has been created in the  `predict_prob.py` script, to run at port 9696 for all active network interface addresses (including loopback 127.0.0.1) using `waitress-serve` running under the virtual environment created with `pipenv` using the command:
```
pipenv run waitress-serve --listen=0.0.0.0:9696 predict_prob:app
```
The `Pipfile` and `Pipfile.lock` have been created earlier along with the virtual environment by running the following command:
```
pipenv install numpy, scikit-learn==1.0.2, flask, gunicorn, waitress
```

In [10]:
import requests

In [11]:
# The probability-predictor app must be started at port 9696 before executing this cell - else rqequest will time out.

url = 'http://localhost:9696/predict'
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()

{'card': True, 'card_probability': 0.9282218018527452}

## Docker

Install [Docker](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/05-deployment/06-docker.md). We will use it for the next two questions.

For these questions, we prepared a base image: `svizor/zoomcamp-model:3.9.12-slim`. You'll need to use it (see Question 5 for an example).

This image is based on `python:3.9.12-slim` and has a logistic regression model (a different one) as well a dictionary vectorizer inside.

This is how the Dockerfile for this image looks like:
```
FROM python:3.9.12-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```
We already built it and then pushed it to [`svizor/zoomcamp-model:3.9.12-slim`](https://hub.docker.com/r/svizor/zoomcamp-model).

Note: You don't need to build this docker image, it's just for your reference.

## Question 5

Download the base image `svizor/zoomcamp-model:3.9.12-slim`. You can easily make it by using [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command.

So what's the size of this base image?
-   15 Mb
-   125 Mb
-   275 Mb
-   415 Mb

You can get this information when running `docker images` - it'll be in the "SIZE" column.

**Answer:** 125 MB

## Dockerfile

Now create your own Dockerfile based on the image we prepared.

It should start like that:
```
FROM svizor/zoomcamp-model:3.9.12-slim
# add your stuff here
```
Now complete it:
-   Install all the dependencies form the Pipenv file
-   Copy your Flask script
-   Run it with Gunicorn

After that, you can build your docker image.

## Question 6

Let's run your docker container!

After running it, score this client once again:
```
url = "YOUR_URL"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()
```
What's the probability that this client will get a credit card now?
-   0.289
-   0.502
-   0.769
-   0.972

**Answer:** The Docker file, as well as the new Flask based script `predict_prob2.py` (to make use of the `model2.bin` already loaded in the base image) have been created. The new Docker image tagged as `card-predict` is built using the command
```
docker build -t card-predict .
```
and then run as a web service at port 9696 using the command
```
docker run -it --rm -p 9696:9696 card-predict:latest
```
Note: The `gunicorn`/`waitress` services should be stopped before running the Docker image which also spawns `gunicorn` based service at port 9696.

In [12]:
# The Docker image must be started at port 9696 before executing this cell - else rqequest will time out.

url = "http://localhost:9696/predict"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()

{'card': True, 'card_probability': 0.7692649226628628}