# MLZoomcamp 2022 - Session #5 - Homework

Author: José Victor

* Dataset: [Econometric Analysis](https://raw.githubusercontent.com/alexeygrigorev/datasets/master/AER_credit_card_data.csv)

## Imports

In [1]:
import pickle
import requests

## Question 1

* Install Pipenv
* What's the version of pipenv you installed?
* Use `--version` to find out

In [2]:
%pip install -q pipenv

Note: you may need to restart the kernel to use updated packages.


In [3]:
!pipenv --version

[1mpipenv[0m, version 2022.9.24
[0m

## Question 2

* Use Pipenv to install Scikit-Learn version 1.0.2
* What's the first hash for scikit-learn you get in Pipfile.lock? 
Answer: `"sha256:08ef968f6b72033c16c479c966bf37ccd49b06ea91b765e1cc27afefe723920b"`

### Models

We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:

```python
features = ['reports', 'share', 'expenditure', 'owner']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression(solver='liblinear').fit(X,y)
```

Note: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

* DictVectorizer
* LogisticRegression

With `wget`:

```bash
PREFIX=https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin
```


In [4]:
!pipenv install -q scikit-learn==1.0.2

[32m[1mInstalling scikit-learn==1.0.2...[0m
[K[1mAdding[0m [32m[1mscikit-learn[0m [1mto Pipfile's[0m [33m[1m[packages][0m[1m...[0m
[K[?25h✔ Installation Succeeded[0m 
[1mInstalling dependencies from Pipfile.lock (b0a961)...[0m
To activate this project's virtualenv, run [33mpipenv shell[0m.
Alternatively, run a command inside the virtualenv with [33mpipenv run[0m.
[0m

## Question 3

Let's use these models!

* Write a script for loading these models with pickle
* Score this client:

```python
{"reports": 0, "share": 0.001694, "expenditure": 0.12, "owner": "yes"}
```

What's the probability that this client will get a credit card?

* (X) 0.148
* ( ) 0.391
* ( ) 0.601
* ( ) 0.993

If you're getting errors when unpickling the files, check their checksum:

```bash
$ md5sum model1.bin dv.bin
3f57f3ebfdf57a9e1368dcd0f28a4a14  model1.bin
6b7cded86a52af7e81859647fa3a5c2e  dv.bin
```

In [5]:
client = {"reports": 0, "share": 0.001694, "expenditure": 0.12, "owner": "yes"}

In [6]:
model_file_path = "/home/jvictor/vs_code/mlzoomcamp2022_jvscursulim/05-deployment/homework/model1.bin"
dv_file_path = "/home/jvictor/vs_code/mlzoomcamp2022_jvscursulim/05-deployment/homework/dv.bin"

with open(file=model_file_path, mode="rb") as model_file:
    model = pickle.load(model_file)

with open(file=dv_file_path, mode="rb") as dv_file:
    dv = pickle.load(dv_file)


In [7]:
X = dv.transform(client)
y_pred = model.predict_proba(X)[0,1]
print(y_pred)

0.16213414434326598


## Question 4

Now let's serve this model as a web service

* Install Flask and gunicorn (or waitress, if you're on Windows)
* Write Flask code for serving the model
* Now score this client using `requests`:

```python
url = "YOUR_URL"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a credit card?

* ( ) 0.274
* ( ) 0.484
* ( ) 0.698
* (X) 0.928

In [8]:
!pipenv install -q flask gunicorn

[32m[1mInstalling flask...[0m
[K[1mAdding[0m [32m[1mflask[0m [1mto Pipfile's[0m [33m[1m[packages][0m[1m...[0m
[K[?25h✔ Installation Succeeded[0m 
[32m[1mInstalling gunicorn...[0m
[K[1mAdding[0m [32m[1mgunicorn[0m [1mto Pipfile's[0m [33m[1m[packages][0m[1m...[0m
[K[?25h✔ Installation Succeeded[0m 
[1mInstalling dependencies from Pipfile.lock (b0a961)...[0m
To activate this project's virtualenv, run [33mpipenv shell[0m.
Alternatively, run a command inside the virtualenv with [33mpipenv run[0m.
[0m

In [9]:
url = "http://localhost:4242/predict"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()

{'credit_card_probability': 0.9282218018527452}

## Docker

Install [Docker](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/05-deployment/06-docker.md). We will use it for the next two questions.

For these questions, we prepared a base image: `svizor/zoomcamp-model:3.9.12-slim`. You'll need to use it (see Question 5 for an example).

This image is based on `python:3.9.12-slim` and has a logistic regression model (a different one) as well a dictionary vectorizer inside.

This is how the Dockerfile for this image looks like:

```
FROM python:3.9.12-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```

We already built it and then pushed it to `svizor/zoomcamp-model:3.9.12-slim`.

Note: You don't need to build this docker image, it's just for your reference.

## Question 5

Download the base image `svizor/zoomcamp-model:3.9.12-slim`. You can easily make it by using [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command.

So what's the size of this base image?

* ( ) 15 Mb
* (X) 125 Mb
* ( ) 275 Mb
* ( ) 415 Mb

You can get this information when running `docker images` - it'll be in the "SIZE" column.

### Dockerfile

Now create your own Dockerfile based on the image we prepared.

It should start like that:

```
FROM svizor/zoomcamp-model:3.9.12-slim
# add your stuff here
```

Now complete it:

* Install all the dependencies form the Pipenv file
* Copy your Flask script
* Run it with Gunicorn

After that, you can build your docker image.

In [10]:
!docker pull svizor/zoomcamp-model:3.9.12-slim

3.9.12-slim: Pulling from svizor/zoomcamp-model
Digest: sha256:10445b40653d5ac17ede84db17f42ae8c4090b347a979372b8102174498b33b9
Status: Image is up to date for svizor/zoomcamp-model:3.9.12-slim
docker.io/svizor/zoomcamp-model:3.9.12-slim


In [11]:
!docker images

REPOSITORY              TAG           IMAGE ID       CREATED          SIZE
homework                latest        6e8130b2e607   30 minutes ago   594MB
svizor/zoomcamp-model   3.9.12-slim   571a6fdc554b   47 hours ago     125MB


## Question 6

Let's run your docker container!

After running it, score this client once again:

```python
url = "YOUR_URL"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a credit card now?

* ( ) 0.289
* ( ) 0.502
* ( ) 0.769
* (X) 0.972

In [12]:
%cd /home/jvictor/vs_code/mlzoomcamp2022_jvscursulim/05-deployment/homework

!docker build -t homework .

/home/jvictor/vs_code/mlzoomcamp2022_jvscursulim/05-deployment/homework
[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                                         
[?25h[1A[0G[?25l[+] Building 0.1s (10/10)                                                       
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 38B                                        0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m[34m => [internal] load metadata for docker.io/svizor/zoomcamp-model:3.9.12-s  0.0s
[0m[34m => [1/6] FROM docker.io/svizor/zoomcamp-model:3.9.12-slim                 0.0s
[0m[34m => [internal] load build context                                          0.0s
[0m[34m => => transferring context: 975B                                          0.0s
[0m[34m => CACHED 

In [13]:
!docker images

REPOSITORY              TAG           IMAGE ID       CREATED          SIZE
homework                latest        1e97a6c9e300   3 seconds ago    594MB
<none>                  <none>        6e8130b2e607   33 minutes ago   594MB
svizor/zoomcamp-model   3.9.12-slim   571a6fdc554b   47 hours ago     125MB


```bash
docker run -it --rm -p 4242:4242 homework
```

In [14]:
url = "http://localhost:4242/predict"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()

{'credit_card_probability': 0.9282218018527452}