# 1. Info

Notebook with all the code needed to solve the homework for the week five of the machine learning zoomcamp.

## Install the required libraries

In [11]:
import pandas as pd
import numpy as np
import pickle
import requests
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.feature_extraction import DictVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

# Models

We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:

```python
    features = ['job','duration', 'poutcome']
    dicts = df[features].to_dict(orient='records')

    dv = DictVectorizer(sparse=False)
    X = dv.fit_transform(dicts)

    model = LogisticRegression().fit(X, y)
```

> Note: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

* DictVectorizer
* LogisticRegression


# Question 1

* Install Pipenv
* What's the version of pipenv you installed?
* Use --version to find out

In [7]:
!pipenv --version

[1mpipenv[0m, version 2023.6.11


# Question 2

* Use Pipenv to install Scikit-Learn version 1.3.1
* What's the first hash for scikit-learn you get in Pipfile.lock?

```python
    "sha256": "60769f8685f3244e386c4352f0a932474ea30e1bb54e5e4cfe24967fbc1c6348"
```

# Question 3

Let's use these models!

Write a script for loading these models with pickle

Score this client:
```
{"job": "retired", "duration": 445, "poutcome": "success"}
```

What's the probability that this client will get a credit?

* 0.162
* 0.392
* 0.652
* 0.902

If you're getting errors when unpickling the files, check their checksum:

In [8]:
dv_vectorizer = './dv.bin'
with open(dv_vectorizer, 'rb') as dv_file:
  dv = pickle.load(dv_file)

In [9]:
model_path = './model1.bin'
with open(model_path, 'rb') as model_file:
  model = pickle.load(model_file)

In [10]:
customer = {"job": "retired", "duration": 445, "poutcome": "success"}
X = dv.transform([customer])
model.predict_proba(X)[0,1].round(3)

0.902

# Question 4

Now let's serve this model as a web service

* Install Flask and gunicorn (or waitress, if you're on Windows)
* Write Flask code for serving the model
* Now score this client using requests:

```python
    What's the probability that this client will get a credit?
```

* 0.140
* 0.440
* 0.645
* 0.845

In [14]:
url = "http://localhost:9696/predict"
client = {"job": "unknown", "duration": 270, "poutcome": "failure"}
resp = requests.post(url, json=client).json()

In [19]:
round(resp['credit_probability'],3)

0.14

# Docker

Install Docker. We will use it for the next two questions.

For these questions, we prepared a base image: svizor/zoomcamp-model:3.10.12-slim. You'll need to use it (see Question 5 for an example).

This image is based on python:3.10.12-slim and has a logistic regression model (a different one) as well a dictionary vectorizer inside.

This is how the Dockerfile for this image looks like:

```python
    FROM python:3.10.12-slim
    WORKDIR /app
    COPY ["model2.bin", "dv.bin", "./"]
```
We already built it and then pushed it to svizor/zoomcamp-model:3.10.12-slim.

Note: You don't need to build this docker image, it's just for your reference.

# Question 5

Download the base image svizor/zoomcamp-model:3.10.12-slim. You can easily make it by using docker pull command.

So what's the size of this base image?

* 47 MB
* 147 MB
* 374 MB
* 574 MB

You can get this information when running docker images - it'll be in the "SIZE" column.

![Alt text](image.png)

# Dockerfile

Now create your own Dockerfile based on the image we prepared.

It should start like that:

```Python
FROM svizor/zoomcamp-model:3.10.12-slim
# add your stuff here
```

Now complete it:

* Install all the dependencies form the Pipenv file
* Copy your Flask script
* Run it with Gunicorn

After that, you can build your docker image.

# Question 6

Let's run your docker container!

After running it, score this client once again:

```Python
url = "YOUR_URL"
client = {"job": "retired", "duration": 445, "poutcome": "success"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a credit now?

* 0.168
* 0.530
* 0.730
* 0.968

In [29]:
url = "http://localhost:9696/predict"
client = {"job": "retired", "duration": 445, "poutcome": "success"}
resp = requests.post(url, json=client).json()

In [30]:
round(resp['credit_probability'],3)

0.902