# Week 5: Deployment Homework

## Question 1

- Install Pipenv
- Whats the version of pipenv you installed?

> use `--version` to find out

Output obtained:

```bash
$ pip install pipenv
...
$ pipenv --version
pipenv, version 2022.10.12
```

## Question 2

- Use pipenv to install Scikit-learn version 1.0.2
- What's the firts hash for scikit-learn you get in Pipfile.lock?

> **Note**: you should create an empty folder for homework and do it there

Install scikit-learn with
```bash
$ pipenv install scikit-learn==1.0.2
```


In [1]:
import json

with open("Pipfile.lock", "r") as file:
    lock = json.load(file)

print(lock["default"]["scikit-learn"]["hashes"][0])

sha256:08ef968f6b72033c16c479c966bf37ccd49b06ea91b765e1cc27afefe723920b


## Models

We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:

```python
features = ['reports', 'share', 'expenditure', 'owner']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression(solver='liblinear').fit(X, y)
```

> **Note**: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

* [DictVectorizer](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/cohorts/2022/05-deployment/homework/dv.bin?raw=true)
* [LogisticRegression](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/cohorts/2022/05-deployment/homework/model1.bin?raw=true)

With `wget`:

```bash
PREFIX=https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin
```

## Question 3

Let's use these models!

* Write a script for loading these models with pickle
* Score this client:

```json
{"reports": 0, "share": 0.001694, "expenditure": 0.12, "owner": "yes"}
```

What's the probability that this client will get a credit card? 

* 0.162
* 0.391
* 0.601
* 0.993

If you're getting errors when unpickling the files, check their checksum:

```bash
$ md5sum model1.bin dv.bin
3f57f3ebfdf57a9e1368dcd0f28a4a14  model1.bin
6b7cded86a52af7e81859647fa3a5c2e  dv.bin
```

In [2]:
import pickle

with open("model1.bin", "rb") as model_fh, open("dv.bin", "rb") as dv_bin:
    model = pickle.load(model_fh)
    dv = pickle.load(dv_bin)

model, dv

(LogisticRegression(solver='liblinear'), DictVectorizer(sparse=False))

In [3]:
sample = {"reports": 0, "share": 0.001694, "expenditure": 0.12, "owner": "yes"}

x = dv.transform(sample)
y_pred = model.predict_proba(x)[0][1]

print(f"{y_pred=:.3f}")

y_pred=0.162


## Question 4

Now let's serve this model as a web service

* Install Flask and gunicorn (or waitress, if you're on Windows) (I used FastAPI and uvicorn)
* Write Flask code for serving the model
* Now score this client using `requests`:

```python
url = "YOUR_URL"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a credit card?

* 0.274
* 0.484
* 0.698
* 0.928

First install FastAPI and uvicorn

```bash
$ pipenv install fastapi uvicorn
```

Then write the code to serve the model as a web service. Here we use pydantic to validate the input of the model, and therefore we need to use the `.dict()` method to transform it using the `DictVectorizer`. The code is in the `service.py` file

```python
import pickle
from fastapi import FastAPI
from pydantic import BaseModel

class Sample(BaseModel):
    reports: int
    share: float
    expenditure: float
    owner: str

app = FastAPI()

with open("model1.bin", "rb") as model_fh, open("dv.bin", "rb") as dv_bin:
    model = pickle.load(model_fh)
    dv = pickle.load(dv_bin)

@app.post("/predict")
def predict(sample: Sample):
    x = dv.transform(sample)
    y = model.predict_proba(x)[0][1]

    return {
        "probability": y
    }

```

Start the local server to test the service

```bash
$ uvicorn service:app 
```

In [4]:
import requests

url = "http://127.0.0.1:8000/predict"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()

{'probability': 0.9282218018527452}

## Docker

Install [Docker](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/05-deployment/06-docker.md). We will use it for the next two questions.

For these questions, we prepared a base image: `svizor/zoomcamp-model:3.9.12-slim`. 
You'll need to use it (see Question 5 for an example).

This image is based on `python:3.9.12-slim` and has a logistic regression model 
(a different one) as well a dictionary vectorizer inside. 

This is how the Dockerfile for this image looks like:

```docker 
FROM python:3.9.12-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```

We already built it and then pushed it to [`svizor/zoomcamp-model:3.9.12-slim`](https://hub.docker.com/r/svizor/zoomcamp-model).

> **Note**: You don't need to build this docker image, it's just for your reference.