In [2]:
import pickle
import requests
import warnings
warnings.filterwarnings("ignore")

---
## Question 1

* Install Pipenv
* What's the version of pipenv you installed?
* Use `--version` to find out
---

In [3]:
!pipenv --version

pipenv, version 2022.10.4



---
## Question 2

* Use Pipenv to install Scikit-Learn version 1.0.2
* What's the first hash for scikit-learn you get in Pipfile.lock?

Note: you should create an empty folder for homework
and do it there. 

The solution: pretty easy, use `pipenv ...` instead of `pip ...`

---
## Models

We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:

```python
features = ['reports', 'share', 'expenditure', 'owner']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression(solver='liblinear').fit(X, y)
```

> **Note**: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

* [DictVectorizer](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/cohorts/2022/05-deployment/homework/dv.bin?raw=true)
* [LogisticRegression](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/cohorts/2022/05-deployment/homework/model1.bin?raw=true)

With `wget`:

```bash
PREFIX=https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin
```
---

## Question 3

Let's use these models!

* Write a script for loading these models with pickle
* Score this client:

```json
{"reports": 0, "share": 0.001694, "expenditure": 0.12, "owner": "yes"}
```

What's the probability that this client will get a credit card? 

* 0.162
* 0.391
* 0.601
* 0.993

If you're getting errors when unpickling the files, check their checksum:

```bash
$ md5sum model1.bin dv.bin
3f57f3ebfdf57a9e1368dcd0f28a4a14  model1.bin
6b7cded86a52af7e81859647fa3a5c2e  dv.bin
```
---

In [4]:
with open("data/model1.bin", 'rb') as f_model:
    model = pickle.load(f_model)
    
with open("data/dv.bin",'rb') as dv:
    dict_vectorizer = pickle.load(dv)

In [14]:
client = {"reports": 0, "share": 0.001694, "expenditure": 0.12, "owner": "yes"}

client = dict_vectorizer.transform([client])

print("{:.3f}".format(*model.predict_proba(client)[:,1]))

0.162


In [15]:
# Testing ping->Pong app

# run the following first in an env where flask is present
# pipenv shell command will take you to your local environment created for this task.
# Then run: python ping.py

!curl localhost:9696/ping

Pong


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100     5  100     5    0     0   1755      0 --:--:-- --:--:-- --:--:--  2500


In [17]:
# Or using the requests library...

print(requests.get('http://localhost:9696/ping').text)

Pong



---
## Question 4

Now let's serve this model as a web service

* Install Flask and gunicorn (or waitress, if you're on Windows)
* Write Flask code for serving the model
* Now score this client using `requests`:

```python
url = "YOUR_URL"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a credit card?

* 0.274
* 0.484
* 0.698
* 0.928
---

In [26]:
# DEBUG MODE:
# Run the following first in an env where flask is present (i.e. pipenv)
# python predict_churn.py

URL = "http://localhost:9696/predict"

client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}

response = requests.post(url=URL, json=client).json()

print("The type of response: ", type(response))
print("Positive : {}".format(response['churn']))
print("The probability: {:.3f}".format(response['probability']))

The type of response:  <class 'dict'>
Positive : True
The probability: 0.928


I'm using Windows, so I use waitress with the following command in gitbash pipenv shell:

`waitress-serve --listen=0.0.0.0:9696 predict_churn:app`

Then I switch into the pipenv on another gitbash (on Win10) using `python -m pipenv shell` command.

Finally, `python test.py` gives me the result form test.py:

`The probability of customer to churn:  0.928
sending 25% discount.`

---
## Docker

Install [Docker](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/05-deployment/06-docker.md). We will use it for the next two questions.

For these questions, we prepared a base image: `svizor/zoomcamp-model:3.9.12-slim`. 
You'll need to use it (see Question 5 for an example).

This image is based on `python:3.9.12-slim` and has a logistic regression model 
(a different one) as well a dictionary vectorizer inside. 

This is how the Dockerfile for this image looks like:

```docker 
FROM python:3.9.12-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```

We already built it and then pushed it to [`svizor/zoomcamp-model:3.9.12-slim`](https://hub.docker.com/r/svizor/zoomcamp-model).

> **Note**: You don't need to build this docker image, it's just for your reference.


## Question 5

Download the base image `svizor/zoomcamp-model:3.9.12-slim`. You can easily make it by using [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command.

So what's the size of this base image?

* 15 Mb
* 125 Mb
* 275 Mb
* 415 Mb

You can get this information when running `docker images` - it'll be in the "SIZE" column.


## Dockerfile

Now create your own Dockerfile based on the image we prepared.

It should start like that:

```docker
FROM svizor/zoomcamp-model:3.9.12-slim
# add your stuff here
```

Now complete it:

* Install all the dependencies form the Pipenv file
* Copy your Flask script
* Run it with Gunicorn 

After that, you can build your docker image.


## Question 6

Let's run your docker container!

After running it, score this client once again:

```python
url = "YOUR_URL"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a credit card now?

* 0.289
* 0.502
* 0.769
* 0.972
---

---
Solution: While working with Dockerfile on Windows 10 and gitbash, I could not get it to work using the given command:
```docker 
FROM python:3.9.12-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```
I instead used * wildcard to copy all...
`COPY ["*", "./"]`
This may cause unnecessary files to be copied to Docker or conflicts if filenames are the same, but usually Linux host environments are more common with Docker. So this is a temporary fix. The following is what is intended.
```docker 
FROM svizor/zoomcamp-model:3.9.12-slim

RUN pip install pipenv

WORKDIR /app

COPY ["model2.bin", "dv.bin", "./"]
COPY ["predict_churn2.py", "./"]

RUN pipenv install --system --deploy

EXPOSE 9696

ENTRYPOINT [ "gunicorn", "--bind=0.0.0.0:9696", "predict_churn2:app"]
```

For gitbash on Windows 10, it suggested me to prepend `winpty` to the docker run command like below:

`winpty docker run -it --rm -p 9696:9696 zoomcamp-test`

Then, in pipenv on another bash: run the command `python test_docker.py` to connect to Docker container and get the result back using the `test.docker.py` file.
I obtained the followin result:

`The probability of customer to churn:  0.769
sending 25% discount.`

NOTE: Assumed we were working on churn prediction, but model2.bin is about credit card approval. So ignore the churn statements like 'sending discount' etc. or the use of 'churn' in function names. Higher probability here is for credit approval.

---
The end of the Week 5 Assignment.

---