# Machine Learning Zoomcamp Homeworks

## Week 5
In this homework, we'll use the churn prediction model trained on a smaller set of features.

### Question #1

* Install Pipenv
* What's the version of pipenv you installed?
* Use `--version` to find out



In [1]:
# answer to question #1

!pipenv --version

pipenv, version 2020.11.15




### Question #2

* Use Pipenv to install Scikit-Learn version 1.0
* What's the first hash for scikit-learn you get in Pipfile.lock?

answer to question #2:

All you need to do is to look up "Pipfile.lock" file in the folder you created virtual environment.<br>
Opening the file with your favorite editor, you'll find "scikit-learn" json key with "hashes" sub-key (about 25 hashesh in my case). The first hash value is:<br>
sha256:121f78d6564000dc5e968394f45aac87981fcaaf2be40cfcd8f07b2baa1e1829

### Models

We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:

```
features = ['tenure', 'monthlycharges', 'contract']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression().fit(X, y)
```

> **Note**: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

* [DictVectorizer](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/05-deployment/homework/dv.bin?raw=true)
* [LogisticRegression](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/05-deployment/homework/model1.bin?raw=true)

With wget:

```bash
PREFIX=https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin
```


### Question #3

Let's use these models!

* Write a script for loading these models with pickle
* Score this customer:

```json
{"contract": "two_year", "tenure": 12, "monthlycharges": 19.7}
```

What's the probability that this customer is churning? 

If you're getting errors when unpickling the files, check their checksum:

```bash
$ md5sum model1.bin dv.bin
5868e129bfbb309ba60bf750263afab1  model1.bin
c49b69f8a5a3c560882ff5daa3c0ff4d  dv.bin
```

In [2]:
# answer to question #3

import pickle


def load_model(path_to_model, path_to_dv):
    with open(path_to_model, 'rb') as model_file, open(path_to_dv, 'rb') as dv_file:
        model = pickle.load(model_file)
        dv = pickle.load(dv_file)
    return model, dv

def predict_churn(model, dv, customer_data):
    X = dv.transform([customer_data])
    y_pred = model.predict_proba(X)[0, 1]
    is_churning = y_pred >= 0.5

    result = {
        'churn': bool(is_churning),
        'churn_probability': float(y_pred)
    }
    return result


data = {"contract": "two_year", "tenure": 12, "monthlycharges": 19.7}
model, dv = load_model(path_to_model='virtual-envs\hw05\models\model1.bin',
                       path_to_dv='virtual-envs\hw05\models\dv.bin')
prediction_result = predict_churn(model=model, dv=dv, customer_data=data)
print('Is Customer Churning? {}\nChurn Probability: {}'.format(prediction_result['churn'], round(prediction_result['churn_probability'], 3)))


Is Customer Churning? False
Churn Probability: 0.115


### Question #4

Now let's serve this model as a web service

* Install Flask and Gunicorn (or waitress, if you're on Windows)
* Write Flask code for serving the model
* Now score this customer using `requests`:

```python
url = "YOUR_URL"
customer = {"contract": "two_year", "tenure": 1, "monthlycharges": 10}
requests.post(url, json=customer).json()
```

What's the probability that this customer is churning?

In [3]:
# # answer to question #4
# ## the following lines must be stored in churn.py file which should be running in the background using command line "python churn.py" or "python -m churn"


# import pickle
# from flask import Flask, request, jsonify
# from waitress import serve


# def load_model(path_to_model, path_to_dv):
#     with open(path_to_model, 'rb') as model_file, open(path_to_dv, 'rb') as dv_file:
#         model = pickle.load(model_file)
#         dv = pickle.load(dv_file)
#     return model, dv


# def predict_churn(model, dv, customer_data):
#     X = dv.transform([customer_data])
#     y_pred = model.predict_proba(X)[0, 1]
#     is_churning = y_pred >= 0.5

#     result = {
#         'churn': bool(is_churning),
#         'churn_probability': float(y_pred)
#     }
#     return jsonify(result)


# app = Flask('churn')
# @app.route('/churn/predict', methods=['POST'])
# def predict():
#     customer_data = request.get_json()

#     model, dv = load_model(path_to_model='models\model1.bin',
#                            path_to_dv='models\dv.bin')
#     prediction_result = predict_churn(
#         model=model, dv=dv, customer_data=customer_data)

#     return prediction_result


# if __name__ == "__main__":
#     # app.run(debug=True, host='0.0.0.0', port=9696)
#     serve(app, host='0.0.0.0', port=9696)


In [4]:
# answer to question #4 (cont.)

import requests

api_url = "http://localhost:9696/churn/predict"
customer_data = {
    "contract": "two_year",
    "tenure": 1,
    "monthlycharges": 10
}

api_response = requests.post(url=api_url, json=customer_data).json()
print(api_response)
print('Is Customer Churning? {}\nChurn Probability: {}'.format(api_response['churn'], round(api_response['churn_probability'], 4)))


{'churn': True, 'churn_probability': 0.9988892771007961}
Is Customer Churning? True
Churn Probability: 0.9989


### Docker

Install [Docker](06-docker.md). We will use it for the next two questions.

For these questions, I prepared a base image: `agrigorev/zoomcamp-model:3.8.12-slim`. 
You'll need to use it (see Question 5 for an example).

This image is based on `python:3.8.12-slim` and has a logistic regression model 
(a different one) as well a dictionary vectorizer inside. 

This is how the Dockerfile for this image looks like:

```docker 
FROM python:3.8.12-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```

I already built it and then pushed it to [`agrigorev/zoomcamp-model:3.8.12-slim`](https://hub.docker.com/r/agrigorev/zoomcamp-model).

> **Note**: You don't need to build this docker image, it's just for your reference.

### Question #5

Now create your own Dockerfile based on the image I prepared.

It should start like that:

```docker
FROM agrigorev/zoomcamp-model:3.8.12-slim
# add your stuff here
```

Now complete it:

* Install all the dependencies form the Pipenv file
* Copy your Flask script
* Run it with gunicorn 


When you build your image, what's the image id for `agrigorev/zoomcamp-model:3.8.12-slim`?

Look at the first step of your build log. It should look something like that:

```
$ docker some-command-for-building
Sending build context to Docker daemon  2.048kB
Step 1/N : FROM agrigorev/zoomcamp-model:3.8.12-slim
 ---> XXXXXXXXXXXX
Step 2/N : ....
```

You need this `XXXXXXXXXXXX`.

Alternatively, you can get this information when running `docker images` - it'll be in the "IMAGE ID" column.
Submitting DIGEST (long string starting with "sha256") is also fine.

answer to question #5:<br>
IMAGE ID: f0f43f7bc6e0

### Question #6

Let's run your docker container!

After running it, score the same customer:

```python
url = "YOUR_URL"
customer = {"contract": "two_year", "tenure": 12, "monthlycharges": 10}
requests.post(url, json=customer).json()
```

What's the probability that this customer is churning?

In [5]:
# answer to question #6

docker_instance_url = "http://localhost:9000/churn/predict"
customer2_data = {
    "contract": "two_year",
    "tenure": 12,
    "monthlycharges": 10
}

api_response_docker = requests.post(url=docker_instance_url, json=customer2_data).json()
print(api_response_docker)
print('Is Customer Churning? {}\nChurn Probability: {}'.format(api_response_docker['churn'], round(api_response_docker['churn_probability'], 3)))


{'churn': False, 'churn_probability': 0.32940789808151005}
Is Customer Churning? False
Churn Probability: 0.329


The above response is received from the running docker instance and the churn probability is: 0.3294<br><br>
Notice that this time around, port 9000 of the host machine is utilized to map the running app on port 9696 of the container:<br>
_docker run -it --rm -p 9000:9696 churn-predict_<br><br>
The "Dockerfie" for instance has following lines:<br>

```Dockerfile
FROM python:3.8.12-slim

RUN pip install pipenv

WORKDIR /app

COPY ["Pipfile", "Pipfile.lock", "./"]

RUN pipenv install --system --deploy

COPY ["churn.py", "./"]

COPY ["models/model1.bin", "models/dv.bin", "models/"]

EXPOSE 9696

ENTRYPOINT ["python", "churn.py"]
```