## Homework

> Note: sometimes your answer doesn't match one of the options exactly. 
> That's fine. 
> Select the option that's closest to your solution.

> Note: we recommend using python 3.11 in this homework.


In this homework, we will use the Bank Marketing dataset. Download it from [here](https://archive.ics.uci.edu/static/public/222/bank+marketing.zip).

You can do it with `wget`:


In [2]:
# !wget https://archive.ics.uci.edu/static/public/222/bank+marketing.zip
# !unzip bank+marketing.zip 
# !unzip bank.zip

## Question 1

* Install Pipenv
* What's the version of pipenv you installed?
* Use `--version` to find out

In [3]:
!pipenv --version

[1mpipenv[0m, version 2024.0.2


## Question 2

* Use Pipenv to install Scikit-Learn version 1.5.2
* What's the first hash for scikit-learn you get in Pipfile.lock?

> **Note**: you should create an empty folder for homework
and do it there. 

In [1]:
# !pipenv install scikit-learn==1.5.2

[32mCourtesy Notice[0m:
Pipenv found itself running within a virtual environment,  so it will 
automatically use that environment, instead of  creating its own for any 
project. You can set
[1;33mPIPENV_IGNORE_VIRTUALENVS[0m[1m=[0m[1;36m1[0m to force pipenv to ignore that environment and 
create  its own instead.
[1mCreating a Pipfile for this project[0m[1;33m...[0m
[1;32mInstalling scikit-learn==1.5.2...[0m
[?25lResolving scikit-learn==1.5.2...
[2K[1mAdded [0m[1;32mscikit-learn[0m to Pipfile's [33m[packages][0m ...
[2K✔ Installation Succeeded-learn...
[2K[32m⠋[0m Installing scikit-learn...
[1A[2K[1mPipfile.lock not found, creating[0m[1;33m...[0m
Locking[0m [33m[packages][0m dependencies...[0m
[?25lBuilding requirements...
[2KResolving dependencies....
[2K✔ Success! Locking packages...
[2K[32m⠦[0m Locking packages...
[1A[2KLocking[0m [33m[dev-packages][0m dependencies...[0m
[1mUpdated Pipfile.lock (adb15e5c5a21f13e221698df1bb36dc8d3bee16

        "hash": {
            "sha256": "adb15e5c5a21f13e221698df1bb36dc8d3bee16bb778fdfcad74b126616757b9"
        }

## Models

We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:

```python
features = ['job', 'duration', 'poutcome']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression().fit(X, y)
```

> **Note**: You don't need to train the model. This code is just for your reference.


> **Note**: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

* [DictVectorizer](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2024/05-deployment/homework/dv.bin?raw=true)
* [LogisticRegression](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2024/05-deployment/homework/model1.bin?raw=true)

With `wget`:
```bash
PREFIX=https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2024/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin
```

## Question 3

Let's use these models!

* Write a script for loading these models with pickle
* Score this client:

```json
{"job": "management", "duration": 400, "poutcome": "success"}
```

What's the probability that this client will get a subscription? 

* 0.359
* 0.559
* 0.759
* 0.959

In [4]:
import pickle

model_file = "model1.bin"
with open(model_file, 'rb')as f_in:
    model = pickle.load(f_in)
dv_file = "dv.bin"
with open(dv_file, 'rb')as f_in:
    dv = pickle.load(f_in)
dv, model

(DictVectorizer(sparse=False), LogisticRegression(max_iter=250))

In [5]:
cust = {"job": "management", "duration": 400, "poutcome": "success"}
X = dv.transform([cust])
model.predict_proba(X)[0, 1]

np.float64(0.7590966516879658)

## Question 4

Now let's serve this model as a web service

* Install Flask and gunicorn (or waitress, if you're on Windows)
* Write Flask code for serving the model
* Now score this client using `requests`:

```python
url = "YOUR_URL"
client = {"job": "student", "duration": 280, "poutcome": "failure"}
requests.post(url, json=client).json()
```

What's the probability that this client will get a subscription?

* 0.335
* 0.535
* 0.735
* 0.935

In [11]:
# !pipenv install flask
# !pipenv install gunicorn
# !pipenv install requests

predict: the name of the Python file (without the .py extension).

app: the name of the Flask application instance inside the file.

In [None]:

# !gunicorn --bind 0.0.0.0:9696 predict:app

In [21]:
import requests
url = "http://localhost:9696/predict"
client = {"job": "student", "duration": 280, "poutcome": "failure"}
requests.post(url, json=client).json()

{'subscription_probability': 0.335}

## Docker

Install [Docker](https://github.com/DataTalksClub/machine-learning-zoomcamp/blob/master/05-deployment/06-docker.md). 
We will use it for the next two questions.

For these questions, we prepared a base image: `svizor/zoomcamp-model:3.11.5-slim`. 
You'll need to use it (see Question 5 for an example).

This image is based on `python:3.11.5-slim` and has a logistic regression model 
(a different one) as well a dictionary vectorizer inside. 

This is how the Dockerfile for this image looks like:

```docker 
FROM python:3.11.5-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]
```
We already built it and then pushed it to [`svizor/zoomcamp-model:3.11.5-slim`](https://hub.docker.com/r/svizor/zoomcamp-model).

> **Note**: You don't need to build this docker image, it's just for your reference.

## Question 5

Download the base image `svizor/zoomcamp-model:3.11.5-slim`. You can easily make it by using [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command.

So what's the size of this base image?

* 45 MB
* 130 MB
* 245 MB
* 330 MB

You can get this information when running `docker images` - it'll be in the "SIZE" column.


In [None]:
# !docker pull svizor/zoomcamp-model:3.11.5-slim

In [2]:
!docker images

REPOSITORY              TAG           IMAGE ID       CREATED      SIZE
svizor/zoomcamp-model   3.11.5-slim   975e7bdca086   5 days ago   130MB
