## Homework

> Note: sometimes your answer doesn't match one of the options exactly. 
> That's fine. 
> Select the option that's closest to your solution.

We recommend using python 3.12 or 3.13 in this homework.

In this homework, we're going to continue working with the lead scoring dataset. You don't need the dataset: we will provide the model for you.


## Question 1

* Install `uv`
* What's the version of uv you installed?
* Use `--version` to find out


## Initialize an empty uv project

You should create an empty folder for homework
and do it there. 


## Question 2

* Use uv to install Scikit-Learn version 1.6.1 
* What's the first hash for Scikit-Learn you get in the lock file?
* Include the entire string starting with sha256:, don't include quotes


## Models

We have prepared a pipeline with a dictionary vectorizer and a model.

It was trained (roughly) using this code:

```python
categorical = ['lead_source']
numeric = ['number_of_courses_viewed', 'annual_income']

df[categorical] = df[categorical].fillna('NA')
df[numeric] = df[numeric].fillna(0)

train_dict = df[categorical + numeric].to_dict(orient='records')

pipeline = make_pipeline(
    DictVectorizer(),
    LogisticRegression(solver='liblinear')
)

pipeline.fit(train_dict, y_train)
```

> **Note**: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download it [here](https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/cohorts/2025/05-deployment/pipeline_v1.bin).

With `wget`:

```bash
wget https://github.com/DataTalksClub/machine-learning-zoomcamp/raw/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin
```


## Question 3

Let's use the model!

* Write a script for loading the pipeline with pickle
* Score this record:

```json
{
    "lead_source": "paid_ads",
    "number_of_courses_viewed": 2,
    "annual_income": 79276.0
}
```

What's the probability that this lead will convert? 

* 0.333
* 0.533
* 0.733
* 0.933

If you're getting errors when unpickling the files, check their checksum:

```bash
$ md5sum pipeline_v1.bin
7d17d2e4dfbaf1e408e1a62e6e880d49 *pipeline_v1.bin
```


## Question 4

Now let's serve this model as a web service

* Install FastAPI
* Write FastAPI code for serving the model
* Now score this client using `requests`:

```python
url = "YOUR_URL"
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}
requests.post(url, json=client).json()
```

What's the probability that this client will get a subscription?

* 0.334
* 0.534
* 0.734
* 0.934


## Docker

Install [Docker](https://github.com/DataTalksClub/machine-learning-zoomcamp/blob/master/05-deployment/06-docker.md). 
We will use it for the next two questions.

For these questions, we prepared a base image: `agrigorev/zoomcamp-model:2025`. 
You'll need to use it (see Question 5 for an example).

This image is based on `3.13.5-slim-bookworm` and has
a pipeline with logistic regression (a different one)
as well a dictionary vectorizer inside. 

This is how the Dockerfile for this image looks like:

```docker 
FROM python:3.13.5-slim-bookworm
WORKDIR /code
COPY pipeline_v2.bin .
```

We already built it and then pushed it to [`agrigorev/zoomcamp-model:2025`](https://hub.docker.com/r/agrigorev/zoomcamp-model).

> **Note**: You don't need to build this docker image, it's just for your reference.


## Question 5

Download the base image `agrigorev/zoomcamp-model:2025`. You can easily make it by using [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command.

So what's the size of this base image?

* 45 MB
* 121 MB
* 245 MB
* 330 MB

You can get this information when running `docker images` - it'll be in the "SIZE" column.


## Dockerfile

Now create your own `Dockerfile` based on the image we prepared.

It should start like that:

```docker
FROM agrigorev/zoomcamp-model:2025
# add your stuff here
```

Now complete it:

* Install all the dependencies from pyproject.toml
* Copy your FastAPI script
* Run it with uvicorn 

After that, you can build your docker image.


## Question 6

Let's run your docker container!

After running it, score this client once again:

```python
url = "YOUR_URL"
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}
requests.post(url, json=client).json()
```

What's the probability that this lead will convert?

* 0.39
* 0.59
* 0.79
* 0.99


## Submit the results

* Submit your results here: https://courses.datatalks.club/ml-zoomcamp-2025/homework/hw05
* If your answer doesn't match options exactly, select the closest one



## Publishing to Docker hub

This is just for reference, this is how we published an image to Docker hub.

`Dockerfile_base`: 

```dockerfile
FROM python:3.13.5-slim-bookworm
WORKDIR /code
COPY pipeline_v2.bin .
```

Publishing:

```bash
docker build -t mlzoomcamp2025_hw5 -f Dockerfile_base .
docker tag mlzoomcamp2025_hw5:latest agrigorev/zoomcamp-model:2025
docker push agrigorev/zoomcamp-model:2025
```


In [1]:
## question 1
#!pip install uv
import uv
!uv --version

uv 0.9.5


In [2]:
## question 2

In [3]:
!pipenv lock

Locking  dependencies[33m...[0m
[?25lBuilding requirements...
[2KResolving dependencies....
[2K✔ Success! Locking packages...
[2K[32m⠋[0m Locking packages...
[1A[2KLocking  dependencies[33m...[0m
[1mUpdated Pipfile.lock [0m
[1m([0m[1m8924bdcf803b9faae59e21af66cbfda2f6ebc6e464b9ea2ac58673899b01f5ad[0m[1m)[0m[1m![0m


In [4]:
# import libraries
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.pipeline import Pipeline

from sklearn.feature_extraction import DictVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

In [67]:
!pipenv run python --version

Python 3.12.3


In [45]:
!pipenv install

[1mCreating a virtualenv for this project[0m
Pipfile: [1;33m/workspaces/machine-learning-zoomcamp-2025/05-deployment/[0m[1;33mPipfile[0m
[1mUsing[0m [1;33m/opt/conda/bin/[0m[1;33mpython3.1[0m[32m3.12.3[0m[32m [0m[1;32mto create virtualenv[0m[1;32m...[0m
[2K[32m⠼[0m Creating virtual environment...[36mcreated virtual environment CPython3.[0m[1;36m12.3[0m[36m.final.[0m[1;36m0[0m[36m-[0m[1;36m64[0m[36m in 299ms[0m
[36m  creator [0m
[1;36mCPython3Posix[0m[1;36m([0m[36mdest[0m[36m=[0m[36m/home/codespace/.local/share/virtualenvs/[0m[36m05-deployment-7mY0q0[0m
[36moF[0m[36m, [0m[36mclear[0m[36m=[0m[3;36mFalse[0m[36m, [0m[36mno_vcs_ignore[0m[36m=[0m[3;36mFalse[0m[36m, [0m[36mglobal[0m[36m=[0m[3;36mFalse[0m[1;36m)[0m
[36m  seeder [0m[1;36mFromAppData[0m[1;36m([0m[36mdownload[0m[36m=[0m[3;36mFalse[0m[36m, [0m[36mpip[0m[36m=[0m[36mbundle[0m[36m, [0m[36mvia[0m[36m=[0m[36mcopy[0m[36m, [0m
[3

In [46]:
!pipenv run python -c "import sklearn; print(sklearn.__version__)"

1.6.1


In [15]:
## extract the hash using grep
!grep -A 10 '"scikit-learn": {' Pipfile.lock | grep -m 1 'sha256:' | sed 's/.*"\(sha256:[^"]*\)".*/\1/'

sha256:0650e730afb87402baa88afbf31c07b84c98272622aaba002559b614600ca691


In [16]:
## preparing models
#!wget https://github.com/DataTalksClub/machine-learning-zoomcamp/raw/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin

In [47]:
## question 3
!pipenv install flask joblib requests

To activate this project's virtualenv, run [33mpipenv shell[0m.
Alternatively, run a command inside the virtualenv with [33mpipenv run[0m.
[1;32mInstalling flask...[0m
✔ Installation Succeeded
[1;32mInstalling joblib...[0m
✔ Installation Succeeded
[1;32mInstalling requests...[0m
✔ Installation Succeeded
To activate this project's virtualenv, run [33mpipenv shell[0m.
Alternatively, run a command inside the virtualenv with [33mpipenv run[0m.
[1mInstalling dependencies from Pipfile.lock [0m[1;39m(c8acdf)...[0m
[32mAll dependencies are now up-to-date![0m
[1;32mUpgrading[0m flask, joblib, requests in [39m dependencies.[0m
[?25lBuilding requirements...
[2KResolving dependencies....
[2K✔ Success! Locking packages...
[2K[32m⠋[0m Locking packages...
[1A[2K[?25lBuilding requirements...
[2KResolving dependencies....
[2K✔ Success! Locking packages...
[2K[32m⠇[0m Locking packages...
[1A[2KTo activate this project's virtualenv, run [33mpipenv shell[0m.
Alter

In [27]:
# run the predict.py
!pipenv run python predict-test.py

0.5336072702798061


In [48]:
## question 4
!pipenv install typing pydantic fastapi uvicorn

To activate this project's virtualenv, run [33mpipenv shell[0m.
Alternatively, run a command inside the virtualenv with [33mpipenv run[0m.
[1;32mInstalling typing...[0m
✔ Installation Succeeded
[1;32mInstalling pydantic...[0m
✔ Installation Succeeded
[1;32mInstalling fastapi...[0m
✔ Installation Succeeded
[1;32mInstalling uvicorn...[0m
✔ Installation Succeeded
To activate this project's virtualenv, run [33mpipenv shell[0m.
Alternatively, run a command inside the virtualenv with [33mpipenv run[0m.
[1mInstalling dependencies from Pipfile.lock [0m[1;39m(c8acdf)...[0m
[32mAll dependencies are now up-to-date![0m
[1;32mUpgrading[0m typing, pydantic, fastapi, uvicorn in [39m dependencies.[0m
[?25lBuilding requirements...
[2KResolving dependencies....
[2K✔ Success! Locking packages...
[2K[32m⠏[0m Locking packages...
[1A[2K[?25lBuilding requirements...
[2KResolving dependencies....
[2K✔ Success! Locking packages...
[2K[32m⠏[0m Locking packages...
[1A[2K

In [7]:
## run the uv lock
!pipenv run uv lock

Using CPython 3.12.3 interpreter at: [36m/opt/conda/bin/python3[39m
[2K[2mResolved [1m127 packages[0m [2min 1.60s[0m[0m                                       [0m
[1m[31mRemoved[39m[0m blinker v1.9.0
[1m[31mRemoved[39m[0m flask v3.1.2
[1m[31mRemoved[39m[0m itsdangerous v2.2.0
[1m[31mRemoved[39m[0m werkzeug v3.1.3


In [66]:
# run the predictv2.py
!pipenv run python predict-testv2.py

Conversion Probability: 0.5340417283801275
Converted: True


In [1]:
## question 5 check the image size after docker pull agrigorev/zoomcamp-model:2025
!pipenv run docker images

REPOSITORY                       TAG       IMAGE ID       CREATED         SIZE
raven0205/05-deployment/models   v1.0      c1f01e39d0da   5 minutes ago   767MB
agrigorev/zoomcamp-model         2025      4a9ecc576ae9   2 days ago      121MB


In [4]:
!pipenv run docker ps

CONTAINER ID   IMAGE          COMMAND     CREATED         STATUS         PORTS     NAMES
4dcfcdab7de8   4a9ecc576ae9   "python3"   2 minutes ago   Up 2 minutes             dazzling_northcutt


In [10]:
## question 6 run the predict on json
!pipenv run curl -X POST \
      -H "Content-Type: application/json" \
      -d '{"lead_source": "organic_search", "number_of_courses_viewed": 4, "annual_income": "80304.0"}' \
      http://localhost:9696/predict

{"c_probability":0.5340417283801275,"converted":true}