05 homework

### Question 1
Install uv

What's the version of uv you installed?

Use --version to find out

## install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

```
uv --version
```

Initialize an empty uv project

You should create an empty folder for homework and do it there.

* Create empty project
```
uv init <name> --python 3.12
```

Change to folder <name>
```
cd <name>
```

The .venv not yet created.
Run sync to update the venv
```
uv sync
```

It will create .venv

Add dependencies
```
uv add pandas matplotlib
```

Activate .venv
```
source .venv/bin/activate
```







* Edit pyproject.toml
* add scikit-learn==1.6.1 in dependencies
* uv sync





## Question 2
Use uv to install Scikit-Learn version 1.6.1

What's the first hash for Scikit-Learn you get in the lock file?

Include the entire string starting with sha256:, don't include quotes

```
[[package]]
name = "scikit-learn"
version = "1.6.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
    { name = "joblib" },
    { name = "numpy" },
    { name = "scipy" },
    { name = "threadpoolctl" },
]
sdist = { url = "https://files.pythonhosted.org/packages/9e/a5/4ae3b3a0755f7b35a280ac90b28817d1f380318973cff14075ab41ef50d9/scikit_learn-1.6.1.tar.gz", hash = "sha256:b4fc2525eca2c69a59260f583c56a7557c6ccdf8deafdba6e060f94c1c59738e", size = 7068312, upload-time = "2025-01-10T08:07:55.348Z" }
```

### Question 3
Let's use the model!

Write a script for loading the pipeline with pickle

Score this record:
```
{
    "lead_source": "paid_ads",
    "number_of_courses_viewed": 2,
    "annual_income": 79276.0
}
```

What's the probability that this lead will convert?

* 0.333
* 0.533
* 0.733
* 0.933

If you're getting errors when unpickling the files, check their checksum:

```
$ md5sum pipeline_v1.bin
7d17d2e4dfbaf1e408e1a62e6e880d49 *pipeline_v1.bin
```

In [1]:
from sklearn.pipeline import make_pipeline
from unittest import result
from sklearn.feature_extraction import DictVectorizer
from sklearn.linear_model import LogisticRegression
import joblib
import pickle

In [None]:


# load pipelined model from file
    
model = joblib.load("hw5/pipeline_v1.bin")

with open('hw5/pipeline_v1.bin', 'rb') as f_in:
    pipeline = pickle.load(f_in)


input_data = [
    {
    "lead_source": "paid_ads",
    "number_of_courses_viewed": 2,
    "annual_income": 79276.0
    }
]

result = pipeline.predict_proba(input_data)[0]

print(result)

[0.46639273 0.53360727]


### Question 4
Now let's serve this model as a web service

Install FastAPI

Write FastAPI code for serving the model

Now score this client using requests:

```
url = "YOUR_URL"
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}
requests.post(url, json=client).json()
```

What's the probability that this client will get a subscription?
* 0.334
* 0.534
* 0.734
* 0.934

In [None]:
# create fastapi app

import pickle
from typing import Literal
from pydantic import BaseModel, Field
from fastapi import FastAPI
import uvicorn


class PredictResponse(BaseModel):
    churn_probability: float
    churn: bool


app = FastAPI(title="Q4")

with open('hw5/pipeline_v1.bin', 'rb') as f_in:
    pipeline = pickle.load(f_in)


def predict_single(input_data: dict) -> float:
    result = pipeline.predict_proba(input_data)[0, 1]
    return float(result)


@app.post("/predict")
def predict(input_data: dict) -> PredictResponse:
    prob = predict_single(input_data)

    return PredictResponse(
        churn_probability=prob,
        churn=prob >= 0.5
    )


if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=9696)





In [None]:
# request to the deployed model
import requests

url = "http://localhost:9696/predict"
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}
requests.post(url, json=client).json()

{'churn_probability': 0.5340417283801275, 'churn': True}

Docker
Install Docker. We will use it for the next two questions.

For these questions, we prepared a base image: agrigorev/zoomcamp-model:2025. You'll need to use it (see Question 5 for an example).

This image is based on 3.13.5-slim-bookworm and has a pipeline with logistic regression (a different one) as well a dictionary vectorizer inside.

This is how the Dockerfile for this image looks like:

FROM python:3.13.5-slim-bookworm
WORKDIR /code
COPY pipeline_v2.bin .
We already built it and then pushed it to agrigorev/zoomcamp-model:2025.

Note: You don't need to build this docker image, it's just for your reference.

### Question 5

Download the base image agrigorev/zoomcamp-model:2025. You can easily make it by using docker pull command.

So what's the size of this base image?
* 45 MB
* 121 MB
* 245 MB
* 330 MB

You can get this information when running docker images - it'll be in the "SIZE" column.

```
C:\>docker images
REPOSITORY                     TAG               IMAGE ID       CREATED       SIZE
agrigorev/zoomcamp-model       2025              4a9ecc576ae9   2 days ago    121MB
```

Dockerfile

Now create your own Dockerfile based on the image we prepared.

It should start like that:

FROM agrigorev/zoomcamp-model:2025
# add your stuff here
Now complete it:

Install all the dependencies from pyproject.toml
Copy your FastAPI script
Run it with uvicorn
After that, you can build your docker image.

### Question 6
Let's run your docker container!

After running it, score this client once again:

```
url = "YOUR_URL"
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}
requests.post(url, json=client).json()
```

What's the probability that this lead will convert?
* 0.39
* 0.59
* 0.79
* 0.99

Submit the results

Submit your results here: https://courses.datatalks.club/ml-zoomcamp-2025/homework/hw05

If your answer doesn't match options exactly, select the closest one
Publishing to Docker hub
This is just for reference, this is how we published an image to Docker hub.

Dockerfile_base:

FROM python:3.13.5-slim-bookworm
WORKDIR /code
COPY pipeline_v2.bin .
Publishing:

docker build -t mlzoomcamp2025_hw5 -f Dockerfile_base .
docker tag mlzoomcamp2025_hw5:latest agrigorev/zoomcamp-model:2025
docker push agrigorev/zoomcamp-model:2025

In [None]:
# create dockerfile
# build image: docker build -t myapp .


In [8]:
import requests

url = "http://localhost:9696/predict"
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}
requests.post(url, json=client).json()

{'churn_probability': 0.9933071490756734, 'churn': True}