# Homework 5

We recommend using python 3.12 or 3.13 in this homework.

In this homework, we're going to continue working with the lead scoring dataset. You don't need the dataset: we will provide the model for you.

In [1]:
!python --version

Python 3.13.9


## Question 1

- Install uv
- What's the version of uv you installed?
- Use --version to find out

### Initialize an empty uv project
You should create an empty folder for homework and do it there.

In [2]:
!uv --version

uv 0.9.5 (d5f39331a 2025-10-21)


In [3]:
# uv 0.9.5 (d5f39331a 2025-10-21)

## Question 2

- Use uv to install Scikit-Learn version 1.6.1
- What's the first hash for Scikit-Learn you get in the lock file?
- Include the entire string starting with sha256:, don't include quotes


In [4]:
## Commands Used:
# uv init
# uv python install 3.13
# uv python pin 3.13
# uv add "scikit-learn==1.6.1"

In [5]:
# first hash for scikit-learn
# sha256:b4fc2525eca2c69a59260f583c56a7557c6ccdf8deafdba6e060f94c1c59738e

### Models
We have prepared a pipeline with a dictionary vectorizer and a model.

It was trained (roughly) using this code:

    categorical = ['lead_source']
    numeric = ['number_of_courses_viewed', 'annual_income']

    df[categorical] = df[categorical].fillna('NA')
    df[numeric] = df[numeric].fillna(0)

    train_dict = df[categorical + numeric].to_dict(orient='records')

    pipeline = make_pipeline(
        DictVectorizer(),
        LogisticRegression(solver='liblinear')
    )

    pipeline.fit(train_dict, y_train)

Note: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download it here.

With wget:

    wget https://github.com/DataTalksClub/machine-learning-zoomcamp/raw/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin

## Question 3
Let's use the model!

- Write a script for loading the pipeline with pickle
- Score this record:
    
    {
        "lead_source": "paid_ads",
        "number_of_courses_viewed": 2,
        "annual_income": 79276.0
    }

What's the probability that this lead will convert?

- 0.333
- 0.533
- 0.733
- 0.933

If you're getting errors when unpickling the files, check their checksum:

    $ md5sum pipeline_v1.bin 
    7d17d2e4dfbaf1e408e1a62e6e880d49 *pipeline_v1.bin

In [6]:
import pickle

In [7]:
datum = {"lead_source": "paid_ads", "number_of_courses_viewed": 2, "annual_income": 79276.0 }
datum

{'lead_source': 'paid_ads',
 'number_of_courses_viewed': 2,
 'annual_income': 79276.0}

In [8]:
with open('pipeline_v1.bin', 'rb') as f_in:
    dv, model = pickle.load(f_in)

In [9]:
dv, model

(DictVectorizer(), LogisticRegression(solver='liblinear'))

In [10]:
X = dv.transform([datum])
X

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 3 stored elements and shape (1, 8)>

In [11]:
model.predict(X)

array([1])

In [12]:
model.predict_proba(X)[0,1]

np.float64(0.5336072702798061)

In [13]:
# 0.533

## Question 4

Now let's serve this model as a web service

- Install FastAPI
- Write FastAPI code for serving the model
- Now score this client using requests:
  
        url = "YOUR_URL"
        client = {
            "lead_source": "organic_search",
            "number_of_courses_viewed": 4,
            "annual_income": 80304.0
        }
        requests.post(url, json=client).json()

What's the probability that this client will get a subscription?

- 0.334
- 0.534
- 0.734
- 0.934

In [14]:
import pickle
with open('pipeline_v1.bin', 'rb') as f_in:
    model = pickle.load(f_in)
model

In [15]:
# uv add fastapi

In [16]:
import requests

In [17]:
url = "http://0.0.0.0:9696/predict"
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}
requests.post(url, json=client).json()

{'probability of getting subscribed = ': 0.5340417283801275,
 'isConverted = ': True}

In [18]:
# 0.534

### Docker
Install Docker. We will use it for the next two questions.

For these questions, we prepared a base image: agrigorev/zoomcamp-model:2025. You'll need to use it (see Question 5 for an example).

This image is based on 3.13.5-slim-bookworm and has a pipeline with logistic regression (a different one) as well a dictionary vectorizer inside.

This is how the Dockerfile for this image looks like:

FROM python:3.13.5-slim-bookworm
WORKDIR /code
COPY pipeline_v2.bin .
We already built it and then pushed it to agrigorev/zoomcamp-model:2025.

Note: You don't need to build this docker image, it's just for your reference.


In [19]:
# docker pull agrigorev/zoomcamp-model:2025

## Question 5

Download the base image agrigorev/zoomcamp-model:2025. You can easily make it by using docker pull command.

So what's the size of this base image?

- 45 MB
- 121 MB
- 245 MB
- 330 MB

You can get this information when running docker images - it'll be in the "SIZE" column.

In [20]:
!docker images

REPOSITORY                 TAG       IMAGE ID       CREATED         SIZE
predict-lead               latest    1e7e2d1e299d   5 minutes ago   568MB
agrigorev/zoomcamp-model   2025      14d79fde0bbf   6 days ago      181MB
dpage/pgadmin4             latest    2a830466aafd   4 months ago    812MB
postgres                   14        c0aab7962b28   4 months ago    623MB
mysql                      8.0       bf79508626d6   12 months ago   832MB


In [21]:
# 181MB (121 MB)

### Dockerfile

Now create your own Dockerfile based on the image we prepared.

It should start like that:

FROM agrigorev/zoomcamp-model:2025
#add your stuff here
Now complete it:

- Install all the dependencies from pyproject.toml
- Copy your FastAPI script
- Run it with uvicorn
- After that, you can build your docker image.

## Question 6
Let's run your docker container!

After running it, score this client once again:

url = "YOUR_URL"

client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}

requests.post(url, json=client).json()

What's the probability that this lead will convert?

- 0.39
- 0.59
- 0.79
- 0.99


In [22]:
# docker build -t predict-lead . 
# docker run -it --rm -p 9696:9696 predict-lead

In [23]:
url = "http://0.0.0.0:9696/predict"
client = { "lead_source": "organic_search", "number_of_courses_viewed": 4, "annual_income": 80304.0 }
requests.post(url, json=client).json()

{'probability of getting subscribed = ': 0.5340417283801275,
 'isConverted = ': True}

In [24]:
# 0.59