# ML Zoomcamp 2025 — Homework 5 (Deployment)

This notebook walks you through all tasks using **uv**, **FastAPI**, and **Docker**.

**Tip:** Use Python **3.12** or **3.13**.

> Run each cell in order. If a cell uses external tools (uv, Docker, wget) and you run into issues,
run it in a terminal and return to the notebook for the parsing/verification steps.

## Setup
This notebook assumes you have internet access and can install packages. If you're in a managed environment, you may need to run the shell commands in a local terminal instead.

## Question 1 — Install `uv` and check the version

**Goal:** Find the version of `uv` you installed (using `uv --version` or `uv self version`).

Expected answer format (example from the assignment text): **0.9.5**

In [24]:
# If you don't have uv yet, install it (choose one):
# Option A: pip (works in many environments)
%pip install -U uv
!uv --version



Note: you may need to restart the kernel to use updated packages.
uv 0.9.5 (d5f39331a 2025-10-21)


## Project skeleton (optional but recommended)
You can do this homework in a clean folder. These cells will create a `hw05` folder and a minimal `pyproject.toml` later for Docker.


In [29]:
# from hw05/
!rm -rf fastapi
!uv init . --python 3.13
!uv add fastapi
!uv sync


Initialized project `[36mhw05[39m` at `[36m/Users/oleitao/Documents/Repos/DataTalks.Club/machine-learning-zoomcamp/cohorts/2025/05-deployment/hw05[39m`
Using CPython 3.13.8 interpreter at: [36m/opt/homebrew/opt/python@3.13/bin/python3.13[39m
Creating virtual environment at: [36m.venv[39m
[2K[2mResolved [1m11 packages[0m [2min 358ms[0m[0m                                        [0m
[2K[37m⠙[0m [2mPreparing packages...[0m (0/10)                                                  
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/10)-------------[0m[0m     0 B/14.27 KiB           [1A
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/10)-------------[0m[0m     0 B/14.27 KiB           [1A
[2mtyping-inspection   [0m [32m[2m------------------------------[0m[0m     0 B/14.27 KiB
[2K[2A[37m⠙[0m [2mPreparing packages...[0m (0/10)-------------[0m[0m     0 B/43.57 KiB           [2A
[2mtyping-inspection   [0m [32m[2m------------------------------[0m[

## Question 2 — Lock `scikit-learn==1.6.1` and extract the first hash from `uv.lock`

We will:
1. Add `scikit-learn==1.6.1` to the project (without necessarily syncing/installing).
2. Refresh the lock file.
3. Parse the first `sha256:` hash under the `[[package]]` section for `scikit-learn`.

**Expected output:** A string like `sha256:...` (copy this into the submission form).

In [35]:
import os, subprocess, sys, pathlib, re, shutil

# Pick project dir safely (no inline comments!)
candidates = ["hw05/fastapi", "hw05", "."]
proj = next((p for p in candidates if (pathlib.Path(p)/"pyproject.toml").exists()), None)
if not proj:
    raise SystemExit("Couldn't find a uv project (pyproject.toml). Check your folder.")
os.chdir(proj)
print("Project dir:", pathlib.Path().resolve())

# Run uv commands with VIRTUAL_ENV unset (avoids the warning)
env = dict(os.environ)
env.pop("VIRTUAL_ENV", None)

def run(cmd):
    print("$", cmd)
    res = subprocess.run(cmd, shell=True, env=env)
    if res.returncode != 0:
        raise SystemExit(f"Command failed: {cmd}")

# Add scikit-learn without installing
run("uv add scikit-learn==1.6.1 --no-sync")

# Refresh lockfile (no install)
run("uv lock --refresh")

# Show an excerpt from uv.lock confirming scikit-learn
lock = pathlib.Path("uv.lock")
if not lock.exists():
    print("uv.lock not found")
else:
    s = lock.read_text()
    m = re.search(r'\[\[package\]\]\nname = "scikit-learn"[\s\S]*?(?=\n\[\[package\]\]|\Z)', s)
    print("\n=== scikit-learn in uv.lock ===\n")
    print((m.group(0)[:1200] if m else "scikit-learn section not found"))


Project dir: /Users/oleitao/Documents/Repos/DataTalks.Club/machine-learning-zoomcamp/cohorts/2025/05-deployment/hw05
$ uv add scikit-learn==1.6.1 --no-sync
$ uv lock --refresh

=== scikit-learn in uv.lock ===

[[package]]
name = "scikit-learn"
version = "1.6.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
    { name = "joblib" },
    { name = "numpy" },
    { name = "scipy" },
    { name = "threadpoolctl" },
]
sdist = { url = "https://files.pythonhosted.org/packages/9e/a5/4ae3b3a0755f7b35a280ac90b28817d1f380318973cff14075ab41ef50d9/scikit_learn-1.6.1.tar.gz", hash = "sha256:b4fc2525eca2c69a59260f583c56a7557c6ccdf8deafdba6e060f94c1c59738e", size = 7068312, upload-time = "2025-01-10T08:07:55.348Z" }
wheels = [
    { url = "https://files.pythonhosted.org/packages/2e/59/8eb1872ca87009bdcdb7f3cdc679ad557b992c12f4b61f9250659e592c63/scikit_learn-1.6.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:2ffa1e9e25b3d93990e74a4be2c2fc61ee5af85811562f1288d5d055880c4322", s

[2mResolved [1m16 packages[0m [2min 3ms[0m[0m
[2mResolved [1m16 packages[0m [2min 105ms[0m[0m


In [37]:
import pathlib, tomllib

# Find uv.lock in the usual spots
candidates = [
    pathlib.Path("uv.lock"),
    pathlib.Path("hw05/uv.lock"),
    pathlib.Path("hw05/fastapi/uv.lock"),
]
lock_path = next((p for p in candidates if p.exists()), None)
if not lock_path:
    raise FileNotFoundError("Couldn't find uv.lock (tried uv.lock, hw05/uv.lock, hw05/fastapi/uv.lock)")

data = tomllib.loads(lock_path.read_text(encoding="utf-8"))

# uv.lock has a [[package]] table array
pkgs = [p for p in data.get("package", []) if p.get("name") == "scikit-learn" and p.get("version") == "1.6.1"]
if not pkgs:
    raise RuntimeError("scikit-learn==1.6.1 not found in uv.lock")

pkg = pkgs[0]

hashes = []
# Prefer wheel hashes (there can be many for different platforms)
for w in pkg.get("wheels", []):
    if "hash" in w:
        hashes.append(w["hash"])
# Also include the sdist hash if present
sdist = pkg.get("sdist")
if isinstance(sdist, dict) and "hash" in sdist:
    hashes.append(sdist["hash"])

if not hashes:
    raise RuntimeError("No hashes found for scikit-learn in uv.lock")

print("Lockfile:", lock_path)
print("First scikit-learn 1.6.1 hash:", hashes[0])


Lockfile: uv.lock
First scikit-learn 1.6.1 hash: sha256:2ffa1e9e25b3d93990e74a4be2c2fc61ee5af85811562f1288d5d055880c4322


## Models (pre-trained pipeline)
We'll use the provided pipeline with `DictVectorizer` and `LogisticRegression` saved in `pipeline_v1.bin`.

## Question 3 — Load `pipeline_v1.bin` and score a record

Record to score:
```json
{
  "lead_source": "paid_ads",
  "number_of_courses_viewed": 2,
  "annual_income": 79276.0
}
```
Print the probability of conversion (positive class).

In [39]:
# Download with curl (follows redirects)
!curl -L -o pipeline_v1.bin \
  https://github.com/DataTalksClub/machine-learning-zoomcamp/raw/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin

# macOS MD5 check (prints just the hash)
!md5 -r pipeline_v1.bin | awk '{print $1}'

# Or SHA-256 (also available on macOS)
!shasum -a 256 pipeline_v1.bin | awk '{print $1}'


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0 0      0 --:--:-- --:--:-- --:--:--     0
100  1300  100  1300    0     0   2452      0 --:--:-- --:--:-- --:--:--  2452
7d17d2e4dfbaf1e408e1a62e6e880d49
c85a57297c7b23f1afee8fce787d1a1a64633484ffb98e62fc8fb35406b891aa


In [40]:
import pickle
with open('pipeline_v1.bin', 'rb') as f:
    pipeline = pickle.load(f)

record = {
    "lead_source": "paid_ads",
    "number_of_courses_viewed": 2,
    "annual_income": 79276.0,
}

proba = float(pipeline.predict_proba([record])[0, 1])
print('Q3 probability:', round(proba, 3))


https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


Q3 probability: 0.534


https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


## Question 4 — Serve the model with FastAPI and score a client

Client to score via POST:
```python
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}
```
We will write a small FastAPI service and a client request.

In [41]:
# Ensure FastAPI and Uvicorn are available in your environment
import sys, subprocess
subprocess.run([sys.executable, '-m', 'pip', 'install', '-U', 'fastapi', 'uvicorn', 'pydantic', 'scikit-learn', 'scipy'], check=False)

from textwrap import dedent
app_code = dedent('''
from fastapi import FastAPI
from pydantic import BaseModel
import pickle

class Lead(BaseModel):
    lead_source: str | None = None
    number_of_courses_viewed: float | int | None = 0
    annual_income: float | int | None = 0

with open('pipeline_v1.bin', 'rb') as f:
    pipeline = pickle.load(f)

app = FastAPI()

@app.get('/')
def root():
    return {'status': 'ok'}

@app.post('/predict')
def predict(lead: Lead):
    record = lead.model_dump()
    proba = float(pipeline.predict_proba([record])[0, 1])
    return {'probability': proba}
''')
open('app.py', 'w', encoding='utf-8').write(app_code)
print('Wrote app.py')
print('To run locally:  uvicorn app:app --reload')


Collecting fastapi
  Downloading fastapi-0.119.1-py3-none-any.whl.metadata (28 kB)
Collecting uvicorn
  Downloading uvicorn-0.38.0-py3-none-any.whl.metadata (6.8 kB)
Collecting pydantic
  Downloading pydantic-2.12.3-py3-none-any.whl.metadata (87 kB)
Collecting scikit-learn
  Using cached scikit_learn-1.7.2-cp313-cp313-macosx_12_0_arm64.whl.metadata (11 kB)
Collecting starlette<0.49.0,>=0.40.0 (from fastapi)
  Downloading starlette-0.48.0-py3-none-any.whl.metadata (6.3 kB)
Collecting typing-extensions>=4.8.0 (from fastapi)
  Using cached typing_extensions-4.15.0-py3-none-any.whl.metadata (3.3 kB)
Collecting annotated-types>=0.6.0 (from pydantic)
  Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.41.4 (from pydantic)
  Downloading pydantic_core-2.41.4-cp313-cp313-macosx_11_0_arm64.whl.metadata (7.3 kB)
Collecting typing-inspection>=0.4.2 (from pydantic)
  Downloading typing_inspection-0.4.2-py3-none-any.whl.metadata (2.6 kB)
Collecting anyi

In [42]:
# Client call (run this **after** starting `uvicorn app:app --reload` in a separate terminal)
import json, time
import requests

url = 'http://127.0.0.1:8000/predict'
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0,
}
try:
    resp = requests.post(url, json=client, timeout=5)
    print('Response:', resp.json())
except Exception as e:
    print('Make sure the server is running (uvicorn app:app --reload). Error:', e)



Make sure the server is running (uvicorn app:app --reload). Error: HTTPConnectionPool(host='127.0.0.1', port=8000): Max retries exceeded with url: /predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x11db62ba0>: Failed to establish a new connection: [Errno 61] Connection refused'))


## Question 5 — Docker base image size

Pull the base image and check its size with `docker images` (look under the `SIZE` column).

In [44]:
%%bash
set -euo pipefail
docker --version
docker pull agrigorev/zoomcamp-model:2025
docker image ls agrigorev/zoomcamp-model:2025
docker image inspect agrigorev/zoomcamp-model:2025 --format '{{.Size}}'


Docker version 28.2.2, build e6534b4
2025: Pulling from agrigorev/zoomcamp-model
816e9c0578b5: Pulling fs layer
59e22667830b: Pulling fs layer
66cd357cf7a1: Pulling fs layer
760c2b9cc6ee: Pulling fs layer
799df5177ae2: Pulling fs layer
19f81082f6f0: Pulling fs layer
66cd357cf7a1: Download complete
816e9c0578b5: Download complete
760c2b9cc6ee: Download complete
19f81082f6f0: Download complete
59e22667830b: Download complete
799df5177ae2: Download complete
19f81082f6f0: Pull complete
59e22667830b: Pull complete
799df5177ae2: Pull complete
66cd357cf7a1: Pull complete
816e9c0578b5: Pull complete
760c2b9cc6ee: Pull complete
Digest: sha256:14d79fde0bbf078eb18c99c2bd007205917b758ec11060b2994963a1e485c2ae
Status: Downloaded newer image for agrigorev/zoomcamp-model:2025
docker.io/agrigorev/zoomcamp-model:2025
REPOSITORY                 TAG       IMAGE ID       CREATED        SIZE
agrigorev/zoomcamp-model   2025      14d79fde0bbf   38 hours ago   181MB
44332815


## Dockerfile — build your service image

Create a `Dockerfile` that starts FROM the provided base and installs dependencies from `pyproject.toml`, then runs the FastAPI app with Uvicorn.

In [None]:
from textwrap import dedent
dockerfile = dedent('''
FROM agrigorev/zoomcamp-model:2025
WORKDIR /code

# Install uv (fast installer)
RUN python -m pip install -U uv

# Copy project files
COPY pyproject.toml ./
COPY app.py ./
COPY pipeline_v1.bin ./

# Install dependencies via uv (no dev deps)
RUN uv sync --no-dev --frozen || uv sync --no-dev

EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
''')
open('Dockerfile', 'w', encoding='utf-8').write(dockerfile)
print('Wrote Dockerfile')

pyproject = dedent('''
[project]
name = "mlzoomcamp-hw5"
version = "0.1.0"
requires-python = ">=3.12"
dependencies = [
  "fastapi",
  "uvicorn",
  "scikit-learn",
  "scipy",
  "pydantic>=2",
]
''')
open('pyproject.toml', 'w', encoding='utf-8').write(pyproject)
print('Wrote pyproject.toml')
print('Build with:  docker build --platform linux/amd64 -t hw5-service .')
print('Run with:    docker run --rm --platform linux/amd64 -p 8000:8000 hw5-service')




Wrote Dockerfile
Wrote pyproject.toml
Build with:  docker build -t hw5-service .
Run with:    docker run --rm -p 8000:8000 hw5-service


## Question 6 — Run your container and score the client

After building and running your container, POST the same `client` to the running service and print the probability.

In [50]:
import requests
url = 'http://127.0.0.1:8000/predict'  # if running locally with -p 8000:8000
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0,
}
try:
    resp = requests.post(url, json=client, timeout=5)
    print('Q6 probability:', round(resp.json().get('probability', float('nan')), 3))
except Exception as e:
    print('Ensure the Docker container is running and listening on port 8000. Error:', e)


Q6 probability: 0.534


## Answers — Fill these into the course form
- Q1: uv version — run the cell in *Question 1* and copy the version (example in the prompt was **0.9.5**).
- Q2: First `scikit-learn==1.6.1` hash — run the parser cell in *Question 2* and copy the first `sha256:` value.
- Q3: Probability from `pipeline_v1.bin` — run *Question 3* (expected ~**0.533**).
- Q4: Probability from FastAPI app — run *Question 4* client cell (expected ~**0.534**).
- Q5: Size of `agrigorev/zoomcamp-model:2025` — run *Question 5* (expected **121 MB**).
- Q6: Probability from Dockerized service — run *Question 6* (expected ~**0.79**).

> If your numbers differ slightly, pick the closest option in the multiple choice.

---
_Notebook generated: 2025-10-22T21:13:28Z_
