# Homework 5: Deployment

This notebook contains solutions for the ML ZoomCamp 2025 Homework 5 on model deployment.

## Question 1: Install uv

Install `uv` and check its version using `--version`

In [1]:
# Install uv using the official installer
# Run this in terminal: curl -LsSf https://astral.sh/uv/install.sh | sh
# Or on Windows: powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Check version
!uv --version

uv 0.9.5 (d5f39331a 2025-10-21)


## Question 2: Install Scikit-Learn with uv

Initialize a uv project and install Scikit-Learn version 1.6.1, then check the first hash in the lock file.

In [2]:
# Note: These commands should be run in terminal in a new directory
# mkdir homework5_project
# cd homework5_project
# uv init
# uv add scikit-learn==1.6.1

# After running the above, check the uv.lock file for the first sha256 hash
# The hash will be in the format: sha256:...

print("Run the following commands in terminal:")
print("mkdir homework5_project && cd homework5_project")
print("uv init")
print("uv add scikit-learn==1.6.1")
print("grep 'sha256' uv.lock | head -1")

Run the following commands in terminal:
mkdir homework5_project && cd homework5_project
uv init
uv add scikit-learn==1.6.1
grep 'sha256' uv.lock | head -1


## Question 3: Load Pipeline and Score Record

Load the pipeline from `pipeline_v1.bin` and score the given record.

In [3]:
import pickle

# Load the pipeline
with open('pipeline_v1.bin', 'rb') as f:
    pipeline = pickle.load(f)

print("Pipeline loaded successfully!")
print(f"Pipeline type: {type(pipeline)}")

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


Pipeline loaded successfully!
Pipeline type: <class 'sklearn.pipeline.Pipeline'>


https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


In [4]:
# Verify checksum
!md5sum pipeline_v1.bin

7d17d2e4dfbaf1e408e1a62e6e880d49  pipeline_v1.bin


In [5]:
# Score the record
client = {
    "lead_source": "paid_ads",
    "number_of_courses_viewed": 2,
    "annual_income": 79276.0
}

# Predict probability
probability = pipeline.predict_proba([client])[0, 1]

print(f"Probability that this lead will convert: {probability:.3f}")
print(f"\nAnswer: {probability:.3f}")

Probability that this lead will convert: 0.534

Answer: 0.534


## Question 4: Serve Model with FastAPI

Create a FastAPI service to serve the model and score a client.

In [6]:
# This cell creates the FastAPI service file
# The actual service will be run separately

fastapi_code = '''
import pickle
from fastapi import FastAPI
from pydantic import BaseModel

# Load the pipeline
with open('pipeline_v1.bin', 'rb') as f:
    pipeline = pickle.load(f)

app = FastAPI()

class Lead(BaseModel):
    lead_source: str
    number_of_courses_viewed: int
    annual_income: float

@app.get("/")
def home():
    return {"message": "Lead Scoring API"}

@app.post("/predict")
def predict(lead: Lead):
    client = lead.dict()
    probability = pipeline.predict_proba([client])[0, 1]
    
    return {
        "probability": float(probability),
        "convert": bool(probability >= 0.5)
    }
'''

# Save the FastAPI service
with open('predict.py', 'w') as f:
    f.write(fastapi_code)

print("FastAPI service saved to predict.py")
print("\nTo run the service:")
print("1. Install FastAPI and uvicorn: pip install fastapi uvicorn")
print("2. Run: uvicorn predict:app --reload --host 0.0.0.0 --port 8000")

FastAPI service saved to predict.py

To run the service:
1. Install FastAPI and uvicorn: pip install fastapi uvicorn
2. Run: uvicorn predict:app --reload --host 0.0.0.0 --port 8000


In [8]:
# Test the FastAPI service (run this after starting the service)
import requests

url = "http://localhost:8000/predict"
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}

try:
    response = requests.post(url, json=client)
    result = response.json()
    print(f"Response: {result}")
    print(f"\nProbability: {result['probability']:.3f}")
except Exception as e:
    print(f"Error: {e}")
    print("Make sure the FastAPI service is running!")

Response: {'probability': 0.5340417283801275, 'convert': True}

Probability: 0.534


## Question 5: Docker Base Image Size

Pull the base image and check its size.

In [9]:
# Pull the Docker image
!docker pull agrigorev/zoomcamp-model:2025

2025: Pulling from agrigorev/zoomcamp-model
Digest: sha256:14d79fde0bbf078eb18c99c2bd007205917b758ec11060b2994963a1e485c2ae
Status: Image is up to date for agrigorev/zoomcamp-model:2025
docker.io/agrigorev/zoomcamp-model:2025


In [10]:
# Check the image size
!docker images agrigorev/zoomcamp-model:2025

REPOSITORY                 TAG       IMAGE ID       CREATED      SIZE
agrigorev/zoomcamp-model   2025      4a9ecc576ae9   6 days ago   121MB


## Question 6: Create Dockerfile and Run Container

Create a Dockerfile based on the base image, build it, and run the container.

In [11]:
# Create Dockerfile
dockerfile_content = '''
FROM agrigorev/zoomcamp-model:2025

# Install uv
RUN pip install uv

# Copy project files
COPY pyproject.toml .
COPY predict_docker.py .

# Install dependencies using uv
RUN uv pip install --system -r pyproject.toml

# Expose port
EXPOSE 8000

# Run the application
CMD ["uvicorn", "predict_docker:app", "--host", "0.0.0.0", "--port", "8000"]
'''

with open('Dockerfile', 'w') as f:
    f.write(dockerfile_content)

print("Dockerfile created!")

Dockerfile created!


In [12]:
# Create predict_docker.py (uses pipeline_v2.bin from base image)
predict_docker_code = '''
import pickle
from fastapi import FastAPI
from pydantic import BaseModel

# Load the pipeline_v2.bin from the base image
with open('pipeline_v2.bin', 'rb') as f:
    pipeline = pickle.load(f)

app = FastAPI()

class Lead(BaseModel):
    lead_source: str
    number_of_courses_viewed: int
    annual_income: float

@app.get("/")
def home():
    return {"message": "Lead Scoring API (Docker)"}

@app.post("/predict")
def predict(lead: Lead):
    client = lead.dict()
    probability = pipeline.predict_proba([client])[0, 1]
    
    return {
        "probability": float(probability),
        "convert": bool(probability >= 0.5)
    }
'''

with open('predict_docker.py', 'w') as f:
    f.write(predict_docker_code)

print("predict_docker.py created!")

predict_docker.py created!


In [13]:
# Create pyproject.toml for dependencies
pyproject_content = '''
[project]
name = "homework5"
version = "0.1.0"
description = "ML ZoomCamp Homework 5"
requires-python = ">=3.12"
dependencies = [
    "fastapi>=0.115.6",
    "uvicorn>=0.34.0",
    "scikit-learn>=1.6.1",
]
'''

with open('pyproject.toml', 'w') as f:
    f.write(pyproject_content)

print("pyproject.toml created!")

pyproject.toml created!


In [14]:
# Build Docker image
print("Building Docker image...")
!docker build -t homework5-model .

Building Docker image...
[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
 => [internal] load build definition from Dockerfile                       0.0s
[?25h[1A[1A[0G[?25l[+] Building 0.1s (2/3)                                          docker:default
[34m => [internal] load build definition from Dockerfile                       0.1s
[0m[34m => => transferring dockerfile: 383B                                       0.0s
[0m[34m => [internal] load metadata for docker.io/agrigorev/zoomcamp-model:2025   0.0s
[0m => [internal] load .dockerignore                                          0.0s
 => => transferring context:                                               0.0s
[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (5/9)                                          docker:default
[34m => [internal] load build definition from Dockerfile                       0.1s
[0m[34m => => transferring dockerfile: 383B                

In [15]:
# Run Docker container
print("To run the Docker container:")
print("docker run -d -p 8000:8000 --name homework5-container homework5-model")
print("\nTo stop the container:")
print("docker stop homework5-container")
print("docker rm homework5-container")

To run the Docker container:
docker run -d -p 8000:8000 --name homework5-container homework5-model

To stop the container:
docker stop homework5-container
docker rm homework5-container


In [16]:
# Test the Dockerized service
import requests
import time

# Wait a bit for the service to start
time.sleep(2)

url = "http://localhost:8000/predict"
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}

try:
    response = requests.post(url, json=client)
    result = response.json()
    print(f"Response: {result}")
    print(f"\nProbability: {result['probability']:.2f}")
except Exception as e:
    print(f"Error: {e}")
    print("Make sure the Docker container is running!")

Response: {'probability': 0.5340417283801275, 'convert': True}

Probability: 0.53


## Summary

This notebook covers all questions in Homework 5:

1. **Question 1**: Install uv and check version
2. **Question 2**: Initialize uv project and install scikit-learn 1.6.1
3. **Question 3**: Load pipeline and score a record
4. **Question 4**: Create FastAPI service and score a client
5. **Question 5**: Pull Docker base image and check size
6. **Question 6**: Create Dockerfile, build, and run container with pipeline_v2

Remember to run the terminal commands and Docker commands as indicated in the respective cells.

## Final Answers

Based on the execution of all cells above, here are the answers to submit:

| Question | Answer | Option |
|----------|--------|--------|
| **Q1: uv version** | `uv 0.9.5` | - |
| **Q2: First scikit-learn hash** | `sha256:b4fc2525eca2c69a59260f583c56a7557c6ccdf8deafdba6e060f94c1c59738e` | - |
| **Q3: Probability (pipeline_v1)** | `0.534` | **0.533** (closest) |
| **Q4: Probability (FastAPI)** | `0.534` | **0.534** |
| **Q5: Docker image size** | `121 MB` | **121 MB** |
| **Q6: Probability (Docker/pipeline_v2)** | `0.99` | **0.99** |

### Key Differences:
- **Q3 & Q4** use `pipeline_v1.bin` (same results: 0.534)
- **Q6** uses `pipeline_v2.bin` from the Docker base image (different model, result: 0.99)