# Q1

* Install `uv`
* What's the version of uv you installed?
* Use `--version` to find out


```bash
uv --version
uv 0.9.2 (141369ce7 2025-10-10)
```

# Q2

* Use uv to install Scikit-Learn version 1.6.1 
* What's the first hash for Scikit-Learn you get in the lock file?
* Include the entire string starting with sha256:, don't include quotes

In [1]:
import tomllib

with open("uv.lock", "rb") as f:
    data = tomllib.load(f)

sklearn_info = []
for package in data.get("package", []):
    if package.get("name") == "scikit-learn":
        sklearn_info.append(package)

sklearn_info

[{'name': 'scikit-learn',
  'version': '1.6.1',
  'source': {'registry': 'https://pypi.org/simple'},
  'dependencies': [{'name': 'joblib'},
   {'name': 'numpy'},
   {'name': 'scipy'},
   {'name': 'threadpoolctl'}],
  'sdist': {'url': 'https://files.pythonhosted.org/packages/9e/a5/4ae3b3a0755f7b35a280ac90b28817d1f380318973cff14075ab41ef50d9/scikit_learn-1.6.1.tar.gz',
   'hash': 'sha256:b4fc2525eca2c69a59260f583c56a7557c6ccdf8deafdba6e060f94c1c59738e',
   'size': 7068312,
   'upload-time': '2025-01-10T08:07:55.348Z'},
  'wheels': [{'url': 'https://files.pythonhosted.org/packages/0a/18/c797c9b8c10380d05616db3bfb48e2a3358c767affd0857d56c2eb501caa/scikit_learn-1.6.1-cp312-cp312-macosx_10_13_x86_64.whl',
    'hash': 'sha256:926f207c804104677af4857b2c609940b743d04c4c35ce0ddc8ff4f053cddc1b',
    'size': 12104516,
    'upload-time': '2025-01-10T08:06:40.009Z'},
   {'url': 'https://files.pythonhosted.org/packages/c4/b7/2e35f8e289ab70108f8cbb2e7a2208f0575dc704749721286519dcf35f6f/scikit_learn-1.

# Q3

We have prepared a pipeline with a dictionary vectorizer and a model.

It was trained (roughly) using this code:

```python
categorical = ['lead_source']
numeric = ['number_of_courses_viewed', 'annual_income']

df[categorical] = df[categorical].fillna('NA')
df[numeric] = df[numeric].fillna(0)

train_dict = df[categorical + numeric].to_dict(orient='records')

pipeline = make_pipeline(
    DictVectorizer(),
    LogisticRegression(solver='liblinear')
)

pipeline.fit(train_dict, y_train)
```

In [9]:
!wget https://github.com/DataTalksClub/machine-learning-zoomcamp/raw/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin

--2025-10-21 09:49:02--  https://github.com/DataTalksClub/machine-learning-zoomcamp/raw/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin
Resolving github.com (github.com)... 140.82.113.3
Connecting to github.com (github.com)|140.82.113.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin [following]
--2025-10-21 09:49:02--  https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8002::154, 2606:50c0:8001::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1300 (1.3K) [application/octet-stream]
Saving to: ‘pipeline_v1.bin’


Let's use the model!

* Write a script for loading the pipeline with pickle
* Score this record:

```json
{
    "lead_source": "paid_ads",
    "number_of_courses_viewed": 2,
    "annual_income": 79276.0
}
```

What's the probability that this lead will convert? 

In [2]:
import pickle

with open("pipeline_v1.bin", "rb") as file:
    pipeline = pickle.load(file)


customer = {
    "lead_source": "paid_ads",
    "number_of_courses_viewed": 2,
    "annual_income": 79276.0
}

pipeline.predict_proba(customer)[0,1]

np.float64(0.5336072702798061)

# Q4

Now let's serve this model as a web service

* Install FastAPI
* Write FastAPI code for serving the model
* Now score this client using `requests`:

```python
url = "YOUR_URL"
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}
requests.post(url, json=client).json()
```

What's the probability that this client will get a subscription?

In [4]:
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/alexeygrigorev/datasets/master/course_lead_scoring.csv")

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1462 entries, 0 to 1461
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   lead_source               1334 non-null   object 
 1   industry                  1328 non-null   object 
 2   number_of_courses_viewed  1462 non-null   int64  
 3   annual_income             1281 non-null   float64
 4   employment_status         1362 non-null   object 
 5   location                  1399 non-null   object 
 6   interaction_count         1462 non-null   int64  
 7   lead_score                1462 non-null   float64
 8   converted                 1462 non-null   int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 102.9+ KB


In [9]:
for col in df.select_dtypes(include='object').columns:
    print(f"{col}: {df[col].unique()}")

lead_source: ['paid_ads' 'social_media' 'events' 'referral' 'organic_search' nan]
industry: [nan 'retail' 'healthcare' 'education' 'manufacturing' 'technology'
 'other' 'finance']
employment_status: ['unemployed' 'employed' nan 'self_employed' 'student']
location: ['south_america' 'australia' 'europe' 'africa' 'middle_east' nan
 'north_america' 'asia']


In [17]:
import requests
url = 'http://localhost:9696/predict'
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}
requests.post(url, json=client).json()

{'churn_probability': 0.5340417283801275, 'churn': True}

# Q5

Install [Docker](https://github.com/DataTalksClub/machine-learning-zoomcamp/blob/master/05-deployment/06-docker.md). 
We will use it for the next two questions.

For these questions, we prepared a base image: `agrigorev/zoomcamp-model:2025`. 
You'll need to use it (see Question 5 for an example).

This image is based on `3.13.5-slim-bookworm` and has
a pipeline with logistic regression (a different one)
as well a dictionary vectorizer inside. 

This is how the Dockerfile for this image looks like:

```docker 
FROM python:3.13.5-slim-bookworm
WORKDIR /code
COPY pipeline_v2.bin .
```

We already built it and then pushed it to [`agrigorev/zoomcamp-model:2025`](https://hub.docker.com/r/agrigorev/zoomcamp-model)

Download the base image `svizor/zoomcamp-model:3.11.5-slim`. You can easily make it by using [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command.

So what's the size of this base image?

In [None]:
!docker pull agrigorev/zoomcamp-model:2025

In [18]:
!docker image ls

REPOSITORY                 TAG           IMAGE ID       CREATED         SIZE
agrigorev/zoomcamp-model   2025          14d79fde0bbf   8 hours ago     181MB
predict                    latest        7fa8f6ed10ab   5 days ago      801MB
subscription               latest        0113bef5e0d5   5 days ago      801MB
predict-churn              latest        b2780c86c8a8   5 days ago      570MB
svizor/zoomcamp-model      3.11.5-slim   15d61790363f   12 months ago   197MB


# Q6

Now create your own `Dockerfile` based on the image we prepared.

It should start like that:

```docker
FROM agrigorev/zoomcamp-model:2025
# add your stuff here
```

Now complete it:

* Install all the dependencies from pyproject.toml
* Copy your FastAPI script
* Run it with uvicorn 

After that, you can build your docker image.



Let's run your docker container!

After running it, score this client once again:

```python
url = "YOUR_URL"
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}
requests.post(url, json=client).json()
```

What's the probability that this lead will convert?


In [None]:
!docker build --platform=linux/amd64 -t predict .

In [None]:
!docker run -it --rm --platform=linux/amd64 -p 9696:9696 predict:latest

In [22]:
import requests
url = 'http://localhost:9696/predict'
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}
requests.post(url, json=client).json()

{'churn_probability': 0.5340417283801275, 'churn': True}