A complete, production-style machine learning project that teaches you how to take a model from a CSV file all the way to a monitored, containerised, CI/CD-driven service.
What you will build:
- A churn-prediction model trained with MLflow experiment tracking
- A FastAPI prediction service with Prometheus metrics
- A Streamlit web UI for interactive predictions
- A full Docker Compose stack (API + UI + Prometheus + Grafana)
- A GitHub Actions pipeline that tests, lints, and publishes Docker images
- Architecture overview
- Prerequisites
- Project structure
- Step 1 — The dataset and ML problem
- Step 2 — Training the model
- Step 3 — Experiment tracking with MLflow
- Step 4 — Serving predictions with FastAPI
- Step 5 — Interactive UI with Streamlit
- Step 6 — Testing the API
- Step 7 — Containerisation with Docker
- Step 8 — Monitoring with Prometheus and Grafana
- Step 9 — Running the full stack with Docker Compose
- Step 10 — CI/CD with GitHub Actions
- Ports and service reference
┌──────────────────────────────────────────────────────┐
│ Developer Workflow │
│ │
│ train.py ──► models/ ──► api.py ──► tests/ │
│ │ │
│ └──► mlruns/ (MLflow local tracking) │
└──────────────────────────────────────────────────────┘
│
git push main
│
┌──────────────────────────────────────────────────────┐
│ GitHub Actions CI/CD │
│ │
│ ┌─────────┐ ┌──────┐ ┌─────────────────────┐ │
│ │ lint │ │ test │──►│ build & push image │ │
│ └─────────┘ └──────┘ └─────────────────────┘ │
│ │ │
│ ghcr.io/…/churnguard-api │
│ ghcr.io/…/churnguard-ui │
└──────────────────────────────────────────────────────┘
│
docker compose up
│
┌──────────────────────────────────────────────────────────────┐
│ Runtime Stack │
│ │
│ :8501 Streamlit UI ──────────────────────────────────────► │
│ └─► :8000 FastAPI ◄── :9090 Prometheus │
│ │ │ │
│ models/ :3000 Grafana │
└──────────────────────────────────────────────────────────────┘
| Tool | Minimum version | Purpose |
|---|---|---|
| Python | 3.11 | Training, API, UI |
| Docker | 24 | Containerisation |
| Docker Compose | v2 | Multi-service stack |
| Git | 2.x | Version control and CI/CD |
Install the Python dependencies for local development:
pip install -r requirements.api.txt
pip install pytest httpx streamlit plotly requests mlflowchurn_project/
├── src/
│ ├── train.py # Data preprocessing, model training, artifact export
│ ├── api.py # FastAPI prediction service with Prometheus metrics
│ └── streamlit_app.py # Interactive web UI
├── tests/
│ └── test_api.py # pytest test suite (18 tests)
├── monitoring/
│ └── prometheus.yml # Prometheus scrape configuration
├── models/ # Generated artifacts (model.pkl, scaler.pkl, feature_names.pkl)
├── mlruns/ # MLflow local tracking store
├── Dockerfile.api # Image for the FastAPI service
├── Dockerfile.streamlit # Image for the Streamlit UI
├── docker-compose.yml # Full 4-service stack
├── requirements.api.txt # API dependencies
├── requirements.streamlit.txt # UI dependencies
├── ruff.toml # Linter and formatter configuration
└── .github/
└── workflows/
└── ci.yml # GitHub Actions pipeline
Dataset: IBM Telco Customer Churn (≈7,000 customers, 20 features).
Loaded at training time directly from GitHub — no manual download needed.
Goal: Binary classification — predict whether a customer will churn (Churn = 1) or stay (Churn = 0).
Input features:
| Feature | Type | Notes |
|---|---|---|
| gender | Binary | 0 = Female, 1 = Male |
| SeniorCitizen | Binary | 0 / 1 |
| Partner, Dependents | Binary | 0 = No, 1 = Yes |
| tenure | Integer | Months with the company |
| PhoneService, PaperlessBilling | Binary | 0 / 1 |
| MultipleLines | 0–2 | No phone service / No / Yes |
| InternetService | 0–2 | DSL / Fiber optic / No |
| OnlineSecurity … StreamingMovies | 0–2 | No internet / No / Yes |
| Contract | 0–2 | Month-to-month / 1-year / 2-year |
| PaymentMethod | 0–3 | Four payment methods |
| MonthlyCharges | Float | Must be > 0 |
| TotalCharges | Float | Must be ≥ 0 |
All training logic lives in src/train.py. Run it once before starting the API:
python src/train.pyWhat happens, step by step:
df = pd.read_csv(DATA_URL)
# Fix TotalCharges (whitespace rows arrive as empty strings)
df["TotalCharges"] = pd.to_numeric(df["TotalCharges"], errors="coerce")
df.dropna(inplace=True)
# Encode binary and multi-class columns
for col in binary_cols:
df[col] = (df[col] == "Yes").astype(int)
le = LabelEncoder()
for col in multi_cols:
df[col] = le.fit_transform(df[col])X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y # stratify preserves class balance
)
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc = scaler.transform(X_test)Why scale? Random Forests are not distance-based and don't strictly require scaling, but it makes the pipeline consistent when you swap in other algorithms later.
params = {
"n_estimators": 200,
"max_depth": 8,
"min_samples_split": 5,
"class_weight": "balanced", # compensates for the ~27% churn minority class
"random_state": 42,
}
model = RandomForestClassifier(**params)
model.fit(X_train_sc, y_train)joblib.dump(model, "models/model.pkl")
joblib.dump(scaler, "models/scaler.pkl")
joblib.dump(list(X.columns), "models/feature_names.pkl")Three separate files are saved intentionally:
model.pkl— the trained classifierscaler.pkl— the fitted scaler (must use the same scale at inference time)feature_names.pkl— the ordered list of column names, so the API can reorder any incoming dictionary to match training order
MLflow tracks every training run locally inside mlruns/.
mlflow ui # open http://localhost:5000 to see run historyEach run records:
- Metrics:
accuracy,roc_auc - Model artifact: the serialised RandomForestClassifier (skipped in CI to keep runs fast)
with mlflow.start_run():
# ... train ...
mlflow.log_metric("accuracy", acc)
mlflow.log_metric("roc_auc", auc)
if not os.getenv("CI"):
mlflow.sklearn.log_model(model, name="model")Why skip
log_modelin CI? Serialising a 200-tree forest adds several seconds and megabytes to every CI run. The final.pklartefacts saved byjoblib.dumpare what the API actually loads, so the MLflow model artefact is optional.
src/api.py exposes three endpoints on port 8000.
uvicorn src.api:app --reload --port 8000Open the auto-generated interactive docs: http://localhost:8000/docs
Pydantic validates every incoming request before it reaches the model:
class CustomerFeatures(BaseModel):
gender: int = Field(..., ge=0, le=1)
tenure: int = Field(..., ge=0)
MonthlyCharges: float = Field(..., gt=0) # must be strictly positive
# ... 16 more fieldsIf a request violates any constraint, FastAPI returns 422 Unprocessable Entity automatically — no custom error handling needed.
@app.post("/predict", response_model=PredictionResponse)
def predict(customer: CustomerFeatures):
data = pd.DataFrame([customer.model_dump()])[feature_names] # enforce column order
data_scaled = scaler.transform(data)
proba = model.predict_proba(data_scaled)[0][1] # probability of churn
prediction = bool(proba >= 0.5)
risk = "high" if proba >= 0.7 else "medium" if proba >= 0.4 else "low"
return PredictionResponse(churn_probability=round(float(proba), 4),
churn_prediction=prediction, risk_level=risk)curl http://localhost:8000/health
# {"status":"ok","model":"RandomForestClassifier","version":"1.0.0"}Used by Docker to decide when the container is ready to accept traffic.
curl http://localhost:8000/metricsThree counters/histograms are exposed:
| Metric | Type | Labels |
|---|---|---|
churnguard_requests_total |
Counter | endpoint, status |
churnguard_request_latency_seconds |
Histogram | endpoint |
churnguard_churn_predicted_total |
Counter | — |
src/streamlit_app.py provides a no-code interface for exploring predictions.
streamlit run src/streamlit_app.py
# http://localhost:8501The sidebar collects all 19 customer features through dropdowns, sliders, and number inputs. When you click Predict churn risk, the app:
- Encodes the UI inputs into the same integer format the API expects
- POSTs to
/predict - Displays a probability gauge, prediction card, and risk-tier recommendations
The API URL is configured via an environment variable:
API_URL=http://api:8000 streamlit run src/streamlit_app.py # Docker internal DNS
API_URL=http://localhost:8000 streamlit run src/streamlit_app.py # local devThe test suite in tests/test_api.py uses FastAPI's TestClient, which runs the app in-process — no network required.
pytest tests/ -v18 tests across 4 classes:
| Class | What it covers |
|---|---|
TestHealth |
/health returns 200 with correct structure |
TestPredict |
Probability in [0,1], boolean prediction, valid risk level, comparative high vs. low risk |
TestValidation |
Missing fields → 422, out-of-range values → 422, type errors → 422 |
TestMetrics |
/metrics returns 200, Prometheus content type, metric names present |
Key insight: Always test the contract your API exposes, not the model internals. If you change the thresholds or the model later, these tests catch regressions at the boundary.
The project uses two separate images so the API and UI can be scaled and deployed independently.
FROM python:3.11-slim
RUN apt-get install -y curl # needed for Docker health checks
COPY requirements.api.txt .
RUN pip install -r requirements.api.txt
COPY src/ src/
COPY models/ models/ # pre-trained artifacts baked into the image
EXPOSE 8000
CMD ["uvicorn", "src.api:app", "--host", "0.0.0.0", "--port", "8000"]Build and run manually:
python src/train.py # generate models/ first
docker build -f Dockerfile.api -t churnguard-api .
docker run -p 8000:8000 churnguard-apiFROM python:3.11-slim
COPY requirements.streamlit.txt .
RUN pip install -r requirements.streamlit.txt
COPY src/streamlit_app.py src/
EXPOSE 8501
CMD ["streamlit", "run", "src/streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]The Streamlit image contains no model files — it only talks to the API over HTTP.
FastAPI /metrics ──► Prometheus (scrapes every 15s) ──► Grafana (visualises)
global:
scrape_interval: 15s
scrape_configs:
- job_name: churnguard-api
static_configs:
- targets: ["api:8000"]
metrics_path: /metricsapi resolves via Docker's internal DNS when running under Compose.
- Open
http://localhost:3000(admin / churnguard) - Add data source → Prometheus → URL:
http://prometheus:9090 - Create a dashboard with these PromQL queries:
# Request rate (requests per second)
rate(churnguard_requests_total[1m])
# 95th percentile latency
histogram_quantile(0.95, rate(churnguard_request_latency_seconds_bucket[5m]))
# Churn prediction rate
rate(churnguard_churn_predicted_total[5m])
docker-compose.yml wires up all four services:
api (port 8000) ← FastAPI + trained model
streamlit (port 8501) ← UI, depends on api being healthy
prometheus (port 9090) ← scrapes api/metrics every 15 s
grafana (port 3000) ← reads from prometheus
python src/train.py # build model artifacts once (needed by the api image)
docker compose up --buildDocker Compose respects the depends_on + condition: service_healthy chain:
prometheus ──► api ◄── streamlit
grafana ──► prometheus
The api service has a health check:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3Streamlit will not start until the API passes its health check, preventing connection errors on startup.
| Service | URL | Credentials |
|---|---|---|
| API docs | http://localhost:8000/docs | — |
| Streamlit UI | http://localhost:8501 | — |
| Prometheus | http://localhost:9090 | — |
| Grafana | http://localhost:3000 | admin / churnguard |
docker compose down # stop containers, keep volumes
docker compose down -v # stop containers and remove volumes (wipes Grafana state).github/workflows/ci.yml runs on every push and pull request to main.
push to main
│
├── lint (parallel)
│ ruff check src/ tests/
│ ruff format --check src/ tests/
│
├── test (parallel)
│ pip install dependencies
│ python src/train.py ← train model so artifacts exist for tests
│ pytest tests/ -v
│
└── build-and-push (only on main, requires test to pass)
python src/train.py ← bake fresh artifacts into the image
docker build Dockerfile.api ──► ghcr.io/…/churnguard-api:latest
docker build Dockerfile.streamlit ──► ghcr.io/…/churnguard-streamlit:latest
The models/ directory is excluded from git (it contains large binary files). The CI pipeline trains a fresh model so that:
- Tests always run against a real, loadable model
- The published Docker images contain up-to-date artifacts without committing binaries
Each published image gets two tags:
| Tag | Example | Purpose |
|---|---|---|
sha-<commit> |
sha-062da9e |
Pinned, immutable reference |
latest |
latest |
Convenience tag for the most recent build |
Always deploy using the SHA tag in production — latest can change under you.
ruff.toml configures three rule sets:
select = ["E", "F", "I"] # pycodestyle errors, pyflakes, isort
ignore = ["E501"] # line length handled separately (100-char limit)The lint job enforces both correctness (ruff check) and formatting (ruff format --check), so style debates never reach code review.
| Service | Port | Technology | Role |
|---|---|---|---|
| FastAPI | 8000 | FastAPI + Uvicorn | Predictions, health, metrics |
| Streamlit | 8501 | Streamlit | Web UI |
| Prometheus | 9090 | Prometheus | Metrics collection |
| Grafana | 3000 | Grafana | Monitoring dashboards |
| MLflow UI | 5000 | MLflow | Experiment tracking (local only) |
# 1. Install dependencies
pip install -r requirements.api.txt
pip install pytest httpx mlflow streamlit plotly requests
# 2. Train the model
python src/train.py
# 3a. Run locally (4 separate terminals)
uvicorn src.api:app --reload --port 8000
streamlit run src/streamlit_app.py
mlflow ui # optional — view experiment history
pytest tests/ -v # verify everything works
# 3b. Run with Docker Compose (all services, one command)
docker compose up --build
# 4. Check code quality
pip install ruff
ruff check src/ tests/
ruff format --check src/ tests/