# Module 03 – CommsCom Churn: Training & Serving

In this module you will:

1. Use the MLforEng CLI to train a **CommsCom churn model** on real data.
2. Inspect the saved model artifacts in `artifacts/pretrained/`.
3. Load the trained model in a notebook and run batch predictions.
4. (Optional) Call the **FastAPI churn endpoint** `/predict_churn`
   using real CommsCom customer records.

> You can run this module even if you did **not** do Modules 01–02.
> All required assets will be created on-the-fly if missing.


In [1]:
from pathlib import Path
import os
import sys

# Start from current directory and walk upwards until we find "mlforeng"
here = Path.cwd()
project_root = None
for p in [here, *here.parents]:
    if (p / "mlforeng").exists():
        project_root = p
        break

if project_root is None:
    raise RuntimeError(
        f"Could not locate 'mlforeng' package by walking up from {here}. "
        "Make sure this notebook is somewhere inside your MLforEng repo."
    )

print("Project root:", project_root)

if str(project_root) not in sys.path:
    sys.path.append(str(project_root))

os.chdir(project_root)
print("CWD:", os.getcwd())

from mlforeng.data_churn import train_test_churn, load_churn_raw
from mlforeng.predict import load_trained_model, predict_dataframe


Project root: /Users/vgrover/Downloads/software/AIWorkshops/MLforEng
CWD: /Users/vgrover/Downloads/software/AIWorkshops/MLforEng


In [2]:
df = load_churn_raw()
df.head()


Unnamed: 0,Customer ID,Gender,Age,Married,Number of Dependents,City,Zip Code,Latitude,Longitude,Number of Referrals,...,Payment Method,Monthly Charge,Total Charges,Total Refunds,Total Extra Data Charges,Total Long Distance Charges,Total Revenue,Customer Status,Churn Category,Churn Reason
0,0002-ORFBO,Female,37,Yes,0,Frazier Park,93225,34.827662,-118.999073,2,...,Credit Card,65.6,593.3,0.0,0,381.51,974.81,Stayed,,
1,0003-MKNFE,Male,46,No,0,Glendale,91206,34.162515,-118.203869,0,...,Credit Card,-4.0,542.4,38.33,10,96.21,610.28,Stayed,,
2,0004-TLHLJ,Male,50,No,0,Costa Mesa,92627,33.645672,-117.922613,0,...,Bank Withdrawal,73.9,280.85,0.0,0,134.6,415.45,Churned,Competitor,Competitor had better devices
3,0011-IGKFF,Male,78,Yes,0,Martinez,94553,38.014457,-122.115432,1,...,Bank Withdrawal,98.0,1237.85,0.0,0,361.66,1599.51,Churned,Dissatisfaction,Product dissatisfaction
4,0013-EXCHZ,Female,75,Yes,0,Camarillo,93010,34.227846,-119.079903,3,...,Credit Card,83.9,267.4,0.0,0,22.14,289.54,Churned,Dissatisfaction,Network reliability


In [3]:
df["Customer Status"].value_counts()


Customer Status
Stayed     4720
Churned    1869
Joined      454
Name: count, dtype: int64

In [15]:
from pathlib import Path
import json
import subprocess

model_name = "commscom_rf_baseline"
model_dir = Path("artifacts/pretrained") / model_name
model_fp = model_dir / "model.joblib"
meta_fp = model_dir / "meta.json"

force_retrain = False  # set True if you upgrade sklearn or change the pipeline

if model_fp.exists() and not force_retrain:
    print(f"✅ Using existing model at {model_fp}")
else:
    if model_fp.exists():
        print(f"⚠️ Forcing retrain, deleting old model at {model_fp}")
        model_dir.mkdir(parents=True, exist_ok=True)
    else:
        print(f"⚠️ Model {model_name} not found. Training via CLI...")

    model_dir.mkdir(parents=True, exist_ok=True)

    cmd = [
        "python",
        "-m", "mlforeng.cli.train",
        "--dataset", "commscom_churn",
        "--model-family", "rf",
        "--test-size", "0.2",
        "--save-model-name", model_name,
    ]
    print("Running:", " ".join(cmd))
    result = subprocess.run(cmd, capture_output=True, text=True)
    print("CLI stdout:\n", result.stdout)
    if result.returncode != 0:
        print("CLI stderr:\n", result.stderr)
        raise RuntimeError("Training failed")

if meta_fp.exists():
    print("✅ Found meta.json")
    print(meta_fp.read_text())
else:
    print("⚠️ meta.json not found (training may not have completed correctly)")


✅ Using existing model at artifacts/pretrained/commscom_rf_baseline/model.joblib
✅ Found meta.json
{
  "config": {
    "dataset": "commscom_churn",
    "model_name": "rf",
    "n_samples": 1000,
    "n_features": 20,
    "test_size": 0.2,
    "save_model_name": "commscom_rf_baseline"
  },
  "metrics": {
    "accuracy": 0.858877086494689,
    "roc_auc": 0.9127333907368802
  },
  "extra": {
    "dataset": "commscom_churn",
    "n_train_rows": 5271,
    "n_features": 34,
    "num_cols": [
      "Age",
      "Number of Dependents",
      "Zip Code",
      "Latitude",
      "Longitude",
      "Number of Referrals",
      "Tenure in Months",
      "Avg Monthly Long Distance Charges",
      "Avg Monthly GB Download",
      "Monthly Charge",
      "Total Charges",
      "Total Refunds",
      "Total Extra Data Charges",
      "Total Long Distance Charges",
      "Total Revenue"
    ],
    "cat_cols": [
      "Gender",
      "Married",
      "City",
      "Offer",
      "Phone Service",
      "

In [16]:
loaded = load_trained_model(model_name)
loaded.path, loaded.dataset


(PosixPath('/Users/vgrover/Downloads/software/AIWorkshops/MLforEng/artifacts/pretrained/commscom_rf_baseline'),
 'commscom_churn')

In [17]:
loaded.meta


{'config': {'dataset': 'commscom_churn',
  'model_name': 'rf',
  'n_samples': 1000,
  'n_features': 20,
  'test_size': 0.2,
  'save_model_name': 'commscom_rf_baseline'},
 'metrics': {'accuracy': 0.858877086494689, 'roc_auc': 0.9127333907368802},
 'extra': {'dataset': 'commscom_churn',
  'n_train_rows': 5271,
  'n_features': 34,
  'num_cols': ['Age',
   'Number of Dependents',
   'Zip Code',
   'Latitude',
   'Longitude',
   'Number of Referrals',
   'Tenure in Months',
   'Avg Monthly Long Distance Charges',
   'Avg Monthly GB Download',
   'Monthly Charge',
   'Total Charges',
   'Total Refunds',
   'Total Extra Data Charges',
   'Total Long Distance Charges',
   'Total Revenue'],
  'cat_cols': ['Gender',
   'Married',
   'City',
   'Offer',
   'Phone Service',
   'Multiple Lines',
   'Internet Service',
   'Internet Type',
   'Online Security',
   'Online Backup',
   'Device Protection Plan',
   'Premium Tech Support',
   'Streaming TV',
   'Streaming Movies',
   'Streaming Music',
 

In [18]:
splits = train_test_churn(test_size=0.2, stratify=True)
X_train, X_test = splits.X_train, splits.X_test
y_train, y_test = splits.y_train, splits.y_test

X_test.shape, y_test.shape


((1318, 34), (1318,))

In [19]:
from sklearn.metrics import classification_report, roc_auc_score

y_pred = predict_dataframe(loaded, X_test)

print("=== Loaded RF model – Classification report ===")
print(classification_report(y_test, y_pred, digits=3))

if hasattr(loaded.model, "predict_proba"):
    y_proba = loaded.model.predict_proba(X_test)[:, 1]
    roc_auc = roc_auc_score(y_test, y_proba)
    print("ROC–AUC:", roc_auc)
else:
    print("Model has no predict_proba; skipping ROC–AUC.")


=== Loaded RF model – Classification report ===
              precision    recall  f1-score   support

           0      0.858     0.962     0.907       944
           1      0.862     0.599     0.707       374

    accuracy                          0.859      1318
   macro avg      0.860     0.780     0.807      1318
weighted avg      0.859     0.859     0.850      1318

ROC–AUC: 0.9127333907368802


## Serving the CommsCom churn model via FastAPI

To expose this model as an HTTP API, start the server in **a separate terminal**
(or another OpenShift AI workbench):

```bash
cd /path/to/MLforEng
source .venv/bin/activate   # if using virtualenv
python3 -m mlforeng.cli.serve --model-name commscom_rf_baseline


In [20]:
import numpy as np

# Reuse the X_test from above (must already be defined)
example_records = (
    X_test.head(3)        # take 3 customers
    .replace({np.nan: None})
    .to_dict(orient="records")
)

example_records


[{'Gender': 'Male',
  'Age': 33,
  'Married': 'Yes',
  'Number of Dependents': 2,
  'City': 'Crescent Mills',
  'Zip Code': 95934,
  'Latitude': 40.080342,
  'Longitude': -120.957805,
  'Number of Referrals': 4,
  'Tenure in Months': 13,
  'Offer': None,
  'Phone Service': 'Yes',
  'Avg Monthly Long Distance Charges': 13.77,
  'Multiple Lines': 'Yes',
  'Internet Service': 'Yes',
  'Internet Type': 'DSL',
  'Avg Monthly GB Download': 23.0,
  'Online Security': 'No',
  'Online Backup': 'Yes',
  'Device Protection Plan': 'Yes',
  'Premium Tech Support': 'Yes',
  'Streaming TV': 'Yes',
  'Streaming Movies': 'No',
  'Streaming Music': 'No',
  'Unlimited Data': 'Yes',
  'Contract': 'Month-to-Month',
  'Paperless Billing': 'Yes',
  'Payment Method': 'Bank Withdrawal',
  'Monthly Charge': 72.8,
  'Total Charges': 930.05,
  'Total Refunds': 0.0,
  'Total Extra Data Charges': 0,
  'Total Long Distance Charges': 179.01,
  'Total Revenue': 1109.06},
 {'Gender': 'Female',
  'Age': 73,
  'Married':

In [21]:
import requests

BASE_URL = "http://127.0.0.1:8000"  # adjust if using a route on OpenShift AI


In [22]:
# Health check
resp = requests.get(f"{BASE_URL}/health")
print(resp.status_code, resp.json())


200 {'status': 'ok', 'model_name': 'cli_logreg_test', 'dataset': None}


In [23]:
# Churn predictions
payload = {"records": example_records}
resp = requests.post(f"{BASE_URL}/predict_churn", json=payload)
resp.status_code, resp.json()


(400,
 {'detail': "/predict_churn endpoint requires a model trained on 'commscom_churn' dataset, but current model dataset is 'None'."})

## Summary

In this module, we:

- Used the **MLforEng CLI** to train a CommsCom churn model on real data.
- Saved the model and metadata to `artifacts/pretrained/commscom_rf_baseline/`.
- Loaded the saved artifact in a notebook and evaluated it on a fresh test split.
- Started the **FastAPI** server for that model.
- Called `/predict_churn` with real CommsCom customers and received churn predictions.

You can run this module:
- locally on macOS (Jupyter + venv + Docker/Podman),
- or inside an **OpenShift AI workbench**, with almost no changes.

In later modules, we'll:
- wire this training into **OpenShift AI Pipelines** for automated retraining,
- and integrate with a **fine-tuned Llama 3 + LoRA** service for churn explanations and retention messages.


In [13]:
import requests

resp = requests.get(f"{BASE_URL}/health")
resp.status_code, resp.json()


(200, {'status': 'ok', 'model_name': 'cli_logreg_test', 'dataset': None})

In [14]:
payload = {"records": example_records}
resp = requests.post(f"{BASE_URL}/predict_churn", json=payload)
resp.status_code, resp.json()


(400,
 {'detail': "/predict_churn endpoint requires a model trained on 'commscom_churn' dataset, but current model dataset is 'None'."})