# 05 – Automated CommsCom Churn Training (Pipeline-Ready)

In this module you will:

1. Use **OpenShift AI Data Science Pipelines** to run an automated training job
   for the CommsCom churn model.
2. Upload the provided pipeline definition
   `pipelines/commscom_churn_pipeline.yaml` into your OpenShift AI project.
3. Create and run a pipeline **Run**.
4. Inspect the **logs** and **output artifacts** (model + metadata) in the
   Pipelines UI.

> You do **not** need to build images or write YAML yourself in this module.
> Everything is prebuilt; you will just use the OpenShift AI UI and this repo.


In [None]:
from pathlib import Path

project_root = Path.cwd()
while project_root.name != "MLforEng" and project_root != project_root.parent:
    project_root = project_root.parent

print("Project root:", project_root)

pipeline_yaml = project_root / "pipelines" / "commscom_churn_pipeline.yaml"
pipeline_yaml, pipeline_yaml.exists()


## Step 1 – Upload the pipeline YAML into OpenShift AI

1. In a new browser tab, open the **OpenShift AI** web console.
2. Go to **Data Science Projects** and select the project prepared for this lab
   (for example: `commscom-ml`).
3. In the left panel, click **Pipelines**.
4. Click **Import pipeline** (or **Create pipeline** → **Upload a file**).
5. In the dialog:
   - Click **Browse** / **Choose file**.
   - Navigate to your workbench home → `MLforEng/pipelines/`.
   - Select `commscom_churn_pipeline.yaml`.
6. Give the pipeline a name, for example:

   - **Name**: `CommsCom Churn Training Pipeline`

7. Click **Create** / **Import**.

If the import is successful, you should see your pipeline listed with the name
`commscom-churn-training-pipeline` and a small graph icon. When you click into
it you will see **one step** called `train-commscom-churn` (or similar).


In [None]:
## Step 2 – Create and run a Pipeline Run

1. In the Pipelines page, click on your newly imported pipeline
   (e.g. `CommsCom Churn Training Pipeline`).
2. Click **Create run** (or **Start a new run**).
3. In the run form:
   - **Run name**: e.g. `commscom-churn-run-<your-initials>`.
   - Under **Parameters**, leave the defaults:
     - `model_family` = `rf`
     - `test_size` = `0.2`
4. Click **Start** / **Create run**.

You will be taken to the **Run details** page.

- You should see a DAG with a single node / step.
- The status will transition: `Pending` → `Running` → `Succeeded`
  (if everything is configured correctly).

> If the run fails (e.g. ImagePull error), check with the instructor:
> usually it means the training image is not accessible or the project does
> not have a Pipelines server / storage connection configured.


In [None]:
## Step 3 – Inspect the training logs

1. In the **Run details** view, click on the step
   (e.g. `train-commscom-churn`).
2. Open the **Logs** tab.

You should see output similar to what you saw when running training locally:

- `=== CommsCom churn pipeline training step ===`
- `Model family: rf`
- A **classification report** for the test set.
- `ROC–AUC: ...`
- `Saved model to /tmp/output/model.joblib`
- `Saved meta to /tmp/output/meta.json`
- `=== Training step complete ===`

This confirms that:

- The pipeline successfully launched your
  `quay.io/.../mlforeng-churn-train` image.
- The training script `mlforeng.pipeline.train_churn_step` ran inside the
  container.
- The model artifacts were written to `/tmp/output` inside the container.


In [None]:
## Step 4 – Inspect the output artifacts (model + metadata)

1. Still on the step details view, switch to the **Artifacts** tab.
2. You should see an artifact named something like `model_dir`.
3. Click on `model_dir`.

You will see the files that were written by the training container and copied
into the pipeline output:

- `model.joblib`
- `meta.json`

These files are actually stored in your project’s **object storage** (MinIO/S3)
via the Data Science Pipelines server configuration. The Pipelines UI gives you
a virtual view into that location.

In a production setup, a second pipeline step or a separate CI/CD job might:

- retrieve this artifact,
- register it in a model registry,
- or update a serving deployment configuration.

For this workshop, the goal is to understand:

- how custom training logic can be packaged as a container,
- how it is orchestrated by OpenShift AI Pipelines,
- and how model artifacts are captured and versioned centrally.


## Optional – Use the pipeline-trained model in local serving

For the purposes of this workshop, the automated training pipeline and the
local serving examples are **separate tracks**:

- The pipeline writes `model.joblib` + `meta.json` to S3/MinIO.
- The FastAPI serving examples expect models under
  `artifacts/pretrained/<model_name>/` inside the repo.

If you want to demonstrate using a pipeline-trained model with the local
FastAPI server, you can:

1. Download `model.joblib` + `meta.json` from the `model_dir` artifact in the
   Pipelines UI.
2. Place them under a new folder, for example:

   ```text
   artifacts/pretrained/commscom_rf_pipeline/
     model.joblib
     meta.json
