
# Poverty Classifier System Demo 

Learning Team 1 (MSDS 2025 PT-B): 
- Asiado, Jian 
- Dorado, Joshua
- Fajardo, Jethro

### **About**

This notebook demonstrates how to interact with **Poverty Classifier** and provides documentations of the major components end‑to‑end.

This classifier was trained on Malawi Integrated Household Living Conditions Survey (IHS) 2010–2011 by the Malawi National Statistical Office (NSO). 

Source: https://microdata.worldbank.org/index.php/catalog/3016/study-description

**Components overview**
- **Airflow** orchestrates five main pipelines 
1. Data Download and Extraction
2. Data Preprocessing
3. Training 
4. Model Evaluation 
5. Drift Detection 

- **MLflow** tracks experiments (params, metrics, artifacts) and manages the **Model Registry**.
In the experimentation tracking, there are five ML models that are tested simultaneously: 
    - KNN
    - Logistic Regression
    - Random Forest
    - XGBoost

- **FastAPI** exposes the **`/predict`** and **`/model`** endpoints for online inference and model introspection.
- **Evidently AI** generates **data/drift** and **performance monitoring** reports; reports are logged to **MLflow artifacts**.



## 1) Setup Instructions

> Run everything with Docker. Adjust ports as needed in your `docker-compose.yml`.

**Start the stack**
```bash
docker compose up build
```

For next runs, 
```bash
docker compose up -d
```

**Default URLs**
- Airflow UI: <http://localhost:8080>
- MLflow UI: <http://localhost:5000>
- FastAPI service (inference): <http://localhost:8000>
- (Optional) Evidently Monitoring UI (if used): <http://localhost:8501>

> Tip: Confirm services are healthy before running the cells below.


In [None]:

# (Optional) If running outside Docker, ensure these are installed.
# !pip install requests mlflow pandas ipython
# Evidently is only needed if you plan to generate reports locally:
# !pip install evidently

print("Environment ready. If you're running in Docker, dependencies should already be baked in.")



## 2) Configuration

The following are the list of base URLs for all services.


In [None]:

import os

# ---- Inference / API ----
FASTAPI_BASE = os.getenv("FASTAPI_BASE", "http://localhost:8000")

# ---- MLflow ----
MLFLOW_TRACKING_URI = os.getenv("MLFLOW_TRACKING_URI", "http://localhost:5000")
MLFLOW_EXPERIMENT_NAME = os.getenv("MLFLOW_EXPERIMENT_NAME", "Default")
MLFLOW_MODEL_NAME = os.getenv("MLFLOW_MODEL_NAME", "my_model")  # name in Model Registry
MLFLOW_DRIFT_REPORT_PATH = os.getenv("MLFLOW_DRIFT_REPORT_PATH", "reports/drift_report.html")  # relative artifact path

FASTAPI_PREDICT_URL = f"{FASTAPI_BASE}/predict"
FASTAPI_MODEL_URL = f"{FASTAPI_BASE}/model"

print("FASTAPI_PREDICT_URL:", FASTAPI_PREDICT_URL)
print("FASTAPI_MODEL_URL  :", FASTAPI_MODEL_URL)
print("MLFLOW_TRACKING_URI:", MLFLOW_TRACKING_URI)
print("Experiment name     :", MLFLOW_EXPERIMENT_NAME)
print("Registry model name :", MLFLOW_MODEL_NAME)
print("Drift report relpath:", MLFLOW_DRIFT_REPORT_PATH)



## 3) Prediction Request

Simulate a client making a **POST** request to the `/predict` endpoint.
- Replace `sample_payload` with your model's expected schema.
- The server should respond with a JSON payload containing prediction.
    - 1 if poor
    - 0 if not poor


In [None]:

import json
import requests

# Example payload — adjust fields to match your FastAPI schema
sample_payload = {
    "instances": [
        {
        "hid": "101010160009",
        "iid": 1,
        "ind_sex": 2,
        "ind_relation": 1,
        "ind_age": 31,
        "ind_educfath": 2.0,
        "ind_educmoth": 1.0,
        "ind_language": 11.0,
        "ind_religion": 3.0,
        "ind_marital": 1.0,
        "ind_readwrite": 1,
        "ind_rwchichewa": 1.0,
        "ind_rwenglish": 1.0,
        "ind_educ01": 1.0,
        "ind_educ02": 0,
        "ind_educ03": 8.0,
        "ind_educ04": 2.0,
        "ind_educ05": 0.0,
        "ind_educ06": 0,
        "ind_educ07": 0.0,
        "ind_educ08": 2.0,
        "ind_educ09": 0,
        "ind_educ10": 0,
        "ind_educ11": 0,
        "ind_educ12": 0,
        "ind_health1": 0.0,
        "ind_health2": 0,
        "ind_health3": 0.0,
        "ind_health4": 0,
        "ind_health5": 0.0,
        "ind_health6": 0,
        "ind_health7": 0,
        "ind_health8": 0,
        "ind_breakfast": 0,
        "ind_birthplace": 1.0,
        "ind_birthattend": 2.0,
        "ind_work1": 0.0,
        "ind_work2": 1.0,
        "ind_work3": 0,
        "ind_work4": 0,
        "ind_work5": 0,
        "ind_work6": 0.0,
        "wta_hh": 126.56
        }
    ]
}

try:
    resp = requests.post(FASTAPI_PREDICT_URL, json=sample_payload, timeout=10)
    resp.raise_for_status()
    print("Status:", resp.status_code)
    print("Response JSON:")
    print(json.dumps(resp.json(), indent=2))
except requests.exceptions.RequestException as e:
    print("\n[Prediction request failed]")
    print("Make sure the FastAPI service is running and the endpoint schema matches the payload.")
    print("Error:", e)
    print("Troubleshooting: check Docker logs, ports, and your service health.")


### 🔹  Response (from `/predict`)

The API returns predictions in the following JSON format:

```json
{
  "predictions": [
    {
      "hid": "101010160009",
      "label": 1
    }
  ]
}



## 4) Model Information Retrieval

This cell calls the `/model` endpoint to fetch details like:
- **model name/version**
- **hyperparameters**
- **training metadata** (e.g., run ID, timestamp)

Use the response to document what the production model is and how it was trained.


In [None]:

try:
    resp = requests.get(FASTAPI_MODEL_URL, timeout=10)
    resp.raise_for_status()
    print("Status:", resp.status_code)
    model_info = resp.json()
    print(json.dumps(model_info, indent=2))
    # Example: Explain a field if present
    if isinstance(model_info, dict) and "hyperparameters" in model_info:
        hp = model_info.get("hyperparameters", {})
        if "max_depth" in hp:
            print("\nExplanation: 'max_depth' controls the maximum depth of each decision tree.")
except requests.exceptions.RequestException as e:
    print("\n[Model info request failed]")
    print("Ensure FastAPI is reachable and the /model endpoint is implemented.")
    print("Error:", e)



## 5) Drift Detection Demonstration

In this section, we fetch the **Evidently AI drift report** logged as an artifact in MLflow and render it inline.
- We search the latest successful run (or the latest model in the **Model Registry**) and attempt to download `reports/drift_report.html`.
- If not found, we display guidance to generate/log a report in your pipeline.


In [None]:

import mlflow
from IPython.display import IFrame, display, HTML
import os

mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)

def _find_latest_run_id_from_experiment(experiment_name: str):
    exp = mlflow.get_experiment_by_name(experiment_name)
    if exp is None:
        return None
    runs = mlflow.search_runs(experiment_ids=[exp.experiment_id], order_by=["start_time DESC"], max_results=50)
    if runs is None or runs.empty:
        return None
    # Return the most recent finished run
    for _, row in runs.iterrows():
        if row.get("status", "FINISHED") in ("FINISHED", "SCHEDULED", "RUNNING"):
            return row["run_id"]
    return runs.iloc[0]["run_id"]

def _find_latest_run_id_from_registry(model_name: str):
    try:
        client = mlflow.tracking.MlflowClient()
        # Prefer Production, then Staging, else most recent version
        for stage in ["Production", "Staging"]:
            versions = client.get_latest_versions(model_name, stages=[stage])
            if versions:
                return versions[0].run_id
        # Fallback: most recent version by last_updated_timestamp
        versions = client.search_model_versions(f"name='{model_name}'")
        if versions:
            latest = sorted(versions, key=lambda v: v.last_updated_timestamp, reverse=True)[0]
            return latest.run_id
    except Exception:
        return None
    return None

run_id = _find_latest_run_id_from_registry(MLFLOW_MODEL_NAME) or _find_latest_run_id_from_experiment(MLFLOW_EXPERIMENT_NAME)

if run_id is None:
    display(HTML("<b>No MLflow run found.</b> Make sure experiments are logged and the Model Registry has versions."))
else:
    print("Using MLflow run_id:", run_id)
    # Try to download the drift report artifact
    try:
        local_path = mlflow.artifacts.download_artifacts(run_id=run_id, artifact_path=MLFLOW_DRIFT_REPORT_PATH)
        if os.path.exists(local_path):
            print("Drift report downloaded to:", local_path)
            # Render in an iframe
            display(IFrame(src=local_path, width="100%", height=600))
        else:
            display(HTML(f"<b>Artifact not found:</b> {MLFLOW_DRIFT_REPORT_PATH}. Check your pipeline artifact paths."))
    except Exception as e:
        display(HTML(f"<b>Failed to download artifacts:</b> {e}. Confirm MLflow URI, run permissions, and artifact path."))



## 6) Interpreting the Drift Report 

Proposed Summary of Findings:

- **Data drift**: e.g., 4/20 features show significant drift (KS-test p < 0.05).
- **Prediction drift**: Production prediction distribution shifted right vs. training.
- **Target drift**  If accuracy decreased from 0.88 → 0.81 week-over-week.
- **Next steps**: Retrain with recent data; re-check calibration; update data validation rules.



## 7) Reproducibility Notes

- This notebook is designed to **Run All** from a clean kernel.
- If you changed configuration (ports, model name, or experiment), update the **Configuration** cell and **restart & run all**.
- Version pinning (example): keep `mlflow`, `fastapi`, `evidently`, and model framework versions aligned with your Docker images.



## 8) Troubleshooting

- **Connection errors**: Verify containers are healthy (`docker compose ps`) and ports aren’t blocked.
- **404/422 from `/predict`**: Ensure your request schema matches the FastAPI model input.
- **MLflow artifacts missing**: Confirm your training pipeline logs the Evidently HTML report (e.g., `reports/drift_report.html`).  
- **Authentication**: If your MLflow server requires auth, set the appropriate env vars/tokens.
