# Community Hospital MLOps - Complete Demonstration

End-to-end MLOps workflow showing: data generation, feature store, model training, serving, and monitoring.

**Runtime:** ~5 minutes

**Prerequisites:** `pip install -r requirements.txt`

In [None]:
import sys
from pathlib import Path
project_root = Path.cwd().parent
sys.path.append(str(project_root))

import pandas as pd
import numpy as np
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

print("✓ Setup complete")

## 1. Generate Sample Dataset

Creates 1000 synthetic heart disease patient records with 13 clinical features. Data includes demographics, vital signs, and cardiac measurements suitable for binary classification.

In [None]:
from src.utils.data_generator import HeartDiseaseDataGenerator

generator = HeartDiseaseDataGenerator(n_patients=1000, random_seed=333)
full_data = generator.save_datasets()

print(f"Generated {generator.n_patients} patients across {generator.n_months} months")
print(f"Output: src/feature_store/heart_disease_features/data/")

In [None]:
# Quick data preview
entity_df = pd.read_parquet(project_root / "src/feature_store/heart_disease_features/data/entity_data_sample/monthly_data.parquet")
print(f"Entity data: {entity_df.shape[0]} records, {entity_df.shape[1]} features")
print(f"Columns: {entity_df.columns.tolist()}")

# Check base data
base_df = pd.read_parquet(project_root / "src/feature_store/heart_disease_features/data/base_data/patients.parquet")
print(f"\nBase data: {base_df.shape[0]} records, {base_df.shape[1]} features")
print(f"Columns: {base_df.columns.tolist()}")

entity_df.head(3)

## 2. Feature Store Setup (FEAST)

FEAST provides consistent feature definitions for training and serving. The setup registers feature views and entities. Online materialization (for real-time serving) requires additional configuration and is demonstrated in production deployments.

In [None]:
from feast import FeatureStore
import subprocess

repo_path = project_root / "src/feature_store/heart_disease_features"

# Apply feature definitions to FEAST registry
print("Applying feature definitions to FEAST...")
result = subprocess.run(
    ["feast", "apply"],
    cwd=str(repo_path),
    capture_output=True,
    text=True
)

if result.returncode == 0:
    print("✓ Feature definitions applied")
else:
    print(f"Error applying features: {result.stderr}")

# Initialize feature store
store = FeatureStore(repo_path=str(repo_path))

print(f"\nFeature views: {len(store.list_feature_views())}")
for fv in store.list_feature_views():
    print(f"  - {fv.name}: {len(fv.schema)} features (TTL: {fv.ttl})")

In [None]:
# Materialize features to online store for real-time serving
# NOTE: Materialization requires event_timestamp in all source files
# Skipping for demo - would be: store.materialize_incremental(end_date=datetime.now())
print("✓ Feature definitions registered (materialization skipped for demo)")

In [None]:
# Test online feature retrieval
# NOTE: Online features require materialization first
# Skipping for demo - would retrieve features for real-time predictions
print("Online feature retrieval demonstrated in training/inference cells below")

## 3. Model Training & Promotion

XGBoost classifier with Hyperopt optimization. Includes 5-fold cross-validation, bias detection across protected attributes, and champion/challenger comparison for automated promotion decisions.

In [None]:
from src.models.heart_disease.train import HeartDiseaseTrainer

trainer = HeartDiseaseTrainer()
result = trainer.run_training_pipeline()

print(f"\n✓ Training complete")
print(f"  Run ID: {result['run_id']}")
print(f"  ROC-AUC: {result['metrics']['roc_auc']:.4f}")
print(f"  Accuracy: {result['metrics']['accuracy']:.4f}")
print(f"  F1 Score: {result['metrics']['f1']:.4f}")

In [None]:
def start_mlflow_ui(backend_store_uri="sqlite:///notebooks/src/mlruns/mlflow.db"):
    """
    Command to start MLflow UI using a subprocess to execute: 

    - mlflow ui --port 8080 --backend-store-uri sqlite:///model_registry/mlflow.db
    """
    # mlflow ui --port 8080 --backend-store-uri sqlite:///model_registry/mlflow.db
    command = [
        "mlflow", "ui",
        "--port", "8080",
        "--backend-store-uri", backend_store_uri
    ]

    process = subprocess.Popen(command)
    return process

start_mlflow_ui()
print("MLFlow UI started at http://localhost:8080")

## 4. Model Inference & Serving

Real-time predictions with SHAP explainability. Each prediction includes probability score, risk categorization, and feature contribution analysis showing which clinical factors drove the decision.

In [None]:
from src.models.heart_disease.inference import HeartDiseaseInference

inferencer = HeartDiseaseInference(model_alias="champion")

# Test patient with high-risk profile
test_patient = {
    'age': 62,
    'sex': 1,
    'chest_pain_type': 3,
    'resting_bp': 155,
    'cholesterol': 280,
    'fasting_blood_sugar': 1,
    'resting_ecg': 1,
    'max_heart_rate': 135,
    'exercise_angina': 1,
    'st_depression': 3.2,
    'slope': 2,
    'vessels': 2,
    'thalassemia': 2
}

result = inferencer.predict(test_patient, explain=False)

print(f"Prediction: {result['prediction']} ({result['risk_level']} risk)")
print(f"Probability: {result['probability']:.2%}")
print(f"\nTop Contributing Features:")
for feat in result['explanation']['top_features'][:5]:
    print(f"  - {feat['feature']}: {feat['contribution']:+.4f}")

## 5. Drift & Bias Detection

Continuous monitoring using Evidently for dataset drift and custom metrics for bias. Tracks performance degradation, feature distribution changes, and fairness across demographics. Weekly automated checks trigger retraining when needed.

In [None]:
from src.monitoring.drift_detector import DriftDetector
from src.monitoring.bias_detector import BiasDetector

# Load reference and current data
ref_data = pd.read_parquet(project_root / "src/feature_store/heart_disease_features/data/entity_data_sample/monthly_data.parquet")

# Simulate drift for demo - shift cholesterol and resting_bp distributions
current_data = ref_data.copy()
current_data['cholesterol'] = (current_data['cholesterol'] + np.random.normal(20, 10, len(current_data))).clip(100, 400).astype(int)
current_data['resting_bp'] = (current_data['resting_bp'] + np.random.normal(10, 5, len(current_data))).clip(80, 200).astype(int)

# Detect drift
drift_detector = DriftDetector(drift_threshold=0.1)
feature_cols = [col for col in ref_data.columns if col not in ['patient_id', 'month_key']]

drift_results = drift_detector.detect_dataset_drift(
    reference_data=ref_data[feature_cols],
    current_data=current_data[feature_cols]
)

print("Drift Detection:")
print(f"  Drift Detected: {drift_results['drift_detected']}")
print(f"  Drift Score: {drift_results['drift_score']:.4f}")
print(f"  Drifted Features: {drift_results['n_drifted_features']}/{drift_results['n_features_checked']}")
if drift_results['drifted_features']:
    print(f"  Features: {drift_results['drifted_features']}")

In [None]:
# Bias monitoring
labels = pd.read_parquet(project_root / "src/feature_store/heart_disease_features/data/label_data/outcomes.parquet")
base_data = pd.read_parquet(project_root / "src/feature_store/heart_disease_features/data/base_data/patients.parquet")

# Merge entity data with base data (for age/sex) and labels
test_data = ref_data.merge(base_data[['patient_id', 'age', 'sex']], on='patient_id')
test_data = test_data.merge(labels, on=['patient_id', 'month_key']).sample(200, random_state=42)

X_test = test_data[feature_cols]
y_test = test_data['heart_disease'].values

# Batch predict
y_pred = []
y_proba = []
for _, row in X_test.iterrows():
    res = inferencer.predict(row.to_dict(), explain=False)
    y_pred.append(res['prediction'])
    y_proba.append(res['probability'])

# Create protected attributes
protected_data = pd.DataFrame({
    'sex': test_data['sex'].values,
    'age_group': pd.cut(test_data['age'], bins=[0, 40, 60, 100], labels=['young', 'middle', 'senior'])
})

bias_detector = BiasDetector(protected_attributes=['sex', 'age_group'], threshold=0.1)
bias_results = bias_detector.analyze_bias(y_test, np.array(y_pred), np.array(y_proba), protected_data)

print(f"\nBias Analysis:")
print(f"  Fair: {bias_results['overall_summary']['fair']}")
print(f"  Violations: {bias_results['overall_summary']['total_violations']}")

## 6. CI/CD & DevOps Overview

GitHub Actions workflows automate testing, training, and deployment. Workflows include: CI pipeline (linting, tests), scheduled model training, champion/challenger promotion with approval gates, and weekly drift detection alerts.

In [None]:
# List CI/CD workflows
workflows_dir = project_root / ".github/workflows"
workflows = list(workflows_dir.glob("*.yml")) if workflows_dir.exists() else []

print("GitHub Actions Workflows:")
for wf in workflows:
    print(f"  - {wf.name}")

print("\nWorkflow Descriptions:")
print("  ci.yml: Linting, testing, data validation on every PR")
print("  model-training.yml: Weekly automated model retraining")
print("  model-promotion.yml: Champion/challenger promotion with gates")
print("  drift-detection.yml: Weekly drift monitoring and alerting")

## 7. Documentation & Next Steps

**Deployment Options:**
- Local: SQLite + Parquet (current)
- Cloud: MLflow on PostgreSQL, FEAST with Redis/BigQuery, model serving via FastAPI/Azure ML

**Scaling to 100 Models:**
Same structure applies to any model: `src/models/model_name/` with train.py, inference.py, config.yaml. Shared utilities (mlflow_utils, feast_utils) enable consistent patterns across all models.

**Persona Guides:**
- Data Scientist: docs/PERSONAS/DATA_SCIENTIST.md
- Data Engineer: docs/PERSONAS/DATA_ENGINEER.md
- DevOps: docs/PERSONAS/DEVOPS.md
- Business: docs/PERSONAS/BUSINESS.md

**View Dashboard:**
```bash
streamlit run src/dashboard/app.py
```

**Architecture:** See docs/ARCHITECTURE.md for complete technical design

In [None]:
print("✓ Demo Complete")
print("\nYou've seen:")
print("  1. Sample dataset generation")
print("  2. Feature store (FEAST)")
print("  3. Model training & promotion")
print("  4. Model serving & inference")
print("  5. Drift & bias detection")
print("  6. CI/CD processes")
print("  7. Documentation & scaling patterns")