# Healthcare ML Model Monitoring with Databricks Lakehouse Monitoring

This notebook implements comprehensive monitoring for the healthcare insurance risk prediction model using native Databricks Lakehouse Monitoring APIs.

## Monitoring Architecture

1. **Inference Monitoring** - Tracks model predictions and prediction drift
2. **Feature Store Monitoring** - Monitors feature quality and distribution
3. **Baseline Data Monitoring** - Tracks upstream data quality
4. **Custom Healthcare Metrics** - Fairness, bias, and business KPIs

## Setup Requirements

- Databricks SDK >= 0.28.0
- Unity Catalog tables populated with data
- Appropriate permissions to create monitors


In [None]:
# Install/upgrade databricks-sdk
%pip install "databricks-sdk>=0.28.0" --quiet
dbutils.library.restartPython()


## 1. Initialize Monitoring Infrastructure

**Note**: If you've updated the monitoring modules, restart Python to reload them.


In [None]:
import sys
import os

# Add monitoring module to path
notebook_path = dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()
project_root = os.path.dirname(os.path.dirname(notebook_path))
monitoring_path = os.path.join(project_root, "03-monitoring")

if monitoring_path not in sys.path:
    sys.path.insert(0, monitoring_path)

# Import monitoring modules
from lakehouse_monitoring import HealthcareMonitorManager, MonitorRefreshManager, MonitorAnalyzer
from custom_metrics import FairnessMetricsCalculator, BusinessMetricsCalculator, DriftDetector, calculate_all_custom_metrics

print("✓ Monitoring modules imported successfully")


In [None]:
# Configuration
CATALOG = "juan_dev"
SCHEMA = "healthcare_data"

# Get current user email for notifications
user_email = dbutils.notebook.entry_point.getDbutils().notebook().getContext().userName().get()

print(f"Configuration:")
print(f"  Catalog: {CATALOG}")
print(f"  Schema: {SCHEMA}")
print(f"  User: {user_email}")


## 2. Create Lakehouse Monitors

Create monitors for all three tables in the monitoring architecture.


In [None]:
# Initialize the monitor manager
monitor_manager = HealthcareMonitorManager(
    catalog=CATALOG,
    schema=SCHEMA,
    user_email=user_email
)


In [None]:
# Create all monitors with scheduled daily refresh at 6 AM UTC
import json

monitor_results = monitor_manager.create_all_monitors(
    schedule_cron="0 0 6 * * ?",  # Daily at 6 AM UTC
    notification_emails=[user_email]
)

# Display results
print("\nMonitor Creation Results:")
print(json.dumps(monitor_results, indent=2, default=str))


## 3. Trigger Initial Monitor Refresh

Manually trigger the first refresh to generate initial metrics.


In [None]:
# Initialize refresh manager
refresh_manager = MonitorRefreshManager(monitor_manager)

# Refresh all monitors and wait for completion
refresh_results = refresh_manager.refresh_all_monitors(
    wait_for_completion=True,
    timeout_seconds=1800  # 30 minutes timeout per monitor
)

print("\nRefresh Results:")
print(json.dumps(refresh_results, indent=2, default=str))


## 4. Calculate Custom Healthcare Metrics

Compute fairness, bias, and business metrics specific to healthcare.


In [None]:
# Calculate all custom metrics
custom_metrics = calculate_all_custom_metrics(
    spark=spark,
    predictions_table=f"{CATALOG}.{SCHEMA}.ml_patient_predictions",
    baseline_table=f"{CATALOG}.{SCHEMA}.dim_patients",  # Use dim_patients for baseline
    output_schema=f"{CATALOG}.{SCHEMA}",
    catalog=CATALOG,
    schema=SCHEMA
)

print("\n✓ Custom metrics calculated and saved to Unity Catalog")


### Display Fairness, Business, and Drift Metrics


In [None]:
import pandas as pd

# Display all custom metrics
fairness_df = custom_metrics['fairness']
business_df = custom_metrics['business']
drift_df = custom_metrics['drift']

print("=" * 80)
print("FAIRNESS METRICS")
print("=" * 80)
display(fairness_df)

print("\n" + "=" * 80)
print("BUSINESS METRICS")
print("=" * 80)
display(business_df)

print("\n" + "=" * 80)
print("DRIFT ANALYSIS")
print("=" * 80)
display(drift_df)

# Check for alerts
if fairness_df['fairness_threshold_violation'].any():
    print("\n⚠️ FAIRNESS ALERT: Threshold violations detected!")
    
significant_drift = drift_df[drift_df['requires_action'] == True]
if len(significant_drift) > 0:
    print("\n⚠️ DRIFT ALERT: Significant drift detected!")
    print("Columns requiring attention:")
    for _, row in significant_drift.iterrows():
        print(f"  - {row['column_name']}: PSI = {row['psi_score']:.3f}")


## 5. Query Lakehouse Monitor Metrics

Access the metrics tables generated by Databricks Lakehouse Monitoring.


In [None]:
# Initialize analyzer
analyzer = MonitorAnalyzer(monitor_manager, spark)

# Get profile metrics for inference table
inference_profile = analyzer.get_profile_metrics(
    monitor_manager.inference_table,
    limit=20
)

if inference_profile:
    print("Inference Profile Metrics:")
    display(inference_profile)
else:
    print("No profile metrics available yet. Ensure monitor has been refreshed.")


## 6. Generate Executive Monitoring Summary


In [None]:
# Generate comprehensive summary
summary = analyzer.generate_monitoring_summary()

print("\n" + "=" * 80)
print("HEALTHCARE ML MODEL MONITORING SUMMARY")
print("=" * 80)
print(f"\nTimestamp: {summary['timestamp']}")
print(f"\nMonitor Status:")

for monitor_name, monitor_info in summary['monitors'].items():
    print(f"\n{monitor_name.upper()}:")
    for key, value in monitor_info.items():
        print(f"  {key}: {value}")

# Create summary dashboard data
summary_data = {
    "metric_timestamp": pd.Timestamp.now(),
    "total_predictions": int(business_df['total_predictions'].iloc[0]),
    "high_risk_percentage": float(business_df['high_risk_percentage'].iloc[0]),
    "daily_avg_predictions": float(business_df['daily_avg_predictions'].iloc[0]),
    "mean_prediction": float(business_df['mean_prediction'].iloc[0]),
    "fairness_violations": int(fairness_df['fairness_threshold_violation'].sum()),
    "significant_drift_count": int(drift_df['requires_action'].sum()),
    "monitors_active": len([m for m in summary['monitors'].values() if m.get('status') != 'error'])
}

summary_df = pd.DataFrame([summary_data])
print("\n\nMonitoring Summary:")
display(summary_df)

# Save summary to Unity Catalog
summary_spark_df = spark.createDataFrame(summary_df)
summary_spark_df.write.mode("append").saveAsTable(
    f"{CATALOG}.{SCHEMA}.monitoring_summary_history"
)
print(f"\n✓ Summary saved to {CATALOG}.{SCHEMA}.monitoring_summary_history")


## Summary

This notebook has:

1. ✓ Created Lakehouse Monitors for inference, features, and baseline data
2. ✓ Triggered initial monitor refreshes
3. ✓ Calculated custom healthcare-specific metrics (fairness, business, drift)
4. ✓ Analyzed monitoring results and generated alerts
5. ✓ Saved monitoring summary for historical tracking

### Next Steps

- Review the auto-generated Databricks dashboards in the monitor assets directories
- Set up additional custom alerts based on your requirements
- Schedule this notebook to run daily for continuous monitoring
- Integrate monitoring metrics into your MLOps pipeline

### Accessing Monitor Dashboards

Databricks automatically creates interactive dashboards for each monitor. Access them via:
1. Navigate to the table in Catalog Explorer
2. Click on the "Quality" tab
3. View the monitoring dashboard

Or find them in the workspace at:
- `/Workspace/Users/{your_email}/databricks_lakehouse_monitoring/`
