# Calibration and Online Metrics

**Purpose:** Evaluate model calibration and track online performance metrics (MAE, RMSE, Bias).
**Author:** Roo Code
**Date:** 2026-02-16

## Setup
This notebook uses the `ml_heating` package directly.

In [None]:
%load_ext autoreload
%autoreload 2

import sys
import os

# Ensure project root is in path for src imports
project_root = os.path.abspath(os.path.join(os.getcwd(), "../../"))
if project_root not in sys.path:
    sys.path.append(project_root)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta, timezone

# Standard Library Imports
from src import config
from src.analysis import DataLoader, plotting
from src.prediction_metrics import PredictionMetrics

# Configure Plotting
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (12, 6)

## 1. Data Loading
Fetch historical data for performance evaluation. Default is the last 14 days.

In [None]:
# Initialize Loader
loader = DataLoader()

# Define Time Range
end_time = datetime.now(timezone.utc)
start_time = end_time - timedelta(days=14)

print(f"Fetching data from {start_time} to {end_time}...")

# Fetch Data
df = loader.fetch_training_data(
    start_time=start_time,
    end_time=end_time
)

print(f"Loaded {len(df)} rows")
df.head()

## 2. Metrics Calculation
Calculate MAE, RMSE, and Bias for the prediction period.

In [None]:
if not df.empty and 'outlet_temperature' in df.columns and 'ml_target_temperature' in df.columns:
    # Assuming 'ml_target_temperature' is the prediction and 'outlet_temperature' is the actual
    # In a real scenario, we might need to align timestamps or use a specific prediction column
    
    actual = df['outlet_temperature']
    predicted = df['ml_target_temperature'] # Placeholder for prediction
    
    # Calculate Metrics
    mae = np.mean(np.abs(actual - predicted))
    rmse = np.sqrt(np.mean((actual - predicted)**2))
    bias = np.mean(actual - predicted)
    
    print(f"MAE: {mae:.4f}")
    print(f"RMSE: {rmse:.4f}")
    print(f"Bias: {bias:.4f}")
    
    # Rolling Metrics
    rolling_mae = (actual - predicted).abs().rolling(window=48).mean() # 24h window (assuming 30m steps)
    
    plt.figure(figsize=(12, 6))
    rolling_mae.plot(label='Rolling MAE (24h)')
    plt.title('Rolling Mean Absolute Error')
    plt.ylabel('MAE (째C)')
    plt.legend()
    plt.show()
else:
    print("Missing required columns for metrics calculation.")

## 3. Error Distribution
Analyze the distribution of prediction errors.

In [None]:
if not df.empty and 'outlet_temperature' in df.columns and 'ml_target_temperature' in df.columns:
    errors = df['outlet_temperature'] - df['ml_target_temperature']
    
    plt.figure(figsize=(10, 6))
    plt.hist(errors, bins=30, alpha=0.7, color='blue', edgecolor='black')
    plt.title('Prediction Error Distribution')
    plt.xlabel('Error (째C)')
    plt.ylabel('Frequency')
    plt.axvline(0, color='red', linestyle='--')
    plt.show()
else:
    print("No data for error distribution.")

## 4. Calibration Check
Check if the model is well-calibrated (i.e., predicted probabilities match observed frequencies).
For regression, we check if the residuals are independent of the predicted value.

In [None]:
if not df.empty and 'outlet_temperature' in df.columns and 'ml_target_temperature' in df.columns:
    plt.figure(figsize=(10, 6))
    plt.scatter(df['ml_target_temperature'], errors, alpha=0.5)
    plt.title('Residuals vs Predicted Value')
    plt.xlabel('Predicted Value (째C)')
    plt.ylabel('Residuals (째C)')
    plt.axhline(0, color='red', linestyle='--')
    plt.show()
else:
    print("No data for calibration check.")

## 5. Conclusions
Summarize performance findings here.