# River ML Regression Metrics Investigation

## For Estimated Time of Arrival (ETA) Project

This notebook investigates all available regression metrics in River ML and determines the best configuration for the ETA prediction project.

### Research Sources:
- [River ML Documentation](https://riverml.xyz/latest/api/metrics/)
- [Ten quick tips for improving ETA predictions](https://peerj.com/articles/cs-3259/)
- [A review of travel and arrival-time prediction methods](https://pmc.ncbi.nlm.nih.gov/articles/PMC8444094/)

### Key Findings from Literature:
- **Most used metrics combination**: MAE + RMSE + MAPE (used in 15-23% of ETA studies)
- **Primary metric**: MAE (used in 33% of studies)
- **Secondary metrics**: RMSE (22%), MAPE (18%), R² (5%)
- **Best practice**: Use MAE, RMSE, and MAPE together for optimal ETA evaluation

In [1]:
from river import metrics, utils
import inspect

## 1. List All Available Metrics in River

In [2]:
# Get all items from river.metrics
all_metrics = [item for item in dir(metrics) if not item.startswith('_')]
print(f"Total items in river.metrics: {len(all_metrics)}")
print("\nAll metrics:")
for m in sorted(all_metrics):
    obj = getattr(metrics, m)
    if inspect.isclass(obj):
        print(f"  {m} (class)")

Total items in river.metrics: 79

All metrics:
  Accuracy (class)
  AdjustedMutualInfo (class)
  AdjustedRand (class)
  BalancedAccuracy (class)
  ClassificationReport (class)
  CohenKappa (class)
  Completeness (class)
  ConfusionMatrix (class)
  CrossEntropy (class)
  F1 (class)
  FBeta (class)
  FowlkesMallows (class)
  GeometricMean (class)
  Homogeneity (class)
  Jaccard (class)
  LogLoss (class)
  MAE (class)
  MAPE (class)
  MCC (class)
  MSE (class)
  MacroF1 (class)
  MacroFBeta (class)
  MacroJaccard (class)
  MacroPrecision (class)
  MacroRecall (class)
  MicroF1 (class)
  MicroFBeta (class)
  MicroJaccard (class)
  MicroPrecision (class)
  MicroRecall (class)
  MultiFBeta (class)
  MutualInfo (class)
  NormalizedMutualInfo (class)
  Precision (class)
  R2 (class)
  RMSE (class)
  RMSLE (class)
  ROCAUC (class)
  Rand (class)
  Recall (class)
  RollingROCAUC (class)
  SMAPE (class)
  Silhouette (class)
  VBeta (class)
  WeightedF1 (class)
  WeightedFBeta (class)
  WeightedJa

## 2. Identify Regression Metrics

Regression metrics are used for continuous value prediction (like travel time in seconds).

In [3]:
# Regression metrics available in River
REGRESSION_METRICS = [
    'MAE',      # Mean Absolute Error
    'MAPE',     # Mean Absolute Percentage Error
    'MSE',      # Mean Squared Error
    'R2',       # Coefficient of Determination
    'RMSE',     # Root Mean Squared Error
    'RMSLE',    # Root Mean Squared Logarithmic Error
    'SMAPE',    # Symmetric Mean Absolute Percentage Error
]

print("Regression Metrics for ETA Prediction:")
print("=" * 50)
for metric_name in REGRESSION_METRICS:
    metric_class = getattr(metrics, metric_name)
    print(f"\n{metric_name}:")
    print(f"  Signature: {metric_name}{inspect.signature(metric_class)}")
    print(f"  bigger_is_better: {metric_class().bigger_is_better}")

Regression Metrics for ETA Prediction:

MAE:
  Signature: MAE()
  bigger_is_better: False

MAPE:
  Signature: MAPE()
  bigger_is_better: False

MSE:
  Signature: MSE()
  bigger_is_better: False

R2:
  Signature: R2()
  bigger_is_better: True

RMSE:
  Signature: RMSE()
  bigger_is_better: False

RMSLE:
  Signature: RMSLE()
  bigger_is_better: False

SMAPE:
  Signature: SMAPE()
  bigger_is_better: False


## 3. Detailed Analysis of Each Regression Metric

In [4]:
# MAE - Mean Absolute Error
print("MAE - Mean Absolute Error")
print("=" * 50)
print(metrics.MAE.__doc__)
print("\nFormula: MAE = (1/n) * Σ|y_true - y_pred|")
print("Use case: Primary metric for ETA - interpretable in same units as target (seconds)")
print("Optimal: Lower is better")

MAE - Mean Absolute Error
Mean absolute error.

Examples
--------

>>> from river import metrics

>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]

>>> metric = metrics.MAE()

>>> for yt, yp in zip(y_true, y_pred):
...     metric.update(yt, yp)
...     print(metric.get())
0.5
0.5
0.333
0.5

>>> metric
MAE: 0.5



Formula: MAE = (1/n) * Σ|y_true - y_pred|
Use case: Primary metric for ETA - interpretable in same units as target (seconds)
Optimal: Lower is better


In [5]:
# RMSE - Root Mean Squared Error
print("RMSE - Root Mean Squared Error")
print("=" * 50)
print(metrics.RMSE.__doc__)
print("\nFormula: RMSE = sqrt((1/n) * Σ(y_true - y_pred)²)")
print("Use case: Penalizes large errors more than MAE - important for ETA")
print("Optimal: Lower is better")

RMSE - Root Mean Squared Error
Root mean squared error.

Examples
--------

>>> from river import metrics

>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]

>>> metric = metrics.RMSE()
>>> for yt, yp in zip(y_true, y_pred):
...     metric.update(yt, yp)
...     print(metric.get())
0.5
0.5
0.408248
0.612372

>>> metric
RMSE: 0.612372



Formula: RMSE = sqrt((1/n) * Σ(y_true - y_pred)²)
Use case: Penalizes large errors more than MAE - important for ETA
Optimal: Lower is better


In [6]:
# MAPE - Mean Absolute Percentage Error
print("MAPE - Mean Absolute Percentage Error")
print("=" * 50)
print(metrics.MAPE.__doc__)
print("\nFormula: MAPE = (100/n) * Σ|y_true - y_pred| / |y_true|")
print("Use case: Scale-independent metric - useful for comparing across different trip lengths")
print("Optimal: Lower is better")
print("Warning: Undefined when y_true = 0")

MAPE - Mean Absolute Percentage Error
Mean absolute percentage error.

Examples
--------

>>> from river import metrics

>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]

>>> metric = metrics.MAPE()
>>> for yt, yp in zip(y_true, y_pred):
...     metric.update(yt, yp)

>>> metric
MAPE: 32.738095



Formula: MAPE = (100/n) * Σ|y_true - y_pred| / |y_true|
Use case: Scale-independent metric - useful for comparing across different trip lengths
Optimal: Lower is better


In [7]:
# SMAPE - Symmetric Mean Absolute Percentage Error
print("SMAPE - Symmetric Mean Absolute Percentage Error")
print("=" * 50)
print(metrics.SMAPE.__doc__)
print("\nFormula: SMAPE = (100/n) * Σ|y_true - y_pred| / ((|y_true| + |y_pred|) / 2)")
print("Use case: More robust than MAPE - bounded between 0% and 200%")
print("Optimal: Lower is better")

SMAPE - Symmetric Mean Absolute Percentage Error
Symmetric mean absolute percentage error.

Examples
--------

>>> from river import metrics

>>> y_true = [0, 0.07533, 0.07533, 0.07533, 0.07533, 0.07533, 0.07533, 0.0672, 0.0672]
>>> y_pred = [0, 0.102, 0.107, 0.047, 0.1, 0.032, 0.047, 0.108, 0.089]

>>> metric = metrics.SMAPE()
>>> for yt, yp in zip(y_true, y_pred):
...     metric.update(yt, yp)

>>> metric
SMAPE: 37.869392



Formula: SMAPE = (100/n) * Σ|y_true - y_pred| / ((|y_true| + |y_pred|) / 2)
Use case: More robust than MAPE - bounded between 0% and 200%
Optimal: Lower is better


In [8]:
# R2 - Coefficient of Determination
print("R² - Coefficient of Determination")
print("=" * 50)
print(metrics.R2.__doc__)
print("\nFormula: R² = 1 - (SS_res / SS_tot)")
print("Use case: Shows proportion of variance explained by the model")
print("Optimal: Higher is better (1.0 = perfect, 0.0 = baseline, negative = worse than baseline)")

R² - Coefficient of Determination
Coefficient of determination ($R^2$) score

The coefficient of determination, denoted $R^2$ or $r^2$, is the proportion
of the variance in the dependent variable that is predictable from the
independent variable(s). [^1]

Best possible score is 1.0 and it can be negative (because the model can be
arbitrarily worse). A constant model that always predicts the expected
value of $y$, disregarding the input features, would get a $R^2$ score of
0.0.

$R^2$ is not defined when less than 2 samples have been observed. This
implementation returns 0.0 in this case.

Examples
--------
>>> from river import metrics

>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]

>>> metric = metrics.R2()

>>> for yt, yp in zip(y_true, y_pred):
...     metric.update(yt, yp)
...     print(metric.get())
0.0
0.9183
0.9230
0.9486

References
----------
[^1]: [Coefficient of determination (Wikipedia)](https://en.wikipedia.org/wiki/Coefficient_of_determination)



Formula: R²

In [9]:
# MSE - Mean Squared Error
print("MSE - Mean Squared Error")
print("=" * 50)
print(metrics.MSE.__doc__)
print("\nFormula: MSE = (1/n) * Σ(y_true - y_pred)²")
print("Use case: Heavily penalizes large errors - squared units")
print("Optimal: Lower is better")

MSE - Mean Squared Error
Mean squared error.

Examples
--------

>>> from river import metrics

>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]

>>> metric = metrics.MSE()

>>> for yt, yp in zip(y_true, y_pred):
...     metric.update(yt, yp)
...     print(metric.get())
0.25
0.25
0.1666
0.375



Formula: MSE = (1/n) * Σ(y_true - y_pred)²
Use case: Heavily penalizes large errors - squared units
Optimal: Lower is better


In [10]:
# RMSLE - Root Mean Squared Logarithmic Error
print("RMSLE - Root Mean Squared Logarithmic Error")
print("=" * 50)
print(metrics.RMSLE.__doc__)
print("\nFormula: RMSLE = sqrt((1/n) * Σ(log(y_pred + 1) - log(y_true + 1))²)")
print("Use case: Useful when targets span several orders of magnitude")
print("Optimal: Lower is better")
print("Note: Less sensitive to large errors than RMSE")

RMSLE - Root Mean Squared Logarithmic Error
Root mean squared logarithmic error.

Examples
--------

>>> from river import metrics

>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]

>>> metric = metrics.RMSLE()
>>> for yt, yp in zip(y_true, y_pred):
...     metric.update(yt, yp)

>>> metric
RMSLE: 0.357826



Formula: RMSLE = sqrt((1/n) * Σ(log(y_pred + 1) - log(y_true + 1))²)
Use case: Useful when targets span several orders of magnitude
Optimal: Lower is better
Note: Less sensitive to large errors than RMSE


## 4. Rolling Metrics for Concept Drift Detection

River provides a `Rolling` wrapper in `river.utils` to compute metrics over a sliding window. This is useful for detecting concept drift in streaming data.

In [11]:
# Rolling wrapper documentation
print("Rolling Wrapper (river.utils.Rolling)")
print("=" * 50)
print(utils.Rolling.__doc__)
print(f"\nSignature: {inspect.signature(utils.Rolling)}")

Rolling Wrapper (river.utils.Rolling)
A generic wrapper for performing rolling computations.

This can be wrapped around any object which implements both an `update` and a `revert` method.
Inputs to `update` are stored in a queue. Elements of the queue are popped when the window is
full.

Parameters
----------
obj
    An object that implements both an `update` method and a `rolling `method.
window_size
    Size of the window.

Examples
--------

For instance, here is how you can compute a rolling average over a window of size 3:

>>> from river import stats, utils

>>> X = [1, 3, 5, 7]
>>> rmean = utils.Rolling(stats.Mean(), window_size=3)

>>> for x in X:
...     rmean.update(x)
...     print(rmean.get())
1.0
2.0
3.0
5.0



Signature: (obj: 'Rollable', window_size: 'int')


In [12]:
# Example: Rolling MAE
print("Example: Rolling MAE with window_size=1000")
print("=" * 50)

rolling_mae = utils.Rolling(metrics.MAE(), window_size=1000)
print(f"Type: {type(rolling_mae)}")
print(f"Window size: 1000 samples")
print("\nUsage:")
print("  rolling_mae.update(y_true, y_pred)")
print("  rolling_mae.get()  # Returns MAE over last 1000 samples")

Example: Rolling MAE with window_size=1000
Type: <class 'river.utils.rolling.Rolling'>
Window size: 1000 samples

Usage:
  rolling_mae.update(y_true, y_pred)
  rolling_mae.get()  # Returns MAE over last 1000 samples


In [13]:
# Test rolling metrics with sample data
import random

# Create rolling metrics
rolling_mae = utils.Rolling(metrics.MAE(), window_size=100)
rolling_rmse = utils.Rolling(metrics.RMSE(), window_size=100)

# Simulate ETA predictions (travel time in seconds)
print("Simulating ETA predictions (window_size=100):")
print("-" * 50)

for i in range(200):
    y_true = random.uniform(300, 3600)  # 5 minutes to 1 hour
    y_pred = y_true + random.uniform(-300, 300)  # ±5 minutes error
    
    rolling_mae.update(y_true, y_pred)
    rolling_rmse.update(y_true, y_pred)
    
    if (i + 1) % 50 == 0:
        print(f"After {i+1} samples:")
        print(f"  Rolling MAE:  {rolling_mae.get():.2f} seconds")
        print(f"  Rolling RMSE: {rolling_rmse.get():.2f} seconds")

Simulating ETA predictions (window_size=100):
--------------------------------------------------
After 50 samples:
  Rolling MAE:  154.44 seconds
  Rolling RMSE: 178.72 seconds
After 100 samples:
  Rolling MAE:  153.51 seconds
  Rolling RMSE: 175.67 seconds
After 150 samples:
  Rolling MAE:  150.20 seconds
  Rolling RMSE: 172.79 seconds
After 200 samples:
  Rolling MAE:  141.87 seconds
  Rolling RMSE: 165.99 seconds


## 5. Recommended Metrics Configuration for ETA

Based on research and best practices for ETA prediction:

### Primary Metrics (for MLflow logging):
1. **MAE** - Most interpretable, same units as target (seconds)
2. **RMSE** - Penalizes large errors, important for user experience
3. **MAPE** - Scale-independent, useful for different trip lengths

### Secondary Metrics:
4. **R²** - Goodness of fit measure
5. **SMAPE** - More robust than MAPE when y_true is small
6. **MSE** - For model optimization (squared error)
7. **RMSLE** - For log-scale comparison

### Rolling Metrics (for drift detection):
- **RollingMAE** (window_size=1000) - Detect recent performance changes

In [14]:
# Final recommended configuration for ETA
print("RECOMMENDED METRICS CONFIGURATION FOR ETA")
print("=" * 60)
print("""
# =============================================================================
# REGRESSION METRICS CONFIGURATION (Research-based optimal setup for ETA)
# =============================================================================
# Sources:
#   - River ML Documentation: https://riverml.xyz/dev/api/metrics/
#   - ETA Prediction Best Practices: https://peerj.com/articles/cs-3259/
#   - Travel Time Prediction Review: https://pmc.ncbi.nlm.nih.gov/articles/PMC8444094/
# =============================================================================

# PRIMARY METRICS (most important for ETA prediction)
primary_metrics = {
    "MAE": metrics.MAE(),      # Mean Absolute Error (seconds) - Primary metric
    "RMSE": metrics.RMSE(),    # Root Mean Squared Error - Penalizes large errors
    "MAPE": metrics.MAPE(),    # Mean Absolute Percentage Error - Scale-independent
}

# SECONDARY METRICS (additional insights)
secondary_metrics = {
    "R2": metrics.R2(),        # Coefficient of Determination - Goodness of fit
    "SMAPE": metrics.SMAPE(),  # Symmetric MAPE - More robust when y_true is small
    "MSE": metrics.MSE(),      # Mean Squared Error - For optimization
    "RMSLE": metrics.RMSLE(),  # Root Mean Squared Log Error - Log-scale comparison
}

# ROLLING METRICS (for concept drift detection)
# Window size: 1000 samples provides stable estimates while detecting drift
rolling_metrics = {
    "RollingMAE": utils.Rolling(metrics.MAE(), window_size=1000),
}

# BEST MODEL SELECTION CRITERION
# For ETA: Minimize MAE (lower is better)
BEST_METRIC_CRITERIA = {
    "Estimated Time of Arrival": ("MAE", "minimize"),
}
""")

RECOMMENDED METRICS CONFIGURATION FOR ETA

# REGRESSION METRICS CONFIGURATION (Research-based optimal setup for ETA)
# Sources:
#   - River ML Documentation: https://riverml.xyz/dev/api/metrics/
#   - ETA Prediction Best Practices: https://peerj.com/articles/cs-3259/
#   - Travel Time Prediction Review: https://pmc.ncbi.nlm.nih.gov/articles/PMC8444094/

# PRIMARY METRICS (most important for ETA prediction)
primary_metrics = {
    "MAE": metrics.MAE(),      # Mean Absolute Error (seconds) - Primary metric
    "RMSE": metrics.RMSE(),    # Root Mean Squared Error - Penalizes large errors
    "MAPE": metrics.MAPE(),    # Mean Absolute Percentage Error - Scale-independent
}

# SECONDARY METRICS (additional insights)
secondary_metrics = {
    "R2": metrics.R2(),        # Coefficient of Determination - Goodness of fit
    "SMAPE": metrics.SMAPE(),  # Symmetric MAPE - More robust when y_true is small
    "MSE": metrics.MSE(),      # Mean Squared Error - For optimization
    "RMSLE": metrics.RM

## 6. Metrics Summary Table

In [15]:
# Create summary table
print("\nMETRICS SUMMARY TABLE FOR ETA PREDICTION")
print("=" * 80)
print(f"{'Metric':<15} {'Type':<12} {'Optimal':<12} {'Use Case':<40}")
print("-" * 80)

metrics_info = [
    ("MAE", "Primary", "Lower", "Main metric - interpretable in seconds"),
    ("RMSE", "Primary", "Lower", "Penalizes large errors (late arrivals)"),
    ("MAPE", "Primary", "Lower", "Scale-independent percentage error"),
    ("R²", "Secondary", "Higher", "Variance explained (0-1 scale)"),
    ("SMAPE", "Secondary", "Lower", "Robust percentage error (0-200%)"),
    ("MSE", "Secondary", "Lower", "Squared error for optimization"),
    ("RMSLE", "Secondary", "Lower", "Log-scale error comparison"),
    ("RollingMAE", "Drift", "Lower", "Windowed MAE for drift detection"),
]

for metric, mtype, optimal, use_case in metrics_info:
    print(f"{metric:<15} {mtype:<12} {optimal:<12} {use_case:<40}")


METRICS SUMMARY TABLE FOR ETA PREDICTION
Metric          Type         Optimal      Use Case                                
--------------------------------------------------------------------------------
MAE             Primary      Lower        Main metric - interpretable in seconds  
RMSE            Primary      Lower        Penalizes large errors (late arrivals)  
MAPE            Primary      Lower        Scale-independent percentage error      
R²              Secondary    Higher       Variance explained (0-1 scale)          
SMAPE           Secondary    Lower        Robust percentage error (0-200%)        
MSE             Secondary    Lower        Squared error for optimization          
RMSLE           Secondary    Lower        Log-scale error comparison              
RollingMAE      Drift        Lower        Windowed MAE for drift detection        


## 7. Implementation Code for ETA Training Script

This is the recommended metrics configuration to be added to `estimated_time_of_arrival_river.py`:

In [16]:
print("""
# =============================================================================
# METRICS CONFIGURATION (Research-based optimal args for ETA)
# =============================================================================
# Sources:
#   - River ML Documentation: https://riverml.xyz/dev/api/metrics/
#   - ETA Prediction Best Practices: https://peerj.com/articles/cs-3259/
#   - Travel Time Prediction Review: https://pmc.ncbi.nlm.nih.gov/articles/PMC8444094/
#
# Key insights:
#   - MAE + RMSE + MAPE is the most common metric combination (15-23% of studies)
#   - MAE is the primary metric (used in 33% of ETA studies)
#   - Rolling metrics help detect concept drift (traffic pattern changes)
# =============================================================================

from river import metrics, utils

# -----------------------------------------------------------------------------
# REGRESSION METRICS (all metrics have no configurable parameters)
# -----------------------------------------------------------------------------
regression_metric_classes = {
    # PRIMARY METRICS (most important for ETA)
    "MAE": metrics.MAE,        # Mean Absolute Error - Primary metric
    "RMSE": metrics.RMSE,      # Root Mean Squared Error - Large error penalty
    "MAPE": metrics.MAPE,      # Mean Absolute Percentage Error - Scale-independent
    # SECONDARY METRICS (additional insights)
    "R2": metrics.R2,          # Coefficient of Determination - Goodness of fit
    "SMAPE": metrics.SMAPE,    # Symmetric MAPE - Robust percentage error
    "MSE": metrics.MSE,        # Mean Squared Error - For optimization
    "RMSLE": metrics.RMSLE,    # Root Mean Squared Log Error - Log-scale
}

# Note: River regression metrics have NO configurable parameters
# Unlike classification metrics, they don't need pos_val, cm, beta, etc.

# =============================================================================
# INSTANTIATE ALL METRICS
# =============================================================================
regression_metrics = {
    name: metric_class()
    for name, metric_class in regression_metric_classes.items()
}

# -----------------------------------------------------------------------------
# ROLLING METRICS (for concept drift detection)
# -----------------------------------------------------------------------------
# Window size: 1000 samples
# - Provides stable estimates while detecting recent performance changes
# - For ETA with ~10 samples/second, this covers ~100 seconds of data
rolling_metrics = {
    "RollingMAE": utils.Rolling(metrics.MAE(), window_size=1000),
}
""")


# METRICS CONFIGURATION (Research-based optimal args for ETA)
# Sources:
#   - River ML Documentation: https://riverml.xyz/dev/api/metrics/
#   - ETA Prediction Best Practices: https://peerj.com/articles/cs-3259/
#   - Travel Time Prediction Review: https://pmc.ncbi.nlm.nih.gov/articles/PMC8444094/
#
# Key insights:
#   - MAE + RMSE + MAPE is the most common metric combination (15-23% of studies)
#   - MAE is the primary metric (used in 33% of ETA studies)
#   - Rolling metrics help detect concept drift (traffic pattern changes)

from river import metrics, utils

# -----------------------------------------------------------------------------
# REGRESSION METRICS (all metrics have no configurable parameters)
# -----------------------------------------------------------------------------
regression_metric_classes = {
    # PRIMARY METRICS (most important for ETA)
    "MAE": metrics.MAE,        # Mean Absolute Error - Primary metric
    "RMSE": metrics.RMSE,      # Root Mean Squared Erro

## 8. Key Differences from Classification Metrics (TFD)

| Aspect | Classification (TFD) | Regression (ETA) |
|--------|---------------------|------------------|
| **Parameters** | Many (pos_val, cm, beta, n_thresholds) | None |
| **Shared CM** | Yes (efficiency optimization) | N/A |
| **Primary Metric** | FBeta (β=2) | MAE |
| **Rolling** | RollingROCAUC | RollingMAE |
| **Best Model Selection** | Maximize FBeta | Minimize MAE |

In [17]:
print("\n" + "=" * 60)
print("INVESTIGATION COMPLETE")
print("=" * 60)
print("""
Summary:
- River regression metrics have NO configurable parameters
- Recommended metrics: MAE, RMSE, MAPE, R², SMAPE, MSE, RMSLE
- Rolling metrics: Use utils.Rolling() wrapper with window_size=1000
- Best model selection: Minimize MAE

Next steps:
1. Update estimated_time_of_arrival_river.py with new metrics configuration
2. Add RollingMAE for drift detection
3. Update Reflex dashboard to display all metrics
""")


INVESTIGATION COMPLETE

Summary:
- River regression metrics have NO configurable parameters
- Recommended metrics: MAE, RMSE, MAPE, R², SMAPE, MSE, RMSLE
- Rolling metrics: Use utils.Rolling() wrapper with window_size=1000
- Best model selection: Minimize MAE

Next steps:
1. Update estimated_time_of_arrival_river.py with new metrics configuration
2. Add RollingMAE for drift detection
3. Update Reflex dashboard to display all metrics

