Dynamic baselines with time-of-day and day-of-week awareness

## Summary

Static thresholds ("alert when CPU > 80%") generate alert fatigue because they don't account for normal workload patterns. CPU at 85% might be expected at 2pm Tuesday during month-end processing but alarming at 3am Sunday. Dynamic baselines learn what "normal" looks like for each time window and flag deviations from that pattern.

PerformanceMonitor's \`compare_analysis\` already does a primitive version of this (compare current 4 hours vs same window 28 hours ago). This issue tracks making baselines time-of-day and day-of-week aware.

## Core Concept

- Collect 30+ days of historical metrics
- Build per-metric baselines segmented by time-of-day and day-of-week (e.g., "Tuesday 2pm CPU is typically 70-85%")
- Compute confidence bands (e.g., mean ± 2-3 standard deviations)
- Flag current values that fall outside the expected band for this specific time window
- Continuously update baselines as workloads evolve

## Which Metrics to Baseline

### High value (clear daily/weekly patterns)
- CPU utilization
- Batch Requests/sec
- Wait stats (total wait time per type)
- Session/connection counts
- Query duration aggregates

### Medium value
- Memory utilization (tends to be more stable)
- I/O latency
- TempDB usage
- Blocking event counts

## Where This Applies

### Analysis Engine (both Dashboard and Lite)
The inference engine's fact scoring could incorporate baseline deviation as an amplifier. A CPU reading of 85% with a baseline of 80±5% scores low (normal). The same 85% with a baseline of 40±10% scores high (anomalous). This makes the engine's findings context-aware without changing the rule structure.

### Alert Thresholds
Instead of fixed thresholds, alerts could fire on "deviation from baseline exceeds N standard deviations." This directly addresses alert fatigue — the #1 cited barrier to faster incident response (per 2024 industry survey).

### Trend Charts (both Dashboard and Lite)
Overlay a shaded "expected range" band on metric charts. Visually, the user sees the metric line and a band showing what's normal. When the line exits the band, something changed. This is the visual equivalent of the annotation markers from issue #688 but for statistical context rather than discrete events.

### compare_analysis Enhancement
The existing \`compare_analysis\` MCP tool compares two time windows. With baselines, it could compare the current window against the *expected baseline for this time of day/week* rather than a fixed offset, making the comparison more meaningful.

## Data Requirements

### Dashboard
Historical data is already in the \`PerformanceMonitor\` SQL Server database. Baseline computation could be a scheduled calculation (SQL Agent job or application-level) that maintains a baseline table with per-metric, per-hour-of-day, per-day-of-week statistics.

### Lite
Historical data is in DuckDB/Parquet. Baseline computation could run as part of the collector cycle or on-demand. DuckDB's analytical query capabilities make time-bucketed aggregation efficient.

Both apps need at least 2-4 weeks of data before baselines become meaningful. New installations should gracefully degrade to static thresholds until sufficient history exists.

## Design Notes

- Start simple: mean and standard deviation per metric per hour-of-day per day-of-week
- More sophisticated approaches (seasonal decomposition, exponential smoothing) can come later
- The baseline computation itself is not computationally expensive — it's aggregating data that's already stored
- The UX challenge is communicating "this is unusual for this time" vs "this crossed a fixed threshold" — the shaded band on charts is the clearest way
- Applies to both Dashboard and Lite, plus MCP analysis tools

## References
- [Adaptive Baselining — ITRS Group](https://www.itrsgroup.com/blog/engineering-out-alert-fatigue-adaptive-baselining)
- [Static vs Dynamic Thresholds — LogicMonitor](https://www.logicmonitor.com/blog/static-thresholds-vs-dynamic-thresholds)
- [Adaptive Thresholding — Splunk](https://www.splunk.com/en_us/blog/learn/adaptive-thresholding.html)
- [Alert Fatigue Solutions — incident.io](https://incident.io/blog/alert-fatigue-solutions-for-dev-ops-teams-in-2025-what-works)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic baselines with time-of-day and day-of-week awareness #692

Summary

Core Concept

Which Metrics to Baseline

High value (clear daily/weekly patterns)

Medium value

Where This Applies

Analysis Engine (both Dashboard and Lite)

Alert Thresholds

Trend Charts (both Dashboard and Lite)

compare_analysis Enhancement

Data Requirements

Dashboard

Lite

Design Notes

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Dynamic baselines with time-of-day and day-of-week awareness #692

Description

Summary

Core Concept

Which Metrics to Baseline

High value (clear daily/weekly patterns)

Medium value

Where This Applies

Analysis Engine (both Dashboard and Lite)

Alert Thresholds

Trend Charts (both Dashboard and Lite)

compare_analysis Enhancement

Data Requirements

Dashboard

Lite

Design Notes

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions