Problem
The ErikAI analysis engine currently only does "always bad" (aggregate analysis) and "bad actor" (per-query) detection. The "oh shit" mode — detecting acute deviations from a server's own baseline — is not implemented.
We have a crude CPU_SPIKE fact (max >> avg heuristic), but real anomaly detection requires:
- Rolling baselines per server per metric (mean + standard deviation)
- Spike detection when a metric deviates beyond a threshold (e.g., 2+ standard deviations)
- Context-aware baselines (business hours vs overnight, weekday vs weekend)
- Correlation: "during this 4-minute spike window, what queries were running?"
Without this, the engine can't answer "what just changed?" — only "what's chronically wrong?"
Design Reference
See erikai-design.md memory file, section "How anomaly detection works" — covers baseline comparisons, context-aware baselines, and anomaly-as-fact integration.
Impact
Bursty workloads, sudden regressions, and transient blocking events are invisible to aggregate analysis. Users running HammerDB for 2 hours on an otherwise idle server see diluted findings instead of "CPU spiked to 99% at 14:49 because of this query."
Problem
The ErikAI analysis engine currently only does "always bad" (aggregate analysis) and "bad actor" (per-query) detection. The "oh shit" mode — detecting acute deviations from a server's own baseline — is not implemented.
We have a crude
CPU_SPIKEfact (max >> avg heuristic), but real anomaly detection requires:Without this, the engine can't answer "what just changed?" — only "what's chronically wrong?"
Design Reference
See
erikai-design.mdmemory file, section "How anomaly detection works" — covers baseline comparisons, context-aware baselines, and anomaly-as-fact integration.Impact
Bursty workloads, sudden regressions, and transient blocking events are invisible to aggregate analysis. Users running HammerDB for 2 hours on an otherwise idle server see diluted findings instead of "CPU spiked to 99% at 14:49 because of this query."