Goal
Add monitoring, alerting, and kill the single-VPS SPOF (Phase 4 of #164).
Tasks
Acceptance
Stopping the stream fires exactly one "stream-down" alert (not a permanently-firing one); a missing recording fires an alert; relay auto-restarts under memory pressure; public mount is fronted by a CDN/second relay.
Depends on
Phase-0 spike (peak listeners) and #164 Phase-2 stack + telemetry ticket.
Parent: #164 (Phase 4).
Goal
Add monitoring, alerting, and kill the single-VPS SPOF (Phase 4 of #164).
Tasks
markuslindenberg/icecast_exporter(icecast_listenerson :9146) for history with near-zero code. Bind all admin/metrics ports to 127.0.0.1 — never expose publicly. Don't scrape per-client endpoints (cardinality blow-up).probe_success == 0 for 5mon the public stream URL (relay-down, independent of server metrics) +absent(<listeners metric>) == 1. CRITICAL: do NOT useor on() vector(0)inside an alert rule —vector(0)is always present and< 1, so it fires permanently; that trick is for Grafana panels only. Keep "stream-down" and "zero-listeners" as two separate alerts, or gate zero-listeners on a "show-scheduled" signal. Add recording-failure alerts:r2_upload_failures_total > 0, recording byte-rate stalls while stream up, ffmpeg/tee restarts.blank.stripcovers the audio path; consider an EBU-R128 tap for alerting).Restart=always+ hard memory cap (cgroupMemoryMax).Acceptance
Stopping the stream fires exactly one "stream-down" alert (not a permanently-firing one); a missing recording fires an alert; relay auto-restarts under memory pressure; public mount is fronted by a CDN/second relay.
Depends on
Phase-0 spike (peak listeners) and #164 Phase-2 stack + telemetry ticket.
Parent: #164 (Phase 4).