Context
Today only the Grafana datasources are provisioned in code (deploy/pkg/k8s/monitoring.go:546). Alert rules and dashboards live in Grafana's PostgreSQL DB and are edited via UI — not versioned, not reviewable, and lost if Grafana's DB ever resets.
Surfaced while triaging recurring Publish Endpoint Latency and Availability dropped below 95% alerts that turned out to be metric-pipeline artifacts post-deploy. Wanted to add for: 10m and noDataState: OK to both rules; can only do that via UI today.
Proposal
Add a grafana-alerts ConfigMap mounted at /etc/grafana/provisioning/alerting/ mirroring the existing datasources pattern, and move alert rules into it. Likely worth doing dashboards and notification policies at the same time.
Caveats
- Provisioned alerts are read-only in the UI — every change goes through a PR thereafter. Worth confirming team is OK with that.
- Existing rules need to be exported (Grafana provisioning API) and committed.
Acceptance
Context
Today only the Grafana datasources are provisioned in code (
deploy/pkg/k8s/monitoring.go:546). Alert rules and dashboards live in Grafana's PostgreSQL DB and are edited via UI — not versioned, not reviewable, and lost if Grafana's DB ever resets.Surfaced while triaging recurring
Publish Endpoint LatencyandAvailability dropped below 95%alerts that turned out to be metric-pipeline artifacts post-deploy. Wanted to addfor: 10mandnoDataState: OKto both rules; can only do that via UI today.Proposal
Add a
grafana-alertsConfigMap mounted at/etc/grafana/provisioning/alerting/mirroring the existing datasources pattern, and move alert rules into it. Likely worth doing dashboards and notification policies at the same time.Caveats
Acceptance
deploy/README.mdupdated with the new edit workflow