fix(infra): grafana 12 expression migration + alert message improvements#3530
Conversation
Grafana 12.x removed classic_condition expression type. Replace all alert rule conditions with reduce (refId B) + threshold (refId C) pattern as required by modern Grafana. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request migrates Grafana alert configurations from classic_condition to the Unified Alerting engine by introducing reduce and threshold expressions across the production and stage environments. While the structural changes are correct, the alert descriptions in the annotations still reference the raw range query ($values.A) instead of the reduced result ($values.B). This could lead to incorrect or missing values in alert notifications, so it is recommended to update these references to ensure proper data reporting.
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add severity-based notification policy routes: * critical: 1h repeat * warning (default): 24h repeat * error state: 7d repeat (avoid spam from rule errors) - Translate alert titles, summaries, descriptions to Korean - Add action guidance to descriptions (commands to investigate) - Display current value with proper formatting (humanize1024 for bytes) - Adjust 'for' duration: * Node Not Ready: 2m (faster critical detection) * RDS DatabaseConnections: 10m (absorb traffic spikes) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
… syntax - Remove unquoted matcher values per grafana provisioning docs format - Fix node-not-ready: drop \`== 0\` filter from PromQL, use threshold lt 1 Previous logic returned series only when value=0 (NotReady), then threshold gt 0 evaluated to false → alert never fired Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
Grafana 12.x에서
classic_conditionexpression type이 제거되어 모든 alert rule이 에러 상태로 firing되는 문제를 수정하고, 알림 메시지 품질을 개선합니다.수정 사항:
Expression type 마이그레이션 (필수 fix)
reduce(refId B) +threshold(refId C) 패턴으로 교체Notification policy 차등 (스팸 방지)
알림 메시지 한글화 + 조치 가이드
humanize1024로 bytes → human)kubectl top pod -A,df -h등)forduration 조정Additional context
invalid command type in expression 'C': 'classic_condition' is not a recognized expression typeBefore submitting the PR, please make sure you do the following
fixes #123).