Skip to content

Commit

Permalink
Alerts: Improve HAProxyReloadFail alert
Browse files Browse the repository at this point in the history
openshift/router#209 reworks
the HAProxy reload fails metric so that the HAProxyReloadFail alert can
be improved. The new template_router_reload_failure metric in router openshift#209
that replaces the template_router_reload_fails metric is a simple boolean gauge metric,
which allows the HAProxyReloadFail alert to fire for the duration of the
HAProxy reload outage. Previously, the HAProxyReloadFail alert would
fire for ~5 minutes regardless of whether or not reloads were still
continuing to fail on the router. Also drops the HAProxyReloadFail alert
to warning severity.
  • Loading branch information
sgreene570 committed Nov 11, 2020
1 parent dd93c33 commit c19ceeb
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions manifests/0000_90_ingress-operator_03_prometheusrules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ spec:
- name: openshift-ingress.rules
rules:
- alert: HAProxyReloadFail
expr: increase(template_router_reload_fails[5m]) > 0
expr: template_router_reload_failure == 1
for: 5m
labels:
severity: critical
severity: warning
annotations:
message: "HAProxy reloads have failed on {{ $labels.pod }}. Router is not respecting recently created or modified routes"
message: "HAProxy reloads are failing on {{ $labels.pod }}. Router is not respecting recently created or modified routes"
- alert: HAProxyDown
expr: haproxy_up == 0
for: 5m
Expand Down

0 comments on commit c19ceeb

Please sign in to comment.