Skip to content

Commit

Permalink
Alerts: Improve HAProxyReloadFail alert
Browse files Browse the repository at this point in the history
openshift/router#209 reworks
the HAProxy reload fails metric so that the HAProxyReloadFail alert can
be improved. The new template_router_reload_failure metric in router openshift#209
that replaces the template_router_reload_fails metric is a simple boolean gauge metric,
which allows the HAProxyReloadFail alert to fire for the duration of the
HAProxy reload outage. Previously, the HAProxyReloadFail alert would
fire for ~5 minutes regardless of whether or not reloads were still
continuing to fail on the router. Also drops the HAProxyReloadFail alert
to warning severity.
  • Loading branch information
sgreene570 committed Nov 11, 2020
1 parent 49d3e86 commit e1bcb1b
Showing 1 changed file with 27 additions and 0 deletions.
27 changes: 27 additions & 0 deletions manifests/0000_90_ingress-operator_03_prometheusrules.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: ingress-operator
namespace: openshift-ingress-operator
labels:
role: alert-rules
annotations:
include.release.openshift.io/self-managed-high-availability: "true"
spec:
groups:
- name: openshift-ingress.rules
rules:
- alert: HAProxyReloadFail
expr: template_router_reload_failure == 1
for: 5m
labels:
severity: warning
annotations:
message: "HAProxy reloads are failing on {{ $labels.pod }}. Router is not respecting recently created or modified routes"
- alert: HAProxyDown
expr: haproxy_up == 0
for: 5m
labels:
severity: critical
annotations:
message: "HAProxy metrics are reporting that the router is down"

0 comments on commit e1bcb1b

Please sign in to comment.