Skip to content

Commit

Permalink
Added alerting.restarts.enabled
Browse files Browse the repository at this point in the history
  • Loading branch information
bastianeicher committed Jan 17, 2024
1 parent 9d8a9b7 commit d7aa745
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 0 deletions.
1 change: 1 addition & 0 deletions charts/generic-service/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ app:
| `alerting.enabled` | `false` | Deploys Prometheus alert rule for issues like like unavailable pods or high memory use |
| `alerting.pod.maxStartupSeconds` | `120` | The maximum amount of time a Pod is allowed to take for startup |
| `alerting.pod.maxAgeSeconds` | | The maximum allowed age of a `Pod` in seconds (useful to ensure regular deployments) |
| `alerting.restarts.enabled` | `true` | Deploys Prometheus alert rule for unexpected container restarts |
| `alerting.memory.enabled` | `true` | Enables alerts relating to memory usage |
| `alerting.memory.maxUsageFactor` | `0.9` | The maximum usage factor of the memory limit (between `0` and `1`) |
| `alerting.memory.quotaBufferFactor` | `1.0` | Multiplied with `resources.*.memory` to determine minimum allowed unused memory quota in namespace |
Expand Down
2 changes: 2 additions & 0 deletions charts/generic-service/templates/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ spec:
{{- end }}
{{- end }}

{{- if .Values.alerting.restarts.enabled }}
- alert: ContainerRestart
# Avoid constantly retriggering during crash loops by comparing over interval slightly longer than CrashLoopBackOff upper limit (5m).
# Don't trigger during startup grace period (service might just be waiting for dependencies).
Expand All @@ -56,6 +57,7 @@ spec:
topic: availability
annotations: {{- include "generic-service.alert-annotations" . | nindent 12 }} crash/restart
description: '{{"{{ $labels.pod }}"}} has crashed/restarted.'
{{- end }}

{{- if .Values.alerting.pod.maxAgeSeconds }}
- alert: PodTooOld
Expand Down
10 changes: 10 additions & 0 deletions charts/generic-service/values.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -784,6 +784,16 @@
},
"additionalProperties": false
},
"restarts": {
"type": "object",
"properties": {
"enabled": {
"type": "boolean",
"default": true,
"description": "Deploys Prometheus alert rule for unexpected container restarts"
}
}
},
"memory": {
"type": "object",
"properties": {
Expand Down
2 changes: 2 additions & 0 deletions charts/generic-service/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,8 @@ alerting:
pod:
maxStartupSeconds: 120
maxAgeSeconds:
restarts:
enabled: true
memory:
enabled: true
maxUsageFactor: 0.9
Expand Down

0 comments on commit d7aa745

Please sign in to comment.