diff --git a/mission-control-chart b/mission-control-chart index ba28f078..5f2fde6c 160000 --- a/mission-control-chart +++ b/mission-control-chart @@ -1 +1 @@ -Subproject commit ba28f078b57102bf6259041e041ed4fce36060eb +Subproject commit 5f2fde6cebc65f39d701ad4311d30c7c0d660b9a diff --git a/mission-control/docs/guide/notifications/concepts/grouping.md b/mission-control/docs/guide/notifications/concepts/grouping.md new file mode 100644 index 00000000..4fa6c7ae --- /dev/null +++ b/mission-control/docs/guide/notifications/concepts/grouping.md @@ -0,0 +1,24 @@ +--- +title: Grouping +sidebar_custom_props: + icon: group +--- + +Mission Control may generate multiple related notifications within a short time window. Instead of sending each alert, +you can use notification grouping to merge multiple events into a single message. + +_Example_: When multiple Helm releases fail to upgrade because of a common unavailable dependency, +you can use notification grouping to merge the notifications for all the affected helm releases into a single message. + +The `groupBy` parameter lets you define how to group notifications. +You can group by: + +- `type` (type of the config) +- `description` +- `status_reason` +- `label` in the format `label:app` +- `tag` in the format `tag:namespace` + +```yaml title="" file=/modules/mission-control/fixtures/notifications/config-health.yaml {11-12} + +``` diff --git a/mission-control/docs/guide/notifications/concepts/inhibition.mdx b/mission-control/docs/guide/notifications/concepts/inhibition.mdx new file mode 100644 index 00000000..beb48200 --- /dev/null +++ b/mission-control/docs/guide/notifications/concepts/inhibition.mdx @@ -0,0 +1,20 @@ +--- +title: Inhibition +sidebar_custom_props: + icon: block +--- + +import Inhibition from '../../../reference/notifications/_inhibition.mdx'; + +Multiple related notifications may be generated within a short time window. Instead of sending each alert separately, +you can use notification inhibition to inhibit notifications based on the resource hierarchy. + +_Example_: When a Kubernetes pod becomes unhealthy, its replicaset and the deployment will also become unhealthy. +If you have a notification set up to alert on `config.unhealthy`, you'll receive 3 different notifications for the same cause. + +```yaml title="deployment-with-inhibition.yaml" file=/modules/mission-control/fixtures/notifications/deployment-with-inhibition.yaml +``` + + + + diff --git a/mission-control/docs/guide/notifications/concepts/repeat-interval.mdx b/mission-control/docs/guide/notifications/concepts/repeat-interval.mdx index b36b1a8b..cca6c6ac 100644 --- a/mission-control/docs/guide/notifications/concepts/repeat-interval.mdx +++ b/mission-control/docs/guide/notifications/concepts/repeat-interval.mdx @@ -4,7 +4,7 @@ sidebar_custom_props: icon: dedupe --- -The repeat interval determines the duration between subsequent notifications after an initial successful delivery. +The repeat interval determines the duration between subsequent related notifications after an initial successful delivery. ```yaml title="deployment-failed.yaml" apiVersion: mission-control.flanksource.com/v1 @@ -14,24 +14,33 @@ metadata: namespace: default spec: events: - - config.healthy - config.unhealthy - - config.warning - - config.unknown filter: config.type == "Kubernetes::Deployment" to: email: alerts@acme.com repeatInterval: 2h + groupBy: + - type + groupByInterval: 12h ``` -## Grouping Per Resource per Source Event +With the above notification in place, if a Kubernetes Deployment's health fluctuates between `healthy` and `unhealthy` multiple times within a 2-hour window, the system only sends one notification in that period. -The `repeatInterval` applies per unique resource per unique event source. This prevents duplicate notifications when the same resource triggers the same event type multiple times within the interval window. It still allows notifications for different event types on the same resource within the window. +### Repeat groups -### Example: +Repeat interval works in tandem with [notification grouping](./grouping.md). +If multiple notifications fall in the same group, only one notification will be sent for the group within the repeat interval. -With the above notification in place, if a Kubernetes Helm release's health fluctuates between `healthy` and `unhealthy` multiple times within a 2-hour window, the system limits notifications to just two: one for the `config.healthy` event and one for the `config.unhealthy` event. +#### Example: -However, if the Helm release health shifts to `warning` during this same period, it triggers an additional notification. This occurs because the warning status is considered a separate event source. +Deployment A becomes unhealthy due to a missing storage class, triggering a notification. +Soon after, Deployment B also turns unhealthy for the same reason. Since it’s grouped with A, no additional notification is sent during the repeat interval. +After the 2-hour interval passes, if Deployment C also becomes unhealthy for the same issue, a new notification is sent for C but A & B will also be included in the notification. -The notification throttling mechanism operates independently for each distinct resource. As a result, other Helm releases are not affected by this limitation. + + +| Time | Deployment | Status | Action Taken | +|--------|------------|------------|----------------------------------------------------| +| 10:00 | A | Unhealthy | Notification sent _(first in group)_ | +| 10:15 | B | Unhealthy | Supressed due to repeat interval _(grouped with A for 12h)_ | +| 12:10 | C | Unhealthy | Notification sent _(repeat interval expired)_ Includes A, B, and C in the message _(group is still active since groupByInterval is 12h)_ | \ No newline at end of file diff --git a/mission-control/docs/guide/notifications/concepts/wait-for.mdx b/mission-control/docs/guide/notifications/concepts/wait-for.mdx index 96783a77..d8c26fd0 100644 --- a/mission-control/docs/guide/notifications/concepts/wait-for.mdx +++ b/mission-control/docs/guide/notifications/concepts/wait-for.mdx @@ -51,30 +51,4 @@ spec: waitFor: 5m //highlight-next-line waitForEvalPeriod: 30s -``` -::: - -### Grouping Notifications - -Multiple related notifications may be generated within a short time window. Instead of sending each alert separately, -you can use notification grouping to consolidate multiple events into a single message. - -_Example_: When a Kubernetes deployment becomes unhealthy, its replicaset and associated pods will also become unhealthy. -If you have a notification set up to alert on `config.unhealthy`, you'll receive 3 different notifications at the very least for the same cause. - -The `groupBy` parameter allows you to define how notifications should be grouped. -Grouping can be done via -- `type` (type of the config) -- `description` -- `status_reason` -- `labels` in the format `labels:app` -- `tags` in the format `tag:namespace` - -:::info -Grouping only works with waitFor. -Hence, a waitFor duration is required -::: - - -```yaml title="" file=/modules/mission-control/fixtures/notifications/config-health.yaml {11-12} ``` \ No newline at end of file diff --git a/mission-control/docs/reference/notifications/_inhibition.mdx b/mission-control/docs/reference/notifications/_inhibition.mdx new file mode 100644 index 00000000..f4aec630 --- /dev/null +++ b/mission-control/docs/reference/notifications/_inhibition.mdx @@ -0,0 +1,33 @@ + + diff --git a/mission-control/docs/reference/notifications/_notification.mdx b/mission-control/docs/reference/notifications/_notification.mdx index 7ca3d7c2..86e9f978 100644 --- a/mission-control/docs/reference/notifications/_notification.mdx +++ b/mission-control/docs/reference/notifications/_notification.mdx @@ -1,3 +1,5 @@ +import Inhibition from '../../reference/notifications/_inhibition.mdx'; + /` of the playbook that should be triggered when this notification fires.", scheme: "string", }, + { + field: "inhibitions", + description: "Inhibitions are used to inhibit notifications based on the resource hierarchy.", + scheme: "[`[]Inhibition`](#inhibition)", + } ]}/> :::info Single Recipient Only one recipient can be specified ::: + +### Inhibition + + diff --git a/modules/canary-checker b/modules/canary-checker index ab9f263f..8a07be11 160000 --- a/modules/canary-checker +++ b/modules/canary-checker @@ -1 +1 @@ -Subproject commit ab9f263f1a357fda68b32a1ecceba465fa11f037 +Subproject commit 8a07be11ce6cac1bba80983cd0d0fd886bf94015 diff --git a/modules/config-db b/modules/config-db index 13663fa2..e420722a 160000 --- a/modules/config-db +++ b/modules/config-db @@ -1 +1 @@ -Subproject commit 13663fa292091000c5b088a25322af9d7aa965f5 +Subproject commit e420722ac1eb0e5bea7d60e4ac535ef71bd85ae0 diff --git a/modules/duty b/modules/duty index d8af8506..7456d2b4 160000 --- a/modules/duty +++ b/modules/duty @@ -1 +1 @@ -Subproject commit d8af8506a672319e8c3013a1fee6ce6604091b27 +Subproject commit 7456d2b41d75356da25ea4819281420efd7fe071 diff --git a/modules/mission-control b/modules/mission-control index c4e98493..1e72c906 160000 --- a/modules/mission-control +++ b/modules/mission-control @@ -1 +1 @@ -Subproject commit c4e984932a2d95ece00f79066881ce20ce942b79 +Subproject commit 1e72c906c4d8ebd998f737bb61370e95b90c54f5 diff --git a/modules/mission-control-chart b/modules/mission-control-chart index 100b5816..5f2fde6c 160000 --- a/modules/mission-control-chart +++ b/modules/mission-control-chart @@ -1 +1 @@ -Subproject commit 100b5816751795a138dc19a43e0e4a8986370c19 +Subproject commit 5f2fde6cebc65f39d701ad4311d30c7c0d660b9a diff --git a/modules/mission-control-registry b/modules/mission-control-registry index 81297714..5d644939 160000 --- a/modules/mission-control-registry +++ b/modules/mission-control-registry @@ -1 +1 @@ -Subproject commit 81297714d0ee27fc6be69085c41f7694417caa08 +Subproject commit 5d6449397af9a8fa246b600b684f36ec01b91c32