Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions mission-control/docs/guide/notifications/concepts/grouping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: Grouping
sidebar_custom_props:
icon: group
---

Mission Control may generate multiple related notifications within a short time window. Instead of sending each alert,
you can use notification grouping to merge multiple events into a single message.

_Example_: When multiple Helm releases fail to upgrade because of a common unavailable dependency,
you can use notification grouping to merge the notifications for all the affected helm releases into a single message.

The `groupBy` parameter lets you define how to group notifications.
You can group by:

- `type` (type of the config)
- `description`
- `status_reason`
- `label` in the format `label:app`
- `tag` in the format `tag:namespace`

```yaml title="" file=<rootDir>/modules/mission-control/fixtures/notifications/config-health.yaml {11-12}

```
20 changes: 20 additions & 0 deletions mission-control/docs/guide/notifications/concepts/inhibition.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
title: Inhibition
sidebar_custom_props:
icon: block
---

import Inhibition from '../../../reference/notifications/_inhibition.mdx';

Multiple related notifications may be generated within a short time window. Instead of sending each alert separately,
you can use notification inhibition to inhibit notifications based on the resource hierarchy.

_Example_: When a Kubernetes pod becomes unhealthy, its replicaset and the deployment will also become unhealthy.
If you have a notification set up to alert on `config.unhealthy`, you'll receive 3 different notifications for the same cause.

```yaml title="deployment-with-inhibition.yaml" file=<rootDir>/modules/mission-control/fixtures/notifications/deployment-with-inhibition.yaml
```

<Inhibition />


Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ sidebar_custom_props:
icon: dedupe
---

The repeat interval determines the duration between subsequent notifications after an initial successful delivery.
The repeat interval determines the duration between subsequent related notifications after an initial successful delivery.

```yaml title="deployment-failed.yaml"
apiVersion: mission-control.flanksource.com/v1
Expand All @@ -14,24 +14,33 @@ metadata:
namespace: default
spec:
events:
- config.healthy
- config.unhealthy
- config.warning
- config.unknown
filter: config.type == "Kubernetes::Deployment"
to:
email: alerts@acme.com
repeatInterval: 2h
groupBy:
- type
groupByInterval: 12h
```

## Grouping Per Resource per Source Event
With the above notification in place, if a Kubernetes Deployment's health fluctuates between `healthy` and `unhealthy` multiple times within a 2-hour window, the system only sends one notification in that period.

The `repeatInterval` applies per unique resource per unique event source. This prevents duplicate notifications when the same resource triggers the same event type multiple times within the interval window. It still allows notifications for different event types on the same resource within the window.
### Repeat groups

### Example:
Repeat interval works in tandem with [notification grouping](./grouping.md).
If multiple notifications fall in the same group, only one notification will be sent for the group within the repeat interval.

With the above notification in place, if a Kubernetes Helm release's health fluctuates between `healthy` and `unhealthy` multiple times within a 2-hour window, the system limits notifications to just two: one for the `config.healthy` event and one for the `config.unhealthy` event.
#### Example:

However, if the Helm release health shifts to `warning` during this same period, it triggers an additional notification. This occurs because the warning status is considered a separate event source.
Deployment A becomes unhealthy due to a missing storage class, triggering a notification.
Soon after, Deployment B also turns unhealthy for the same reason. Since it’s grouped with A, no additional notification is sent during the repeat interval.
After the 2-hour interval passes, if Deployment C also becomes unhealthy for the same issue, a new notification is sent for C but A & B will also be included in the notification.

The notification throttling mechanism operates independently for each distinct resource. As a result, other Helm releases are not affected by this limitation.


| Time | Deployment | Status | Action Taken |
|--------|------------|------------|----------------------------------------------------|
| 10:00 | A | Unhealthy | Notification sent _(first in group)_ |
| 10:15 | B | Unhealthy | Supressed due to repeat interval _(grouped with A for 12h)_ |
| 12:10 | C | Unhealthy | Notification sent _(repeat interval expired)_ Includes A, B, and C in the message _(group is still active since groupByInterval is 12h)_ |
26 changes: 0 additions & 26 deletions mission-control/docs/guide/notifications/concepts/wait-for.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -51,30 +51,4 @@ spec:
waitFor: 5m
//highlight-next-line
waitForEvalPeriod: 30s
```
:::

### Grouping Notifications

Multiple related notifications may be generated within a short time window. Instead of sending each alert separately,
you can use notification grouping to consolidate multiple events into a single message.

_Example_: When a Kubernetes deployment becomes unhealthy, its replicaset and associated pods will also become unhealthy.
If you have a notification set up to alert on `config.unhealthy`, you'll receive 3 different notifications at the very least for the same cause.

The `groupBy` parameter allows you to define how notifications should be grouped.
Grouping can be done via
- `type` (type of the config)
- `description`
- `status_reason`
- `labels` in the format `labels:app`
- `tags` in the format `tag:namespace`

:::info
Grouping only works with waitFor.
Hence, a waitFor duration is required
:::


```yaml title="" file=<rootDir>/modules/mission-control/fixtures/notifications/config-health.yaml {11-12}
```
33 changes: 33 additions & 0 deletions mission-control/docs/reference/notifications/_inhibition.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<Fields
rows={[
{
field: 'depth',
scheme: 'int',
description: 'Defines how many levels of child or parent resources to traverse.'
},
{
field: 'direction',
scheme: '`inoming`|`outgoing`|`both`',
required: true,
description: 'Specifies the traversal direction in relation to the "From" resource. Can be "outgoing" (looks for child resources), "incoming" (looks for parent resources), or "all" (considers both).'
},
{
field: 'from',
scheme: '`string`',
required: true,
description: 'Specifies the starting resource type (for example, "Kubernetes::Deployment").'
},
{
field: 'soft',
scheme: 'bool',
description: 'When true, relates using soft relationships. Example: Deployment to Pod is hard relationship, but Node to Pod is soft relationship.'
},
{
field: 'to',
scheme: '`[]string`',
required: true,
description: 'Specifies the traversal direction in relation to the `from` resource. `outgoing` looks for child resources and `incoming` looks for parent resources.'
}
]}
/>

16 changes: 16 additions & 0 deletions mission-control/docs/reference/notifications/_notification.mdx
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import Inhibition from '../../reference/notifications/_inhibition.mdx';

<Fields withTemplates="true" rows={[
{
field: "events",
Expand Down Expand Up @@ -60,6 +62,11 @@
description: "Group notifications that are in waiting stage based on labels, tags and attributes. Only applicable when `waitFor` is provided. See [Grouping attributes](../../guide/notifications/concepts/wait-for#grouping-notifications)",
scheme: "[]string"
},
{
field: "groupByInterval",
description: "The maximum duration for which notifications will be grouped together before creating a new group. _(Default: 24h)_",
scheme: "duration"
},
{
field: "title",
description: "Channel dependent e.g. subject for email",
Expand Down Expand Up @@ -95,8 +102,17 @@
description: "specify the `<namespace>/<name>` of the playbook that should be triggered when this notification fires.",
scheme: "string",
},
{
field: "inhibitions",
description: "Inhibitions are used to inhibit notifications based on the resource hierarchy.",
scheme: "[`[]Inhibition`](#inhibition)",
}
]}/>

:::info Single Recipient
Only one recipient can be specified
:::

### Inhibition

<Inhibition />
2 changes: 1 addition & 1 deletion modules/duty
Submodule duty updated 67 files
+2 −1 .gitignore
+45 −36 connection/environment.go
+1 −1 connection/kubernetes.go
+52 −0 connection/loki.go
+25 −0 connection/zz_generated.deepcopy.go
+22 −0 db/utils.go
+2 −0 functions/drop.sql
+24 −22 go.mod
+42 −40 go.sum
+67 −22 kubernetes/dynamic.go
+24 −1 migrate/migrate.go
+73 −0 migrate/migrate_test.go
+22 −0 models/config.go
+3 −2 models/connections.go
+124 −18 models/notifications.go
+20 −4 models/playbooks.go
+26 −0 query/notifications.go
+2 −0 rbac/objects.go
+11 −1 schema/apply.go
+123 −14 schema/notifications.hcl
+196 −11 schema/openapi/canary.schema.json
+196 −11 schema/openapi/canary.spec.schema.json
+199 −11 schema/openapi/component.schema.json
+199 −11 schema/openapi/component.spec.schema.json
+3 −0 schema/openapi/config_aws.schema.json
+3 −0 schema/openapi/config_azure.schema.json
+3 −0 schema/openapi/config_azuredevops.schema.json
+3 −0 schema/openapi/config_file.schema.json
+3 −0 schema/openapi/config_githubactions.schema.json
+106 −3 schema/openapi/config_kubernetes.schema.json
+3 −0 schema/openapi/config_kubernetesfile.schema.json
+3 −0 schema/openapi/config_sql.schema.json
+3 −0 schema/openapi/config_trivy.schema.json
+189 −4 schema/openapi/connection.definitions.json
+134 −9 schema/openapi/connection.schema.json
+5 −1 schema/openapi/health_databasebackupcheck.schema.json
+103 −0 schema/openapi/health_exec.schema.json
+8 −4 schema/openapi/health_folder.schema.json
+56 −1 schema/openapi/health_http.schema.json
+100 −0 schema/openapi/health_kubernetes.schema.json
+3 −0 schema/openapi/health_s3.schema.json
+106 −7 schema/openapi/notification.definitions.json
+80 −6 schema/openapi/notification.schema.json
+21 −0 schema/openapi/permission.schema.json
+25 −4 schema/openapi/permissiongroup.schema.json
+169 −0 schema/openapi/playbook-spec.schema.json
+224 −18 schema/openapi/playbook.definitions.json
+169 −0 schema/openapi/playbook.schema.json
+136 −3 schema/openapi/scrape_config.definitions.json
+113 −4 schema/openapi/scrape_config.schema.json
+112 −3 schema/openapi/scrape_config.spec.schema.json
+234 −19 schema/openapi/topology.definitions.json
+199 −11 schema/openapi/topology.schema.json
+199 −11 schema/openapi/topology.spec.schema.json
+84 −5 shell/shell.go
+28 −8 start.go
+20 −0 tests/kubernetes_test.go
+7 −7 tests/migration_dependency_test.go
+169 −48 tests/notification_test.go
+3 −1 tests/upstream_test.go
+1 −1 types/common.go
+10 −1 types/resource_selector.go
+3 −2 types/resource_selector_test.go
+2 −0 views/015_job_history.sql
+14 −2 views/018_playbooks.sql
+83 −5 views/021_notification.sql
+38 −0 views/037_notification_group_resources.sql
2 changes: 1 addition & 1 deletion modules/mission-control
Submodule mission-control updated 85 files
+1 −1 .github/workflows/scorecard.yml
+12 −2 .github/workflows/test.yml
+3 −1 .gitignore
+1 −1 Makefile
+10 −0 api/event.go
+3 −0 api/v1/connection_types.go
+3 −0 api/v1/notification_types.go
+44 −9 api/v1/playbook_actions.go
+97 −17 api/v1/zz_generated.deepcopy.go
+113 −9 artifacts/artifacts.go
+26 −7 cmd/server.go
+4 −0 config/crds/mission-control.flanksource.com_connections.yaml
+3 −0 config/crds/mission-control.flanksource.com_notifications.yaml
+472 −0 config/crds/mission-control.flanksource.com_playbooks.yaml
+5 −1 config/schemas/connection.schema.json
+3 −0 config/schemas/notification.schema.json
+189 −0 config/schemas/playbook-spec.schema.json
+189 −0 config/schemas/playbook.schema.json
+19 −8 db/connections.go
+125 −1 db/notifications.go
+1 −1 db/playbooks.go
+10 −0 fixtures/notifications/check-label-match-query.yaml
+11 −0 fixtures/notifications/component-match-query.yaml
+26 −9 fixtures/permissions/config-notification-group-playbook-permission.yaml
+14 −0 fixtures/permissions/connection-read.yaml
+34 −0 fixtures/playbooks/logs/cloudwatch.yaml
+39 −0 fixtures/playbooks/logs/loki.yaml
+38 −0 fixtures/playbooks/logs/opensearch.yaml
+34 −32 go.mod
+81 −60 go.sum
+534 −0 llm/cost.go
+56 −0 llm/cost_test.go
+8 −0 llm/gemini.go
+78 −12 llm/llm.go
+128 −0 logs/cloudwatch/search.go
+23 −0 logs/cloudwatch/types.go
+23 −0 logs/cloudwatch/zz_generated.deepcopy.go
+41 −0 logs/config.go
+45 −1 logs/logs.go
+59 −0 logs/loki/loki.go
+123 −0 logs/loki/types.go
+23 −0 logs/loki/zz_generated.deepcopy.go
+108 −0 logs/mapping.go
+76 −0 logs/mapping_test.go
+123 −0 logs/opensearch/search.go
+85 −0 logs/opensearch/types.go
+49 −0 logs/opensearch/zz_generated.deepcopy.go
+52 −0 logs/zz_generated.deepcopy.go
+1 −1 notification/cel.go
+2 −2 notification/context.go
+140 −44 notification/events.go
+24 −90 notification/job.go
+276 −37 notification/notification_test.go
+113 −12 notification/send.go
+17 −4 notification/shoutrrr.go
+6 −0 notification/suite_test.go
+1 −1 notification/templates/component.health
+1 −1 notification/templates/config.db.update
+1 −1 notification/templates/config.health
+8 −4 pkg/clients/git/connectors/git_access_token.go
+8 −4 pkg/clients/git/connectors/git_ssh.go
+134 −45 playbook/actions/ai.go
+12 −2 playbook/actions/ai_slack.go
+9 −5 playbook/actions/exec.go
+2 −4 playbook/actions/gitops.go
+95 −0 playbook/actions/logs.go
+1 −1 playbook/actions/pod.go
+29 −5 playbook/events.go
+36 −11 playbook/playbook.go
+96 −7 playbook/playbook_test.go
+0 −54 playbook/runner/artifacts.go
+15 −3 playbook/runner/exec.go
+24 −11 playbook/runner/runner.go
+0 −1 playbook/runner/template.go
+6 −0 playbook/suite_test.go
+55 −0 playbook/testdata/action-ai.yaml
+17 −0 playbook/testdata/action-exec-artifacts.yaml
+9 −0 playbook/testdata/connections/artifact.yaml
+11 −0 playbook/testdata/connections/gemini.yaml
+14 −0 playbook/testdata/permissions/allow-ai-artifacts-connection.yaml
+14 −0 playbook/testdata/permissions/allow-ai-gemini-connection.yaml
+14 −0 playbook/testdata/permissions/allow-exec-playbook-artifact.yaml
+9 −0 utils/bytes.go
+55 −0 utils/bytes_test.go
+30 −0 utils/dir.go