Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] Remove defaultRuleTaskTimeout and set ruleType specific timeout from kibana.yml #128294

Merged
merged 8 commits into from
Mar 28, 2022

Conversation

ersin-erdal
Copy link
Contributor

@ersin-erdal ersin-erdal commented Mar 22, 2022

fixes: #124807

doc preview: https://kibana_128294.docs-preview.app.elstc.co/guide/en/kibana/master/alert-action-settings-kb.html#alert-settings

Summary

This PR intends to remove defaultRuleTaskTimeout config and use config.execution.timeout key instead of it.
The new key config.execution.timeout can be modified by the users globally or per rule type.

To Verify

  • Add a sleep function into the task runner (just before this.ruleType.executor) code such as: await new Promise(r => setTimeout(r, 360000));
    That would delay your rule execution for 6 minutes.
  • Add the below config to your kibana.yml and run kibana
  • Create two rules (e.g. Index Threshold and a Security Rule) and wait your rules to be executed.

Index Threshold should timeout in 10s and the Security Rule in 5m.
You can see the error message in the rule detail page with the applied timeout config.

xpack.alerting.rules.execution:
    timeout: 30s
    ruleTypeOverrides:
      - id: '.index-threshold'
         timeout: 10s

To try global timeout config remove ruleTypeOverrides from your kibana.yml and run the kibana again...
You should see 30s in the error message on the rule details page.

xpack.alerting.rules.execution:
    timeout: 30s

Checklist

Delete any items that are not applicable to this PR.

  • Documentation was added for features that require explanation or tutorials
  • Unit or functional tests were updated or added to match the most common scenarios
  • If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list

1- Rule Type specific timeout from kibana.yml
xpack.alerting.rules.execution.ruleTypeOverrides

2- Timeout that applies to all rule types, from kibana.yml
xpack.alerting.rules.execution.timeout

3- ruleTaskTimeout from the source plugin
4- default timeout '5m'
@ersin-erdal ersin-erdal added backport:skip This commit does not require backporting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) release_note:feature Makes this part of the condensed release notes Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework v8.2.0 labels Mar 22, 2022
@@ -286,18 +285,21 @@ export class AlertingPlugin {
ActionGroupIds,
RecoveryActionGroupId
>
) {
) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted to an arrow function, so we can use this in the function.

@ersin-erdal ersin-erdal marked this pull request as ready for review March 22, 2022 19:30
@ersin-erdal ersin-erdal requested review from a team as code owners March 22, 2022 19:30
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

Copy link
Member

@jbudz jbudz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kibana-docker LGTM

@ymao1 ymao1 requested a review from lcawl March 23, 2022 12:27
Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Verified the following:

  • When no value is set for xpack.alerting.rules.execution.timeout, rule type timeouts are 5m for all rules except security rules.
  • When value is set for xpack.alerting.rules.execution.timeout, value is applied to all rules
  • When value is set for specific rule type using xpack.alerting.rules.execution.ruleTypeOverrides, value is applied to that specific rule type.

Nice job! Tagging @lcawl to review copy and @jbudz for a question about the docker allowlist

@@ -207,3 +200,22 @@ Specifies the behavior when a new or changed rule has a schedule interval less t

`xpack.alerting.rules.execution.actions.max`::
Specifies the maximum number of actions that a rule can trigger each time detection checks run.

`xpack.alerting.rules.execution.timeout`::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xpack.alerting.rules.execution.timeout

I know that it currently appears in our Rules UI, but "execution" is one of those words with violent connotations that we generally try to avoid. Is there some other term we can use that's less problematic? For example, "xpack.alerting.rules.tasks.timeout" or "xpack.alerting.rules.invocation.timeout", or just "xpack.alerting.rules.timeout"?

docs/settings/alert-action-settings.asciidoc Outdated Show resolved Hide resolved
+
For example, `20m`, `24h`, `7d`, `1w`. Default: `5m`.

`xpack.alerting.rules.execution.ruleTypeOverrides`::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xpack.alerting.rules.execution.ruleTypeOverrides::

Likewise, is it possible to omit "execution" from this setting name too? For example, just use "xpack.alerting.rules.typeOverrides" instead?

For example, `20m`, `24h`, `7d`, `1w`. Default: `5m`.

`xpack.alerting.rules.execution.ruleTypeOverrides`::
Overrides the configs under `xpack.alerting.rules.execution` for the rule type with the given ID. The value is array of objects consists of the key(s) that are wanted to be overridden under execution object and the id of the specific rule type that is wanted to be modified.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overrides the configs under xpack.alerting.rules.execution for the rule type with the given ID.

This makes me wonder if the configs referred to here are in some other file or UI. If not, this might be clearer:

"Overrides xpack.alerting.rules.* settings for a specific rule identifier.

docs/settings/alert-action-settings.asciidoc Outdated Show resolved Hide resolved
Copy link
Contributor

@mikecote mikecote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM! I tested the different mechanisms locally and saw it work! Great work


export const getRuleTaskTimeout = ({
config,
ruleTaskTimeout,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional nit: to make the naming of this variable clearer, what if it was called something like ruleTypeProposedTimeout?

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@ersin-erdal ersin-erdal merged commit c9aad65 into elastic:main Mar 28, 2022
@ersin-erdal ersin-erdal deleted the 124807-rule-task-timeout branch March 28, 2022 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework release_note:feature Makes this part of the condensed release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove ruleTaskTimeout configuration from rule types
7 participants