-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RAM] [Flapping] Add Flapping Rules Settings #147774
[RAM] [Flapping] Add Flapping Rules Settings #147774
Conversation
@elasticmachine merge upstream |
…weiWu/kibana into issue-143529-flapping-config
@elasticmachine merge upstream |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
Pinging @elastic/response-ops (Team:ResponseOps) |
export const RULES_SETTINGS_SAVED_OBJECT_TYPE = 'rules_settings'; | ||
export const RULES_SETTINGS_SAVED_OBJECT_ID = 'rules-settings'; | ||
|
||
export const DEFAULT_FLAPPING_SETTINGS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find the existing default values, so this will obviously need to be changed 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is here for now -> https://github.com/elastic/kibana/blob/main/x-pack/plugins/alerting/server/lib/flapping_utils.ts
Since the flapping settings are space-specific, did we consider storing these in Advanced Settings / uiSettings instead of creating our own saved object? Seems like a perfect fit, though I'm not sure how well you can deal with feature controls with those ... |
@pmuellr Thanks for the feedback. I did look at advanced settings, In my mind, I think the 3 complications with using
Although it is interesting to think about whether or not we want rules settings to be configurable via |
Ya, we had a brief thought about figuring out how we could use an Advanced Setting for some of our config: #132183 - I think in some Slack chatter, we were thinking that perhaps we could make use of AS, but do the UX in the rules pages - as well as supporting the AS page. I think I saw someone doing that, but not completely sure.
Ya, I think that's the killer. I thought I had seen some setting that was conditionally enabled for editing, but can't see that now.
They actually recently renamed |
@pmuellr, we were considering using kibana settings but if I am correct , we will need to save the full json blob of all the kibana advanced setting each time there is an update. (we did not think that was super cool) and then we won't have been able to create specific sub feature and show who/when updated this setting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing a drive by review, will let others do the more thorough review. I'm loving the new UI and the concept of settings! I left a few questions as I think about how rule executions will load these settings in the background.
<EuiText size="s"> | ||
<FormattedMessage | ||
id="xpack.triggersActionsUI.rulesSettings.flapping.flappingSettingsDescription" | ||
defaultMessage="An alert will be considered flapping if it changes status {lookBackWindow} within the last {statusChangeThreshold}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaultMessage="An alert will be considered flapping if it changes status {lookBackWindow} within the last {statusChangeThreshold}." | |
defaultMessage="An alert will be considered flapping if it changes status {statusChangeThreshold} within the last {lookBackWindow}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like it's also missing nouns associated with the numbers, for example "if it changes status X times within the last Y runs ...
lookBackWindow: schema.number(), | ||
statusChangeThreshold: schema.number(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: should we add UI / API validation to ensure lookBackWindow
>= statusChangeThreshold
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep we can do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added backend validation for this case, @mdefazio let me know if we want to have any frontend validation
x-pack/plugins/alerting/server/rules_settings_client_factory.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. I had some comments prepped but it looks like you've solved many of them already.
Looks like the copy has some mistakes (apologies if pulled directly from the mocks). Should be Alerts that quickly go between active and recovered...
I think the toggle label should just be Enable flapping detection (recommended)
We can also remove the spacer between the form description and the toggle so these are closer together.
@lcawl Do you have any thoughts on the tooltips for lookback and change threshold?
Re: validation question. Looks like the sliders were updated so the threshold is controlled by the lookback window.
Last point, which is perhaps more of a question: Should we keep the 'Documentation' and 'Settings' links in the top right even if we don't have rules installed? Would someone want to update settings/flapping detection before adding rules?
export const rulesSettingsMappings: SavedObjectsTypeMappingDefinition = { | ||
properties: { | ||
flapping: { | ||
properties: { | ||
enabled: { | ||
type: 'boolean', | ||
}, | ||
lookBackWindow: { | ||
type: 'long', | ||
}, | ||
statusChangeThreshold: { | ||
type: 'long', | ||
}, | ||
createdBy: { | ||
type: 'keyword', | ||
}, | ||
updatedBy: { | ||
type: 'keyword', | ||
}, | ||
createdAt: { | ||
type: 'date', | ||
}, | ||
updatedAt: { | ||
type: 'date', | ||
}, | ||
}, | ||
}, | ||
}, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Do we need all these fields indexed, or can we add index: false
to any of those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I agree with you @JiaweiWu because this SO is global per space so there is no need to index it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, loading settings by ID is all we'll need at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking we may want to pull out the task_runner
parts of this PR, since they seem to be non-operational. And made some other comments in that section, if we decide to keep it in. Kinda feels like we should wait till we start threading the settings through all the different parts of task_runner, before making some changes there.
<EuiText size="s"> | ||
<FormattedMessage | ||
id="xpack.triggersActionsUI.rulesSettings.flapping.flappingSettingsDescription" | ||
defaultMessage="An alert will be considered flapping if it changes status {lookBackWindow} within the last {statusChangeThreshold}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like it's also missing nouns associated with the numbers, for example "if it changes status X times within the last Y runs ...
x-pack/plugins/alerting/server/rules_settings_client/rules_settings_client.ts
Show resolved
Hide resolved
x-pack/plugins/alerting/server/rules_settings_client/flapping/rules_settings_flapping_client.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/alerting/server/rules_settings_client/rules_settings_client.ts
Outdated
Show resolved
Hide resolved
@@ -110,6 +115,7 @@ export async function getRuleAttributes<Params extends RuleTypeParams>( | |||
|
|||
const fakeRequest = getFakeKibanaRequest(context, spaceId, rawRule.attributes.apiKey); | |||
const rulesClient = context.getRulesClientWithRequest(fakeRequest); | |||
const rulesSettingsClient = context.getRulesSettingsClientWithRequest(fakeRequest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a few problems with this.
-
Seems like an odd place to get the ruleSettingsClient. Presumably it has to be done somewhere in the rule task runner, to get the current values to pass along internally. If I was looking for it, I'd never look here though! I think the TaskRunner constructor would be a better place. I think
loadRule()
is used in other places as well, that probably don't need that client (just the SO attrs). -
I don't think we want the fakeRequest here, since that's the user that created the rule. They may not have privs to read the flapping config (as I understand it), and so presumably use of this client by such a user would end up throwing an error. At least that's my general understanding of what's going on. We need to do auth when users are reading/writing the settings in the UX (and HTTP APIs), but we will also need to get these settings for every rule run, and so the client for THOSE usages will need to be one of the superuser things.
Maybe since the settings aren't threaded all the way through yet, we could just remove these bits, since it doesn't appear they're doing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For # 2: Good point, In fact, this made me realize that I've completely neglected to use the checkPrivileges
APIs to enforce savedObject
privileges at the rulesSettingsClient
level since I've been relying on the router tag
property to gate APIs by features. So I will implement checkPrivileges
shortly here as well.
In that case, what is our preferred method of "faking" a superuser? Is there a fakeRequest
as a superuser? Or should we just have a programmatic bypass of the privilege checks (seems kind of...gross, haha)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, what is our preferred method of "faking" a superuser? Is there a fakeRequest as a superuser? Or should we just have a programmatic bypass of the privilege checks (seems kind of...gross, haha)?
The recommended option is to create a scoped client that excludes the security extension. This will skip all the RBAC and pretty much uncouple a user from the SO client. You can find an example here: https://github.com/elastic/kibana/blob/main/x-pack/plugins/alerting/server/rules_client_factory.ts#L94-L97
If that doesn't work and you can make the rules settings client work with a savedObjectsClient
| savedObjectsRepository
, you should be able by bypass RBAC whenever there isn't a user by creating a SO repository instead. The repository doesn't require a user and bypasses all the wrappers (encrypted saved objects, security, etc.) so it's less favourable but still works.
Hey @pmuellr about that |
This idea sounds good. Let's make sure the settings get updated across all the Kibanas when more than one Kibana is running (because only 1 Kibana will process the update settings request and the others need to be aware). So we may need some kind of per-Kibana polling / cache w/ TTL if we don't load it on every rule run.. |
Ya, we need to either load it every rule run (ugh!) or refresh it from the SO on an interval. Seems like Advanced Settings has the same challenge. I wonder if they have some way of avoiding doing the GETs every time ... |
No they ask the user to refresh the page :( |
In my last commit I addressed the following review feedback:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Core changes LGTM
@elasticmachine merge upstream |
has been done + validation on the API side
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Module Count
Public APIs missing comments
Async chunks
Public APIs missing exports
Page load bundle
Saved Objects .kibana field count
History
To update your PR or re-run it, just comment with: |
...ins/triggers_actions_ui/public/application/components/rules_setting/rules_settings_modal.tsx
Show resolved
Hide resolved
...ions_ui/public/application/components/rules_setting/rules_settings_flapping_form_section.tsx
Show resolved
Hide resolved
...ins/triggers_actions_ui/public/application/components/rules_setting/rules_settings_modal.tsx
Show resolved
Hide resolved
const flappingEnableLabel = i18n.translate( | ||
'xpack.triggersActionsUI.rulesSettings.modal.enableFlappingLabel', | ||
{ | ||
defaultMessage: 'Enabled flapping detection (recommended)', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Advanced Settings UI, it uses "on" and "off" instead of "enabled" and "disabled". Can we do the same here (i.e. the text changes from "On" to "Off" depending on the state of the toggle)?:
defaultMessage: 'Enabled flapping detection (recommended)', | |
defaultMessage: 'On (recommended)', |
Summary
Resolves: #143529
Initial PR for adding flapping rule settings.
This PR adds a new saved object
rules-settings
with the schema:It also adds 2 new endpoints:
GET /rules/settings/_flapping
POST /rules/settings/_flapping
The new rules settings saved object is instantiated per space, using a predetermined ID to enable OCC. This new saved object allows the user to control rules flapping settings for a given space. Access control to the new saved object is done through the kibana features API. A new
RulesSettingsClient
was created and can be used to interact with the settings saved object. This saved object is instantiated lazily. When the code callsrulesSettingsClient.flapping().get
orrulesSettingsClient.flapping().update
, we will lazily create a new saved object if one does not exist for the current space. (I have explored bootstrapping this saved object elsewhere but I think this is the easiest solution, I am open to change on this).We have set up the rules settings to support future rule settings sections by making the settings client and permissions modular. Since permission control can be easily extended by using sub features.
This PR doesn't contain integration for the
task_runner
to use the flapping settings, but I can do that in this PR if needed.Rules settings feature and sub feature (under management)
Rules settings settings button
Rules settings modal
Disabled
Rules settings settings button with insufficient permissions
Rules settings modal with insufficient write subfeature permissions
Rules settings modal with insufficient read subfeature permissions
TODO: Integration testing, both API and E2E