Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ResponseOps][Alerting] Ability run a rule on-demand #139848

Merged
merged 21 commits into from
Sep 14, 2022

Conversation

doakalexi
Copy link
Contributor

@doakalexi doakalexi commented Aug 31, 2022

Resolves #50215

Summary

  • Added a button on the Rule and Connectors UI ("Rules" tab) that allows running a rule add hoc

Screen Shot 2022-09-01 at 7 55 57 AM

  • Leveraged the task manager's runSoon API

Checklist

To verify

  • Go to the rules list page, and select to run a rule
  • Verify that you see a toast saying that the rule is scheduled to run soon.
  • You can refresh and verify that the rule is scheduled to run now

@doakalexi doakalexi changed the title Adding button and endpoint to run rule soon [ResponseOps][Alerting] Ability run an alert immediately Aug 31, 2022
@doakalexi doakalexi added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Feature:Alerting release_note:feature Makes this part of the condensed release notes labels Aug 31, 2022
@doakalexi doakalexi marked this pull request as ready for review September 1, 2022 15:13
@doakalexi doakalexi requested a review from a team as a code owner September 1, 2022 15:13
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few initial comments on the code. I'm wondering if we should do more with the user UX other than the toast message? I'm worried that users will click Run soon for a rule that takes a minute to run and get impatient and keep trying to run the rule over and over again.

@ymao1
Copy link
Contributor

ymao1 commented Sep 6, 2022

I'm worried that users will click Run soon for a rule that takes a minute to run and get impatient and keep trying to run the rule over and over again.

Maybe as part of the runSoon API, we can get the task manager document to make sure the current status isn't claiming or running? And return a message to the user like "Task already running"?

@doakalexi
Copy link
Contributor Author

I'm worried that users will click Run soon for a rule that takes a minute to run and get impatient and keep trying to run the rule over and over again.

Maybe as part of the runSoon API, we can get the task manager document to make sure the current status isn't claiming or running? And return a message to the user like "Task already running"?

I think that sounds good to do! With this change should I keep the toasts or change the ux to something else like you mentioned above?

@ymao1
Copy link
Contributor

ymao1 commented Sep 6, 2022

With this change should I keep the toasts or change the ux to something else like you mentioned above?

I think a toast is fine but might be worth pinging @mdefazio :)

@doakalexi
Copy link
Contributor Author

@mdefazio When a user selects Run Rule it will try to schedule the rule to run and if it is successful will show a success toast and if there is an error it will show an error toast. If the user tries to run a rule that is currently running, it will show a warning toast with the message 'Rule is already runnning'. I was wondering if you had any thoughts on this ux? Thanks!

@mdefazio
Copy link
Contributor

mdefazio commented Sep 6, 2022

@doakalexi Thanks for the ping. A few questions:

  1. Does the 'last response' (immediately) change to pending when the user runs the rule?
  2. Are we also providing this option on the rule detail page?
  3. What are the reasons someone wants to run the rule now? (Perhaps a dumb question, but I guess I'm wondering if we should push them to the rule detail page to do this).
  4. We should likely reorder the overflow menu—assuming we need it here—but having 'run rule now' and 'delete rule' so close may potentially be frustrating
  5. Similar to ques 3, if this is a really common ask for a user, perhaps we also change what options are exposed on hover (right now it's still Edit and Delete correct?).

If the user tries to run a rule that is currently running, it will show a warning toast with the message 'Rule is already runnning'. I was wondering if you had any thoughts on this ux?

My gut reaction is to disable the button if it's currently running and avoid this scenario altogether. Might be odd with the overflow menu scenario (which leads to my question about only having this on the rule detail view). But ideally we would avoid this last scenario.

Happy to hop on a zoom if it helps clear up some of these questions.

@mikecote
Copy link
Contributor

mikecote commented Sep 7, 2022

What are the reasons someone wants to run the rule now? (Perhaps a dumb question, but I guess I'm wondering if we should push them to the rule detail page to do this).

My main thinking would be to get a rule out of an error state. For example, the Elastic license expired and the user renewed it, they'd want the rule to get out of error state. Another example could be a transient error and the user wants to re-run a specific rule.

Another scenario could be modifying the rule and wanting to run the rule again with the updated configuration. They don't have a preview functionality today and will take some time to develop, so having this ad-hoc run functionality in the interim can be useful.

Similar to ques 3, if this is a really common ask for a user, perhaps we also change what options are exposed on hover (right now it's still Edit and Delete correct?).

I don't think it's common enough yet to warrant a dedicated button. We could add telemetry to count how often each cluster uses this feature.

My gut reaction is to disable the button if it's currently running and avoid this scenario altogether. Might be odd with the overflow menu scenario (which leads to my question about only having this on the rule detail view). But ideally we would avoid this last scenario.

There most likely will be a disconnect between loading the rules list / details page and the current state of the rule (running, idle, etc) which makes disabling the button not the best option (the rule may have finished running after the page loaded and run rule now be used). The API could be enhanced so when the user clicks we can provide feedback if the rule is currently running at that time.

For rules that were currently running and had a quick or short duration, it may not be necessary to add some UX as the rule ran when the user wanted it to. However, when a rule has been running for a while, the user may be trying to "run rule" to unblock it and some UX feedback could be useful.. And since we don't know the expected run duration of a currently running rule, the latter may be a better option (some UX feedback after clicking)..

@mdefazio
Copy link
Contributor

mdefazio commented Sep 7, 2022

Thanks for the explanations @mikecote

However, when a rule has been running for a while, the user may be trying to "run rule" to unblock it and some UX
feedback could be useful

Would this also lean towards having a 'Stop rule run' option as well? Assuming 'unblock' would mean stopping the process.

I wanted to summarize some of the points you mentioned as I think they are really useful:

  1. If I make edits to a rule, I would want to have the option to restart the rule interval upon saving (does it restart currently?). For instance, if I have an interval of 24 hrs, I don't want to wait 24 hours to test out my rule and see if it runs successfuly.
  2. Similarly, if a rule has failed, I want to have the option to run on demand as the fix may not be something within the rule configuration (so the above wouldn't apply). Although, in this scenario, if might be good to rename this option to 'Retry'.
  3. Any status of a rule depends on a page refresh..so being able to see if a rule has completed running would mean refreshing the data and does not happen on its own (ie, requires user interaction).

Assuming this option would be available for bulk actions as well? In which case, I can certainly see why some sort of messaging would be needed to inform the user that a rule is currently running.

Wondering if instead of saying 'Run soon', we say something like 'Restart rule interval'. I understand that the rule run may not start right away, but wondering if this is a bit more actionable or immediate, rather than doing some 'soon'?

@mikecote
Copy link
Contributor

mikecote commented Sep 7, 2022

Would this also lean towards having a 'Stop rule run' option as well? Assuming 'unblock' would mean stopping the process.

We have a couple of challenges before this can be accomplished. In the meantime, the timeout logic will kick in to automatically stop rules from running too long (5 minutes).

If I make edits to a rule, I would want to have the option to restart the rule interval upon saving (does it restart currently?). For instance, if I have an interval of 24 hrs, I don't want to wait 24 hours to test out my rule and see if it runs successfuly.

This isn't a bad idea, we would have to think through the technical challenges when editing in bulk to not overwhelm the system. Maybe a follow up issue could be created? We have some logic when users change rule interval, but it doesn't run right away, just adjusts the schedule to respect the new interval based on the last time it ran.

Assuming this option would be available for bulk actions as well? In which case, I can certainly see why some sort of messaging would be needed to inform the user that a rule is currently running.

Same as above, we may want to hold off on bulk capabilities to avoid the system from getting overwhelmed.

Similarly, if a rule has failed, I want to have the option to run on demand as the fix may not be something within the rule configuration (so the above wouldn't apply). Although, in this scenario, if might be good to rename this option to 'Retry'.

Wondering if instead of saying 'Run soon', we say something like 'Restart rule interval'. I understand that the rule run may not start right away, but wondering if this is a bit more actionable or immediate, rather than doing some 'soon'?

Summoning @lcawl for creative naming :)

@mdefazio
Copy link
Contributor

mdefazio commented Sep 7, 2022

Sounds good @mikecote Thanks for the quick responses. I've setup a meeting with @doakalexi to walkthrough it all and will post any edits.

@gmmorris gmmorris changed the title [ResponseOps][Alerting] Ability run an alert immediately [ResponseOps][Alerting] Ability run an alert on-demand Sep 8, 2022
@gmmorris gmmorris changed the title [ResponseOps][Alerting] Ability run an alert on-demand [ResponseOps][Alerting] Ability run a rule on-demand Sep 8, 2022
@mdefazio
Copy link
Contributor

mdefazio commented Sep 8, 2022

Spoke with @doakalexi and the outcome was:

  • Add in the Run rule option to the Rule detail overflow menu in the same order it is in the rules list overflow menu
  • Likely a good idea to have @lcawl take a look. Some of my hesitation is primarily around wording and if its clear. But I don't really have a good recommendation one way or the other from what it is now.
    • @doakalexi Might be useful to add screenshots to the PR summary of both toasts that appear when a user clicks on this. I was able to see the confirmation toast, but my rule was too fast to see the warning message.

Future changes:

  • We also discussed possible other placements of this option. We will hold off on this for now, but we think it's likely useful to have this in the flyout as well, similar to how we have the Connectors flyout buttons (Save | Save and test). So Save and Save and run.
  • In the scenario where there's an error, it might be good to have this exposed near the Last response indicator on the rule detail. But I think this isn't necessary at this point, but something to consider in future updates to the rule detail page.

@doakalexi
Copy link
Contributor Author

  • @doakalexi Might be useful to add screenshots to the PR summary of both toasts that appear when a user clicks on this. I was able to see the confirmation toast, but my rule was too fast to see the warning message.

Here is the screenshot of the toasts
Screen Shot 2022-09-08 at 9 36 02 AM

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Works as expected :) Left some minor comments. It might also be nice to update the docs with this new feature cc @lcawl

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
triggersActionsUi 597 599 +2

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
triggersActionsUi 1.0MB 1.0MB +1.5KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
triggersActionsUi 92.3KB 92.4KB +130.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@ersin-erdal ersin-erdal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@doakalexi doakalexi merged commit 6499bae into elastic:main Sep 14, 2022
@kibanamachine kibanamachine added v8.5.0 backport:skip This commit does not require backporting labels Sep 14, 2022
@doakalexi doakalexi deleted the alerting/run-rule-immediately branch December 6, 2022 19:17
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Alerting release_note:feature Makes this part of the condensed release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability run an alert immediately
8 participants