Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerting: Support concurrent queries for saving alert instances #70525

Conversation

grobinson-grafana
Copy link
Contributor

@grobinson-grafana grobinson-grafana commented Jun 22, 2023

What is this feature?

This pull request adds support for concurrent queries when saving alert instances to the database. This is an experimental feature in response to some customers experiencing delays between rule evaluation and sending alerts to Alertmanager, resulting in flapping. It is disabled by default.

1 concurrent saver (default):

logger=ngalert.state.manager rule_uid=d96bf5f1-ecc1-4bca-991f-560a05a5a5bb org_id=1 t=2023-06-22T16:02:06.001373+01:00 level=debug msg="Saving alert states" count=4096 max_concurrent_state_savers=1
logger=ngalert.state.manager rule_uid=d96bf5f1-ecc1-4bca-991f-560a05a5a5bb org_id=1 t=2023-06-22T16:02:09.163225+01:00 level=debug msg="Saving alert states done" count=4096

10 concurrent savers:

logger=ngalert.state.manager rule_uid=d96bf5f1-ecc1-4bca-991f-560a05a5a5bb org_id=1 t=2023-06-22T16:03:06.241755+01:00 level=debug msg="Saving alert states" count=4096 max_concurrent_state_savers=10
logger=ngalert.state.manager rule_uid=d96bf5f1-ecc1-4bca-991f-560a05a5a5bb org_id=1 t=2023-06-22T16:03:07.314158+01:00 level=debug msg="Saving alert states done" count=4096

Why do we need this feature?

[Add a description of the problem the feature is trying to solve.]

Who is this feature for?

[Add information on what kind of user the feature is for.]

Which issue(s) does this PR fix?:

Fixes #

Special notes for your reviewer:

Please check that:

  • It works as expected from a user's perspective.
  • If this is a pre-GA feature, it is behind a feature toggle.
  • The docs are updated, and if this is a notable improvement, it's added to our What's New doc.

This commit adds support for concurrent queries when saving alert
instances to the database. This is an experimental feature in
response to some customers experiencing delays between rule evaluation
and sending alerts to Alertmanager, resulting in flapping. It is
disabled by default.
@grobinson-grafana grobinson-grafana added area/alerting Grafana Alerting add to changelog no-backport Skip backport of PR labels Jun 22, 2023
@grobinson-grafana grobinson-grafana added this to the 10.1.x milestone Jun 22, 2023
@grobinson-grafana grobinson-grafana self-assigned this Jun 22, 2023
@grobinson-grafana grobinson-grafana requested a review from a team June 22, 2023 15:11
@grobinson-grafana grobinson-grafana requested a review from a team as a code owner June 22, 2023 15:11
@grobinson-grafana grobinson-grafana requested review from zserge, mildwonkey and suntala and removed request for a team, torkelo, zserge, mildwonkey and suntala June 22, 2023 15:11
@grobinson-grafana grobinson-grafana force-pushed the grobinson/support-concurrent-queries-for-saving-alert-instances branch from 3656109 to 5669998 Compare June 22, 2023 17:22
Copy link
Contributor

@stevesg stevesg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change - really clean implementation.

My only nits which are non-blocking:

  • Naming, I'd have gone for max_state_save_concurrency, but that's purely subjective so I'm happy with it as-is.
  • A test to exercise a concurrency value other than 1, but I don't think it's worth the effort. The ForEachJob function in dskit is battle tested, so I'm not especially worried.

logger.Debug("Saving alert states done", "count", len(states))

logger.Debug("Saving alert states", "count", len(states), "max_concurrent_state_savers", st.maxConcurrentStateSavers)
_ = concurrency.ForEachJob(ctx, len(states), st.maxConcurrentStateSavers, saveState)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the ForEachJob function will work as expected with concurrency=1, it will spawn a Goroutine needlessly. This is probably not anything to be worried about, just flagging it.

@grobinson-grafana
Copy link
Contributor Author

max_state_save_concurrency

I like it, will change it!

@grobinson-grafana
Copy link
Contributor Author

concurrency=1 (default):

logger=ngalert.state.manager rule_uid=c104ea09-a698-4b06-834b-9c6f110d3cb3 org_id=1 t=2023-06-22T20:40:06.054643+01:00 level=debug msg="Saving alert states" count=4096 max_state_save_concurrency=1
logger=ngalert.state.manager rule_uid=c104ea09-a698-4b06-834b-9c6f110d3cb3 org_id=1 t=2023-06-22T20:40:09.399026+01:00 level=debug msg="Saving alert states done" count=4096 max_state_save_concurrency=1

concurrency=10

logger=ngalert.state.manager rule_uid=c104ea09-a698-4b06-834b-9c6f110d3cb3 org_id=1 t=2023-06-22T20:41:06.059383+01:00 level=debug msg="Saving alert states" count=4096 max_state_save_concurrency=10
logger=ngalert.state.manager rule_uid=c104ea09-a698-4b06-834b-9c6f110d3cb3 org_id=1 t=2023-06-22T20:41:07.058264+01:00 level=debug msg="Saving alert states done" count=4096 max_state_save_concurrency=10

Copy link
Member

@JohnnyQQQQ JohnnyQQQQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@grobinson-grafana grobinson-grafana merged commit 7edbe72 into main Jun 23, 2023
11 checks passed
@grobinson-grafana grobinson-grafana deleted the grobinson/support-concurrent-queries-for-saving-alert-instances branch June 23, 2023 10:36
LudoVio pushed a commit that referenced this pull request Jun 26, 2023
This commit adds support for concurrent queries when saving alert
instances to the database. This is an experimental feature in
response to some customers experiencing delays between rule evaluation
and sending alerts to Alertmanager, resulting in flapping. It is
disabled by default.
santihernandezc pushed a commit that referenced this pull request Jun 28, 2023
This commit adds support for concurrent queries when saving alert
instances to the database. This is an experimental feature in
response to some customers experiencing delays between rule evaluation
and sending alerts to Alertmanager, resulting in flapping. It is
disabled by default.

(cherry picked from commit 7edbe72)
santihernandezc pushed a commit that referenced this pull request Jun 28, 2023
This commit adds support for concurrent queries when saving alert
instances to the database. This is an experimental feature in
response to some customers experiencing delays between rule evaluation
and sending alerts to Alertmanager, resulting in flapping. It is
disabled by default.

(cherry picked from commit 7edbe72)
santihernandezc pushed a commit that referenced this pull request Jun 28, 2023
This commit adds support for concurrent queries when saving alert
instances to the database. This is an experimental feature in
response to some customers experiencing delays between rule evaluation
and sending alerts to Alertmanager, resulting in flapping. It is
disabled by default.

(cherry picked from commit 7edbe72)
santihernandezc pushed a commit that referenced this pull request Jun 28, 2023
This commit adds support for concurrent queries when saving alert
instances to the database. This is an experimental feature in
response to some customers experiencing delays between rule evaluation
and sending alerts to Alertmanager, resulting in flapping. It is
disabled by default.

(cherry picked from commit 7edbe72)
santihernandezc added a commit that referenced this pull request Jun 29, 2023
…nces (#70869)

* Alerting: Support concurrent queries for saving alert instances (#70525)

This commit adds support for concurrent queries when saving alert
instances to the database. This is an experimental feature in
response to some customers experiencing delays between rule evaluation
and sending alerts to Alertmanager, resulting in flapping. It is
disabled by default.

(cherry picked from commit 7edbe72)

* remove changes in api_testing.go

---------

Co-authored-by: George Robinson <george.robinson@grafana.com>
@santihernandezc santihernandezc added backport v9.5.x Bot will automatically open backport PR and removed no-backport Skip backport of PR labels Jun 29, 2023
@grafana-delivery-bot
Copy link
Contributor

Hello @santihernandezc!
Backport pull requests need to be either:

  • Pull requests which address bugs,
  • Urgent fixes which need product approval, in order to get merged,
  • Docs changes.

Please, if the current pull request addresses a bug fix, label it with the type/bug label.
If it already has the product approval, please add the product-approved label. For docs changes, please add the type/docs label.
If the pull request modifies CI behaviour, please add the type/ci label.
If none of the above applies, please consider removing the backport label and target the next major/minor release.
Thanks!

santihernandezc pushed a commit that referenced this pull request Jun 29, 2023
This commit adds support for concurrent queries when saving alert
instances to the database. This is an experimental feature in
response to some customers experiencing delays between rule evaluation
and sending alerts to Alertmanager, resulting in flapping. It is
disabled by default.

(cherry picked from commit 7edbe72)
harisrozajac pushed a commit that referenced this pull request Jun 29, 2023
This commit adds support for concurrent queries when saving alert
instances to the database. This is an experimental feature in
response to some customers experiencing delays between rule evaluation
and sending alerts to Alertmanager, resulting in flapping. It is
disabled by default.
harisrozajac pushed a commit that referenced this pull request Jun 30, 2023
This commit adds support for concurrent queries when saving alert
instances to the database. This is an experimental feature in
response to some customers experiencing delays between rule evaluation
and sending alerts to Alertmanager, resulting in flapping. It is
disabled by default.
@ricky-undeadcoders ricky-undeadcoders modified the milestones: 10.1.x, 10.1.0 Aug 1, 2023
grobinson-grafana added a commit that referenced this pull request Aug 16, 2023
…ces (#70921)

* Alerting: Support concurrent queries for saving alert instances (#70525)

This commit adds support for concurrent queries when saving alert
instances to the database. This is an experimental feature in
response to some customers experiencing delays between rule evaluation
and sending alerts to Alertmanager, resulting in flapping. It is
disabled by default.

(cherry picked from commit 7edbe72)

* Trigger PR automation

---------

Co-authored-by: George Robinson <george.robinson@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
add to changelog area/alerting Grafana Alerting area/backend backport v9.5.x Bot will automatically open backport PR type/bug
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

6 participants