Cloudwatch Logs: Set Alerting timeout to datasource config's logsTimeout (#72611) #72611

idastambuk · 2023-07-31T15:09:57Z

What is this feature?

When getting results for CW Logs queries, the client needs to get the ID of the query first, then periodically poll the results with GetQueryResults. The Status of the query we get from AWS is "Running". Once the request resolves with Status "Done", it will contain complete results. More info here

In CloudwatchLogsQueryRunner for dashboards and explore, there is a timeout set in the ConfigEditor (default: 30 minutes) to retry query results for Cloudwatch Logs queries with GetQueryResults.

After some users noticed they were getting "failed to query data: fetching of query results exceeded max number of attempts" errors in their alerts, we decided to make this option configurable (only for alerts). However, in the process we discovered that the 30 minute configuration wasn't being used on the BE for alert queries. Instead, CW datasource plugin polled the results for 8 seconds maximum, which might be too short for some queries.

So instead of making this configurable, we decided to first use this setting from ConfigEditor and make it 30 minutes.
Note that this means that some queries will timeout with the alerting runner's error "context cancelled", since the default value when running queries in this context is 30s (https://github.com/grafana/grafana/blob/main/conf/defaults.ini#L1207) but this can be increased per instance and doesn't need any change in the code

The next step in improving this flow will be to not poll the results every second, but to instead do something like in async-query-data, which will increase wait time with every request that returns still "Running".
After this, we could implement a query option input that will give the user more control over how often the results are polled per-query, but it might not be necessary after ☝🏻

Some additional things in this PR:

Made the tooltip in ConfigEditor more clear that this is not a retry after error, but a retry after "not yet done" response.
Defined the default 30 minute timeout in executor struct, in order to be able to pass a shorter timeout for tests (otherwise the test with default value would run for 30 min 👀)
Moved the comment about logs queries closer to where the requests for GetQueryResults were happening (from datasource.go to log_sync_query.go, since I felt it was easy to miss where it was. Lmk if there's a better place.

Who is this feature for?

[Add information on what kind of user the feature is for.]

Which issue(s) does this PR fix?:

Fixes #63305

Special notes for your reviewer:
When this is merged, I will comment here and here to inform the user and customer support about the change.
TODO: Add to what's new

Please check that:

It works as expected from a user's perspective.
If this is a pre-GA feature, it is behind a feature toggle.
The docs are updated, and if this is a notable improvement, it's added to our What's New doc.

pkg/tsdb/cloudwatch/cloudwatch.go

fridgepoet

Great work, Ida!

I am fairly certain executor does not need to have the timeout. You ought to be able to inject your custom timeout for tests through the Settings instead. This will simplify the code a little. Let me know if I have missed something, though!

pkg/tsdb/cloudwatch/cloudwatch.go

pkg/tsdb/cloudwatch/models/settings.go

pkg/tsdb/cloudwatch/cloudwatch.go

pkg/tsdb/cloudwatch/log_sync_query.go

pkg/tsdb/cloudwatch/log_sync_query_test.go

pkg/tsdb/cloudwatch/log_sync_query.go

pkg/tsdb/cloudwatch/cloudwatch.go

pkg/tsdb/cloudwatch/models/settings_test.go

public/app/plugins/datasource/cloudwatch/components/ConfigEditor.tsx

pkg/tsdb/cloudwatch/log_sync_query.go

fridgepoet · 2023-08-02T10:56:33Z

Great Go code, these are purely functional questions/suggestions and not a remark on the Go code!

…out (grafana#72611)

grafana-delivery-bot bot added this to the 10.2.x milestone Jul 31, 2023

grafana-pr-automation bot added datasource/CloudWatch area/backend area/frontend labels Jul 31, 2023

idastambuk commented Jul 31, 2023

View reviewed changes

pkg/tsdb/cloudwatch/cloudwatch.go Outdated Show resolved Hide resolved

idastambuk added add to changelog no-backport Skip backport of PR add to what's new labels Jul 31, 2023

idastambuk added 2 commits July 31, 2023 18:01

Set logs polling in alerts to the value from settings

4ef09d4

Lint

63d30bd

idastambuk force-pushed the 63305-cloudwatch-logs-make-alertmaxattempts-configurable branch from 70aac6a to 63d30bd Compare July 31, 2023 16:02

idastambuk marked this pull request as ready for review July 31, 2023 16:15

idastambuk requested a review from a team as a code owner July 31, 2023 16:15

idastambuk requested review from fridgepoet and kevinwcyu and removed request for a team July 31, 2023 16:15

fridgepoet reviewed Aug 1, 2023

View reviewed changes

idastambuk added 2 commits August 1, 2023 18:52

Move parsing to settings.go

38c0778

Lint

484f483

fridgepoet reviewed Aug 2, 2023

View reviewed changes

pkg/tsdb/cloudwatch/log_sync_query.go Outdated Show resolved Hide resolved

fridgepoet approved these changes Aug 2, 2023

View reviewed changes

Add additional tests, make Duration struct etc.

44cee7a

idastambuk changed the title ~~Cloudwatch Logs: Set alert timeout from datasource config~~ Cloudwatch Logs: Use alert timeout from datasource config Aug 2, 2023

idastambuk added 2 commits August 3, 2023 13:06

Fix test

27139ee

Lint2.0

e61e0c0

idastambuk changed the title ~~Cloudwatch Logs: Use alert timeout from datasource config~~ Cloudwatch Logs: Set Alerting timeout to datasource config's logsTimeout (#72611) Aug 3, 2023

idastambuk merged commit abff6e2 into main Aug 3, 2023
16 checks passed

idastambuk deleted the 63305-cloudwatch-logs-make-alertmaxattempts-configurable branch August 3, 2023 17:35

idastambuk mentioned this pull request Aug 3, 2023

CloudWatch Logs: Make alertMaxAttempts configurable #63305

Closed

aishyandapalli pushed a commit to aishyandapalli/grafana that referenced this pull request Aug 16, 2023

Cloudwatch Logs: Set Alerting timeout to datasource config's logsTime…

a04aec0

…out (grafana#72611)

chauchausoup pushed a commit to chauchausoup/grafana that referenced this pull request Sep 15, 2023

Cloudwatch Logs: Set Alerting timeout to datasource config's logsTime…

aec9202

…out (grafana#72611)

zerok modified the milestones: 10.2.x, 10.2.0 Oct 23, 2023

dnhn mentioned this pull request Oct 24, 2023

grafana 10.2.0 Homebrew/homebrew-core#152264

Closed

BrewTestBot mentioned this pull request Oct 25, 2023

grafana 10.2.0 Homebrew/homebrew-core#152321

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cloudwatch Logs: Set Alerting timeout to datasource config's logsTimeout (#72611) #72611

Cloudwatch Logs: Set Alerting timeout to datasource config's logsTimeout (#72611) #72611

idastambuk commented Jul 31, 2023 •

edited

Loading

fridgepoet left a comment •

edited

Loading

fridgepoet commented Aug 2, 2023

Cloudwatch Logs: Set Alerting timeout to datasource config's logsTimeout (#72611) #72611

Cloudwatch Logs: Set Alerting timeout to datasource config's logsTimeout (#72611) #72611

Conversation

idastambuk commented Jul 31, 2023 • edited Loading

fridgepoet left a comment • edited Loading

Choose a reason for hiding this comment

fridgepoet commented Aug 2, 2023

idastambuk commented Jul 31, 2023 •

edited

Loading

fridgepoet left a comment •

edited

Loading