Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CloudWatch Logs: Queries in an expression should run synchronously #64443

Merged
merged 2 commits into from
Mar 13, 2023

Conversation

iwysiu
Copy link
Contributor

@iwysiu iwysiu commented Mar 8, 2023

Cloudwatch Logs queries are generally run asynchronously in parts that are managed by the frontend (the frontend sends separate requests to start the query and then to poll for results). However, when a expression is used, it expects the query results to be returned from the first request. This pr adds a flag to the expression query that signals to the cloudwatch datasource that it should run the entire query synchronously in the backend and return the results (which we already handle for alerting).

The fixed bug can be reproed in the external dev as below:
Screenshot 2023-03-09 at 2 29 38 PM

Which issue(s) does this PR fix?:

Fixes #63735

Special notes for your reviewer:
Note that this doesn't solve the separate "wide format" multiple time series issue #63388, that will still require changing how we create data frames.
Also I suspect that this may be an issue in async redshift/athena.

@iwysiu iwysiu added backport v9.4.x Mark PR for automatic backport to v9.4.x add to changelog type/bug labels Mar 8, 2023
@iwysiu iwysiu added this to the 9.4.5 milestone Mar 8, 2023
@grafanabot
Copy link
Contributor

Hello @iwysiu!
Backport pull requests need to be either:

  • Pull requests which address bugs,
  • Urgent fixes which need product approval, in order to get merged,
  • Docs changes.

Please, if the current pull request addresses a bug fix, label it with the type/bug label.
If it already has the product approval, please add the product-approved label. For docs changes, please add the type/docs label.
If none of the above applies, please consider removing the backport label and target the next major/minor release.
Thanks!

@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2023

Backend code coverage report for PR #64443

Plugin Main PR Difference
cloudwatch 82.4% 82.4% 0%

@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2023

Frontend code coverage report for PR #64443
No changes

@iwysiu iwysiu marked this pull request as ready for review March 9, 2023 19:09
@iwysiu iwysiu requested review from a team as code owners March 9, 2023 19:09
@iwysiu iwysiu requested review from idastambuk and kevinwcyu and removed request for a team March 9, 2023 19:09
@iwysiu iwysiu changed the title CloudWatch Logs: queries in an expression should run synchronously CloudWatch Logs: Queries in an expression should run synchronously Mar 9, 2023
Copy link
Member

@sarahzinger sarahzinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool catch and makes sense to me! Might be good to get someone from plugins platform to weigh-in maybe? It feels like this an interesting edge case that could be at least better documented as part of our api or something? Otherwise I'm not sure how any other plugin dev would know to do this sort of thing you know?

Also lets add tests!

@@ -18,7 +18,7 @@ const (
alertPollPeriod = time.Second
)

func (e *cloudWatchExecutor) executeLogAlertQuery(ctx context.Context, req *backend.QueryDataRequest) (*backend.QueryDataResponse, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this is used for things beyond alerting should it go in a separate file, seems a bit weird to have it in a file called log_alert probably right?

Also sort of basic on my part but how would one know if they should use a Sync vs Async Query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed! We have the async queries so that the panels can get and display partial results, which the cloudwatch logs frontend handles and just keeps pulling results until it gets a done message, but the problem is that alerting and expressions want to consume the data for other things and expect the first dataQuery call to return a frame with all the data. We want to return a sync query whenever the first call needs to get the whole result.

@@ -227,6 +229,7 @@ func (dn *DSNode) Execute(ctx context.Context, now time.Time, _ mathexp.Vars, s
},
Headers: dn.request.Headers,
}
req.Headers[FromExpressionHeaderName] = "true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do other datasources deal with this problem? It feels worth asking grafana plugins platform folks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked! But honestly I think that Cloudwatch logs, redshift and athena are the only async datasources.

@@ -21,6 +21,8 @@ var (
logger = log.New("expr")
)

const FromExpressionHeaderName = "FromExpression"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a header, we generally do X-Grafana , perhaps X-Grafana-From-SSE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm not sure these become HTTP headers, and I guess it matches FromAlert, so okay to leave it.

Copy link
Contributor Author

@iwysiu iwysiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool catch and makes sense to me! Might be good to get someone from plugins platform to weigh-in maybe? It feels like this an interesting edge case that could be at least better documented as part of our api or something? Otherwise I'm not sure how any other plugin dev would know to do this sort of thing you know?

It would probably make sense to document this somewhere. I actually don't think that other people are doing async datasources, at least right now, but other people might. Also, tests added! I needed to make one of the functions mock-able to do it.

@@ -18,7 +18,7 @@ const (
alertPollPeriod = time.Second
)

func (e *cloudWatchExecutor) executeLogAlertQuery(ctx context.Context, req *backend.QueryDataRequest) (*backend.QueryDataResponse, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed! We have the async queries so that the panels can get and display partial results, which the cloudwatch logs frontend handles and just keeps pulling results until it gets a done message, but the problem is that alerting and expressions want to consume the data for other things and expect the first dataQuery call to return a frame with all the data. We want to return a sync query whenever the first call needs to get the whole result.

@@ -227,6 +229,7 @@ func (dn *DSNode) Execute(ctx context.Context, now time.Time, _ mathexp.Vars, s
},
Headers: dn.request.Headers,
}
req.Headers[FromExpressionHeaderName] = "true"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked! But honestly I think that Cloudwatch logs, redshift and athena are the only async datasources.

Copy link
Member

@sarahzinger sarahzinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@grafanabot grafanabot removed this from the 9.4.5 milestone Mar 13, 2023
@grafanabot
Copy link
Contributor

This pull request was removed from the 9.4.5 milestone because 9.4.5 is currently being released.

@iwysiu iwysiu added this to the 9.4.6 milestone Mar 13, 2023
@iwysiu iwysiu merged commit 74436d3 into main Mar 13, 2023
@iwysiu iwysiu deleted the iwysiu/63735 branch March 13, 2023 17:31
grafanabot pushed a commit that referenced this pull request Mar 13, 2023
iwysiu added a commit that referenced this pull request Mar 14, 2023
…nously (#64707)

CloudWatch Logs: Queries in an expression should run synchronously (#64443)

(cherry picked from commit 74436d3)

Co-authored-by: Isabella Siu <Isabella.siu@grafana.com>
fridgepoet added a commit that referenced this pull request Mar 20, 2023
fridgepoet added a commit that referenced this pull request Mar 20, 2023
…ously (#64443)" (#65036)

Revert "CloudWatch Logs: Queries in an expression should run synchronously (#64443)"

This reverts commit 74436d3.
grafanabot pushed a commit that referenced this pull request Mar 20, 2023
…ously (#64443)" (#65036)

Revert "CloudWatch Logs: Queries in an expression should run synchronously (#64443)"

This reverts commit 74436d3.

(cherry picked from commit 972e611)
fridgepoet added a commit that referenced this pull request Mar 20, 2023
… synchronously (#64443)" (#65067)

CloudWatch Logs: Revert "Queries in an expression should run synchronously (#64443)" (#65036)

Revert "CloudWatch Logs: Queries in an expression should run synchronously (#64443)"

This reverts commit 74436d3.

(cherry picked from commit 972e611)

Co-authored-by: Shirley <4163034+fridgepoet@users.noreply.github.com>
@zerok zerok mentioned this pull request Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CloudWatch Logs: applying Reduce expression results in an error
5 participants