Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: customizable cloudwatch batch size when querying aws #10851

Merged
merged 3 commits into from Aug 1, 2022

Conversation

powersj
Copy link
Contributor

@powersj powersj commented Mar 18, 2022

Fixes: #10842

@telegraf-tiger telegraf-tiger bot added the fix pr to fix corresponding bug label Mar 18, 2022
@powersj powersj marked this pull request as draft March 18, 2022 17:20
Copy link
Contributor

@Hipska Hipska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this approach used rather than just configuring metric_batch_size for the plugin?

@powersj powersj marked this pull request as ready for review March 24, 2022 19:51
@powersj
Copy link
Contributor Author

powersj commented Mar 24, 2022

Why is this approach used rather than just configuring metric_batch_size for the plugin?

This is an input, not an output. This batch size is specific to how the plugin gathers data to avoid a limit in AWS.

@powersj powersj changed the title fix: customizable batch size fix: customizable cloudwatch batch size when querying aws Mar 24, 2022
@powersj powersj added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Mar 24, 2022
@Hipska
Copy link
Contributor

Hipska commented Mar 24, 2022

Oh, didn’t notice that.

@reimda
Copy link
Contributor

reimda commented Mar 24, 2022

Is there a way to determine a good batch size automatically? Maybe we calculate how big a request will be, and ask cloudwatch what the max batch is, then have our code determine how many requests can fit in a batch?

Adding a setting gives the user a way to fix the problem, but requires them to think more than maybe they should have to.

@powersj
Copy link
Contributor Author

powersj commented Mar 25, 2022

Is there a way to determine a good batch size automatically?

AWS's GetMetricData docs reference a limit of 500 metrics in a single request. This explains the currently hard-coded value of 500. Presumably, this is why we have not seen this issue before.

The error the user received was a 413: RequestEntityTooLarge: Request size 639468 exceeded 614400 bytes. The message was 25,068 bytes too large.

I do not see any payload limit on the CloudWatch service quotas page. I also did not see any other reference to a 413 or too large error in that document or their docs on common errors.

In the code, during Gather we split the queries into batches and then launch goroutines for each batch. It is not clear to me how we could check that we would avoid the 413 error before the goroutines.

@reimda, thoughts?

@Hipska
Copy link
Contributor

Hipska commented Mar 25, 2022

I'm wondering if this error actually comes from AWS or some in between proxy?

@reimda
Copy link
Contributor

reimda commented Mar 30, 2022

If there's no way to know the byte limit, maybe we should watch for 413 errors and handle them afterward by splitting the batch in half and trying again?

@reimda
Copy link
Contributor

reimda commented Mar 30, 2022

Maybe there is a documented api limit, just not a quota limit. The cloudwatch docs are not the easiest to sift through, but I would expect there to be something about this kind of limit there somewhere.

@powersj powersj removed the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Apr 12, 2022
@powersj powersj force-pushed the fix/10842 branch 2 times, most recently from d6a921d to 4d385ea Compare July 15, 2022 17:28
@powersj
Copy link
Contributor Author

powersj commented Jul 15, 2022

I have continued to go through the web docs, as well as CloudWatch API Reference and CloudWatch User Guide with no references to limits.

I have pushed another commit that tries the splitting method, but I think I prefer the user setting over this. The original bug report shows this occurring on every single request. This means the user will always be making 3x the requests. This does seem to be a one-off issue, and having a setting that tunes this makes more sense over something that introduces more complexity that is not easily tested either.

@reimda, that said, I'd like to close this one way or another, so please let me know which way you want to go with this.

@reimda
Copy link
Contributor

reimda commented Jul 21, 2022

I have continued to go through the web docs, as well as CloudWatch API Reference and CloudWatch User Guide with no references to limits.

I have pushed another commit that tries the splitting method, but I think I prefer the user setting over this. The original bug report shows this occurring on every single request. This means the user will always be making 3x the requests. This does seem to be a one-off issue, and having a setting that tunes this makes more sense over something that introduces more complexity that is not easily tested either.

@reimda, that said, I'd like to close this one way or another, so please let me know which way you want to go with this.

It looks like there's no way to tune the batch size automatically so a setting is the next best thing. Would it be useful to have both the splitting and a user setting? That way if the setting is too low they will get splitting instead of failures, but if/when they notice the 3x requests they can increase the setting?

It does seem like this is a relatively rare situation. I'm also ok with a setting and no splitting if that's what you prefer.

@powersj powersj added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Jul 29, 2022
@telegraf-tiger
Copy link
Contributor

@Hipska Hipska added area/aws AWS plugins including cloudwatch, ecs, kinesis plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Aug 1, 2022
Copy link
Contributor

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@powersj powersj merged commit 196abb7 into influxdata:master Aug 1, 2022
@powersj powersj deleted the fix/10842 branch August 1, 2022 19:09
reimda pushed a commit that referenced this pull request Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/aws AWS plugins including cloudwatch, ecs, kinesis fix pr to fix corresponding bug plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

inputs.cloudwatch Error large request entity size
4 participants