-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally use GetMetricData instead of GetMetricStatistics #414
Optionally use GetMetricData instead of GetMetricStatistics #414
Conversation
7512e68
to
2530e48
Compare
2530e48
to
d696fd9
Compare
It's great to see this worked on! Please let me know if you need guidance or a deeper review. One question I have is what factors would make someone choose the existing implementation? When would this implementation be fit to become the default or only one? |
Hey @matthiasr 👋 The original issue mentioned a possible gain in costs, but then later it was ruled out. This official post is recommending using For instance - if you ask for |
So there is no reason to keep GetMetricStatistics? |
In general - seems like there isn't. Looking YACE - it seems like they chose to keep using GetMetricStatistic if the dimensions are statically defined - though I think they intend to remove it completely and an issue was opened about it a few months ago. I can think of few reasons to keep having both options available first:
|
If at all possible I would prefer to have pagination right away – without
it, data is silently incomplete, and it's very difficult to predict or
detect whether that is the case.
I love your transition plan! It lets us gather experience with this method
while eventually simplifying the code. Would you like to keep driving that
forward?
…On Wed, Apr 20, 2022, 13:44 Or Shachar ***@***.***> wrote:
In general - seems like there isn't.
Looking YACE - it seems like they chose to keep using GetMetricStatistic
if the dimensions are statically defined - though I think they intend to
remove it completely
<nerdswords/yet-another-cloudwatch-exporter#118 (comment)>
and an issue was opened
<nerdswords/yet-another-cloudwatch-exporter#520>
about it a few months ago.
I can think of few reasons to keep having both options available first:
1. The first method, though slow and inefficient, is battle tested. We
want users to opt-in to new the method at their pace. Let's say that this
would be supported in version v14.x and in version v15.x we will swap
the default, and on version v16.x we will remove the old method.
2. My current implementation does not support pagination. The default
page size is 500 metrics. If for intsance - the list of dimetions list is
bigger than that - it's safer to use GetMetricStatistics until we will
introduce such support. (You can, of course, request that the support in
pagination would be part of this PR - I'll work on it later on).
—
Reply to this email directly, view it on GitHub
<#414 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABAEBTTQQV7DLTLJ535AU3VF7UZTANCNFSM5TOHP4NQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
1da4760
to
d861f66
Compare
I think I handled the batching. No need to support pagination actually because we only care about single datapoint. IIUC - single call can retrieve up to 500 metrics - so I partitioned the queries to batches of up to 500 and generated requests accordingly. How do we generally handle runtime errors here? |
A runtime error should cause the scrape to fail (return a non-2xx HTTP status). |
@matthiasr - I think my editor formatted the The easiest for me would be to configure the right formatter and re-format but if there isn't any convention I'll try to change it so that the file won't include formatting changes. |
There isn't a consistent style so far. Feel free to format the file, but please do so in a separate commit so the actual changes can be reviewed separately |
e40e7fa
to
c106448
Compare
8be4ba5
to
cd4d9ef
Compare
K - I think I cleaned the unrelated changes. I think it's even ready for the first code review :-) |
One thing that I thought of is that we might want to add another separate internal metric to count the amount metrics requested from Cloudwatch. When we use GetMetricStatistic it is simpler because it is 1:1 ratio but in a single GetMetricData you can ask for up to 500 metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thank you for organizing the code cleanly!
I agree that we should have a metric for metrics requested that maps to the expected cost.
Additionally, could you add a description of the setting to the README, with the trade offs between the two variants? Something along the lines of "same cost, faster, less tested, will become the default in the future".
Signed-off-by: or-shachar <or.shachar@wiz.io>
Signed-off-by: or-shachar <or.shachar@wiz.io>
Signed-off-by: or-shachar <or.shachar@wiz.io>
Signed-off-by: or-shachar <or.shachar@wiz.io>
Signed-off-by: or-shachar <or.shachar@wiz.io>
Signed-off-by: or-shachar <or.shachar@wiz.io>
a86517d
to
e2027e2
Compare
Hey @matthiasr ! made another commit, and rebased on top of the current master.
Feel free to correct me on anything :-) |
Also - if you happen to merge it - I prefer the squash and merge of course :) |
Signed-off-by: or-shachar <or.shachar@wiz.io>
e2027e2
to
3caa750
Compare
@matthiasr - do you think you'll get to it by the end of this week? :-) (no pressure... I'm just excited for it to be merged) |
Awesome, thanks a lot! |
FYI, I rolled this out to my exporters. It was a massive improvement to our ingestion. Previous gathering that was mostly failing is now consistently returning in a stable scrape duration with 10s of thousands of metrics per scrape. |
Please let me know if it is possible to go to GetMetricStatistics? Perhaps I do not understand how to enable this in the exporter config |
Trying to resolve #134 in an opt-in way, leaving the previous GetMetricStatistics option still as default.
Strategy:
Add per rule / global config called
use_get_metric_data (boolean)
to allow selection between getMetricStatistics or the new GetMetricDataImplementation:
GetMetricStatisticsDataGetter
CloudWatchCollector
will use it. This is why all existing UT pass.use_get_metric_data
it set totrue
- we expect no calls to be made tocloudwatchClient.getMetricStatistics
(added UT for that).GetMetricDataDataGetter
implementation.Additional changes
Statistic -> Samples
andExtendedValue -> Samples
and we populate those lists accordingly.I did not implement all tests using the GetMetricData but selection of them
TODO: