CloudWatch: Correctly add dimension values to labels #74847

iwysiu · 2023-09-13T14:30:47Z

What is this feature?

Fixes how dimension values are added to frame labels by fetching the values for any query dimension using a * and matchExact set to false.
I looked into it, and we're not getting passed the values as is on the return value from the AWS API, so this PR helps out the code in responseparser.go that iterates over all the dimension values and sees if any are contained in the returned label (which, if not set by the user, contains all the differentiating dimension values separated by spaces).
This has the same problem it would have previously had, which is if a dimension value for dimension A is a subset of a value for dimension B it could get mislabeled, but I don't think there's a way for us to work around that.

Which issue(s) does this PR fix?:

Fixes #72172

Special notes for your reviewer:
Should this be behind a feature toggle? I was checking pricing and the additional list metrics call for every query could start costing money if they burn through their 1,000,000 free api calls. Or should we look into caching these values?

Please check that:

It works as expected from a user's perspective.
If this is a pre-GA feature, it is behind a feature toggle.
The docs are updated, and if this is a notable improvement, it's added to our What's New doc.

fridgepoet · 2023-09-18T10:27:58Z

Woohoo! This looks good!

This has the same problem it would have previously had, which is if a dimension value for dimension A is a subset of a value for dimension B it could get mislabeled, but I don't think there's a way for us to work around that.

Would you illustrate the above situation with a test case maybe? Didn't quite follow but will be useful to keep in mind by documenting it with a test case.

Should this be behind a feature toggle? I was checking pricing and the additional list metrics call for every query could start costing money if they burn through their 1,000,000 free api calls. Or should we look into caching these values?

How many people do we expect are falling into this condition?
I support the idea of caching these values, though!

I've just added some suggestions to harmonize our mock usage and move towards using testify/mock for our mocks.

pkg/tsdb/cloudwatch/mocks/cloudwatch_metric_api.go

pkg/tsdb/cloudwatch/get_dimension_values_test.go

pkg/tsdb/cloudwatch/time_series_query.go

pkg/tsdb/cloudwatch/time_series_query_test.go

pkg/tsdb/cloudwatch/get_dimension_values_test.go

iwysiu

Would you illustrate the above situation with a test case maybe? Didn't quite follow but will be useful to keep in mind by documenting it with a test case.

I'm kind of against adding a test that confirms we're doing things wrong, but an example would be:
ex. Dimension 1 has values ["A", "A B"] and Dimension 2 has values ["B", and "C"]. For dimension 1="A" and dimension 2="B" AWS will by default produce the label "A B". When we find the labels, its going to get the labels as 1:"A B" and 2:"B".

How many people do we expect are falling into this condition?

I'm not really sure. the reason I'm hesitant is because we would be doing this for every dimension every time a query is called, which depending on how often a dashboard refreshes could be a lot.

pkg/tsdb/cloudwatch/get_dimension_values_test.go

pkg/tsdb/cloudwatch/time_series_query.go

pkg/tsdb/cloudwatch/time_series_query_test.go

fridgepoet · 2023-09-19T07:27:57Z

I'm kind of against adding a test that confirms we're doing things wrong, but an example would be: ex. Dimension 1 has values ["A", "A B"] and Dimension 2 has values ["B", and "C"]. For dimension 1="A" and dimension 2="B" AWS will by default produce the label "A B". When we find the labels, its going to get the labels as 1:"A B" and 2:"B".

What would be the "right" way to do it?

I'm not really sure. the reason I'm hesitant is because we would be doing this for every dimension every time a query is called, which depending on how often a dashboard refreshes could be a lot.

Just every time a query is called with a wildcard and matchExact false, right?
What would a cache solution be like?

pkg/tsdb/cloudwatch/mocks/cloudwatch_metric_api.go

iwysiu

What would be the "right" way to do it?

The best way to do it would be for AWS to pass us the dimensions as a map/array as part of the result instead of smushed into a string (or smushed into a string with a distinct separator character that can't be in a label value), this is the best way within our power

Just every time a query is called with a wildcard and matchExact false, right?
What would a cache solution be like?

Yeah, but the free api request limit includes basically everything that isn't a "GetMetricsData" request.
Doing the math, if you have a query refreshing every 5 seconds (on the high end) with 1 wildcard dimension, you have 306024*30=1296000 listMetricsPages requests a month, which costs $2.96 for the listMetricsPages requests over the limit. And any additional wildcard dimension refreshing at the same rate is $12.96 a month.
Having done the math I'm going to look into implementing a simple cache that expires (after a day maybe?) to cut those down, because that's kind of ridiculous considering that we're not passing a time range parameter.

pkg/tsdb/cloudwatch/mocks/cloudwatch_metric_api.go

iwysiu · 2023-09-21T19:47:32Z

@fridgepoet if you would like to take another look now that it caches, the last 2 commits are the changes since you approved it

fridgepoet

Cache looks really nice!

fridgepoet · 2023-09-22T07:57:54Z

pkg/tsdb/cloudwatch/get_dimension_values_for_wildcards.go

+				continue
+			}
+
+			cacheKey := fmt.Sprintf("%s-%s-%s-%s", region, query.Namespace, query.MetricName, dimensionKey)


I'm wondering about edge cases, though I don't know everything about the uniqueness here.
Is there any risk for fetching the wrong data with the same key?
For example if there are custom Namespaces that happen to be the same region and same name, what happens there?

Are there any security issues?
To what extent could any of these be private to the account?
Will there be anything secure that we should not store in-memory too long?

How often can these values change, by the way? Will 24 hours be "too long"?

Good call on the security, I moved the cache to be by instance so that you have to have the related permissions to access the cache. I should have done that anyway to avoid causing future multi-tenancy issues.

I don't think labels should change frequently. 24 hours of expiration gets us to 30 queries a month per dimension, and we could up it a bit. I'm not sure what the best expiration would be, but 1 hour or 720 queries per dimension probably wouldn't be crazy out of the 1 million. We could make it configurable? They could resave the datasource to clear the cache if they need, though I don't know if all users would have access to that.

fridgepoet · 2023-09-22T08:08:58Z

pkg/tsdb/cloudwatch/get_dimension_values_for_wildcards_test.go

+		api.On("ListMetricsPages").Return(nil)
+		_, err := executor.getDimensionValuesForWildcards(context.Background(), api, "us-east-1", []*models.CloudWatchQuery{query})
+		assert.Nil(t, err)
+


Unfortunately this test does not assert quite what we think.
The query input is mutated by the results from the first call to ListMetricsPages, so what happens during the second call is that the dimension is no longer a wildcard. The loop continues.
The assertions still pass, however, since the result is that ListMetricsPages doesn't get called again.

I'm aware I'm stepping back here, but would it be possible to not mutate the queries []*models.CloudWatchQuery in getDimensionValuesForWildcards? We are already using a function signature which returns the result type we want. I think it may be misleading to mutate the input as well as provide the output in the function signature. Let's make some copies internally and return that, what do you think?

Changed to not mutate the queries.

Co-authored-by: Shirley <4163034+fridgepoet@users.noreply.github.com>

iwysiu

Updated it to cache by instance!

Sorry about the rebase, I needed to get the fix for the instance manager and forgot that it would mess up the history of the pr.

I started filling out a hosted grafana readiness doc so we can be more reassured about the safety of it, and we can run it by them once our team approves the pr.

iwysiu · 2023-09-25T20:18:56Z

pkg/tsdb/cloudwatch/get_dimension_values_for_wildcards_test.go

+		api.On("ListMetricsPages").Return(nil)
+		_, err := executor.getDimensionValuesForWildcards(context.Background(), api, "us-east-1", []*models.CloudWatchQuery{query})
+		assert.Nil(t, err)
+


Changed to not mutate the queries.

iwysiu · 2023-09-25T20:24:28Z

pkg/tsdb/cloudwatch/get_dimension_values_for_wildcards.go

+				continue
+			}
+
+			cacheKey := fmt.Sprintf("%s-%s-%s-%s", region, query.Namespace, query.MetricName, dimensionKey)


Good call on the security, I moved the cache to be by instance so that you have to have the related permissions to access the cache. I should have done that anyway to avoid causing future multi-tenancy issues.

I don't think labels should change frequently. 24 hours of expiration gets us to 30 queries a month per dimension, and we could up it a bit. I'm not sure what the best expiration would be, but 1 hour or 720 queries per dimension probably wouldn't be crazy out of the 1 million. We could make it configurable? They could resave the datasource to clear the cache if they need, though I don't know if all users would have access to that.

iwysiu · 2023-09-26T18:35:03Z

(Made another commit to only cache when values are returned so users can't maliciously spam fake dimension keys)

Co-authored-by: Shirley <4163034+fridgepoet@users.noreply.github.com>

grafana-delivery-bot bot added this to the 10.2.x milestone Sep 13, 2023

grafana-pr-automation bot added area/backend datasource/CloudWatch labels Sep 13, 2023

iwysiu added add to changelog no-backport Skip backport of PR labels Sep 13, 2023

iwysiu marked this pull request as ready for review September 13, 2023 18:37

iwysiu requested a review from a team as a code owner September 13, 2023 18:37

iwysiu requested review from fridgepoet and idastambuk and removed request for a team September 13, 2023 18:37

iwysiu changed the title ~~CloudWatch: correctly add dimension values to labels~~ CloudWatch: Correctly add dimension values to labels Sep 13, 2023

iwysiu marked this pull request as draft September 14, 2023 21:02

iwysiu force-pushed the iwysiu/72172 branch from 5e97c93 to 6bedced Compare September 14, 2023 21:15

iwysiu marked this pull request as ready for review September 14, 2023 21:30

fridgepoet reviewed Sep 18, 2023

View reviewed changes

iwysiu commented Sep 18, 2023

View reviewed changes

fridgepoet reviewed Sep 19, 2023

View reviewed changes

pkg/tsdb/cloudwatch/mocks/cloudwatch_metric_api.go Show resolved Hide resolved

fridgepoet approved these changes Sep 19, 2023

View reviewed changes

iwysiu commented Sep 19, 2023

View reviewed changes

pkg/tsdb/cloudwatch/mocks/cloudwatch_metric_api.go Show resolved Hide resolved

iwysiu requested review from grafanabot and a team as code owners September 21, 2023 19:43

grafana-pr-automation bot added type/docs area/frontend labels Sep 21, 2023

iwysiu force-pushed the iwysiu/72172 branch from 0da4ca9 to d9a8bcf Compare September 21, 2023 19:46

fridgepoet reviewed Sep 22, 2023

View reviewed changes

iwysiu added 2 commits September 25, 2023 11:55

CloudWatch: correctly add dimension values to labels

5bc878b

fix linting

c589c8b

iwysiu and others added 8 commits September 25, 2023 11:55

Apply suggestions from code review

c323afe

Co-authored-by: Shirley <4163034+fridgepoet@users.noreply.github.com>

shirley comments

ff0ca48

cache tag values

d55c74d

add feature toggle

35b4963

actually use toggle

0d2fd58

improve cache key and deep copy queries

ef499e6

add logging

380ba95

cache in instance

b825ef1

iwysiu force-pushed the iwysiu/72172 branch from 19a9d1b to b825ef1 Compare September 25, 2023 18:56

remove cache key from logs

d6ff0b7

iwysiu commented Sep 25, 2023

View reviewed changes

fridgepoet approved these changes Sep 26, 2023

View reviewed changes

only cache if values were returned

e310f16

Merge remote-tracking branch 'origin' into iwysiu/72172

d6a46e4

iwysiu merged commit 06a35f5 into main Sep 27, 2023
19 checks passed

iwysiu deleted the iwysiu/72172 branch September 27, 2023 14:41

otilor pushed a commit to otilor/grafana that referenced this pull request Sep 28, 2023

CloudWatch: Correctly add dimension values to labels (grafana#74847)

7014fbc

Co-authored-by: Shirley <4163034+fridgepoet@users.noreply.github.com>

rwwiv pushed a commit that referenced this pull request Oct 2, 2023

CloudWatch: Correctly add dimension values to labels (#74847)

c2b4b32

Co-authored-by: Shirley <4163034+fridgepoet@users.noreply.github.com>

mildwonkey pushed a commit that referenced this pull request Oct 4, 2023

CloudWatch: Correctly add dimension values to labels (#74847)

8ed4ad7

Co-authored-by: Shirley <4163034+fridgepoet@users.noreply.github.com>

iwysiu mentioned this pull request Oct 5, 2023

CloudWatch: remove cloudWatchWildCardDimensionValues feature toggle #76077

Closed

zerok modified the milestones: 10.2.x, 10.2.0 Oct 23, 2023

dnhn mentioned this pull request Oct 24, 2023

grafana 10.2.0 Homebrew/homebrew-core#152264

Closed

BrewTestBot mentioned this pull request Oct 25, 2023

grafana 10.2.0 Homebrew/homebrew-core#152321

Closed

iwysiu mentioned this pull request Mar 18, 2024

Cloudwatch: Dimension Labels Inconsistencies Bug with wildcard selected, Match Exact turned off, and an expression used #79809

Closed

iwysiu mentioned this pull request Apr 8, 2024

CloudWatch: Add labels for Metric Query type queries #85766

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CloudWatch: Correctly add dimension values to labels #74847

CloudWatch: Correctly add dimension values to labels #74847

iwysiu commented Sep 13, 2023 •

edited

Loading

fridgepoet commented Sep 18, 2023 •

edited

Loading

iwysiu left a comment

fridgepoet commented Sep 19, 2023

iwysiu left a comment

iwysiu commented Sep 21, 2023

fridgepoet left a comment

fridgepoet Sep 22, 2023 •

edited

Loading

iwysiu Sep 25, 2023

fridgepoet Sep 22, 2023

iwysiu Sep 25, 2023

iwysiu left a comment

iwysiu Sep 25, 2023

iwysiu Sep 25, 2023

iwysiu commented Sep 26, 2023

CloudWatch: Correctly add dimension values to labels #74847

CloudWatch: Correctly add dimension values to labels #74847

Conversation

iwysiu commented Sep 13, 2023 • edited Loading

fridgepoet commented Sep 18, 2023 • edited Loading

iwysiu left a comment

Choose a reason for hiding this comment

fridgepoet commented Sep 19, 2023

iwysiu left a comment

Choose a reason for hiding this comment

iwysiu commented Sep 21, 2023

fridgepoet left a comment

Choose a reason for hiding this comment

fridgepoet Sep 22, 2023 • edited Loading

Choose a reason for hiding this comment

iwysiu Sep 25, 2023

Choose a reason for hiding this comment

fridgepoet Sep 22, 2023

Choose a reason for hiding this comment

iwysiu Sep 25, 2023

Choose a reason for hiding this comment

iwysiu left a comment

Choose a reason for hiding this comment

iwysiu Sep 25, 2023

Choose a reason for hiding this comment

iwysiu Sep 25, 2023

Choose a reason for hiding this comment

iwysiu commented Sep 26, 2023

iwysiu commented Sep 13, 2023 •

edited

Loading

fridgepoet commented Sep 18, 2023 •

edited

Loading

fridgepoet Sep 22, 2023 •

edited

Loading