Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cloudwatch exporter dependency #839

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

andriikushch
Copy link
Contributor

@andriikushch andriikushch commented May 14, 2024

PR Description

This PR adds a new configuration parameter for the Cloudwatch exporter. It allows to specify the AWS SDK version to use. v1 is the default version.

It was achieved by updating and integrating the version of the github.com/nerdswords/yet-another-cloudwatch-exporter package and all dependencies that it depends on.

According to the https://github.com/nerdswords/yet-another-cloudwatch-exporter/blob/a275f94e5d33b3b5330ecfced86c290e1db05928/pkg/config/feature_flags.go#L10 , using the AWS SDK v2, should improve performance.

Breaking change:

After this change, using aliases for the AWS service names is no longer allowed.

Which issue(s) this PR fixes

N.A.

A related change in the Agent project:

Notes to the Reviewer

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

@andriikushch andriikushch marked this pull request as ready for review May 16, 2024 14:29
@andriikushch andriikushch added outdated-dependency Related to an outdated dependency. dependencies Pull requests that update a dependency file labels May 16, 2024
@clayton-cornell clayton-cornell requested a review from a team May 16, 2024 22:14
@clayton-cornell clayton-cornell added the type/docs Docs Squad label across all Grafana Labs repos label May 16, 2024
@clayton-cornell
Copy link
Contributor

Doc changes look OK.

Should this be backported? Or only in next?

@andriikushch
Copy link
Contributor Author

Doc changes look OK.

Should this be backported? Or only in next?

@clayton-cornell It would be nice to backport it. The same change is in the Agent repository:

@clayton-cornell
Copy link
Contributor

Added backports to Alloy v1.0 and v1.1

There's a parallel Agent PR already in process, so added the label backport-to-agent:no

@kgeckhart kgeckhart requested a review from thepalbi May 22, 2024 17:58
@@ -37,6 +40,7 @@ type Arguments struct {
Discovery []DiscoveryJob `alloy:"discovery,block,optional"`
Static []StaticJob `alloy:"static,block,optional"`
DecoupledScrape DecoupledScrapeConfig `alloy:"decoupled_scraping,block,optional"`
AWSSDKVersion string `alloy:"aws_sdk_version,attr,optional"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given there's only two options v1 and v2 with v1 being deprecated with an EOL date I think this should really be a boolean to enable or disable sdk v2 vs a string like this. We aren't anticipating the need to support a v3 anytime soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

CHANGELOG.md Outdated Show resolved Hide resolved
@@ -26,6 +28,7 @@ var defaults = Arguments{
Enabled: false,
ScrapeInterval: 5 * time.Minute,
},
AWSSDKVersion: "v2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should default to v1 for now to prevent breaking anyone on upgrade. We can make v2 the default following YACE changing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do it 👍

Co-authored-by: Kyle Eckhart <kgeckhart@users.noreply.github.com>
Copy link
Contributor

@kgeckhart kgeckhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have a chance to run it locally as well?

…ch.md

Co-authored-by: Kyle Eckhart <kgeckhart@users.noreply.github.com>
@andriikushch
Copy link
Contributor Author

Did you have a chance to run it locally as well?

Yes. I created the config with 3 cloudwatch exporters: without the new parameter, with aws_sdk_version_v2 set to true and to false. I used it to get EC2 metrics. And compared aws_ec2_cpuutilization_average graphs. They look similar (I think the difference is depending on the exact time of the getting data):
Screenshot 2024-05-28 at 13 54 15
Screenshot 2024-05-28 at 13 54 27
Screenshot 2024-05-28 at 13 54 35

@@ -61,7 +61,7 @@ func generateServicesDocSection() string {
var sb strings.Builder
for _, supportedSvc := range yaceConf.SupportedServices {
sb.WriteString(
fmt.Sprintf("- Namespace: `%s` or Alias: `%s`\n", supportedSvc.Namespace, supportedSvc.Alias),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to run this to update the service list in the docs. ATM they all still have the Alias.

@@ -23,4 +23,4 @@ jobs:
go-version: "1.22"
cache: true
- name: Test
run: make GO_TAGS="nodocker" test
run: CGO_LDFLAGS="-ld_classic $CGO_LDFLAGS" make GO_TAGS="nodocker" test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tpaschalis, without CGO_LDFLAGS="-ld_classic build fails on mac.

Copy link
Member

@tpaschalis tpaschalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to block this, doing it to avoid getting this accidentally merged (I've yet to review the entire PR).

As per our backwards compatibility guarantees, we're no longer doing breaking changes in GA functionality.

Instead we'll need to find a way to support both the previous and current type values without affecting users' config files and queue any breaking changes for a v2.0 release in the future with the v2.0-breaking-change label

@kgeckhart
Copy link
Contributor

Instead we'll need to find a way to support both the previous and current type values without affecting users' config files and queue any breaking changes for a v2.0 release in the future with the v2.0-breaking-change label

I think this particular change is covered under this exception to the backwards compatibility guarantee,

Upstream changes: Much of the functionality of {{< param "PRODUCT_NAME" >}} is built on top of other software, such as OpenTelemetry Collector and Prometheus. If upstream software breaks compatibility, we may need to reflect this in {{< param "PRODUCT_NAME" >}}.

As the change is being driven by the upstream exporter in this case.

@tpaschalis
Copy link
Member

But also,

We try, whenever possible, to resolve these issues without breaking compatibility.

It's always a tradeoff, but I think it's a good opportunity to try and fix this without affecting users. On a surface level it doesn't feel like a compatibility layer to support aliasing these types wouldn't be so intrusive. WDYT?

@kgeckhart
Copy link
Contributor

kgeckhart commented Jun 3, 2024

But also,

We try, whenever possible, to resolve these issues without breaking compatibility.

It's always a tradeoff, but I think it's a good opportunity to try and fix this without affecting users. On a surface level it doesn't feel like a compatibility layer to support aliasing these types wouldn't be so intrusive. WDYT?

I agree in principal, but @cristiangreco intentionally dropped these aliases upstream which is why I'm in favor of dropping them here as well. They do not have any AWS specific meaning to them that makes them valuable for a customer and how they were used was harmful nerdswords/yet-another-cloudwatch-exporter#1131.

@cristiangreco
Copy link

But also,

We try, whenever possible, to resolve these issues without breaking compatibility.

It's always a tradeoff, but I think it's a good opportunity to try and fix this without affecting users. On a surface level it doesn't feel like a compatibility layer to support aliasing these types wouldn't be so intrusive. WDYT?

I agree in principal, but @cristiangreco intentionally dropped these aliases upstream which is why I'm in favor of dropping them here as well. They do not have any AWS specific meaning to them that makes them valuable for a customer and how they were used was harmful nerdswords/yet-another-cloudwatch-exporter#1131.

Yes the upstream change was intentional, so in theory the usage of aliases should be dropped in alloy as well.

@andriikushch
Copy link
Contributor Author

Hi @tpaschalis,

@kgeckhart and I discussed some options for this case. We agreed that we could introduce a translation layer that will also produce warning logs when aliases are used and inform that support of the aliases will be dropped in the upcoming releases.

This will give users enough time to update their configs and allow them to benefit from the new component version immediately.

What do you think?

@tpaschalis
Copy link
Member

I don't see the benefit of removing the translation layer though. Do you anticipate there being it an effort to maintain?

On surface it sounds like a simple string-to-string mapping that helps maintain user confidence in our backwards compatibility guarantees.

In any case, other members of the @grafana/grafana-alloy-maintainers can chime in here with their opinion, let's see what they say

@mattdurham
Copy link
Collaborator

If the aliases are indeed considered a bug then I feel that's reasonable to not have them and would follow our backwards compatibility. Otherwise I would strongly push for the aliasing.

@kgeckhart
Copy link
Contributor

kgeckhart commented Jun 7, 2024

The implementation is not going to be hard to do it's more the extra layer of indirection we have to consider when attempting to understand a customers configuration. The alias is not a real value to AWS while what we are forcing someone to use instead is the AWS CloudWatch namespace. Some aliases are easily relatable back to the CloudWatch namespace but for things like gwlb I would have to look at our mapping layer or google it to recognize it as AWS/GatewayELB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport release/v1.0 backport release/v1.1 backport-to-agent:no PR should NOT be backported to the agent repo. dependencies Pull requests that update a dependency file outdated-dependency Related to an outdated dependency. type/docs Docs Squad label across all Grafana Labs repos
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants