Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudwatch: Fix Unexpected error #74420

Merged
merged 24 commits into from
Sep 25, 2023
Merged

Conversation

sarahzinger
Copy link
Member

@sarahzinger sarahzinger commented Sep 5, 2023

What is this feature?

Fixes issue where an unparsed error is thrown in Cloudwatch when a datasource is created:
Screenshot 2023-09-05 at 4 15 36 PM

In order to address fixing this, I ended up refactoring the /regions endpoint to follow the same architecture we now have in most of cloudwatch resource routes. This helps to ensure we return a well formatted error response like so:
Screenshot 2023-09-05 at 4 17 43 PM

rather than this string which our global error handling does not parse well:
Screenshot 2023-09-05 at 4 18 01 PM

Arguably there are a few reasons this error is happening, and many of them are worth tackling. Some of them I've covered in this pr and some of which I did not, figuring we could address it later:

  1. (not addressed) when regions doesn't have auth session yet it returns an error rather than our fallback list of regions. It is debatable to me what the expected behavior should be in this instance. Perhaps this is as desired...but then what's the point of fallback regions if we're not going to use them? We end up having fallback regions in our frontend as well as in our backend which seems wrong. I think probably we should someday make this return the fallback rather than an error, but that felt like something we could work on in the future.
  2. (not addressed) our UI is making a request for /regions before we have an auth saved, even though the frontend already has a default list of regions it can use (located in the grafana-aws-sdk). Again debatable to me what the behavior here should be. If we remove the error entirely and always use the fallback list, then it seems reasonable to me that we make this request right away. But if we're going to throw an error, then it seems strange, like we should wait until there is a saved configuration. But again I think probably it's fine as is once we stop returning errors.
  3. (addressed) we were trying to create a session with a default region of "", rather than throwing an error. This led to a confusing error message, now we return an error that clearly states we're trying to create an auth session with a default region.
  4. (addressed) we were return a string rather than an error object, which our frontend could not parse. We still throw the error but it is no longer noticable unless you have the network tab open. In my opinion this is a good reminder that we need to revisit error handling in Cloudwatch in general.

For such a tiny bug, I ended up re-writing a lot of code, because it felt like a good opportunity to clean up some tech debt and get better familiar with our backend architecture. But it also made me a bit concerned that I might have unexpected changes here, so I went ahead and put the new stuff behind a feature flag. I feel reasonably confident with it so I was going to have it enabled by default, but if it seems to be causing issues we'll have a quick way to turn it off without reverting the commit.

An interesting note here is that I saw we have a regionsCache which I've removed. I realized while trying to add it back, that we're caching regions under a key of Profile which is often undefined, which seems wrong. I think it's not unreasonable for us to add in a cache and create a proper cache key (maybe out of a hash of the auth settings the way we do with settings?) but for now since it seems it was often unused, I've just removed it, but let me know if I'm missing anything!

Which issue(s) does this PR fix?:

Fixes #73294

Please check that:

  • It works as expected from a user's perspective.
  • If this is a pre-GA feature, it is behind a feature toggle.
  • The docs are updated, and if this is a notable improvement, it's added to our What's New doc.

@sarahzinger sarahzinger changed the title Draft: Cloudwatch: Fix Unexpected error Cloudwatch: Fix Unexpected error Sep 13, 2023
@sarahzinger sarahzinger marked this pull request as ready for review September 13, 2023 18:41
@sarahzinger sarahzinger requested review from grafanabot and a team as code owners September 13, 2023 18:41
@sarahzinger sarahzinger requested review from fridgepoet and idastambuk and removed request for a team September 13, 2023 18:41
pkg/tsdb/cloudwatch/client_factory.go Show resolved Hide resolved
pkg/tsdb/cloudwatch/cloudwatch_integration_test.go Outdated Show resolved Hide resolved
pkg/tsdb/cloudwatch/cloudwatch_test.go Outdated Show resolved Hide resolved
pkg/tsdb/cloudwatch/client_factory.go Show resolved Hide resolved
pkg/tsdb/cloudwatch/cloudwatch_test.go Outdated Show resolved Hide resolved
@fridgepoet
Copy link
Member

Just some thoughts and totally defer to your judgement:
Pretty large number of changes and I feel a bit surprised that we feel a need for a feature flag. I wonder if we would have been able to break this up and felt more comfortable about delivering a smaller set of changes a bit at a time?

pkg/tsdb/cloudwatch/models/api.go Outdated Show resolved Hide resolved
pkg/tsdb/cloudwatch/models/errors.go Outdated Show resolved Hide resolved
pkg/tsdb/cloudwatch/routes/regions_test.go Outdated Show resolved Hide resolved
pkg/tsdb/cloudwatch/routes/regions_test.go Outdated Show resolved Hide resolved
pkg/tsdb/cloudwatch/routes/regions_test.go Outdated Show resolved Hide resolved
Co-authored-by: Shirley <4163034+fridgepoet@users.noreply.github.com>
pkg/tsdb/cloudwatch/mocks/regions.go Outdated Show resolved Hide resolved
pkg/tsdb/cloudwatch/mocks/regions.go Outdated Show resolved Hide resolved
pkg/tsdb/cloudwatch/services/regions_test.go Outdated Show resolved Hide resolved
pkg/tsdb/cloudwatch/routes/regions_test.go Outdated Show resolved Hide resolved
@sarahzinger sarahzinger merged commit ef441f0 into main Sep 25, 2023
19 checks passed
@sarahzinger sarahzinger deleted the sarahzinger/fix-unexpected-error branch September 25, 2023 18:19
rwwiv pushed a commit that referenced this pull request Oct 2, 2023
Fix unexpected error when creating a new cloudwatch datasource.

Involves a fair amount of refactoring, so if this causes unexpected issues related to region fetching we can turn this off with the cloudwatchNewRegionsHandler feature toggle, although we do not predict it will so we are enabling it to default to true and hope to remove it shortly.
@zerok zerok modified the milestones: 10.2.x, 10.2.0 Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cloudwatch: Unexpected token error when creating a new datasource
3 participants