Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote querier: use backoff retry only with 5xx errors #7004

Merged
merged 4 commits into from
Jan 3, 2024

Conversation

duricanikolic
Copy link
Contributor

@duricanikolic duricanikolic commented Dec 27, 2023

What this PR does

This PR allows usage of backoff retry in RemoteQuerier.sendRequest() only in case of errors with 5xx status codes.
Currently, all errors that RemoteQuerier.sendRequest() receives from httpgrpc.HTTPClient.Handle() contain a 5xx code. This behaviour might though change in the future, and we want to ensure that only 5xx errors are retried, while all other errors are immediately returned.

Which issue(s) this PR fixes or relates to

Part of #1830

Checklist

  • Tests updated.
  • [na] Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • [na] about-versioning.md updated with experimental features.

@duricanikolic duricanikolic self-assigned this Dec 27, 2023
@duricanikolic duricanikolic force-pushed the yuri/remote-querier-backoff branch 3 times, most recently from f49b44a to 39d0508 Compare December 27, 2023 16:44
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
pkg/ruler/remotequerier.go Show resolved Hide resolved
pkg/ruler/remotequerier.go Show resolved Hide resolved
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (modulo a couple of nits)

Comment on lines 153 to 158
"context.Canceled error is retried": {
err: context.Canceled,
expectedError: context.Canceled,
expectedRetries: true,
},
"gRPC context.Canceled error is retried": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should retry in case of context canceled. In practice it will not be retried by the function because the backoff checks the context, but in this test it's retried due to how it's mocked (we don't cancel the context passed to the sendRequest() but we just return the context canceled).

Not sure how to proceed, but I think these 2 test cases are misleading. An option is the remove these 2 test cases, another option is to test the context cancellation case properly, which means cancelling the context passed to the sendRequest().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opted for canceling the context passed to sendRequest() and showing that the request is not retried.

CHANGELOG.md Outdated
@@ -5,6 +5,9 @@
### Grafana Mimir

* [CHANGE] Ingester: Increase default value of `-blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes` and `-blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes` to 100 MiB (previous default value was 10 MiB). #6764
* [CHANGE] Ruler: the following changes apply on remote evaluation: #7004
* Non-erroneous responses with status code `4xx` are treated as errors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is about how things are implemented internally, but hard to understand (and not even interesting) for the final user. I think you should just mention the backoff thing (next line) and remove this line.

expectedError: httpgrpc.Errorf(http.StatusBadRequest, errMsg),
expectedRetries: false,
},
"non-erroneous responses with status code 4xx are not retried and are converted to errors": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] I think calling them "non erroneous" is a bit misleading. I would simplify it.

Suggested change
"non-erroneous responses with status code 4xx are not retried and are converted to errors": {
"responses with status code 4xx are not retried and are converted to errors": {

expectedError: httpgrpc.ErrorFromHTTPResponse(erroneousResponse),
expectedRetries: false,
},
"non-erroneous responses with status code different from 4xx are not retried and are not converted to errors": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"non-erroneous responses with status code different from 4xx are not retried and are not converted to errors": {
"responses with status code 2xx are not retried": {

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
@duricanikolic duricanikolic marked this pull request as ready for review January 3, 2024 14:22
@duricanikolic duricanikolic requested review from a team as code owners January 3, 2024 14:22
@duricanikolic duricanikolic merged commit e5795f8 into main Jan 3, 2024
28 checks passed
@duricanikolic duricanikolic deleted the yuri/remote-querier-backoff branch January 3, 2024 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants