-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote querier: use backoff retry only with 5xx errors #7004
Conversation
f49b44a
to
39d0508
Compare
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
39d0508
to
4b44677
Compare
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (modulo a couple of nits)
pkg/ruler/remotequerier_test.go
Outdated
"context.Canceled error is retried": { | ||
err: context.Canceled, | ||
expectedError: context.Canceled, | ||
expectedRetries: true, | ||
}, | ||
"gRPC context.Canceled error is retried": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should retry in case of context canceled. In practice it will not be retried by the function because the backoff checks the context, but in this test it's retried due to how it's mocked (we don't cancel the context passed to the sendRequest()
but we just return the context canceled).
Not sure how to proceed, but I think these 2 test cases are misleading. An option is the remove these 2 test cases, another option is to test the context cancellation case properly, which means cancelling the context passed to the sendRequest()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have opted for canceling the context passed to sendRequest()
and showing that the request is not retried.
CHANGELOG.md
Outdated
@@ -5,6 +5,9 @@ | |||
### Grafana Mimir | |||
|
|||
* [CHANGE] Ingester: Increase default value of `-blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes` and `-blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes` to 100 MiB (previous default value was 10 MiB). #6764 | |||
* [CHANGE] Ruler: the following changes apply on remote evaluation: #7004 | |||
* Non-erroneous responses with status code `4xx` are treated as errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is about how things are implemented internally, but hard to understand (and not even interesting) for the final user. I think you should just mention the backoff thing (next line) and remove this line.
pkg/ruler/remotequerier_test.go
Outdated
expectedError: httpgrpc.Errorf(http.StatusBadRequest, errMsg), | ||
expectedRetries: false, | ||
}, | ||
"non-erroneous responses with status code 4xx are not retried and are converted to errors": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] I think calling them "non erroneous" is a bit misleading. I would simplify it.
"non-erroneous responses with status code 4xx are not retried and are converted to errors": { | |
"responses with status code 4xx are not retried and are converted to errors": { |
pkg/ruler/remotequerier_test.go
Outdated
expectedError: httpgrpc.ErrorFromHTTPResponse(erroneousResponse), | ||
expectedRetries: false, | ||
}, | ||
"non-erroneous responses with status code different from 4xx are not retried and are not converted to errors": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"non-erroneous responses with status code different from 4xx are not retried and are not converted to errors": { | |
"responses with status code 2xx are not retried": { |
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
What this PR does
This PR allows usage of backoff retry in
RemoteQuerier.sendRequest()
only in case of errors with5xx
status codes.Currently, all errors that
RemoteQuerier.sendRequest()
receives fromhttpgrpc.HTTPClient.Handle()
contain a5xx
code. This behaviour might though change in the future, and we want to ensure that only5xx
errors are retried, while all other errors are immediately returned.Which issue(s) this PR fixes or relates to
Part of #1830
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.