Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query-frontend: don't retry queries which error inside PromQL #4643

Merged
merged 12 commits into from
Apr 5, 2023

Conversation

bboreham
Copy link
Contributor

@bboreham bboreham commented Mar 31, 2023

What this PR does

Break out of the retry loop if the error has been decoded from a Prometheus error, or is a cancellation.

Extra drive-by fix: don't log "context canceled" as an error in the querier. This is noticeable when running TestQueryFrontendErrorMessageParity/query_time_range_exceeds_the_limit, for instance.

Which issue(s) this PR fixes or relates to

Fixes #4637

Checklist

  • Tests updated
  • NA Documentation added
  • CHANGELOG.md updated

@bboreham bboreham requested a review from a team as a code owner March 31, 2023 15:37
pkg/frontend/querymiddleware/retry_test.go Outdated Show resolved Hide resolved
pkg/frontend/querymiddleware/retry_test.go Outdated Show resolved Hide resolved
@bboreham
Copy link
Contributor Author

bboreham commented Apr 1, 2023

The change to not log "context canceled" as an error in the querier actually broke query-frontend, so I have removed it from this PR and moved to #4648.

util_log "github.com/grafana/mimir/pkg/util/log"
)

type intObservation interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but should be:

Suggested change
type intObservation interface {
type intObserver interface {

type retryMiddlewareMetrics struct {
retriesCount prometheus.Histogram
}

func newRetryMiddlewareMetrics(registerer prometheus.Registerer) *retryMiddlewareMetrics {
func newRetryMiddlewareMetrics(registerer prometheus.Registerer) intObservation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the value in this, I think we could perfectly use prometheus.Observer inteface and call float64 casting in all (1?) places.

@bboreham
Copy link
Contributor Author

bboreham commented Apr 4, 2023

@colega I have changed as you suggested.

pkg/api/error/error.go Outdated Show resolved Hide resolved
Copy link
Contributor

@colega colega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please consider my comment.

bboreham and others added 2 commits April 5, 2023 11:01
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
@bboreham bboreham merged commit b562fe2 into main Apr 5, 2023
@bboreham bboreham deleted the dont-retry-bad-data branch April 5, 2023 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Erroneous retry after over-limit error
3 participants