Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return Canceled rather than Aborted when a Series request to a store-gateway is cancelled by the calling querier. #4007

Merged
merged 2 commits into from
Jan 19, 2023

Conversation

charleskorn
Copy link
Contributor

@charleskorn charleskorn commented Jan 19, 2023

What this PR does

Queriers make multiple requests to store-gateways simultaneously. If one of these requests fails, or if the querier decides to stop processing the request (eg. due to a query limit being reached, or an invalid query), the querier will cancel all in-flight store-gateway requests.

Previously, the store-gateway would return an Aborted gRPC error if the request is cancelled by the caller. However, this would then be recorded in logs and metrics with status="error".

Canceled is the preferred error for this scenario (the caller cancelling the request). Requests returning Canceled are also recorded in our logs and metrics with status="cancel", which more accurately reflects what happened. (I was confused while diagnosing an alert by a high number of requests logged with status="error" which were in fact the store-gateway handling an expected scenario in the desired way - status="cancel" is much clearer to me in this scenario.)

This PR changes the Series endpoint on store-gateways to return Canceled when the caller (eg. a querier) cancels the request.

This does not require any changes on the querier side, as we already have special handling for the scenario where the querier cancels the request.

Which issue(s) this PR fixes or relates to

(none)

Checklist

  • Tests updated
  • [n/a] Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@charleskorn charleskorn marked this pull request as ready for review January 19, 2023 00:46
@charleskorn charleskorn requested a review from a team as a code owner January 19, 2023 00:46
Copy link
Contributor

@replay replay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks!

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! We can merge this, but can you also look at LabelNames() and LabelValues()? I think they suffer the same issue.

@pracucci pracucci merged commit 20ea888 into main Jan 19, 2023
@pracucci pracucci deleted the charleskorn/store-gateway-return-canceled branch January 19, 2023 13:53
@charleskorn
Copy link
Contributor Author

LGTM! We can merge this, but can you also look at LabelNames() and LabelValues()? I think they suffer the same issue.

Yep, I'll take a look at that shortly.

fayzal-g added a commit that referenced this pull request Jan 20, 2023
* Update test

* Add missing changelog entries for commits since Mimir 2.5 (#4006)

All other commits weren't user-facing or were helm-chart specific.

See #3979

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

* Add --concurrency support to 'mimirtool rules sync' command (#3996)

* Add --concurrency support to 'mimirtool rules sync' command

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Update pkg/mimirtool/commands/rules.go

Co-authored-by: Patrick Oyarzun <patrick.oyarzun@grafana.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Patrick Oyarzun <patrick.oyarzun@grafana.com>

* store-gateway: ExpandedPostings shortcut: avoid LabelValues unless necessary (#3872)

* Return `Canceled` rather than `Aborted` when a `Series` request to a store-gateway is cancelled by the calling querier. (#4007)

* Return Canceled rather than Aborted when a Series request to a store-gateway is cancelled.

* Add changelog entry.

* Update mimir-prometheus, add support for align_evaluation_time_on_interval. (#4013)

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

* Fix title of guide in link text; reword phrase. (#4008)

* Fix ExampleInitLogger to work in UTC (#4016)

The test didn't pass in my time zone (tm).

--- FAIL: ExampleInitLogger (0.00s)
got:
ts=1970-01-01T01:00:00+01:00 caller=log_test.go:31 level=info test=1
ts=1970-01-01T01:00:00+01:00 caller=log_test.go:33 level=info msg="test 3"
want:
ts=1970-01-01T00:00:00Z caller=log_test.go:31 level=info test=1
ts=1970-01-01T00:00:00Z caller=log_test.go:33 level=info msg="test 3"

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Create outline of Mimir 2.6 release notes (#4002)

Includes notable features and bugfixes based on the CHANGELOG. Helm changes
to be filled out later by product and engineering.

See #3979

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

* Fix post-merge comments from PR #4013. (#4014)

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

* Update CODEOWNERS to include mimir-ruler-and-alertmanager-maintainers (#4019)

For those who only want notifications re the ruler or Alertmanager.

* Remove internal use of store.max-query-length (#4017)

Make deprecation of the option more obvious and attempt to remove
any use of store.max-query-length in our documentation, jsonnet, helm,
and integration tests.

See #2793
See #3825

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

* [otlp] Update OTel Collector to latest release (#3852)

* [otlp] Update otel collector dependecy to latest
* Update code to deal with deprecated functions

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* [otlp] Docs: Highlight common issues with OTLP --> Prometheus (#3629)

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com>

* make it possible to inject memberlist kv codecs (#4018)

* make it possible to inject memberlist kv codecs

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* add comment

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* improve comment wording

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* Limits and errors for ephemeral storage (#4004)

* Add limits for ephemeral storage.
* Add new reason when ingestion of ephemeral metrics fails.
* Add tests for max ephemeral series limit.
* Introduce new discard reasons when ingesting ephemeral series.

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

* Reduce maintainership and step down as team member. (#4023)

* Reduce maintainership and step down as team member.

My future priorities will be on the alerting aspects of Mimir, so I think it is
right to reduce my maintainership accordingly and allow others to take my place.
Similarly, remove myself as a team member.

* Sort previous team members.

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Co-authored-by: Nick Pillitteri <56quarters@users.noreply.github.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Patrick Oyarzun <patrick.oyarzun@grafana.com>
Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Co-authored-by: Charles Korn <charleskorn@users.noreply.github.com>
Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Co-authored-by: Mauro Stettler <mauro.stettler@gmail.com>
Co-authored-by: Steve Simpson <steve.simpson@grafana.com>
charleskorn added a commit that referenced this pull request Jan 25, 2023
…ateway is cancelled by the caller, and return Internal otherwise.

See #4007 for explanation.
charleskorn added a commit that referenced this pull request Feb 3, 2023
…ateway is cancelled by the caller, and return Internal otherwise.

See #4007 for explanation.
pracucci pushed a commit that referenced this pull request Feb 3, 2023
…tore-gateway is cancelled by the caller, and return `Internal` otherwise for all requests. (#4061)

* Return Canceled when a LabelNames or LabelValues request to a store-gateway is cancelled by the caller, and return Internal otherwise.

See #4007 for explanation.

* Add changelog entry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants