Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce default distributor - ingester push timeout to 2s #2728

Merged
merged 3 commits into from
Sep 7, 2022

Conversation

56quarters
Copy link
Contributor

Signed-off-by: Nick Pillitteri nick.pillitteri@grafana.com

What this PR does

Reduce the default timeout to 2s from 20s since some resources from the
request to each ingester are held in memory of the distributor until the
ingester responds or the request times out. This reduces distributor
memory usage during ingester crashes.

Which issue(s) this PR fixes or relates to

Fixes #2727

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@56quarters 56quarters force-pushed the 56quarters/shorter-ingester-timeout branch from 13c35d1 to ead3c03 Compare August 15, 2022 20:43
@56quarters 56quarters marked this pull request as ready for review August 15, 2022 20:45
@56quarters 56quarters force-pushed the 56quarters/shorter-ingester-timeout branch from ead3c03 to 3fd10e3 Compare August 15, 2022 20:45
@replay
Copy link
Contributor

replay commented Aug 15, 2022

I think it would make sense to also update the default of -distributor.forwarding.request-timeout to the same value:

f.DurationVar(&c.RequestTimeout, "distributor.forwarding.request-timeout", 10*time.Second, "Timeout for requests to ingestion endpoints to which we forward metrics.")

Otherwise requests that are being forwarded to a slow forwarding target could lead to requests being held in memory for a duration longer than the value of -distributor.remote-timeout.

@pracucci
Copy link
Collaborator

I think we should merge this change only after it has been dogfooded at Grafana Labs (aka: running in all prod envs). We're used to change defaults only after that.

@56quarters
Copy link
Contributor Author

56quarters commented Aug 16, 2022

I think we should merge this change only after it has been dogfooded at Grafana Labs (aka: running in all prod envs). We're used to change defaults only after that.

Sounds good. I'll test in dev/ops and add something to the next release jsonnet so that the change can go out a week after the new metrics for chunk-deduplication #2713.

@56quarters 56quarters marked this pull request as draft August 16, 2022 13:39
@krajorama
Copy link
Contributor

thanks for doing this in Mimir so we don't have to deal with keeping sync in jsonnet + helm!

Reduce the default timeout to 2s from 20s since some resources from the
request to each ingester are held in memory of the distributor until the
ingester responds or the request times out. This reduces distributor
memory usage during ingester crashes.

Fixes #2727

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
@56quarters 56quarters force-pushed the 56quarters/shorter-ingester-timeout branch from ab8575b to 4e86cf9 Compare August 30, 2022 17:29
@56quarters 56quarters marked this pull request as ready for review August 31, 2022 21:09
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@56quarters 56quarters merged commit e156099 into main Sep 7, 2022
@56quarters 56quarters deleted the 56quarters/shorter-ingester-timeout branch September 7, 2022 16:24
@56quarters 56quarters self-assigned this Sep 7, 2022
@@ -9,6 +9,7 @@
* [CHANGE] Ingester: removed deprecated `-blocks-storage.tsdb.isolation-enabled` option. TSDB-level isolation is now always disabled in Mimir. #2782
* [CHANGE] Compactor: `-compactor.partial-block-deletion-delay` must either be set to 0 (to disable partial blocks deletion) or a value higher than `4h`. #2787
* [CHANGE] Query-frontend: CLI flag `-query-frontend.align-querier-with-step` has been deprecated. Please use `-query-frontend.align-queries-with-step` instead. #2840
* [CHANGE] Distributor: change the default value of `-distributor.remote-timeout` to `2s` from `20s` to improve distributor resource usage when ingesters crash. #2728
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also mention -distributor.forwarding.request-timeout

56quarters added a commit that referenced this pull request Sep 7, 2022
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters added a commit that referenced this pull request Sep 7, 2022
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use shorter default timeout for distributor writes to ingesters
6 participants