Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distributor inflight request size limit #2413

Merged
merged 10 commits into from
Jul 19, 2022

Conversation

dimitarvdimitrov
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov commented Jul 14, 2022

Signed-off-by: Dimitar Dimitrov dimitar.dimitrov@grafana.com

What this PR does

Introduces a limit for the total size of the inflight push requests in the distributor -distributor.instance-limits.max-inflight-push-requests-bytes.

This limit is useful in combination with the existing -distributor.instance-limits.max-inflight-push-requests. The existing limit does not account for large requests that may be below the inflight requests count. The new limit is meant to protect the distributor against multiple large requests.

By default the limit is disabled because it will limit memory utilization on existing Mimir installations.

Which issue(s) this PR fixes or relates to

Fixes #2226

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a few small (non-blocking) suggestions / questions.

pkg/distributor/distributor.go Outdated Show resolved Hide resolved
pkg/distributor/distributor.go Outdated Show resolved Hide resolved
@pracucci pracucci self-requested a review July 14, 2022 14:49
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Dimitar for working on this!

pkg/distributor/distributor.go Outdated Show resolved Hide resolved
pkg/distributor/distributor.go Outdated Show resolved Hide resolved
pkg/distributor/distributor.go Outdated Show resolved Hide resolved
docs/sources/operators-guide/mimir-runbooks/_index.md Outdated Show resolved Hide resolved
docs/sources/operators-guide/mimir-runbooks/_index.md Outdated Show resolved Hide resolved
@dimitarvdimitrov dimitarvdimitrov force-pushed the dimitar/distributor-inflight-push-requests-size-limit branch 2 times, most recently from 5c99e7f to 12c584b Compare July 15, 2022 09:19
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! 👏

@dimitarvdimitrov
Copy link
Contributor Author

I relized that we only expose the cortex_request_message_bytes histogram, which is the message size before snappy decompression. I think I should add a gauge with the current inflight requests size, so it's trivial to tune the -distributor.instance-limits.max-inflight-push-requests-total-size

@dimitarvdimitrov
Copy link
Contributor Author

i added that ☝️ + a label value for cortex_distributor_instance_limits. PTAL :)

@dimitarvdimitrov dimitarvdimitrov force-pushed the dimitar/distributor-inflight-push-requests-size-limit branch from 8e122f2 to ee22ea5 Compare July 18, 2022 11:53
maxInflightPushRequestsFlag = "distributor.instance-limits.max-inflight-push-requests"
maxIngestionRateFlag = "distributor.instance-limits.max-ingestion-rate"
maxInflightPushRequestsFlag = "distributor.instance-limits.max-inflight-push-requests"
maxInflightPushRequestsBytesFlag = "distributor.instance-limits.max-inflight-push-requests-total-bytes"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Most other places don't include "total" but the flag does. Should this be changed to be consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I think I just didn't delete the suffix properly in 0eeed94. I will change

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
…l-size

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov dimitarvdimitrov force-pushed the dimitar/distributor-inflight-push-requests-size-limit branch from 11a4a3f to 91dd66d Compare July 18, 2022 16:01
@@ -20,6 +20,7 @@
* [ENHANCEMENT] Store-gateway, listblocks: list of blocks now includes stats from `meta.json` file: number of series, samples and chunks. #2425
* [ENHANCEMENT] Added more buckets to `cortex_ingester_client_request_duration_seconds` histogram metric, to correctly track requests taking longer than 1s (up until 16s). #2445
* [ENHANCEMENT] Azure client: Improve memory usage for large object storage downloads. #2408
* [ENHANCEMENT] Distributor: Add `-distributor.instance-limits.max-inflight-push-requests-bytes`. This limit protects the distributor against multiple large requests that together may cause an OOM, but are only a few, so do not trigger the `max-inflight-push-requests` limit. #2413
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could ask: if we have this new limit available, what is the use of max-inflight-push-requests? But let's leave that for now to not break backwards compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't know if one works better than the other. size limits are probably more related to the actual work a distributor does compared to number of requests, but there may be some overhead in managing the limit (e.g. having to keep it up to date with distributor resources). I expect this to settle with time; then we can delete the other limit.

Copy link
Contributor

@replay replay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, thanks!

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (modulo a nit). Thanks!

pkg/distributor/distributor.go Outdated Show resolved Hide resolved
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@pracucci pracucci enabled auto-merge (squash) July 19, 2022 10:04
@pracucci pracucci merged commit 204cf0b into main Jul 19, 2022
@pracucci pracucci deleted the dimitar/distributor-inflight-push-requests-size-limit branch July 19, 2022 10:22
@bboreham
Copy link
Contributor

bboreham commented Aug 3, 2022

Empirically this number lags the actual memory consumption. I suspect that computing it after protobuf decoding is part of the problem; the size is known after

body, err := decompressRequest(dst, reader, expectedSize, maxSize, compression, sp)

but you could actually reduce maxSize to the gap between the currently-tracked size and the limit, thus apply the limit before full decompression.

(and then do equivalent things on the OTEL path)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce a limit on "max inflight requests size"
5 participants