Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce sync concurrency in store-gateway by default to reduce disk contention #7136

Merged
merged 2 commits into from
Jan 16, 2024

Conversation

andyasp
Copy link
Contributor

@andyasp andyasp commented Jan 15, 2024

What this PR does

Follow-up to #5025. Internally we reduced the concurrency even when lazy loading was enabled. This upstreams those changes to the defaults.

Which issue(s) this PR fixes or relates to

Fixes N/A

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@andyasp andyasp marked this pull request as ready for review January 15, 2024 21:14
@andyasp andyasp requested review from a team as code owners January 15, 2024 21:14
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super convinced we want to update the Mimir default, rather than changing the default config in Jsonnet and Helm. Reason is that this change makes sense if you run on HDD, but not on SSD. Changing this default we're assuming people run on HDD vs SSD, which I'm not sure it's a fair assumption.

CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. LGTM, thanks!

@andyasp andyasp merged commit f299ba7 into main Jan 16, 2024
28 checks passed
@andyasp andyasp deleted the aasp/store-gateway-sync-defaults branch January 16, 2024 14:58
@aallawala
Copy link
Contributor

👋 @andyasp, is the guidance for this to keep this the same if we're running on SSDs? How can we validate whether this will introduce a regression or not?

@dimitarvdimitrov
Copy link
Contributor

is the guidance for this to keep this the same if we're running on SSDs? How can we validate whether this will introduce a regression or not?

The performance of syncing blocks when running on SSDs should be minimal. In general I don't expect that reducing the concurrency there has a negative effect.

One way to verify whether this will have a negative effect is to check how long blocks syncing takes today. Currently measuring sync duration is only possible by looking at logs timestamps from the store-gateway: synchronizing TSDB blocks for all users when syncing starts and successfully synchronized TSDB blocks for all users when syncing finishes.

You want these syncs to not take longer than the sync interval (blocks-storage.bucket-store.sync-interval, default 15m). The reduction in concurrency was 50x in multi-tenant clusters and 20x in single-tenant ones. Depending on the SSD performance/throughput & IOPS limits it might be safe to assume a linear relationship between the concurrency and the time it takes to complete a sync, but you have to verify this in your environment. So roughly speaking if syncs today take less than 18 seconds, it's safe to take this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants