Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store Gateway: Add experimental configuration to use MAP_POPULATE for indexheader reads. #2019

Merged
merged 9 commits into from
Jun 7, 2022

Conversation

stevesg
Copy link
Contributor

@stevesg stevesg commented Jun 3, 2022

What this PR does

Introduces a new experimental configuration option (-blocks-storage.bucket-store.index-header.map-populate-enabled).

This enables the use of the MAP_POPULATE flag when mmap-ing index-header files in the store-gateway. What this flag does is advise the kernel to (synchronously) pre-fault all pages in the memory region, loading them into the file system cache.

Why is this a good idea?

  • The initial read process of the index-header files has shown to cause hangups in the store-gateway.
  • By using this option, I/O is done in the mmap() syscall, which the Go scheduler can cope with.
  • We reduce the likelyhood of Goroutines getting stalled in major page faults.
  • The initial read process walks the entire file anyway, so we are not doing any more I/O.
  • It's a very low risk change compared to re-writing the BinaryReader (work in progress).

Why is this not perfect?

  • The Kernel does not guarantee the pages will stay in memory, so we are only reducing the probability of major page faults.

Rationale about the implementation:

  • I have copied the mmap utilities from Prometheus as a temporary measure, for the sake of evaluating this change.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@stevesg stevesg marked this pull request as ready for review June 3, 2022 14:38
b []byte
}

func OpenMmapFile(path string, populate bool) (*MmapFile, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I understand this file is basically copied from Prometheus and then modified, right?
would it be worth it to upstream this change into Prometheus?

Copy link
Contributor

@replay replay Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or if we don't want to upstream into upstream prometheus, we could still consider making the change in mimir-prometheus instead of copying the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the size of the files, I don't think it's worth the effort doing either right now.

If the testing goes well, then I'll upstream the changes.

Copy link
Contributor

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks very promising. One small issue that it looks like @replay pointed out as well. I'm excited to see how this helps in a high-traffic cluster

pkg/storegateway/indexheader/fileutil/mmap_windows.go Outdated Show resolved Hide resolved
@stevesg stevesg requested a review from a team June 7, 2022 08:43
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR. 🤞 that it helps!

pkg/storegateway/indexheader/config.go Outdated Show resolved Hide resolved
@@ -3397,6 +3397,13 @@ bucket_store:
# CLI flag: -blocks-storage.bucket-store.posting-offsets-in-mem-sampling
[postings_offsets_in_mem_sampling: <int> | default = 32]

index_header:
# (experimental) If enabled, the store-gateway will attempt to pre-populate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move index_header_lazy_loading_enabled to index_header section at some point, if this experiment survives.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid introducing a breaking change, I would rather do the opposite and use YAML inline for the new index header config struct in the bucket stores config. This would allow us to have a dedicated struct for index header config in the code, but not having to move existing stable flags.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on a PR that will touch this area of the code, #2048. I can open a PR to implement this change and update relevant internal (Grafana) configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually nevermind, I'll leave it as is for now and wait to see if our various experiments with index header changes survive.

@stevesg stevesg force-pushed the indexerheader-reader-experimental-populate branch from efd3b09 to 448fe88 Compare June 7, 2022 16:38
@stevesg stevesg merged commit 6b5cc6a into main Jun 7, 2022
@stevesg stevesg deleted the indexerheader-reader-experimental-populate branch June 7, 2022 17:43
stevesg added a commit that referenced this pull request Jun 8, 2022
A previous change (#2019) assumed MAP_POPULATE was available on Darwin. This fixes the build.
56quarters added a commit that referenced this pull request Jul 14, 2022
Superseded by #2019

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters added a commit that referenced this pull request Jul 14, 2022
Superseded by #2019

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants