Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using mmap in the store-gateway #3465

Closed
19 tasks done
pracucci opened this issue Nov 17, 2022 · 3 comments
Closed
19 tasks done

Stop using mmap in the store-gateway #3465

pracucci opened this issue Nov 17, 2022 · 3 comments
Assignees

Comments

@pracucci
Copy link
Collaborator

pracucci commented Nov 17, 2022

We want to get rid of mmap usage in the store-gateway, in order to definitely solve the problem of disk I/O hanging the process (see this and this for reference). This issue tracks the work to stop using mmap in the store-gateway.


Timeline

All dates subject to change, no refunds 😛

Nov 21

  • Take ownership of the code and initial cleanup.
  • Fix v1 index.Symbol lookup bug - ce19084
  • Add tests for encoding.Decbuf - 4f74ccb to 9a60b74

Nov 28

  • Add tests for indexheader.StreamBinaryReader - bda4548
  • Change encoding.Reader interface to return errors - 863cbe0, 31de2b3, bfe4819
  • Add tests for encoding.FileReader - 4e79107
  • Fix any remaining encoding.FileReader issues - 685926f
  • Add Seek() to encoding.Decbuf (use case 1, use case 2) - 90d25b5, d757b12
  • Create feature flag to control use of indexheader.StreamBinaryReader 15b7a58
  • Make indexheader.StreamBinaryReader goroutine safe (each instance of encoding.Decbuf needs a new reader) - fd66a99
  • Switch uses of encoding.BufReader to encoding.FileReader
  • Stream data from disk when computing CRC32 checksums rather than reading the entire segment into memory - dc46d17
  • store-gateway integration test to use indexheader.StreamBinaryReader - 8bb4fdf
  • Cleanup of dead code, error handling, lints.

Dec 5+

@pracucci
Copy link
Collaborator Author

pracucci commented Nov 17, 2022

We had a brief in-person conversation yesterday with Steve. The TL;DR is:

  • The mmap-ed file is opened currently in newFileBinaryReader()
  • The mmap-ed []byte usage entrypoints are:
    • The newFileBinaryReader() itself to pre-populate the in-memory data structures used to map symbols and postings offsets (in both cases we keep 1 entry every 32)
    • BinaryReader.LabelValues()
    • BinaryReader.PostingsOffset() (calling BinaryReader.postingsOffset())

Looking at the actual usage of the mmap-ed []byte, we could change the implementation of the following underlying components to work with file I/O syscalls instead of reading a mmap-ed []byte:

  • newFileBinaryReader(): use syscalls without ever reading the whole index-header in memory
  • encoding.Decbuf: write a new version of it which works with io.ReadSeeker (or similar interface)
  • index.NewSymbols: write a new verson of it which works with io.ReadSeeker (or similar interface)

56quarters added a commit that referenced this issue Nov 21, 2022
In preparation for #3465, import the Prometheus TSDB packages that
will need to be modified to work with a `ReadSeeker`-like interface
instead of a `ByteSlice` interface.

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
@56quarters
Copy link
Contributor

Work on this will live in the following branch until it is ready to be merged to main: https://github.com/grafana/mimir/tree/56quarters/indexheader-mmap-removal

Based on the initial work done by @stevesg here d8e5035

@charleskorn
Copy link
Contributor

Draft PR for this is now up at #3639

56quarters pushed a commit that referenced this issue Dec 9, 2022
… mmap (#3639)

## What this PR does

This adds an alternate implementation of the store gateway's index-header reader that does not use mmap. The motivations for this change and references are covered in #3465.

## Changes

The organization of the changes is described below. Some knowledge of the current architecture of the store-gateway is assumed. All paths are relative to `pkg/storegateway`.

### Encoding components

* `indexheader/encoding/reader.go/FileReader`: this is the lowest level part of this change. This is a buffering wrapper around file operations that provides some convenience methods for seeking, reading, peeking. 
* `indexheader/encoding/encoding.go/Decbuf`: this is a copy / modification of the equivalent Prometheus file. It includes methods for reading integers, strings, and bytes from binary files using a `FileReader` instance. This is the important part: it uses standard go file operations that the goruntime knows how to schedule if they block. Compare this to mmap which causes invisible blocking operations that hang the entire store-gateway.
* `indexheader/encoding/factory.go/DecbufFactory`: this code is responsible for creating new `Decbuf` instances in a variety of ways. It is also responsible for cleaning up after `Decbuf` instances are no longer needed. This is because it pools the underlying `bufio.Reader`s used.

### TSDB index components

* `indexheader/index/symbols.go/Symbols`: this is a copy/modification of the equivalent Prometheus file. It includes a modified `Symbol` reader that works with our `DecbufFactory` (which in turn uses standard go file operations). It also includes a bulk API for reverse lookups since this is required by the `StreamBinaryReader`
* `indexheader/index/positings.go/PostingOffsetTable`: an interface abstracting the differences between how index v1 postings are looked up vs index v2 postings. The existing `BinaryReader` used a bunch of `if` statements to pick how to work on postings. We feel this is cleaner. This file contains the v1 and v2 implementations of this interface using largely the same logic as the existing `BinaryReader`.

### New index-header `Reader`

* `indexheader/stream_binary_reader.go/StreamBinaryReader`: the alternate implementation of the `Reader` interface that uses all the previously described file-based logic for reading index-headers from disk. The interesting part of this class is the logic run when instances are first created: the index-header table-of-contents is loaded, symbols are loaded and validated, postings are loaded and validated.
* `indexheader/reader_pool.go/ReaderPool`: there are some limited changes here to allow the `StreamBinaryReader` to be used when index-header `Reader`s are lazy loaded or eagerly loaded.
 
Co-authored-by: Steve Simpson <steve.simpson@grafana.com>
Co-authored-by: Charles Korn <charles.korn@grafana.com>
Co-authored-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters added a commit that referenced this issue Dec 12, 2022
Add a pool of file handles to avoid the cost of opening and closing files
for each operation that needs access to the index-header file (symbols and
posting offsets).

See #3465

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters added a commit that referenced this issue Dec 13, 2022
Add a pool of file handles to avoid the cost of opening and closing files
for each operation that needs access to the index-header file (symbols and
posting offsets).

See #3465

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters added a commit that referenced this issue Dec 13, 2022
Add a pool of file handles to avoid the cost of opening and closing files
for each operation that needs access to the index-header file (symbols and
posting offsets).

See #3465

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
masonmei pushed a commit to udmire/mimir that referenced this issue Dec 16, 2022
… mmap (grafana#3639)

## What this PR does

This adds an alternate implementation of the store gateway's index-header reader that does not use mmap. The motivations for this change and references are covered in grafana#3465.

## Changes

The organization of the changes is described below. Some knowledge of the current architecture of the store-gateway is assumed. All paths are relative to `pkg/storegateway`.

### Encoding components

* `indexheader/encoding/reader.go/FileReader`: this is the lowest level part of this change. This is a buffering wrapper around file operations that provides some convenience methods for seeking, reading, peeking.
* `indexheader/encoding/encoding.go/Decbuf`: this is a copy / modification of the equivalent Prometheus file. It includes methods for reading integers, strings, and bytes from binary files using a `FileReader` instance. This is the important part: it uses standard go file operations that the goruntime knows how to schedule if they block. Compare this to mmap which causes invisible blocking operations that hang the entire store-gateway.
* `indexheader/encoding/factory.go/DecbufFactory`: this code is responsible for creating new `Decbuf` instances in a variety of ways. It is also responsible for cleaning up after `Decbuf` instances are no longer needed. This is because it pools the underlying `bufio.Reader`s used.

### TSDB index components

* `indexheader/index/symbols.go/Symbols`: this is a copy/modification of the equivalent Prometheus file. It includes a modified `Symbol` reader that works with our `DecbufFactory` (which in turn uses standard go file operations). It also includes a bulk API for reverse lookups since this is required by the `StreamBinaryReader`
* `indexheader/index/positings.go/PostingOffsetTable`: an interface abstracting the differences between how index v1 postings are looked up vs index v2 postings. The existing `BinaryReader` used a bunch of `if` statements to pick how to work on postings. We feel this is cleaner. This file contains the v1 and v2 implementations of this interface using largely the same logic as the existing `BinaryReader`.

### New index-header `Reader`

* `indexheader/stream_binary_reader.go/StreamBinaryReader`: the alternate implementation of the `Reader` interface that uses all the previously described file-based logic for reading index-headers from disk. The interesting part of this class is the logic run when instances are first created: the index-header table-of-contents is loaded, symbols are loaded and validated, postings are loaded and validated.
* `indexheader/reader_pool.go/ReaderPool`: there are some limited changes here to allow the `StreamBinaryReader` to be used when index-header `Reader`s are lazy loaded or eagerly loaded.

Co-authored-by: Steve Simpson <steve.simpson@grafana.com>
Co-authored-by: Charles Korn <charles.korn@grafana.com>
Co-authored-by: Nick Pillitteri <nick.pillitteri@grafana.com>
masonmei pushed a commit to udmire/mimir that referenced this issue Dec 16, 2022
Add a pool of file handles to avoid the cost of opening and closing files
for each operation that needs access to the index-header file (symbols and
posting offsets).

See grafana#3465

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants