-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop using mmap in the store-gateway #3465
Comments
We had a brief in-person conversation yesterday with Steve. The TL;DR is:
Looking at the actual usage of the mmap-ed
|
In preparation for #3465, import the Prometheus TSDB packages that will need to be modified to work with a `ReadSeeker`-like interface instead of a `ByteSlice` interface. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Work on this will live in the following branch until it is ready to be merged to main: https://github.com/grafana/mimir/tree/56quarters/indexheader-mmap-removal |
Draft PR for this is now up at #3639 |
… mmap (#3639) ## What this PR does This adds an alternate implementation of the store gateway's index-header reader that does not use mmap. The motivations for this change and references are covered in #3465. ## Changes The organization of the changes is described below. Some knowledge of the current architecture of the store-gateway is assumed. All paths are relative to `pkg/storegateway`. ### Encoding components * `indexheader/encoding/reader.go/FileReader`: this is the lowest level part of this change. This is a buffering wrapper around file operations that provides some convenience methods for seeking, reading, peeking. * `indexheader/encoding/encoding.go/Decbuf`: this is a copy / modification of the equivalent Prometheus file. It includes methods for reading integers, strings, and bytes from binary files using a `FileReader` instance. This is the important part: it uses standard go file operations that the goruntime knows how to schedule if they block. Compare this to mmap which causes invisible blocking operations that hang the entire store-gateway. * `indexheader/encoding/factory.go/DecbufFactory`: this code is responsible for creating new `Decbuf` instances in a variety of ways. It is also responsible for cleaning up after `Decbuf` instances are no longer needed. This is because it pools the underlying `bufio.Reader`s used. ### TSDB index components * `indexheader/index/symbols.go/Symbols`: this is a copy/modification of the equivalent Prometheus file. It includes a modified `Symbol` reader that works with our `DecbufFactory` (which in turn uses standard go file operations). It also includes a bulk API for reverse lookups since this is required by the `StreamBinaryReader` * `indexheader/index/positings.go/PostingOffsetTable`: an interface abstracting the differences between how index v1 postings are looked up vs index v2 postings. The existing `BinaryReader` used a bunch of `if` statements to pick how to work on postings. We feel this is cleaner. This file contains the v1 and v2 implementations of this interface using largely the same logic as the existing `BinaryReader`. ### New index-header `Reader` * `indexheader/stream_binary_reader.go/StreamBinaryReader`: the alternate implementation of the `Reader` interface that uses all the previously described file-based logic for reading index-headers from disk. The interesting part of this class is the logic run when instances are first created: the index-header table-of-contents is loaded, symbols are loaded and validated, postings are loaded and validated. * `indexheader/reader_pool.go/ReaderPool`: there are some limited changes here to allow the `StreamBinaryReader` to be used when index-header `Reader`s are lazy loaded or eagerly loaded. Co-authored-by: Steve Simpson <steve.simpson@grafana.com> Co-authored-by: Charles Korn <charles.korn@grafana.com> Co-authored-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Add a pool of file handles to avoid the cost of opening and closing files for each operation that needs access to the index-header file (symbols and posting offsets). See #3465 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Add a pool of file handles to avoid the cost of opening and closing files for each operation that needs access to the index-header file (symbols and posting offsets). See #3465 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Add a pool of file handles to avoid the cost of opening and closing files for each operation that needs access to the index-header file (symbols and posting offsets). See #3465 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
… mmap (grafana#3639) ## What this PR does This adds an alternate implementation of the store gateway's index-header reader that does not use mmap. The motivations for this change and references are covered in grafana#3465. ## Changes The organization of the changes is described below. Some knowledge of the current architecture of the store-gateway is assumed. All paths are relative to `pkg/storegateway`. ### Encoding components * `indexheader/encoding/reader.go/FileReader`: this is the lowest level part of this change. This is a buffering wrapper around file operations that provides some convenience methods for seeking, reading, peeking. * `indexheader/encoding/encoding.go/Decbuf`: this is a copy / modification of the equivalent Prometheus file. It includes methods for reading integers, strings, and bytes from binary files using a `FileReader` instance. This is the important part: it uses standard go file operations that the goruntime knows how to schedule if they block. Compare this to mmap which causes invisible blocking operations that hang the entire store-gateway. * `indexheader/encoding/factory.go/DecbufFactory`: this code is responsible for creating new `Decbuf` instances in a variety of ways. It is also responsible for cleaning up after `Decbuf` instances are no longer needed. This is because it pools the underlying `bufio.Reader`s used. ### TSDB index components * `indexheader/index/symbols.go/Symbols`: this is a copy/modification of the equivalent Prometheus file. It includes a modified `Symbol` reader that works with our `DecbufFactory` (which in turn uses standard go file operations). It also includes a bulk API for reverse lookups since this is required by the `StreamBinaryReader` * `indexheader/index/positings.go/PostingOffsetTable`: an interface abstracting the differences between how index v1 postings are looked up vs index v2 postings. The existing `BinaryReader` used a bunch of `if` statements to pick how to work on postings. We feel this is cleaner. This file contains the v1 and v2 implementations of this interface using largely the same logic as the existing `BinaryReader`. ### New index-header `Reader` * `indexheader/stream_binary_reader.go/StreamBinaryReader`: the alternate implementation of the `Reader` interface that uses all the previously described file-based logic for reading index-headers from disk. The interesting part of this class is the logic run when instances are first created: the index-header table-of-contents is loaded, symbols are loaded and validated, postings are loaded and validated. * `indexheader/reader_pool.go/ReaderPool`: there are some limited changes here to allow the `StreamBinaryReader` to be used when index-header `Reader`s are lazy loaded or eagerly loaded. Co-authored-by: Steve Simpson <steve.simpson@grafana.com> Co-authored-by: Charles Korn <charles.korn@grafana.com> Co-authored-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Add a pool of file handles to avoid the cost of opening and closing files for each operation that needs access to the index-header file (symbols and posting offsets). See grafana#3465 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
We want to get rid of mmap usage in the store-gateway, in order to definitely solve the problem of disk I/O hanging the process (see this and this for reference). This issue tracks the work to stop using mmap in the store-gateway.
Timeline
All dates subject to change, no refunds 😛
Nov 21
index.Symbol
lookup bug - ce19084encoding.Decbuf
- 4f74ccb to 9a60b74Nov 28
indexheader.StreamBinaryReader
- bda4548encoding.Reader
interface to return errors - 863cbe0, 31de2b3, bfe4819encoding.FileReader
- 4e79107encoding.FileReader
issues - 685926fSeek()
toencoding.Decbuf
(use case 1, use case 2) - 90d25b5, d757b12indexheader.StreamBinaryReader
15b7a58indexheader.StreamBinaryReader
goroutine safe (each instance ofencoding.Decbuf
needs a new reader) - fd66a99encoding.BufReader
toencoding.FileReader
indexheader.StreamBinaryReader
- 8bb4fdfDec 5+
bufio.Reader
not specific to each index-header - Store gateway mmap removal: use a shared pool of buffers across all Decbuf instances #3691ApplyNot possible with mmap-less implementation[]string
optimization toreadOffsetTable
link 1 link 2The text was updated successfully, but these errors were encountered: