New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full scans of Leveled Compaction Strategy tables can skip some data #3513
Comments
Partitioned sstable set's incremental selector may be the one to blame.
…On Wed, Jun 13, 2018, 9:56 AM Avi Kivity ***@***.***> wrote:
*Installation details*
Scylla version (or git commit hash): 2.1+
A full scan of a table using the Leveled Compaction Strategy can miss a
small fraction of the data. This was observed in decommission, where some
data (<1% in a test) was not streamed.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3513>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABWAcwR6VWrNjVIsS21nvQE2lDPVNo0eks5t8QwDgaJpZM4UmJIa>
.
|
On 2018-06-13 16:21, Raphael Carvalho wrote:
Partitioned sstable set's incremental selector may be the one to blame.
Do you have a theory? Or is it a candidate because it's in the stack?
BTW combined reader is also a potential candidate.
|
It's a candidate because https://groups.google.com/d/msg/scylladb-dev/9aixIHQAf2I/vjWOcD5WAQAJ "fixes" the problem. |
(it still uses combined_reader, so it's less of a candidate. of course it can be that combined_reader doesn't use the information from incremental_selector correctly, I'll let the authors of the two components to fight it out) |
Reproduced on master, so the bug wasn't accidentally fixed. |
There is a bug in incremental_selector for partitioned_sstable_set, so until it is found, stop using it. This degrades scan performance of Leveled Compaction Strategy tables. Fixes #3513. (as a workaround) Introduced: 2.1 Message-Id: <20180613131547.19084-1-avi@scylladb.com> (cherry picked from commit aeffbb6)
There is a bug in incremental_selector for partitioned_sstable_set, so until it is found, stop using it. This degrades scan performance of Leveled Compaction Strategy tables. Fixes #3513. (as a workaround) Introduced: 2.1 Message-Id: <20180613131547.19084-1-avi@scylladb.com> (cherry picked from commit aeffbb6)
There is a bug in incremental_selector for partitioned_sstable_set, so until it is found, stop using it. This degrades scan performance of Leveled Compaction Strategy tables. Fixes #3513. (as a workaround) Introduced: 2.1 Message-Id: <20180613131547.19084-1-avi@scylladb.com> (cherry picked from commit aeffbb6)
There is a bug in incremental_selector for partitioned_sstable_set, so until it is found, stop using it. This degrades scan performance of Leveled Compaction Strategy tables. Fixes #3513. (as a workaround) Introduced: 2.1 Message-Id: <20180613131547.19084-1-avi@scylladb.com> (cherry picked from commit aeffbb6) (cherry picked from commit 044cfde)
Root cause: the combined reader uses the key of the current partition to ask the incremental selector for new readers. But in some case the currently open sstables will have a huge gap between two partitions. In some cases this gap is so wide it includes some unselected sstables entirely. In this cases these sstables will be ignored from the read entirely.
When the combined reader reads The solution is to use the |
There is a bug in incremental_selector for partitioned_sstable_set, so until it is found, stop using it. This degrades scan performance of Leveled Compaction Strategy tables. Fixes scylladb#3513. (as a workaround) Introduced: 2.1 Message-Id: <20180613131547.19084-1-avi@scylladb.com> (cherry picked from commit aeffbb6)
" This series fixes the "LCS data-loss bug" where full scans (and everything that uses them) would miss some small percentage (> 0.001%) of the keys. This could easily lead to permanent data-loss as compaction and decomission both use full scans. aeffbb6 worked around this bug by disabling the incremental reader selectors (the class identified as the source of the bug) altogether. This series fixes the underlying issue and reverts aeffbb6. The root cause of the bug is that the `incremental_reader_selector` uses the current read position to poll for new readers using `sstable_set::incremental_selector::select()`. This means that when the currently open sstables contain no partitions that would intersect with some of the yet unselected sstables, those sstables would be ignored. Solve the problem by not calling `select()` with the current read position and always pass the `next_position` returned in the previous call. This means that the traversal of the sstable-set happens at a pace defined by the sstable-set itself and this guarantees that no sstable will be jumped over. When asked for new readers the `incremental_reader_selector` will now iteratively call `select()` using the `next_position` from the previous `select()` call until it either receives some new, yet unselected sstables, or `next_position` surpasses the read position (in which case `select()` will be tried again later). The `sstable_set::incremental_selector` was not suitable in its present state to support calling `select()` with the `next_position` from a previous call as in some cases it could not make progress due to inclusiveness related ambiguities. So in preparation to the above fix `sstable_set` was updated to work in terms of ring-position instead of tokens. Ring-position can express positions in a much more fine-grained way then token, including positions after/before tokens and keys. This allows for a clear expression of `next_position` such that calling `select()` with it guarantees forward progress in the token-space. Tests: unit(release, debug) Refs: #3513 " * 'leveled-missing-keys/v4' of https://github.com/denesb/scylla: tests/mutation_reader_test: combined_mutation_reader_test: use SEASTAR_THREAD_TEST_CASE tests/mutation_reader_test: refactor combined_mutation_reader_test tests/mutation_reader_test: fix reader_selector related tests Revert "database: stop using incremental selectors" incremental_reader_selector: don't jump over sstables mutation_reader: reader_selector: use ring_position instead of token sstables_set::incremental_selector: use ring_position instead of token compatible_ring_position: refactor to compatible_ring_position_view dht::ring_position_view: use token_bound from ring_position i_partitioner: add free function ring-position tri comparator mutation_reader_merger::maybe_add_readers(): remove early return mutation_reader_merger: get rid of _key
Installation details
Scylla version (or git commit hash): 2.1+
A full scan of a table using the Leveled Compaction Strategy can miss a small fraction of the data. This was observed in decommission, where some data (<1% in a test) was not streamed.
The text was updated successfully, but these errors were encountered: