Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full scans of Leveled Compaction Strategy tables can skip some data #3513

Closed
avikivity opened this issue Jun 13, 2018 · 6 comments
Closed

Full scans of Leveled Compaction Strategy tables can skip some data #3513

avikivity opened this issue Jun 13, 2018 · 6 comments

Comments

@avikivity
Copy link
Member

Installation details
Scylla version (or git commit hash): 2.1+

A full scan of a table using the Leveled Compaction Strategy can miss a small fraction of the data. This was observed in decommission, where some data (<1% in a test) was not streamed.

@raphaelsc
Copy link
Member

raphaelsc commented Jun 13, 2018 via email

@denesb
Copy link
Contributor

denesb commented Jun 13, 2018 via email

@avikivity
Copy link
Member Author

It's a candidate because https://groups.google.com/d/msg/scylladb-dev/9aixIHQAf2I/vjWOcD5WAQAJ "fixes" the problem.

@avikivity
Copy link
Member Author

(it still uses combined_reader, so it's less of a candidate. of course it can be that combined_reader doesn't use the information from incremental_selector correctly, I'll let the authors of the two components to fight it out)

@avikivity
Copy link
Member Author

Reproduced on master, so the bug wasn't accidentally fixed.

avikivity added a commit that referenced this issue Jun 13, 2018
There is a bug in incremental_selector for partitioned_sstable_set, so
until it is found, stop using it.

This degrades scan performance of Leveled Compaction Strategy tables.

Fixes #3513. (as a workaround)
Introduced: 2.1
Message-Id: <20180613131547.19084-1-avi@scylladb.com>

(cherry picked from commit aeffbb6)
avikivity added a commit that referenced this issue Jun 13, 2018
There is a bug in incremental_selector for partitioned_sstable_set, so
until it is found, stop using it.

This degrades scan performance of Leveled Compaction Strategy tables.

Fixes #3513. (as a workaround)
Introduced: 2.1
Message-Id: <20180613131547.19084-1-avi@scylladb.com>

(cherry picked from commit aeffbb6)
avikivity added a commit that referenced this issue Jun 14, 2018
There is a bug in incremental_selector for partitioned_sstable_set, so
until it is found, stop using it.

This degrades scan performance of Leveled Compaction Strategy tables.

Fixes #3513. (as a workaround)
Introduced: 2.1
Message-Id: <20180613131547.19084-1-avi@scylladb.com>

(cherry picked from commit aeffbb6)
duarten pushed a commit that referenced this issue Jun 14, 2018
There is a bug in incremental_selector for partitioned_sstable_set, so
until it is found, stop using it.

This degrades scan performance of Leveled Compaction Strategy tables.

Fixes #3513. (as a workaround)
Introduced: 2.1
Message-Id: <20180613131547.19084-1-avi@scylladb.com>

(cherry picked from commit aeffbb6)
(cherry picked from commit 044cfde)
@denesb
Copy link
Contributor

denesb commented Jun 27, 2018

Root cause: the combined reader uses the key of the current partition to ask the incremental selector for new readers. But in some case the currently open sstables will have a huge gap between two partitions. In some cases this gap is so wide it includes some unselected sstables entirely. In this cases these sstables will be ignored from the read entirely.
Example:

sstable A: [a1                    a1]
sstable B:          [b1 b2]
sstable C:    [c1                           c2 c3 c4]
sstable D:                        [d1 d2 d3]

When the combined reader reads d1 it polls the selector for new readers. At this point it will have readers for A and C. Since d1 is after all keys in B and only intersects with D, B will be ignored (jumped over) and D will be selected.

The solution is to use the next_position provided by the partitioned_sstable_set's incremental selector to poll for new readers. This will guarantee that no intervals are jumped over.

denesb pushed a commit to denesb/scylla that referenced this issue Jul 2, 2018
There is a bug in incremental_selector for partitioned_sstable_set, so
until it is found, stop using it.

This degrades scan performance of Leveled Compaction Strategy tables.

Fixes scylladb#3513. (as a workaround)
Introduced: 2.1
Message-Id: <20180613131547.19084-1-avi@scylladb.com>

(cherry picked from commit aeffbb6)
avikivity added a commit that referenced this issue Jul 5, 2018
"
This series fixes the "LCS data-loss bug" where full scans (and
everything that uses them) would miss some small percentage (> 0.001%)
of the keys. This could easily lead to permanent data-loss as compaction
and decomission both use full scans.
aeffbb6 worked around this bug by disabling the incremental reader
selectors (the class identified as the source of the bug) altogether.
This series fixes the underlying issue and reverts aeffbb6.

The root cause of the bug is that the `incremental_reader_selector` uses
the current read position to poll for new readers using
`sstable_set::incremental_selector::select()`. This means that when the
currently open sstables contain no partitions that would intersect with
some of the yet unselected sstables, those sstables would be ignored.
Solve the problem by not calling `select()` with the current read
position and always pass the `next_position` returned in the previous
call. This means that the traversal of the sstable-set happens at a pace
defined by the sstable-set itself and this guarantees that no sstable
will be jumped over. When asked for new readers the
`incremental_reader_selector` will now iteratively call `select()` using
the `next_position` from the previous `select()` call until it either
receives some new, yet unselected sstables, or `next_position` surpasses
the read position (in which case `select()` will be tried again later).
The `sstable_set::incremental_selector` was not suitable in its present
state to support calling `select()` with the `next_position` from a
previous call as in some cases it could not make progress due to
inclusiveness related ambiguities. So in preparation to the above fix
`sstable_set` was updated to work in terms of ring-position instead of
tokens. Ring-position can express positions in a much more fine-grained
way then token, including positions after/before tokens and keys. This
allows for a clear expression of `next_position` such that calling
`select()` with it guarantees forward progress in the token-space.

Tests: unit(release, debug)

Refs: #3513
"

* 'leveled-missing-keys/v4' of https://github.com/denesb/scylla:
  tests/mutation_reader_test: combined_mutation_reader_test: use SEASTAR_THREAD_TEST_CASE
  tests/mutation_reader_test: refactor combined_mutation_reader_test
  tests/mutation_reader_test: fix reader_selector related tests
  Revert "database: stop using incremental selectors"
  incremental_reader_selector: don't jump over sstables
  mutation_reader: reader_selector: use ring_position instead of token
  sstables_set::incremental_selector: use ring_position instead of token
  compatible_ring_position: refactor to compatible_ring_position_view
  dht::ring_position_view: use token_bound from ring_position
  i_partitioner: add free function ring-position tri comparator
  mutation_reader_merger::maybe_add_readers(): remove early return
  mutation_reader_merger: get rid of _key
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants