Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read may skip over rows when overlapping range deletes exist in different active partition versions #3053

Closed
haaawk opened this Issue Dec 7, 2017 · 1 comment

Comments

Projects
None yet
2 participants
@haaawk
Copy link
Member

haaawk commented Dec 7, 2017

This patch exposes a bug in the cache when reads and update interleave: scylladb/seastar-dev@55aa3d8

It can be simplified further with: scylladb/seastar-dev@e290c75

After this change the test starts to fail on regular bases but nondeterministically.

All that this change does it fills the buffer of a streamed mutation just after it is created.
That usually does not exhaust the streamed mutation so first part of mutation fragments are buffered before update happans and second is buffered after the update.

basically it's the failing case is:

  1. create a range reader, get streamed mutation from it and fill the buffer of this streamed mutation
  2. update cache
  3. create second range reader, get streamed mutation from it and fill its buffer
  4. read the streamed_mutation of first reader entirely
  5. read the streamed_mutation of second reader entirely
    mutation obtained from the second reader is missing some rows that should be there

@tgrabiec tgrabiec self-assigned this Dec 7, 2017

@tgrabiec

This comment has been minimized.

Copy link
Contributor

tgrabiec commented Dec 7, 2017

The problem is that partition_snapshot::range_tombstones() is deoverlapping tombstones coming from different versions, and it may happen that due to range tombstone splitting that function will return a tombstone which starts after the requested range. This breaks assumptions made by the cache reader. It keeps track of the maximum fragment position, and if cache reader will then need to read from sstables due to a miss, it would do so starting from the position marked by that out of range tombstone, possibly skipping over some rows.

There is no race involved here, the test is fully deterministic. Only the data set on which it works is randomized at startup.

@tgrabiec tgrabiec changed the title Races of cache reads and cache updates cause reads to return wrong data Read may skip over rows when overlapping range deletes exist in different active partition versions Dec 7, 2017

@pdziepak pdziepak closed this in 1303320 Dec 8, 2017

pdziepak pushed a commit that referenced this issue Dec 8, 2017

tests: row_cache: Add reproducer for issue #3053
The issue is that partition_snapshot::range_tombstones() is
deoverlapping tombstones coming from different versions, and it may
happen that due to range tombstone splitting that function will return
a tombstone which starts after the requested range. This breaks
assumptions made by the cache reader. It keeps track of the maximum
fragment position, and if cache reader will then need to read from
sstables due to a miss, it would do so starting from the position
marked by that out of range tombstone, possibly skipping over some
rows.

pdziepak added a commit that referenced this issue Dec 8, 2017

Merge "Fix range tombstone emitting which led to skipping over data" …
…from Tomasz

"Fixes cache reader to not skip over data in some cases involving overlapping
range tombstones in different partition versions and discontinuous cache.

Introduced in 2.0

Fixes #3053."

* tag 'tgrabiec/fix-range-tombstone-slicing-v2' of github.com:scylladb/seastar-dev:
  tests: row_cache: Add reproducer for issue #3053
  tests: mvcc: Add test for partition_snapshot::range_tombstones()
  mvcc: Optimize partition_snapshot::range_tombstones() for single version case
  mvcc: Fix partition_snapshot::range_tombstones()
  tests: random_mutation_generator: Do not emit dummy entries at clustering row positions

tgrabiec added a commit that referenced this issue Dec 8, 2017

Merge "Fix range tombstone emitting which led to skipping over data" …
…from Tomasz

"Fixes cache reader to not skip over data in some cases involving overlapping
range tombstones in different partition versions and discontinuous cache.

Introduced in 2.0

Fixes #3053."

* tag 'tgrabiec/fix-range-tombstone-slicing-v2' of github.com:scylladb/seastar-dev:
  tests: row_cache: Add reproducer for issue #3053
  tests: mvcc: Add test for partition_snapshot::range_tombstones()
  mvcc: Optimize partition_snapshot::range_tombstones() for single version case
  mvcc: Fix partition_snapshot::range_tombstones()
  tests: random_mutation_generator: Do not emit dummy entries at clustering row positions

(cherry picked from commit 051cbbc)

tgrabiec added a commit that referenced this issue Dec 8, 2017

Merge "Fix range tombstone emitting which led to skipping over data" …
…from Tomasz

"Fixes cache reader to not skip over data in some cases involving overlapping
range tombstones in different partition versions and discontinuous cache.

Introduced in 2.0

Fixes #3053."

* tag 'tgrabiec/fix-range-tombstone-slicing-v2' of github.com:scylladb/seastar-dev:
  tests: row_cache: Add reproducer for issue #3053
  tests: mvcc: Add test for partition_snapshot::range_tombstones()
  mvcc: Optimize partition_snapshot::range_tombstones() for single version case
  mvcc: Fix partition_snapshot::range_tombstones()
  tests: random_mutation_generator: Do not emit dummy entries at clustering row positions

(cherry picked from commit 051cbbc)
(cherry picked from commit be51273)

[tgrabiec: dropped mvcc_test change, because the file does not exist here]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.