New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
row_cache_test/test_concurrent_reads_and_eviction
fails sometimes.
#12462
Comments
cc @tgrabiec Does this look like some known bug? Or maybe the test is wrong? |
I recall that @bhalevy also observed this in one CI run recently. This is the second time I hear about this. It could be a regression we should investigate. |
@bhalevy does anyone dealing with this? |
|
Increasing priority as we are trying to fight hard with CI-Stability cases. /Cc @michoecho @bhalevy |
@michoecho did you had a chance to explore it ? |
No, not yet. By the way, after this bug was seen, #12048 was merged, which brought great changes to the handling of range tombstones in cache. It's quite likely that the bug was caught up in the rewrite and disappeared as a side effect. (But still is present in versions <=5.2). |
@denesb After a long debugging session, I arrived at the conclusion that this bug is caused by this line:
end_of_range .
If the generator is flushed right on the end bound of a range tombstone, and the next tombstone is adjacent to it, Here's a unit test which proves the problem:
Does this make sense? Please review the unit test and the logic of My other problem is: I don't understand how this situation occurs in practice. I don't know how to write a high level test (i.e. based on memtable/cache operations, rather than calling range_tombstone_change_generator directly) which triggers this bug. Do you have an idea about how to do that? I debugged this issue by determinizing This reproducer was enough to (slowly and painfully) lead me to the range_tombstone_change_generator problem, but it's too complex for me to figure out what kind of sequence of cache reads and updates may lead to the bug being triggered. I'm looking for a simpler reproducer to make sure that the cache didn't get into some bad state in the first place, and that the range_tombstone_change_generator bug wasn't only triggered as an side effect of the bad state. cc @bhalevy @tgrabiec Maybe you too can give me a hint about how to write a simpler reproducer for this bug. |
@DoronArazii This is a 5.2 correctness bug. Since it wasn't seen earlier, it's likely a 5.2 regression. Maybe it should be considered a release blocker. |
Would it be possible to upstream this mode so that we can debug such failures more easily in the future? |
Looking at the contract of range_tombstone_change_generator::flush() with end_of_range = true, the implementation is wrong here:
It should call should_flush() on position(), not end_position(). I don't understand why we need this change of behavior though. cache reader uses it here:
Setting the flag to true is supposed to emit an rtc at position
When I refactored this code, I didn't try to understand the existing logic, I replaced it what I though was correct. |
For reference, the new logic in master looks like this:
|
Actually, I was confused in my analysis. The end_of_range=true is not just supposed to emit positions which are equal to the upper_bound, like the documentation of flush() suggests, it is supposed to emit rtc to null at upper_bound if there is an active range tombstone (that's what cache reader expects). The quoted line is fine, but the logic is incomplete. Here's a row_cache-level test case:
|
This fixes it: diff --git a/range_tombstone_change_generator.hh b/range_tombstone_change_generator.hh
index 6f98be5dceb..e5fd3b8cb84 100644
--- a/range_tombstone_change_generator.hh
+++ b/range_tombstone_change_generator.hh
@@ -114,6 +114,7 @@ class range_tombstone_change_generator {
// It cannot get adjacent later because prev->end_position() < upper_bound,
// so nothing == prev->end_position() can be added after this invocation.
if (prev && (_range_tombstones.empty()
+ || end_of_range
|| (cmp(prev->end_position(), _range_tombstones.begin()->position()) < 0))) {
consumer(range_tombstone_change(prev->end_position(), tombstone())); // [2]
} But I'm not sure it's the best way. |
Thanks, that's what I was looking for. My attempt looked like this:
but when it didn't work, I decided that I probably misunderstand how range tombstones are read and I should ask for help before I spend too much time on debugging. I now see that I was only forgetting to populate the cache after the first apply.
My idea for a fix was: diff --git a/range_tombstone_change_generator.hh b/range_tombstone_change_generator.hh
index 6f98be5dce..48d3fc0043 100644
--- a/range_tombstone_change_generator.hh
+++ b/range_tombstone_change_generator.hh
@@ -83,7 +83,7 @@ class range_tombstone_change_generator {
position_in_partition::tri_compare cmp(_schema);
std::optional<range_tombstone> prev;
- const bool allow_eq = end_of_range || upper_bound.is_after_all_clustered_rows(_schema);
+ const bool allow_eq = upper_bound.is_after_all_clustered_rows(_schema);
const auto should_flush = [&] (position_in_partition_view pos) {
const auto res = cmp(pos, upper_bound);
if (allow_eq) { In this case the range tombstone ending on upper_bound would be handled by the last two
It would, with some effort, but I'm not sure how useful it would be in general. This method is good if preemptions are the only source of nondeterminism in the test (which might be the case for quite many tests, I guess), but as soon as more sources get involved (i.e. timers, smp or IO), it won't help. |
Even if it works only for test_concurrent_reads_and_eviction test, I think it's worth it. This test is notorious for finding hard to reproduce problems. |
… range tombstone Adds a reproducer for scylladb#12462, which doesn't manifest in master any more after f73e2c9. It's still useful to keep the test to avoid regresions. The bug manifests by reader throwing: std::logic_error: Stream ends with an active range tombstone: {range_tombstone_change: pos={position: clustered,ckp{},-1}, {tombstone: timestamp=-9223372036854775805, deletion_time=2}} The reason is that prior to the rework of the cache reader, range_tombstone_generator::flush() was used with end_of_range=true to produce the closing range_tombstone_change and it did not handle correctly the case when there are two adjacent range tombstones and flush(pos, end_of_range=true) is called such that pos is the boundary between the two.
… range tombstone Adds a reproducer for #12462, which doesn't manifest in master any more after f73e2c9. It's still useful to keep the test to avoid regresions. The bug manifests by reader throwing: std::logic_error: Stream ends with an active range tombstone: {range_tombstone_change: pos={position: clustered,ckp{},-1}, {tombstone: timestamp=-9223372036854775805, deletion_time=2}} The reason is that prior to the rework of the cache reader, range_tombstone_generator::flush() was used with end_of_range=true to produce the closing range_tombstone_change and it did not handle correctly the case when there are two adjacent range tombstones and flush(pos, end_of_range=true) is called such that pos is the boundary between the two. Closes #13665
@tgrabiec , @bhalevy , @michoecho - I have missed the question above (or more accurately the answer) - is this a 5.2.0 blocker? I sincerely hope we can fix it in 5.2.1 or so... |
@avikivity please have a look and determine if it's a blocker |
… range tombstone Adds a reproducer for #12462, which doesn't manifest in master any more after f73e2c9. It's still useful to keep the test to avoid regresions. The bug manifests by reader throwing: std::logic_error: Stream ends with an active range tombstone: {range_tombstone_change: pos={position: clustered,ckp{},-1}, {tombstone: timestamp=-9223372036854775805, deletion_time=2}} The reason is that prior to the rework of the cache reader, range_tombstone_generator::flush() was used with end_of_range=true to produce the closing range_tombstone_change and it did not handle correctly the case when there are two adjacent range tombstones and flush(pos, end_of_range=true) is called such that pos is the boundary between the two. Closes #13665
According to #12462 (comment) this was introduced in 5.1 and is therefore not a regression, and therefore not a 5.2 blocker. However, we should fix it promptly. |
… range tombstone Adds a reproducer for #12462, which doesn't manifest in master any more after f73e2c9. It's still useful to keep the test to avoid regresions. The bug manifests by reader throwing: std::logic_error: Stream ends with an active range tombstone: {range_tombstone_change: pos={position: clustered,ckp{},-1}, {tombstone: timestamp=-9223372036854775805, deletion_time=2}} The reason is that prior to the rework of the cache reader, range_tombstone_generator::flush() was used with end_of_range=true to produce the closing range_tombstone_change and it did not handle correctly the case when there are two adjacent range tombstones and flush(pos, end_of_range=true) is called such that pos is the boundary between the two. Closes #13665
To the extent I was able to understand the original context for 5cc5fd4, I think this is a right way. |
@michoecho - will you be sending a patch to fix this? Anyone else who should be assigned to this? |
Yes. |
range_tombstone_change_generator::flush() mishandles the case when two range tombstones are adjacent and flush(pos, end_of_range=true) is called with pos equal to the end bound of the lesser-position range tombstone. In such case, the start change of the greater-position rtc will be accidentally emitted, and there won't be an end change, which breaks reader assumptions by ending the stream with an unclosed range tombstone, triggering an assertion. This is due to a non-strict inequality used in a place where strict inequality should be used. The modified line was intended to close range tombstones which end exactly on the flush position, but this is unnecessary because such range tombstones are handled by the last `if` in the function anyway. Instead, this line caused range tombstones beginning right after the flush position to be emitted sometimes. Fixes scylladb#12462
… range tombstone Adds a reproducer for scylladb#12462. The bug manifests by reader throwing: std::logic_error: Stream ends with an active range tombstone: {range_tombstone_change: pos={position: clustered,ckp{},-1}, {tombstone: timestamp=-9223372036854775805, deletion_time=2}} The reason is that prior to the fix range_tombstone_change_generator::flush() was used with end_of_range=true to produce the closing range_tombstone_change and it did not handle correctly the case when there are two adjacent range tombstones and flush(pos, end_of_range=true) is called such that pos is the boundary between the two. Cherry-picked from a717c80.
… from Michał Chojnowski range_tombstone_change_generator::flush() mishandles the case when two range tombstones are adjacent and flush(pos, end_of_range=true) is called with pos equal to the end bound of the lesser-position range tombstone. In such case, the start change of the greater-position rtc will be accidentally emitted, and there won't be an end change, which breaks reader assumptions by ending the stream with an unclosed range tombstone, triggering an assertion. This is due to a non-strict inequality used in a place where strict inequality should be used. The modified line was intended to close range tombstones which end exactly on the flush position, but this is unnecessary because such range tombstones are handled by the last `if` in the function anyway. Instead, this line caused range tombstones beginning right after the flush position to be emitted sometimes. Fixes #12462 Closes #13894 * github.com:scylladb/scylladb: tests: row_cache: Add reproducer for reader producing missing closing range tombstone range_tombstone_change_generator: fix an edge case in flush()
range_tombstone_change_generator::flush() mishandles the case when two range tombstones are adjacent and flush(pos, end_of_range=true) is called with pos equal to the end bound of the lesser-position range tombstone. In such case, the start change of the greater-position rtc will be accidentally emitted, and there won't be an end change, which breaks reader assumptions by ending the stream with an unclosed range tombstone, triggering an assertion. This is due to a non-strict inequality used in a place where strict inequality should be used. The modified line was intended to close range tombstones which end exactly on the flush position, but this is unnecessary because such range tombstones are handled by the last `if` in the function anyway. Instead, this line caused range tombstones beginning right after the flush position to be emitted sometimes. Fixes scylladb#12462
Backport to 5.1 missing apparently. |
Note it still has the |
@scylladb/scylla-maint ping |
range_tombstone_change_generator::flush() mishandles the case when two range tombstones are adjacent and flush(pos, end_of_range=true) is called with pos equal to the end bound of the lesser-position range tombstone. In such case, the start change of the greater-position rtc will be accidentally emitted, and there won't be an end change, which breaks reader assumptions by ending the stream with an unclosed range tombstone, triggering an assertion. This is due to a non-strict inequality used in a place where strict inequality should be used. The modified line was intended to close range tombstones which end exactly on the flush position, but this is unnecessary because such range tombstones are handled by the last `if` in the function anyway. Instead, this line caused range tombstones beginning right after the flush position to be emitted sometimes. Fixes #12462 Closes #13906 (cherry picked from commit 9b0679c)
Backported to 5.1 as f13f895. |
My fault. I initially thought that this bug wasn't present in master (I though the buggy code was deleted), so I posted a PR against 5.2. After it got merged, I realized that it's present in master after all, so I posted a separate PR against master. (So it wasn't a standard backport procedure). |
During a CI run for an unrelated commit,
row_cache_test/test_concurrent_reads_and_eviction
failed with a logic error:This is not a fluke (i.e. a cosmic bitflip), because I could reproduce this failure on my PC — but it only happened once over hundreds of runs (with the same seed), so it's most likely timing-related.
Command used:
Scylla commit:
1d273a98b989a4a3cacc84b2f5d1f38dc207ed45
The text was updated successfully, but these errors were encountered: