test_concurrent_reads_and_eviction: Mutation read doesn't match any expected version #15483

bhalevy · 2023-09-20T06:51:57Z

This was hit in CI with an unrelated change so it can be attrobuted to baseline version b87660f
https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/3706/artifact/testlog/x86_64/release/boost.row_cache_test.test_concurrent_reads_and_eviction.3.log

random-seed=2470087052
random_mutation_generator seed: 2568563592
test/boost/row_cache_test.cc(3447): fatal error: in "test_concurrent_reads_and_eviction": Mutation read doesn't match any expected version ...

boost.row_cache_test.test_concurrent_reads_and_eviction.3.log.gz

@michoecho please look into this

The text was updated successfully, but these errors were encountered:

michoecho · 2023-09-27T15:13:27Z

I looked into it and it appears to be a real problem. It seems that reverse reads are reading range tombstones incorrectly sometimes. But it will take some work to understand the cause.

bhalevy · 2023-09-28T09:35:40Z

I looked into it and it appears to be a real problem. It seems that reverse reads are reading range tombstones incorrectly sometimes. But it will take some work to understand the cause.

Cc @denesb

denesb · 2023-09-28T10:14:55Z

Could this be a regression caused by v2 internal representation?

avikivity · 2023-10-09T11:03:35Z

Seen again: https://jenkins.scylladb.com/view/nexts/job/scylla-master/job/next/6632/

denesb · 2023-10-09T12:21:02Z

@michoecho did you managed to reproduce this? I couldn't reproduce even with --count=100 and the seed from the last failure hardcoded.

michoecho · 2023-10-09T12:29:41Z

Yes, I managed to reproduce it, but it's rare. It reproduced about once per ~20 minutes of running in a loop. --count=100 is not good enough.

However, I noticed that when I set debuggable = true; in test/lib/mutation_source_test.cc, the failure reproduces very easily. (About once per ~3 runs.)

denesb · 2023-10-09T12:55:51Z

However, I noticed that when I set debuggable = true; in test/lib/mutation_source_test.cc, the failure reproduces very easily. (About once per ~3 runs.)

Indeed, I immediately got a failing run once I set debuggable = true. Thanks for the tip.

denesb · 2023-10-09T12:56:09Z

@michoecho if you don't mind, I will take over this issue.

michoecho · 2023-10-09T13:01:22Z

@michoecho if you don't mind, I will take over this issue.

Sure, why not. But when you are done with it, please tell how you debugged it. test_concurrent_reads_and_eviction is very hard to debug. I'm interested what techniques other people use to do it.

denesb · 2023-10-09T13:08:18Z

@michoecho if you don't mind, I will take over this issue.

Sure, why not. But when you are done with it, please tell how you debugged it. test_concurrent_reads_and_eviction is very hard to debug. I'm interested what techniques other people use to do it.

Ok, will do.

denesb · 2023-10-09T13:15:49Z

Wow, I was super lucky and got a reallly small mutation reproducer:

test/lib/mutation_assertions.hh(134): fatal error: in "test_concurrent_reads_and_eviction": Mutations differ, expected {table: 'ks.cf', key: {'pk': ef4b9e22, token: -8892763474812993290}, mutation_partition: {
  tombstone: {tombstone: timestamp=3330002, deletion_time=777000602},
  range_tombstones: {
    {range_tombstone: start={position: clustered, ckp{00040e8c563e0004e1f8219b}, 1}, end={position: clustered, ckp{00040e8c563e0004f32c13c1}, -1}, {tombstone: timestamp=3330002, deletion_time=777000918}},
    {range_tombstone: start={position: clustered, ckp{00040e8c563e0004f32c13c1}, -1}, end={position: clustered, ckp{000416eb7874}, -1}, {tombstone: timestamp=3330002, deletion_time=777001113}},
    {range_tombstone: start={position: clustered, ckp{000416eb7874}, -1}, end={position: clustered, ckp{000416eb7874}, 1}, {tombstone: timestamp=3330002, deletion_time=777001120}},
    {range_tombstone: start={position: clustered, ckp{000416eb7874}, 1}, end={position: clustered, ckp{00047750d3ef}, 1}, {tombstone: timestamp=3330002, deletion_time=777001128}},
    {range_tombstone: start={position: clustered, ckp{00047750d3ef}, 1}, end={position: clustered, ckp{000486dd59e5000416eb7874}, -1}, {tombstone: timestamp=3330002, deletion_time=777001121}},
    {range_tombstone: start={position: clustered, ckp{000486dd59e5000416eb7874}, -1}, end={position: clustered, ckp{00048b105fe0000416eb7874}, -1}, {tombstone: timestamp=3330002, deletion_time=777001120}},
    {range_tombstone: start={position: clustered, ckp{00048b105fe0000416eb7874}, -1}, end={position: clustered, ckp{0004e1f8219b}, -1}, {tombstone: timestamp=3330002, deletion_time=777001117}},
    {range_tombstone: start={position: clustered, ckp{0004e1f8219b}, -1}, end={position: clustered, ckp{0004ef4b9e22}, -1}, {tombstone: timestamp=3330002, deletion_time=777001104}},
    {range_tombstone: start={position: clustered, ckp{0004ef4b9e22}, -1}, end={position: clustered, ckp{}, 1}, {tombstone: timestamp=3330002, deletion_time=777001004}}},
  rows: [
  ]
}
}
 ...but got: {table: 'ks.cf', key: {'pk': ef4b9e22, token: -8892763474812993290}, mutation_partition: {
  tombstone: {tombstone: timestamp=3330002, deletion_time=777000602},
  range_tombstones: {
    {range_tombstone: start={position: clustered, ckp{00040e8c563e0004e1f8219b}, 1}, end={position: clustered, ckp{00040e8c563e}, 1}, {tombstone: timestamp=3330002, deletion_time=777000918}},
    {range_tombstone: start={position: clustered, ckp{00040e8c563e}, 1}, end={position: clustered, ckp{000416eb7874}, -1}, {tombstone: timestamp=3330002, deletion_time=777001113}},
    {range_tombstone: start={position: clustered, ckp{000416eb7874}, -1}, end={position: clustered, ckp{000416eb7874}, 1}, {tombstone: timestamp=3330002, deletion_time=777001120}},
    {range_tombstone: start={position: clustered, ckp{000416eb7874}, 1}, end={position: clustered, ckp{00047750d3ef}, 1}, {tombstone: timestamp=3330002, deletion_time=777001128}},
    {range_tombstone: start={position: clustered, ckp{00047750d3ef}, 1}, end={position: clustered, ckp{000486dd59e5000416eb7874}, -1}, {tombstone: timestamp=3330002, deletion_time=777001121}},
    {range_tombstone: start={position: clustered, ckp{000486dd59e5000416eb7874}, -1}, end={position: clustered, ckp{00048b105fe0000416eb7874}, -1}, {tombstone: timestamp=3330002, deletion_time=777001120}},
    {range_tombstone: start={position: clustered, ckp{00048b105fe0000416eb7874}, -1}, end={position: clustered, ckp{0004e1f8219b}, -1}, {tombstone: timestamp=3330002, deletion_time=777001117}},
    {range_tombstone: start={position: clustered, ckp{0004e1f8219b}, -1}, end={position: clustered, ckp{0004ef4b9e22}, -1}, {tombstone: timestamp=3330002, deletion_time=777001104}},
    {range_tombstone: start={position: clustered, ckp{0004ef4b9e22}, -1}, end={position: clustered, ckp{}, 1}, {tombstone: timestamp=3330002, deletion_time=777001004}}},
  rows: [
  ]
}
}

Usually the printout is 10K+ lines.
The diff:

--- /dev/fd/63	2023-10-09 09:13:21.545162034 -0400
+++ /dev/fd/62	2023-10-09 09:13:21.545162034 -0400
@@ -1,9 +1,9 @@
-test/lib/mutation_assertions.hh(134): fatal error: in "test_concurrent_reads_and_eviction": Mutations differ, expected 
+ ...but got: 
 {table: 'ks.cf', key: {'pk': ef4b9e22, token: -8892763474812993290}, mutation_partition: {
   tombstone: {tombstone: timestamp=3330002, deletion_time=777000602},
   range_tombstones: {
-    {range_tombstone: start={position: clustered, ckp{00040e8c563e0004e1f8219b}, 1}, end={position: clustered, ckp{00040e8c563e0004f32c13c1}, -1}, {tombstone: timestamp=3330002, deletion_time=777000918}},
-    {range_tombstone: start={position: clustered, ckp{00040e8c563e0004f32c13c1}, -1}, end={position: clustered, ckp{000416eb7874}, -1}, {tombstone: timestamp=3330002, deletion_time=777001113}},
+    {range_tombstone: start={position: clustered, ckp{00040e8c563e0004e1f8219b}, 1}, end={position: clustered, ckp{00040e8c563e}, 1}, {tombstone: timestamp=3330002, deletion_time=777000918}},
+    {range_tombstone: start={position: clustered, ckp{00040e8c563e}, 1}, end={position: clustered, ckp{000416eb7874}, -1}, {tombstone: timestamp=3330002, deletion_time=777001113}},
     {range_tombstone: start={position: clustered, ckp{000416eb7874}, -1}, end={position: clustered, ckp{000416eb7874}, 1}, {tombstone: timestamp=3330002, deletion_time=777001120}},
     {range_tombstone: start={position: clustered, ckp{000416eb7874}, 1}, end={position: clustered, ckp{00047750d3ef}, 1}, {tombstone: timestamp=3330002, deletion_time=777001128}},
     {range_tombstone: start={position: clustered, ckp{00047750d3ef}, 1}, end={position: clustered, ckp{000486dd59e5000416eb7874}, -1}, {tombstone: timestamp=3330002, deletion_time=777001121}},

michoecho · 2023-10-09T13:17:00Z

Some observations I've made, they should help:

Commenting out the possibility of reverse reads from the test eliminates the failures. Thus it seems that reverse reads are essential.
The bug seems to be about range tombstones. You can set the clustering row number in random_mutation_generator to 0 and it will still happen.
You can change random_mutation_generator::impl::gen_timestamp() to generate monotonically growing timestamps, and the bug will still happen. This makes it easier to see that some range tombstones are wrongly applied. (By default, with debuggable, the timestamp of all tombstones will almost always be 3330002, which makes some things harder to see).

michoecho · 2023-10-09T13:18:21Z

Wow, I was super lucky and got a reallly small mutation reproducer:

You weren't super lucky, it happens quite often. There is a fairly high chance that the range tombstones will remove all rows and only range tombstones will remain in the compacted output.

denesb · 2023-10-09T13:24:03Z

Thanks for the tips. From what I see, the problem is that the first range tombstone is extended. It should only go as far as 00040e8c563e0004f32c13c1 but it is extended to cover the entire 00040e8c563e prefix.

avikivity · 2023-10-09T15:19:41Z

Seen again: https://jenkins.scylladb.com/view/nexts/job/scylla-master/job/next/6632/

denesb · 2023-10-10T11:23:32Z

After a lot of staring at different results and annotating the generated mutations with metadata, it seems that the bug is that the read will sometimes mix in range tombstones from a later version. Don't know yet how/why this happens.

denesb · 2023-10-11T11:01:02Z

I expted this to be though but I got more than what I bargained for.

I managed to narrow down the problem to prefix keys. I think we are bitten by broken prefix keys compare, in reverse mode.

denesb · 2023-10-13T14:15:50Z

After yet more investigation, both of my above comments are wrong. The bug is not limited to prefix keys and the "wrong" keys do not come from later versions, they "bleed through" from earlier versions. I have manually verified the merging rules in the test and the expected mutation is correct. Which means that we have a bug in cache, where it merges range-tombstones from two versions wrong, sometimes a range tombstone, which is overridden by a later version, is still included.

I still don't understand why we need reverse reads for this to reproduce. The bug itself can happen with forward reads, but only if there are prior reverse reads. Maybe a reverse read mis-populates and then a later read steps on this? I don't yet know.

My 3 main suspicions for the source of the bug are:

switching between reading from cache and underlying
partition snapshot row cursor
cache population logic

Debugging is made extra hard by the fact that despite hard-coding the random seeds, the test has innate randomness due to the timing between the concurrent reads and the concurrent updates/eviction. This means that each time I add more logging, I have to analyze a new case, digging up information from ~50K line of logs.

michoecho · 2023-10-13T14:43:20Z

Debugging is made extra hard by the fact that despite hard-coding the random seeds, the test has innate randomness due to the timing between the concurrent reads and the concurrent updates/eviction. This means that each time I add more logging, I have to analyze a new case, digging up information from ~50K line of logs.

If this is a problem, make the test deterministic. This test doesn't do any IO and doesn't depend on real time, so you only have to ensure that need_preempt() returns the same sequence of decisions on every run.

One way to do it is to run as usual, but write down all the outcomes of need_preempt() to a file. After you get a file from a failed run, make need_preempt() read its decisions from this file, and you will always get the same run.
That's what I did when debugging #12462, and it worked well.

A much easier (but more invasive) way is to make the result of need_preempt() pseudorandom. I.e. do this:

+
+namespace testing {
+    extern thread_local std::default_random_engine local_random_engine;
+}
+
 bool need_preempt() noexcept {
+    return testing::local_random_engine() % 100 == 0;
 #ifndef SEASTAR_DEBUG
     // prevent compiler from eliminating loads in a loop
     std::atomic_signal_fence(std::memory_order_seq_cst);

I just verified that this also works for this test. With this, once you find a failing seed, it will always keep resulting in the same failed run.

denesb · 2023-10-16T08:29:00Z

This seem to have worked (random-seed for need_preempt(), thanks for the tip.

avikivity · 2023-10-16T13:17:16Z

This seem to have worked (random-seed for need_preempt(), thanks for the tip.

Sounds like we should it make it an official and present a command line option, possible available only in non-release modes.

Yes

michoecho · 2023-10-16T14:07:51Z

Sounds like we should it make it an official and present a command line option, possible available only in non-release modes.

It might be useful sometimes, but it also covers only a subset of a general problem.

Non-reproducible tests are evil. In a better world, we would have designed our test frameworks in such a way that all tests can be run with recording, and be replayed when they fail.

It's probably not even that hard to reach. I think it would be sufficient to record and replay the trace of:

All preempt_needed decisions.
The results of all clock reads.
The IDs of all I/O completions and their lengths.
The IDs of timer completions.
The shard IDs of smp message receives.
The wakeups of condition_variables due to SIGTERM and SIGHUP.

Seastar is in a very good position for making all tests determinizable, and it's a shame that we don't make use of it. We could be two weeks of work away from eliminating the problem of non-reproducible test failures altogether.

denesb · 2023-10-16T14:18:32Z

I found a bug. But looks like there might be more.

Found another bug. Hope this is the last.

avikivity · 2023-10-16T14:59:44Z

Sounds like we should it make it an official and present a command line option, possible available only in non-release modes.

It might be useful sometimes, but it also covers only a subset of a general problem.

Non-reproducible tests are evil. In a better world, we would have designed our test frameworks in such a way that all tests can be run with recording, and be replayed when they fail.

It's probably not even that hard to reach. I think it would be sufficient to record and replay the trace of:

All preempt_needed decisions.

The results of all clock reads.

The IDs of all I/O completions and their lengths.

The IDs of timer completions.

The shard IDs of smp message receives.

The wakeups of condition_variables due to SIGTERM and SIGHUP.

Seastar is in a very good position for making all tests determinizable, and it's a shame that we don't make use of it. We could be two weeks of work away from eliminating the problem of non-reproducible test failures altogether.

@bhalevy please find time for it

michoecho · 2023-10-16T15:22:51Z

@bhalevy please find time for it

@bhalevy Maybe I could do this (making Seastar-based tests deterministic/replayable) as a "20% project"?

bhalevy · 2023-10-16T19:28:41Z

@bhalevy please find time for it

@bhalevy Maybe I could do this (making Seastar-based tests deterministic/replayable) as a "20% project"?

Yes, it'd be a very good use of your time. Thanks

michoecho · 2023-10-18T10:08:04Z

I found a bug. But looks like there might be more.

Found another bug. Hope this is the last.

@denesb Don't tease us. What are the bugs? Are they fun?

denesb · 2023-10-18T10:22:14Z

I found a bug. But looks like there might be more.

Found another bug. Hope this is the last.

@denesb Don't tease us. What are the bugs? Are they fun?

I still don't know what the second bug is. I just know it exists.

As for the first one, it is this:

diff --git a/cache_flat_mutation_reader.hh b/cache_flat_mutation_reader.hh
index 8f5fc0a412..6de2fb2cd0 100644
--- a/cache_flat_mutation_reader.hh
+++ b/cache_flat_mutation_reader.hh
@@ -513,7 +513,9 @@ bool cache_flat_mutation_reader::ensure_population_lower_bound() {
             return false;
         }
 
-        if (cmp(cur.position(), _last_row.position()) != 0) {
+        if (cmp(cur.table_position(), _last_row.position()) != 0) {
             return false;
         }

denesb · 2023-10-18T10:24:36Z

The second bug is much more elusive. Each time I'm convinced I found it, it turns out the code at hand is correct and the bug happened at an earlier point, creating bad data.

michoecho · 2023-10-20T12:37:26Z

I found at least one logic bug:

diff --git a/cache_flat_mutation_reader.hh b/cache_flat_mutation_reader.hh
index 30f2f5c503..db93546980 100644
--- a/cache_flat_mutation_reader.hh
+++ b/cache_flat_mutation_reader.hh
@@ -579,12 +579,12 @@ void cache_flat_mutation_reader::maybe_update_continuity() {
                         if (insert_result.second) {
                             clogger.trace("csm {}: L{}: inserted dummy at {}", fmt::ptr(this), __LINE__, insert_result.first->position());
                             _snp->tracker()->insert(*insert_result.first);
+                            clogger.trace("csm {}: set_continuous({}), prev={}, rt={}", fmt::ptr(this), insert_result.first->position(),
+                                          _last_row.position(), _current_tombstone);
+                            insert_result.first->set_continuous(true);
+                            insert_result.first->set_range_tombstone(_current_tombstone);
                         }
-                        clogger.trace("csm {}: set_continuous({}), prev={}, rt={}", fmt::ptr(this), insert_result.first->position(),
-                                      _last_row.position(), _current_tombstone);
-                        insert_result.first->set_continuous(true);
-                        insert_result.first->set_range_tombstone(_current_tombstone);
                         clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), e.position());
                         e.set_continuous(true);
                     });
                 } else {

Explanation: assume that cache contains the following row entries for a partition:

(pos: before 1, continuous: true, rt: ts=0)
(pos: at 1, continuous: false, rt: ts=1)
When a reader attempts to read this partition, it will attempt to read underlying for range (before 1, at 1). This will set _current_tombstone to false, then it will read underlying (without consuming any rows, since the range is obviously empty), then it will call maybe_update_continuity(). maybe_update_continuity() will see that, after finishing the read from underlying, _current_tombstone is null, which is different from the tombstone of _next_row. So it will insert a dummy before next_row and set its rt to _current_tombstone.
But that's invalid, because an entry at this position (before 1) already exists, and is meaningful. Setting its rt to null makes the ts=0 tombstone disappear.

I'm not sure if the fix proposed above is sound, though.

Also, I found a third, unrelated bug (which results in undefined behavior, rather than a test failure):

diff --git a/cache_flat_mutation_reader.hh b/cache_flat_mutation_reader.hh
index 3a71326de4..30f2f5c503 100644
--- a/cache_flat_mutation_reader.hh
+++ b/cache_flat_mutation_reader.hh
@@ -341,7 +342,7 @@ future<> cache_flat_mutation_reader::do_fill_buffer() {
             });
         }
         _state = state::reading_from_underlying;
-        _population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema) && !_read_context.is_reversed();
+        _population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema) && !_read_context.is_reversed() && !_last_row;
         _underlying_upper_bound = _next_row_in_range ? position_in_partition::before_key(_next_row.position())
                                                      : position_in_partition(_upper_bound);
         if (!_read_context.partition_exists()) {

There is a place in the code which assumes that _population_range_starts_before_all_rows implies !_last_row. But this is not necessarily true if there exists a rows_entry with position=before_all_rows. In this case _last_row might be referenced even if invalid, leading e.g. to segfaults or assert violations.

But the test is still failing. I'm trying to find the fourth bug now.

michoecho · 2023-10-23T11:13:37Z

@denesb The rabbit hole goes deep. Other than the bug you've found, I've found 4 bugs so far (I've listed two in the previous comment, the other two are continuity breaks during races in cache population), and it's not over yet. I'm trying to understand the sixth bug now.

michoecho · 2023-10-25T13:15:10Z

After fixing 8 bugs (or 10, depending on how you count), I haven't seen the test fail for 100000 consecutive runs. I'll let it run a million times and then I'll get to properly documenting and submitting the patches.

avikivity · 2023-10-25T14:21:07Z

It points at the thing being way to complicated for our own good.

I don't know what to suggest, apart from flattening MVCC to use the timestamp as a discriminant instead of a version object.

michoecho · 2023-10-26T13:42:59Z

@denesb For the record, my debugging strategy was:

Apply the deterministic preemption patch, set debuggable = true.
Run the test in a loop until it fails, write down the seed.
Find the difference between the expected and the actual output. Write down its keys and timestamps.
Log the full contents of the cache entry before and after each update and read. Using the keys and timestamps, look for the first moment where the difference appears. (Where the wrong row entry appears, or where something supposed to shadow it disappears). Include a log ID (a monotonically rising number) in each such log.
Add the address of each row entry (and its _flags, and its _row_tombstone) to the logs. Write down the addresses of the suspicious row entry (identified in step 4).
Run under GDB until the last good log (via a breakpoint on the log line, conditional on the log ID), then set a watchpoint on the bad row entry, and run until the first bad log, looking at the backtraces of each watchpoint trigger.
When you find the moment where the state becomes bad, you can break there and figure out the cause.

In other words: identify the bad rows_entry, then set watchpoints on it. Watchpoints are indispensable for debugging this.

denesb · 2023-10-26T13:50:45Z

Watch-points are indeed awesome, the hard part is coming up with what is worth watching. Great job!

avikivity · 2023-11-19T17:09:52Z

Possibly reversible debugging can help in these cases.

avikivity · 2023-11-19T17:20:12Z

rr can't support aio or io_uring, so we'll have to add an epoll+threadpool reactor backend that shunts disk I/O through the syscall thread. Shouldn't be hard.

This is a loose collection of fixes to rare row cache bugs flushed out by running test_concurrent_reads_and_eviction several million times. See individual commits for details. Fixes #15483 Closes #15945 * github.com:scylladb/scylladb: partition_version: fix violation of "older versions are evicted first" during schema upgrades cache_flat_mutation_reader: fix a broken iterator validity guarantee in ensure_population_lower_bound() cache_flat_mutation_reader: fix a continuity loss in maybe_update_continuity() cache_flat_mutation_reader: fix continuity losses during cache population races with reverse reads partition_snapshot_row_cursor: fix a continuity loss in ensure_entry_in_latest() with reverse reads cache_flat_mutation_reader: fix some cache mispopulations with reverse reads cache_flat_mutation_reader: fix a logic bug in ensure_population_lower_bound() with reverse reads cache_flat_mutation_reader: never make an unlinked last dummy continuous (cherry picked from commit 6bcf3ac)

denesb · 2024-01-15T14:48:23Z

Backported to 5.4, 5.2 is not affected.

denesb · 2024-01-15T14:49:00Z

5.2 is not affected

@michoecho can you double check this please? AFAIK the bugs fixed here were introduced by the v2 migration. Did I miss something?

michoecho · 2024-01-15T16:10:19Z

5.2 is not affected

@michoecho can you double check this please? AFAIK the bugs fixed here were introduced by the v2 migration. Did I miss something?

That's hard to say. Most of the affected code has greatly changed between 5.2 and 5.4, so it's hard to see which bugs could have been also present in 5.2.

But, for example, I think that 47299d6 also applies to 5.2.

denesb · 2024-01-15T16:38:17Z

But, for example, I think that 47299d6 also applies to 5.2.

Please create a 5.2 backport PR with the patches you think apply to it.

bhalevy assigned michoecho Sep 20, 2023

bhalevy added area/row cache P2 High Priority symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework labels Sep 20, 2023

bhalevy mentioned this issue Sep 20, 2023

Validate compaction strategy options in prepare #15091

Merged

mykaul added this to the 5.4 milestone Sep 27, 2023

avikivity added P1 Urgent and removed P2 High Priority labels Oct 9, 2023

denesb self-assigned this Oct 9, 2023

avikivity mentioned this issue Oct 9, 2023

boost.row_cache_test.test_concurrent_reads_and_eviction is flaky #15676

Closed

michoecho mentioned this issue Nov 3, 2023

Fix a few rare bugs in row cache #15945

Merged

mykaul modified the milestones: 5.4, 6.0 Nov 6, 2023

scylladb-promoter closed this as completed in 6bcf3ac Nov 17, 2023

scylladb-promoter added the Backport candidate label Nov 17, 2023

denesb removed the Backport candidate label Jan 15, 2024

denesb added the Backport candidate label Jan 15, 2024

test_concurrent_reads_and_eviction: Mutation read doesn't match any expected version #15483

test_concurrent_reads_and_eviction: Mutation read doesn't match any expected version #15483

Comments

bhalevy commented Sep 20, 2023

michoecho commented Sep 27, 2023

bhalevy commented Sep 28, 2023

denesb commented Sep 28, 2023

avikivity commented Oct 9, 2023

denesb commented Oct 9, 2023

michoecho commented Oct 9, 2023 • edited

denesb commented Oct 9, 2023

denesb commented Oct 9, 2023

michoecho commented Oct 9, 2023

denesb commented Oct 9, 2023

denesb commented Oct 9, 2023 • edited

michoecho commented Oct 9, 2023 • edited

michoecho commented Oct 9, 2023

denesb commented Oct 9, 2023

avikivity commented Oct 9, 2023

denesb commented Oct 10, 2023

denesb commented Oct 11, 2023

denesb commented Oct 13, 2023

michoecho commented Oct 13, 2023 • edited

denesb commented Oct 16, 2023

avikivity commented Oct 16, 2023

michoecho commented Oct 16, 2023 • edited

denesb commented Oct 16, 2023

avikivity commented Oct 16, 2023

michoecho commented Oct 16, 2023 • edited

bhalevy commented Oct 16, 2023

michoecho commented Oct 18, 2023

denesb commented Oct 18, 2023

denesb commented Oct 18, 2023

michoecho commented Oct 20, 2023 • edited

michoecho commented Oct 23, 2023

michoecho commented Oct 25, 2023

avikivity commented Oct 25, 2023

michoecho commented Oct 26, 2023

denesb commented Oct 26, 2023

avikivity commented Nov 19, 2023

avikivity commented Nov 19, 2023

denesb commented Jan 15, 2024

denesb commented Jan 15, 2024

michoecho commented Jan 15, 2024

denesb commented Jan 15, 2024 • edited

michoecho commented Oct 9, 2023 •

edited

denesb commented Oct 9, 2023 •

edited

michoecho commented Oct 9, 2023 •

edited

michoecho commented Oct 13, 2023 •

edited

michoecho commented Oct 16, 2023 •

edited

michoecho commented Oct 16, 2023 •

edited

michoecho commented Oct 20, 2023 •

edited

denesb commented Jan 15, 2024 •

edited