Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception during add_dc_during_mv_update_test dtest: scylla: read_context.hh:108: utils::phased_barrier::phase_type cache::autoupdating_underlying_reader::creation_phase() const: Assertion `_reader' failed. #4236

Closed
bhalevy opened this issue Feb 17, 2019 · 12 comments

Comments

@bhalevy
Copy link
Contributor

commented Feb 17, 2019

Seen in 33/artifact/logs-release.2/1550376490278_materialized_views_test.TestMaterializedViews.add_dc_during_mv_update_test/node6.log
33/artifact/logs-release.2/1550378153341_materialized_views_test.TestMaterializedViews.add_dc_during_mv_update_test/node5.log and
33/artifact/logs-release.2/1550378526267_materialized_views_test.TestMaterializedViews.add_dc_during_mv_update_test/node5.log:

scylla: read_context.hh:108: utils::phased_barrier::phase_type cache::autoupdating_underlying_reader::creation_phase() const: Assertion `_reader' failed.
Aborting on shard 0.
Backtrace:
  0x000000000419a5c2
  0x0000000004093235
  0x0000000004093535
  0x00000000040935e3
  0x00007f183c2e802f
  /lib64/libc.so.6+0x000000000003853e
  /lib64/libc.so.6+0x0000000000022894
  /lib64/libc.so.6+0x0000000000022768
  /lib64/libc.so.6+0x00000000000309f5
  0x0000000000ff268a
  0x000000000100d9da
  0x0000000001020fb5
  0x000000000102270c
  0x0000000004090611
  0x000000000409080e
  0x000000000416739d
  0x0000000003ffcb55
  0x00000000009f6451
  /lib64/libc.so.6+0x0000000000024412
  0x0000000000a5569d

seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > seastar::futurize<seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > >::apply<range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}, seastar::optimized_optional<mutation_fragment> >(range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}&&, std::tuple<seastar::optimized_optional<mutation_fragment> >&&) at crtstuff.c:?
seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > seastar::future<seastar::optimized_optional<mutation_fragment> >::then_impl<range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}, seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > >(range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}&&) at crtstuff.c:?
scanning_and_populating_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#2}::operator()() const at crtstuff.c:?
seastar::internal::do_until_state<scanning_and_populating_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#1}, scanning_and_populating_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#2}>::run_and_dispose() at crtstuff.c:?

@bhalevy bhalevy added the dtest label Feb 17, 2019

@bhalevy

This comment has been minimized.

Copy link
Contributor Author

commented Feb 17, 2019

@duarten can this be related to 03531c2 ?

@duarten

This comment has been minimized.

Copy link
Contributor

commented Feb 17, 2019

@bhalevy Unlikely, since the cache doesn't use fragmented buffers afaik.

@bhalevy

This comment has been minimized.

Copy link
Contributor Author

commented Feb 17, 2019

@tgrabiec can you please look into this?

@tgrabiec tgrabiec self-assigned this Feb 18, 2019

@tgrabiec

This comment has been minimized.

Copy link
Contributor

commented Feb 18, 2019

Found the coredump. But debug information seems bogus:

(gdb) bt
#0  0x00007f183bbec53f in raise () from /lib64/libc.so.6
#1  0x00007f183bbd695e in abort () from /lib64/libc.so.6
#2  0x00007f183bbd6769 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007f183bbe49f6 in __assert_fail () from /lib64/libc.so.6
#4  0x0000000000ff268b in seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > seastar::futurize<seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > >::apply<range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}, seastar::optimized_optional<mutation_fragment> >(range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}&&, std::tuple<seastar::optimized_optional<mutation_fragment> >&&) ()
#5  0x000000000100d9db in seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > seastar::future<seastar::optimized_optional<mutation_fragment> >::then_impl<range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}, seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > >(range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}&&) ()
#6  0x0000000001020fb6 in scanning_and_populating_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#2}::operator()() const ()
#7  0x000000000102270d in seastar::internal::do_until_state<scanning_and_populating_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#1}, scanning_and_populating_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#2}>::run_and_dispose() ()
#8  0x0000000004090612 in seastar::reactor::run_tasks (
    this=this@entry=0x600000020000, tq=...) at ../../src/core/reactor.cc:3541
#9  0x000000000409080f in seastar::reactor::run_some_tasks (
    this=this@entry=0x600000020000) at ../../src/core/reactor.cc:3966
#10 0x000000000416739e in seastar::reactor::run_some_tasks (
    this=0x600000020000) at ../../src/core/reactor.cc:4109
#11 seastar::reactor::run (this=0x600000020000)
    at ../../src/core/reactor.cc:4109
#12 0x0000000003ffcb56 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) (this=<optimized out>, ac=<optimized out>, 
    av=<optimized out>, func=...) at ../../include/seastar/core/reactor.hh:814
#13 0x00000000009f6452 in main ()
    at /usr/include/c++/8/bits/std_function.h:257

(gdb) select-frame 8
(gdb) p tsk
$1 = <optimized out>

(gdb) p (scanning_and_populating_reader*) 0x60000291d400
No symbol "scanning_and_populating_reader" in current context.

(gdb) scylla ptr 0x60000291d400
Python Exception <class 'gdb.error'> No symbol "logalloc::segment" in current context.: 
Error occurred in Python command: No symbol "logalloc::segment" in current context.

The scylla binary seems to match:

./scylla: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=3d55441c4c39fadba7dd1d95cf27063e13a0ead0, with debug_info, not stripped

eu-unstrip -n --core ./core.scylla.800.1a8ccffdd2c6463e81f070047bd26c85.17639.1550375889000000 | grep scylla
0x400000+0x4e60000 3d55441c4c39fadba7dd1d95cf27063e13a0ead0@0x4002c8 - - /jenkins/workspace/scylla-master/dtest-release/scylla/.dtest/dtest-QIDwYk/test/node6/bin/scylla
@bhalevy

This comment has been minimized.

Copy link
Contributor Author

commented Feb 19, 2019

@avikivity looks like your change in scylladb/scylla-pkg@a1e9a66 could be the culprit for the lack of full debug info in scylla.

@tgrabiec

This comment has been minimized.

Copy link
Contributor

commented Feb 19, 2019

Unable to reproduce locally using the same binary.

@slivne

This comment has been minimized.

Copy link
Contributor

commented Feb 21, 2019

@bhalevy - I think avi made a change in dtest not to focre the -oG flag - are we still hitting this - is there a new log ?

@slivne slivne added this to the 3.1 milestone Feb 21, 2019

@bhalevy

This comment has been minimized.

Copy link
Contributor Author

commented Feb 21, 2019

@slivne not yet. The test is passing and we're waiting for scylladb/scylla-pkg@6ddca03 to keep all logs.

@nyh

This comment has been minimized.

Copy link
Contributor

commented Mar 6, 2019

FYI I saw the same error in a manual run with a cassandra-stress loader writing to a table with one materialized view (I don't know if the materialized view has anything to do with it, but I'm just describing what I did). I had two out of 4 nodes crashing on this. See https://groups.google.com/a/scylladb.com/d/msg/mv-dev/RYXV4eCXdNw/RatSGASRBwAJ

I'll see if I can reproduce this and come up with any more information...

@tgrabiec

This comment has been minimized.

Copy link
Contributor

commented Mar 7, 2019

@nyh I can help debugging if you get a core dump

@psarna

This comment has been minimized.

Copy link
Member

commented Mar 11, 2019

@tgrabiec I managed to accidentally reproduce it locally with materialized_views_test.TestMaterializedViews.remove_node_during_mv_insert_4_nodes_test.
gdb bt:

#0  0x00007f3c3aa3b53f in raise () from /lib64/libc.so.6
#1  0x00007f3c3aa2595e in abort () from /lib64/libc.so.6
#2  0x00007f3c3aa25769 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007f3c3aa339f6 in __assert_fail () from /lib64/libc.so.6
#4  0x0000000000603ca5 in range_populating_reader::handle_end_of_stream() ()
#5  0x000000000060936c in auto range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}::operator()<seastar::optimized_optional<mutation_fragment> >(seastar::optimized_optional<mutation_fragment>&&) ()
#6  0x000000000060996b in seastar::noncopyable_function<seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > (seastar::optimized_optional<mutation_fragment>&&)>::direct_vtable_for<seastar::future<seastar::optimized_optional<mutation_fragment> >::then<range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}, seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > >(range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}&&)::{lambda(seastar::optimized_optional<mutation_fragment>&&)#1}>::call(seastar::noncopyable_function<seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > (seastar::optimized_optional<mutation_fragment>&&)> const*, seastar::optimized_optional<mutation_fragment>&&) ()
#7  0x0000000000610137 in seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > seastar::future<seastar::optimized_optional<mutation_fragment> >::then_impl<seastar::noncopyable_function<seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > (seastar::optimized_optional<mutation_fragment>&&)>, seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > >(seastar::noncopyable_function<seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > (seastar::optimized_optional<mutation_fragment>&&)>&&) ()
#8  0x00000000006140ee in scanning_and_populating_reader::read_from_secondary(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) ()
#9  0x0000000000614371 in scanning_and_populating_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#2}::operator()() const ()
#10 0x00000000006148f7 in seastar::internal::do_until_state<scanning_and_populating_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#1}, scanning_and_populating_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#2}>::run_and_dispose() ()
#11 0x000000000163bbca in seastar::reactor::run_tasks(seastar::reactor::task_queue&) ()
#12 0x0000000001640654 in seastar::reactor::run_some_tasks() ()
#13 0x000000000166ca05 in seastar::reactor::run() ()
#14 0x000000000160a1e2 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) ()
#15 0x0000000000446e0f in main ()

binary+core: http://scratch.scylladb.com/sarna/debug.tar.gz

@bhalevy

This comment has been minimized.

Copy link
Contributor Author

commented Mar 17, 2019

Seen in dtest-release/59/artifact/logs-release.2/1552737387603_materialized_views_test.TestMaterializedViews.remove_node_during_mv_insert_4_nodes_test/node4.log:

#0  0x00007f7d33e7f53f in raise () from /lib64/libc.so.6
#1  0x00007f7d33e6995e in abort () from /lib64/libc.so.6
#2  0x00007f7d33e69769 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007f7d33e779f6 in __assert_fail () from /lib64/libc.so.6
#4  0x0000000000fd176d in cache::autoupdating_underlying_reader::creation_phase (this=<optimized out>) at dht/i_partitioner.hh:615
#5  range_populating_reader::can_set_continuity (this=<optimized out>) at row_cache.cc:488
#6  range_populating_reader::can_set_continuity (this=<optimized out>) at row_cache.cc:487
#7  range_populating_reader::handle_end_of_stream (this=<optimized out>) at row_cache.cc:491
#8  range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}::operator()<seastar::optimized_optional<mutation_fragment> >(seastar::optimized_optional<mutation_fragment>&&) (
    mfopt=..., __closure=<optimized out>) at row_cache.cc:528
#9  seastar::apply_helper<range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}, std::tuple<seastar::optimized_optional<mutation_fragment> >&&, std::integer_sequence<unsigned long, 0ul> >::apply({lambda(auto:1&&)#1}&&, std::tuple<seastar::optimized_optional<mutation_fragment> >) (args=..., func=...) at /jenkins/workspace/scylla-master/dtest-release/scylla/seastar/include/seastar/core/apply.hh:35
#10 seastar::apply<range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}, seastar::optimized_optional<mutation_fragment> >(range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}&&, std::tuple<seastar::optimized_optional<mutation_fragment> >&&) (args=..., func=...)
    at /jenkins/workspace/scylla-master/dtest-release/scylla/seastar/include/seastar/core/apply.hh:43
#11 seastar::futurize<seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > >::apply<range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}, seastar::optimized_optional<mutation_fragment> >(range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}&&, std::tuple<seastar::optimized_optional<mutation_fragment> >&&) (func=..., args=...) at /jenkins/workspace/scylla-master/dtest-release/scylla/seastar/include/seastar/core/future.hh:1478
#12 0x0000000000ff033b in seastar::future<seastar::optimized_optional<mutation_fragment> >::then_impl<range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}, seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > >(range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}&&) (
    this=this@entry=0x7f7d333f0d90, func=...) at /usr/include/c++/8/bits/move.h:74
#13 0x0000000001002417 in seastar::future<seastar::optimized_optional<mutation_fragment> >::then<range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}, seastar::future<seastar::optimized_optional<flat_mutation_reader>, seastar::optimized_optional<mutation_fragment> > >(range_populating_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(auto:1&&)#1}&&) (func=..., 
    this=0x7f7d333f0d90) at /jenkins/workspace/scylla-master/dtest-release/scylla/seastar/include/seastar/core/future.hh:983
#14 range_populating_reader::operator() (timeout=..., this=0x6010022030e0) at row_cache.cc:551
#15 scanning_and_populating_reader::read_from_secondary (timeout=..., this=0x601002203000) at row_cache.cc:650
#16 scanning_and_populating_reader::read_next_partition (timeout=..., this=0x601002203000) at row_cache.cc:663
#17 scanning_and_populating_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#2}::operator()() const (__closure=<optimized out>) at row_cache.cc:689
#18 0x0000000001002e2d in seastar::internal::do_until_state<scanning_and_populating_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#1}, scanning_and_populating_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#2}>::run_and_dispose() (this=0x60100282a5a0) at /jenkins/workspace/scylla-master/dtest-release/scylla/seastar/include/seastar/core/task.hh:33
#19 0x0000000004149712 in seastar::reactor::run_tasks (this=this@entry=0x601000020000, tq=...) at ../../src/core/reactor.cc:3581
#20 0x000000000414990f in seastar::reactor::run_some_tasks (this=this@entry=0x601000020000) at ../../src/core/reactor.cc:4006
#21 0x000000000421ecc6 in seastar::reactor::run_some_tasks (this=0x601000020000) at ../../src/core/reactor.cc:4149
#22 seastar::reactor::run (this=0x601000020000) at ../../src/core/reactor.cc:4149
#23 0x0000000004245cfc in seastar::smp::<lambda()>::operator()(void) const (__closure=<optimized out>) at ../../include/seastar/core/reactor.hh:814
#24 0x00000000041192ee in std::function<void ()>::operator()() const (this=<optimized out>) at /usr/include/c++/8/bits/std_function.h:682
#25 seastar::posix_thread::start_routine (arg=<optimized out>) at ../../src/core/posix.cc:52
#26 0x00007f7d3456e58e in start_thread () from /lib64/libpthread.so.0
#27 0x00007f7d33f446a3 in clone () from /lib64/libc.so.6

Binary: http://jenkins.cloudius-systems.com:8080/view/master/job/scylla-master/job/dtest-release/59/artifact/logs-release.2/scylla
Core: godzilla:/jenkins/corefiles/dtest-release-59/scylla.6214.core.lz4

scylla version f4f5602
seastar version scylladb/seastar@463d24e

@tgrabiec tgrabiec added the bug label Mar 20, 2019

avikivity added a commit that referenced this issue Mar 21, 2019

row_cache: Fix abort in cache populating read concurrent with memtabl…
…e flush

When we're populating a partition range and the population range ends
with a partition key (not a token) which is present in sstables and
there was a concurrent memtable flush, we would abort on the following
assert in cache::autoupdating_underlying_reader:

     utils::phased_barrier::phase_type creation_phase() const {
         assert(_reader);
         return _reader_creation_phase;
     }

That's because autoupdating_underlying_reader::move_to_next_partition()
clears the _reader field when it tries to recreate a reader but it finds
the new range to be empty:

         if (!_reader || _reader_creation_phase != phase) {
            if (_last_key) {
                auto cmp = dht::ring_position_comparator(*_cache._schema);
                auto&& new_range = _range.split_after(*_last_key, cmp);
                if (!new_range) {
                    _reader = {};
                    return make_ready_future<mutation_fragment_opt>();
                }

Fix by not asserting on _reader. creation_phase() will now be
meaningful even after we clear the _reader. The meaning of
creation_phase() is now "the phase in which the reader was last
created or 0", which makes it valid in more cases than before.

If the reader was never created we will return 0, which is smaller
than any phase returned by cache::phase_of(), since cache starts from
phase 1. This shouldn't affect current behavior, since we'd abort() if
called for this case, it just makes the value more appropriate for the
new semantics.

Tests:

  - unit.row_cache_test (debug)

Fixes #4236
Message-Id: <1553107389-16214-1-git-send-email-tgrabiec@scylladb.com>

avikivity added a commit that referenced this issue Mar 21, 2019

row_cache: Fix abort in cache populating read concurrent with memtabl…
…e flush

When we're populating a partition range and the population range ends
with a partition key (not a token) which is present in sstables and
there was a concurrent memtable flush, we would abort on the following
assert in cache::autoupdating_underlying_reader:

     utils::phased_barrier::phase_type creation_phase() const {
         assert(_reader);
         return _reader_creation_phase;
     }

That's because autoupdating_underlying_reader::move_to_next_partition()
clears the _reader field when it tries to recreate a reader but it finds
the new range to be empty:

         if (!_reader || _reader_creation_phase != phase) {
            if (_last_key) {
                auto cmp = dht::ring_position_comparator(*_cache._schema);
                auto&& new_range = _range.split_after(*_last_key, cmp);
                if (!new_range) {
                    _reader = {};
                    return make_ready_future<mutation_fragment_opt>();
                }

Fix by not asserting on _reader. creation_phase() will now be
meaningful even after we clear the _reader. The meaning of
creation_phase() is now "the phase in which the reader was last
created or 0", which makes it valid in more cases than before.

If the reader was never created we will return 0, which is smaller
than any phase returned by cache::phase_of(), since cache starts from
phase 1. This shouldn't affect current behavior, since we'd abort() if
called for this case, it just makes the value more appropriate for the
new semantics.

Tests:

  - unit.row_cache_test (debug)

Fixes #4236
Message-Id: <1553107389-16214-1-git-send-email-tgrabiec@scylladb.com>

duarten added a commit that referenced this issue Mar 22, 2019

row_cache: Fix abort in cache populating read concurrent with memtabl…
…e flush

When we're populating a partition range and the population range ends
with a partition key (not a token) which is present in sstables and
there was a concurrent memtable flush, we would abort on the following
assert in cache::autoupdating_underlying_reader:

     utils::phased_barrier::phase_type creation_phase() const {
         assert(_reader);
         return _reader_creation_phase;
     }

That's because autoupdating_underlying_reader::move_to_next_partition()
clears the _reader field when it tries to recreate a reader but it finds
the new range to be empty:

         if (!_reader || _reader_creation_phase != phase) {
            if (_last_key) {
                auto cmp = dht::ring_position_comparator(*_cache._schema);
                auto&& new_range = _range.split_after(*_last_key, cmp);
                if (!new_range) {
                    _reader = {};
                    return make_ready_future<mutation_fragment_opt>();
                }

Fix by not asserting on _reader. creation_phase() will now be
meaningful even after we clear the _reader. The meaning of
creation_phase() is now "the phase in which the reader was last
created or 0", which makes it valid in more cases than before.

If the reader was never created we will return 0, which is smaller
than any phase returned by cache::phase_of(), since cache starts from
phase 1. This shouldn't affect current behavior, since we'd abort() if
called for this case, it just makes the value more appropriate for the
new semantics.

Tests:

  - unit.row_cache_test (debug)

Fixes #4236
Message-Id: <1553107389-16214-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 69775c5)

duarten added a commit that referenced this issue Mar 22, 2019

row_cache: Fix abort in cache populating read concurrent with memtabl…
…e flush

When we're populating a partition range and the population range ends
with a partition key (not a token) which is present in sstables and
there was a concurrent memtable flush, we would abort on the following
assert in cache::autoupdating_underlying_reader:

     utils::phased_barrier::phase_type creation_phase() const {
         assert(_reader);
         return _reader_creation_phase;
     }

That's because autoupdating_underlying_reader::move_to_next_partition()
clears the _reader field when it tries to recreate a reader but it finds
the new range to be empty:

         if (!_reader || _reader_creation_phase != phase) {
            if (_last_key) {
                auto cmp = dht::ring_position_comparator(*_cache._schema);
                auto&& new_range = _range.split_after(*_last_key, cmp);
                if (!new_range) {
                    _reader = {};
                    return make_ready_future<mutation_fragment_opt>();
                }

Fix by not asserting on _reader. creation_phase() will now be
meaningful even after we clear the _reader. The meaning of
creation_phase() is now "the phase in which the reader was last
created or 0", which makes it valid in more cases than before.

If the reader was never created we will return 0, which is smaller
than any phase returned by cache::phase_of(), since cache starts from
phase 1. This shouldn't affect current behavior, since we'd abort() if
called for this case, it just makes the value more appropriate for the
new semantics.

Tests:

  - unit.row_cache_test (debug)

Fixes #4236
Message-Id: <1553107389-16214-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 69775c5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.