Node crashlooping in `continuous_data_consumer`. #6486

alienth · 2020-05-20T17:27:47Z

This is Scylla's bug tracker, to be used for reporting bugs only.
If you have a question about Scylla, and not a bug, please ask it in
our mailing-list at scylladb-dev@googlegroups.com or in our slack channel.

I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.

Installation details
Scylla version (or git commit hash): 3.3.1, 6f939ff
Cluster size: 9
OS (RHEL/CentOS/Ubuntu/AWS AMI): Ubuntu

Encountered the following crash on Scylla. Not sure if it matters, but this ring is primarily serving Thrift requests.

May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]: scylla: sstables/consumer.hh:469: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed.
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]: Aborting on shard 5.
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]: Backtrace:
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000002a5ed82
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000002a058fb
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000002a05bf5
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000002a05c90
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x00007fcc5087fb1f
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   /opt/scylladb/libreloc/libc.so.6+0x000000000003c624
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   /opt/scylladb/libreloc/libc.so.6+0x00000000000258d8
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   /opt/scylladb/libreloc/libc.so.6+0x00000000000257a8
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   /opt/scylladb/libreloc/libc.so.6+0x0000000000034a65
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000000e5502f
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000000ea9345
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000000e55139
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000000eaac7c
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000000e5a3e4
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000002a01ba1
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000002a01daf
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000002a42575
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000002a511cc
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x00000000029dc43d
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   /opt/scylladb/libreloc/libpthread.so.0+0x00000000000094e1
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   /opt/scylladb/libreloc/libc.so.6+0x0000000000101692

No core dump was produced.

When restarting the node, it crashes again, so guessing there is some sstable triggering this.

Seems like this might have a similar symptoms to #4315?

The text was updated successfully, but these errors were encountered:

alienth · 2020-05-20T17:41:14Z

Just tested: Wiping out the data in the node results in the crashloop ceasing. Going to try and binary search to find the culprit.

alienth · 2020-05-20T17:51:03Z

Narrowed the culprit CF down via binary search. Very curiously, it's a CF with only one small (1MB) sstable. That sstable dumps just fine with sstabledump, so nothing seems obviously corrupt about it.

Going to copy that single sstable to a one-node ring and see if the breakage follows.

alienth · 2020-05-20T18:11:54Z

Curiously, scylla logged that it dumped its core, yet /var/lib/scylla/coredump is empty 🤔 .

May 19 23:17:57 s-0cab66fc8c7c3048e systemd-coredump[8849]: Process 8809 (scylla) of user 113 dumped core.

slivne · 2020-05-20T18:49:01Z

@alienth

can you please share cat /proc/sys/kernel/core_pattern

alienth · 2020-05-20T18:56:32Z

@slivne

$ cat /proc/sys/kernel/core_pattern
|/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %e

Verified that /var/lib/systemd/coredump is also empty.

slivne · 2020-05-20T19:14:10Z

lets try to force the setup to be according to scylla preference

 sudo scylla_coredump_setup --dump-to-raiddir

and then run

ls -lstra /var/lib/systemd/coredump

alienth · 2020-05-20T19:14:41Z

So, I copied the CF that was causing the trouble to a one-node ring, and wasn't able to repro there. Going to try and repro again elsewhere. I think the reason for the missing coredump is insufficient space to store it on the root volume.

slivne · 2020-05-20T19:20:18Z

@alienth the info to setup for scylla I listed above (#6486 (comment))

alienth · 2020-05-20T19:26:55Z

I'm able to repro again, but still no coredump (even when running the coredump setup that symlinks). Going to just force the kernel to coredump to non-systemd.

slivne · 2020-05-20T19:27:14Z

so its clear once it setup correctly you should see something like

shlomi@sl-ubuntu-1:~$ sudo scylla_coredump_setup --dump-to-raiddir
Reading package lists... Done
Building dependency tree       
Reading state information... Done
systemd-coredump is already the newest version (229-4ubuntu21.28).
The following package was automatically installed and is no longer required:
  grub-pc-bin
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 32 not upgraded.
kernel.core_pattern = |/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %e
shlomi@sl-ubuntu-1:~$ 
shlomi@sl-ubuntu-1:~$ cat /proc/sys/kernel/core_pattern
|/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %e
shlomi@sl-ubuntu-1:~$ ls -lstra /lib/systemd/systemd-coredump
116 -rwxr-xr-x 1 root root 117448 Apr 28 13:03 /lib/systemd/systemd-coredump
shlomi@sl-ubuntu-1:~$ kill -ABRT $$
Aborted (core dumped)
shlomi@sl-ubuntu-1:~$ ls -lstra /var//lib/scylla/coredump/
total 4376
   4 drwxr-xr-x  7 scylla scylla    4096 May 20 19:08 ..
2180 -rw-r-----+ 1 root   root   2228224 May 20 19:12 core.bash.1040.134f829743704cc98a9eeb40c6843365.8577.1590001973000000000000
2188 -rw-r-----+ 1 root   root   2236416 May 20 19:26 core.bash.1040.134f829743704cc98a9eeb40c6843365.8459.1590002791000000000000
   4 drwxr-xr-x  2 scylla scylla    4096 May 20 19:26 .

(the last part is validation cores are created correctly)

alienth · 2020-05-20T19:31:57Z

Yea, everything validates yet I still can't get any cores generated. I also tried

echo '/var/lib/scylla/coredump/core.%e.%p.%h.%t' > /proc/sys/kernel/core_pattern

to no avail. Something is hosed with coredumps on this box, I s'pose? Continuing to dig.

alienth · 2020-05-20T19:43:33Z

Ah, default system ulimit on coredump was hampering me :P. Getting the coredump now.

slivne · 2020-05-20T20:49:50Z

@alienth great (one small step ...)

if you can please provide also the schema from that specific node and also the specific sstables if you can share them

to upload the info please follow https://docs.scylladb.com/troubleshooting/report_scylla_problem/#send-files-to-scylladb-support

alienth · 2020-05-22T18:25:56Z

A new node started crashing today on the same CF, so seems like this is definitely some data corruption bug that is somehow unique to that CF. It's a thrift-built CF, so no special schema. Unfortunately I can't upload the sstables without legal clearance, so going to try and troubleshoot myself.

I've resolved the backtrace, which can be found here: https://gist.github.com/alienth/2f3d0d486255f432ae2778a2cd510028

slivne · 2020-05-24T19:59:59Z

backterace

acktrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:56
seastar::backtrace_buffer::append_backtrace() at ./build/release/seastar/./seastar/src/core/reactor.cc:721
 (inlined by) print_with_backtrace at ./build/release/seastar/./seastar/src/core/reactor.cc:742
seastar::print_with_backtrace(char const*) at ./build/release/seastar/./seastar/src/core/reactor.cc:749
seastar::install_oneshot_signal_handler<6, &seastar::sigabrt_action>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) at ./build/release/seastar/./seastar/src/core/reactor.cc:3424
 (inlined by) operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:3406
 (inlined by) _FUN at ./build/release/seastar/./seastar/src/core/reactor.cc:3402
?? ??:0
/opt/scylladb/libreloc/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4cedb55f04630f72406978c5711657a2f7bae224, for GNU/Linux 3.2.0, stripped

gsignal at ??:?
abort at ??:?
?? ??:0
__assert_fail at ??:?
data_consumer::continuous_data_consumer<sstables::data_consume_rows_context_m>::fast_forward_to(unsigned long, unsigned long) at ./sstables/consumer.hh:469
 (inlined by) data_consumer::continuous_data_consumer<sstables::data_consume_rows_context_m>::skip_to(unsigned long) at ./sstables/consumer.hh:477
 (inlined by) sstables::data_consume_context<sstables::data_consume_rows_context_m>::skip_to(sstables::indexable_element, unsigned long) at ./sstables/data_consume_context.hh:116
 (inlined by) sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const at ./sstables/partition.cc:362
 (inlined by) seastar::apply_helper<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}, std::tuple<>&&, std::integer_sequence<unsigned long> >::apply({lambda()#2}&&, std::tuple) at ././seastar/include/seastar/core/apply.hh:36
 (inlined by) auto seastar::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}&&, std::tuple<>&&) at ././seastar/include/seastar/core/apply.hh:44
 (inlined by) seastar::future<> seastar::futurize<seastar::future<> >::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}&&, std::tuple<>&&) at ././seastar/include/seastar/core/future.hh:1645
 (inlined by) seastar::future<> seastar::future<>::then_impl<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}, seastar::future<> >(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}&&) at ././seastar/include/seastar/core/future.hh:1178
 (inlined by) seastar::future<> seastar::future<>::then<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}, seastar::future<> >(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}&&) at ././seastar/include/seastar/core/future.hh:1159
 (inlined by) sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const at ./sstables/partition.cc:363
 (inlined by) seastar::apply_helper<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}, std::tuple<>&&, std::integer_sequence<unsigned long> >::apply({lambda()#2}&&, std::tuple<>) at ././seastar/include/seastar/core/apply.hh:36
 (inlined by) auto seastar::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}&&, std::tuple<>&&) at ././seastar/include/seastar/core/apply.hh:44
 (inlined by) seastar::future<> seastar::futurize<seastar::future<> >::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}&&, std::tuple<>&&) at ././seastar/include/seastar/core/future.hh:1645
seastar::future<> seastar::future<>::then_impl<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}, seastar::future<> >(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}&&) at ././seastar/include/seastar/core/future.hh:1178
 (inlined by) seastar::future<> seastar::future<>::then<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}, seastar::future<> >(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}&&) at ././seastar/include/seastar/core/future.hh:1159
 (inlined by) sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>) at ./sstables/partition.cc:364
 (inlined by) sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./sstables/partition.cc:464
flat_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./flat_mutation_reader.hh:473
 (inlined by) operator() at ./mutation_reader.cc:562
 (inlined by) apply<mutation_reader_merger::fast_forward_to(position_range, seastar::lowres_clock::time_point)::<lambda(mutation_reader_merger::reader_and_last_fragment_kind)>, mutation_reader_merger::reader_and_last_fragment_kind&> at ././seastar/include/seastar/core/future.hh:1655
 (inlined by) futurize_apply<mutation_reader_merger::fast_forward_to(position_range, seastar::lowres_clock::time_point)::<lambda(mutation_reader_merger::reader_and_last_fragment_kind)>, mutation_reader_merger::reader_and_last_fragment_kind&> at ././seastar/include/seastar/core/future.hh:1733
 (inlined by) parallel_for_each<mutation_reader_merger::reader_and_last_fragment_kind*, mutation_reader_merger::fast_forward_to(position_range, seastar::lowres_clock::time_point)::<lambda(mutation_reader_merger::reader_and_last_fragment_kind)> > at ././seastar/include/seastar/core/future-util.hh:156
 (inlined by) parallel_for_each<utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4>&, mutation_reader_merger::fast_forward_to(position_range, seastar::lowres_clock::time_point)::<lambda(mutation_reader_merger::reader_and_last_fragment_kind)> > at ././seastar/include/seastar/core/future-util.hh:196
 (inlined by) mutation_reader_merger::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./mutation_reader.cc:563
mutation_fragment_merger<mutation_reader_merger>::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./mutation_reader.cc:132
 (inlined by) combined_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./mutation_reader.cc:623
flat_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./flat_mutation_reader.hh:473
 (inlined by) restricting_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(flat_mutation_reader&)#1}::operator()(flat_mutation_reader&) at ./mutation_reader.cc:756
 (inlined by) decltype(auto) restricting_mutation_reader::with_reader<restricting_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(flat_mutation_reader&)#1}>(restricting_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(flat_mutation_reader&)#1}, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./mutation_reader.cc:702
 (inlined by) restricting_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./mutation_reader.cc:757
flat_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./flat_mutation_reader.hh:473
 (inlined by) cache::cache_flat_mutation_reader::do_fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./cache_flat_mutation_reader.hh:270
cache::cache_flat_mutation_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const at ./cache_flat_mutation_reader.hh:237
 (inlined by) seastar::internal::do_until_state<cache::cache_flat_mutation_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#2}, cache::cache_flat_mutation_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}>::run_and_dispose() at ././seastar/include/seastar/core/future-util.hh:463
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2118
seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2541
seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2524
 (inlined by) seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:2686
seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::{lambda()#3}::operator()() const at ./build/release/seastar/./seastar/src/core/reactor.cc:3826
std::function<void ()>::operator()() const at /usr/include/c++/9/bits/std_function.h:690
 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:52
/opt/scylladb/libreloc/libpthread.so.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=524a075152f231533a5006f17539a78aab4cea0d, for GNU/Linux 3.2.0, stripped

__libpthread_freeres at ??:?
clone at ??:?`

slivne · 2020-05-24T20:01:53Z

@tgrabiec can this be related to your fix for #6040 ?

avikivity · 2020-05-25T05:41:16Z

@alienth please check if you have any files named *Index.db that are 4GB in size or larger.

tgrabiec · 2020-05-25T08:31:55Z

The user says he gets this with a small (1MB) sstable, so it's not #6040.

@alienth What kind of read query are you issuing which triggers this?

alienth · 2020-05-25T20:39:49Z

@tgrabiec The node crashes as soon as it starts and I think before it's accepting read queries (haven't confirmed that, though).

The queries that hit this CF are just standard thrift get calls fetching down a single partition. In fact, the entire CF is only made up of one partition (an anti-pattern, I know).

alienth · 2020-05-26T03:51:00Z

A few more details:

The CF contains a single partition with about 80k columns, all of which have TTLs. As such, the row is constantly rotating out old columns and having new ones written.

The single partition is read a couple hundred times a second, with each get specifying up to 100 columns.

Row cache is enabled.

The CF was working fine for a few days before the crashing began. Removing the sstables on the crashing node and restarting it fixed it. A day later another node started doing the same. When that occurred I just failed back to C*, which has the same CF in place.

tgrabiec · 2020-05-26T15:07:37Z

@alienth I understood that the crash is during a read from one of your app's tables. Scylla doesn't read from user tables on its own during boot so it must be some external query triggering this.

What's the schema of the CF involved in the crash?

To get to the root cause we may need the core dump.

alienth · 2020-05-26T18:51:44Z

Hi @tgrabiec ,

It's just a thrift-built table, so schema is

CREATE TABLE ks."ActiveForFrontPage" (
    key ascii,
    column1 text,
    value blob,
    PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
    AND CLUSTERING ORDER BY (column1 ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'LeveledCompactionStrategy'}
    AND compression = {'chunk_length_in_kb': '64', 'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 10800
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = 'NONE';

To get to the root cause we may need the core dump.

Yeah, kinda what I suspected. I'm trying to pry open the core dump myself but hitting a separate tooling issue. Hoping once I can get that resolved I can pick the core dump apart to find the trouble.

(gdb) source /opt/scylladb/scripts/scylla-gdb.py
Traceback (most recent call last):
  File "/opt/scylladb/scripts/scylla-gdb.py", line 43, in <module>
    class intrusive_list:
  File "/opt/scylladb/scripts/scylla-gdb.py", line 44, in intrusive_list
    size_t = gdb.lookup_type('size_t')
gdb.error: No type named size_t.

Unfortunately I can't share the core dump without getting legal clearance.

tgrabiec · 2020-05-26T20:09:00Z

On Tue, May 26, 2020 at 8:52 PM Jason Harvey ***@***.***> wrote: Hi @tgrabiec <https://github.com/tgrabiec> , It's just a thrift-built table, so schema is CREATE TABLE keyspace."ActiveForFrontPage" ( key ascii, column1 text, value blob, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (column1 ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} AND comment = '' AND compaction = {'class': 'LeveledCompactionStrategy'} AND compression = {'chunk_length_in_kb': '64', 'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 0 AND gc_grace_seconds = 10800 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = 'NONE'; To get to the root cause we may need the core dump. Yeah, kinda what I suspected. I'm trying to pry open the core dump myself but hitting a separate tooling issue. Hoping once I can get that resolved I can pick the core dump apart to find the trouble. (gdb) source /opt/scylladb/scripts/scylla-gdb.py Traceback (most recent call last): File "/opt/scylladb/scripts/scylla-gdb.py", line 43, in <module> class intrusive_list: File "/opt/scylladb/scripts/scylla-gdb.py", line 44, in intrusive_list size_t = gdb.lookup_type('size_t') gdb.error: No type named size_t.

This usually means that gdb didn't load debug info properly. Maybe you don't have the debug info package installed?

…

alienth · 2020-05-26T20:57:01Z

This usually means that gdb didn't load debug info properly. Maybe you
don't have the debug info package installed?

Yeah, it's definitely installed. Likely something mucked up in my environment elsewhere. I'll update once I figure it out.

tgrabiec · 2020-05-26T21:32:30Z

@alienth Note that relocatable binaries may need special treatment do debug: https://github.com/scylladb/scylla/blob/master/docs/debugging.md#relocatable-binaries

alienth · 2020-05-27T00:19:36Z

I think it might be related to the binutils mess on Ubuntu:

unable to initialize decompress status for section .debug_info

Nearly have it figured out, I think.

denesb · 2020-05-27T10:18:49Z

Yeah, kinda what I suspected. I'm trying to pry open the core dump myself but hitting a separate tooling issue. Hoping once I can get that resolved I can pick the core dump apart to find the trouble.
(gdb) source /opt/scylladb/scripts/scylla-gdb.py
Traceback (most recent call last):
  File "/opt/scylladb/scripts/scylla-gdb.py", line 43, in <module>
    class intrusive_list:
  File "/opt/scylladb/scripts/scylla-gdb.py", line 44, in intrusive_list
    size_t = gdb.lookup_type('size_t')
gdb.error: No type named size_t.
Unfortunately I can't share the core dump without getting legal clearance.

This could be https://github.com/scylladb/scylla/blob/master/docs/debugging.md#namespace-issues.

avikivity · 2020-07-23T10:46:49Z

@tgrabiec any idea?

This is the first (and only) partition, so it can't be related to mis-clearing when moving from one partition to the next.

tgrabiec · 2020-07-23T15:40:06Z

@avikivity promoted index length is varint so 82 7b is actually 0x27b. It includes promoted index header so I think it's correct. The index file has length 0x281 and promoted index header starts at offset 6.

avikivity · 2020-07-23T16:39:55Z

Ah, I thought varint was little endian.

alienth · 2020-07-23T16:59:22Z

Mm, so index is looking fine, then?

If that's the case, any recommended path I can take on the GDB side of things with the coredump?

avikivity · 2020-07-23T17:12:01Z

You can instrument the code to print end and _stream_position, and we can work backwards from there.

avikivity · 2020-07-23T17:13:23Z

That is if you're comfortable with modifying and building the code. With gdb+core you can walk up the frame and look for variables the optimizer forgot to destroy.

alienth · 2020-08-01T20:51:44Z

I was able to repro this again after completely truncating all data and rebuilding it. Going to modify the code to try and diagnose the issue.

alienth · 2020-08-03T23:06:37Z

Got some printing. end is greatly before begin:

scylla[4255]: end is before stream position! begin: 589975 end: 131117

alienth · 2020-08-03T23:25:56Z

Tracing logs from the shard which crashed:

[shard 5] sstable - mp_row_consumer_m 0x6050087f2088: consume_row_marker_and_tombstone({row_marker: }, {tombstone: none}, {tombstone: none}), key={position: clustered,ckp{000974335f693070366d39},0}
[shard 5] sstable - mp_row_consumer_m 0x6050087f2088: consume_column(id={0}, path=, value=5b313539363438323836312e305d, ts=1596482861355781, ttl=0, del_time=9223372036854775807, deleted=0)
[shard 5] sstable - index 0x605000b79900: index_reader for /var/lib/scylla/data/test/test-ba318ff0953f11eaaafe000000000000/mc-4429-big-Data.db
[shard 5] sstable - index 0x605000b79900 bound 0x605000b79950: advance_to({8213365047359667313, pk{000131}, w=0}), _previous_summary_idx=0, _current_summary_idx=0
[shard 5] sstable - index 0x605000b79900: first entry
[shard 5] sstable - index 0x605000b5f680: index_reader for /var/lib/scylla/data/test/test-ba318ff0953f11eaaafe000000000000/mc-4437-big-Data.db
[shard 5] sstable - index 0x605000b5f680 bound 0x605000b5f6d0: advance_to({8213365047359667313, pk{000131}, w=0}), _previous_summary_idx=0, _current_summary_idx=0
[shard 5] sstable - index 0x605000b5f680: first entry
[shard 5] sstable - index 0x605000b5f400: index_reader for /var/lib/scylla/data/test/test-ba318ff0953f11eaaafe000000000000/mc-4445-big-Data.db
[shard 5] sstable - index 0x605000b5f400 bound 0x605000b5f450: advance_to({8213365047359667313, pk{000131}, w=0}), _previous_summary_idx=0, _current_summary_idx=0
[shard 5] sstable - index 0x605000b5f400: first entry
[shard 5] sstable - index 0x605000b79900: advance_to_page(0), bound 0x605000b79950
[shard 5] sstable - index 0x605000b5f680: advance_to_page(0), bound 0x605000b5f6d0
[shard 5] sstable - index 0x605000b5f400: advance_to_page(0), bound 0x605000b5f450
[shard 5] sstable - index 0x605000b5f400 bound 0x605000b5f450: page:
[shard 5] sstable -   {key: pk{000131}, token:8213365047359667313} -> 0
[shard 5] sstable - index 0x605000b79900 bound 0x605000b79950: page:
[shard 5] sstable -   {key: pk{000131}, token:8213365047359667313} -> 0
[shard 5] sstable - index 0x605000b79900: advance_upper_past({position: clustered,ckp{0006323261627a70},1})
[shard 5] sstable - index 0x605000b79900: advance_upper_past({position: clustered,ckp{0006323261627a70},1})
[shard 5] sstable - index 0x605000b79900 upper bound: skipped to cell, _current_pi_idx=2, _data_file_position=131117
[shard 5] sstable - reader 0x60500968a000: reading partition
[shard 5] sstable - reader 0x60500968a000: read from index
[shard 5] sstable - mp_row_consumer_m 0x60500968a088: setup_for_partition(pk{000131})
[shard 5] sstable - index 0x605000b79900: advance_to({position: clustered,ckp{00053272703330},-1}), current data_file_pos=0
[shard 5] sstable - index 0x605000b79900: lower bound skipped to cell, _current_pi_idx=10, _data_file_position=589975
[shard 5] sstable - data_consume_rows_context 0x605009139200: skip_to(0 -> 589975, el=1)
[shard 5] sstable - mp_row_consumer_m 0x60500968a088: reset(1)
end is before stream position! begin: 589975 end: 131117
scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed.
Aborting on shard 5.
Backtrace:
  0x00000000019cde14
  0x0000000001993ffb
  0x0000000001994147
  0x00000000019941d4
  0x00007f4301debb1f
  /opt/scylladb/libreloc/libc.so.6+0x000000000003c624
  /opt/scylladb/libreloc/libc.so.6+0x00000000000258d8
  /opt/scylladb/libreloc/libc.so.6+0x00000000000257a8
  /opt/scylladb/libreloc/libc.so.6+0x0000000000034a65
  0x0000000000884730
  0x00000000008abcdd
  0x00000000004f1712
  0x00000000008a450d
  0x00000000004f1712
  0x00000000008a884f
  0x00000000008aace7
  0x00000000008aafab
  0x000000000087fc3c
  0x00000000004e839f
  0x0000000001991369
  0x000000000199d40b
  0x00000000019bc206
  0x00000000019c45bf
  0x00000000019c46c3
  0x000000000197ece1
  /opt/scylladb/libreloc/libpthread.so.0+0x00000000000094e1
  /opt/scylladb/libreloc/libc.so.6+0x0000000000101692

alienth · 2020-08-04T00:08:00Z

Interestingly, on a subsequent restart the sstable got compacted down, and it crashed again on the newly compacted sstable. Clustering position is different, too:

scylla:  [shard 5] sstable - index 0x605009152a00 bound 0x605009152a50: page:
scylla:  [shard 5] sstable -   {key: pk{000131}, token:8213365047359667313} -> 0
scylla:  [shard 5] sstable - index 0x605009152a00: advance_upper_past({position: clustered,ckp{00053276657a65},1})
scylla:  [shard 5] sstable - index 0x605009152a00: advance_upper_past({position: clustered,ckp{00053276657a65},1})
scylla:  [shard 5] sstable - index 0x605009152a00 upper bound: skipped to cell, _current_pi_idx=15, _data_file_position=983252
scylla:  [shard 5] sstable - reader 0x605009486c00: reading partition
scylla:  [shard 5] sstable - reader 0x605009486c00: read from index
scylla:  [shard 5] sstable - mp_row_consumer_m 0x605009486c88: setup_for_partition(pk{000131})
scylla:  [shard 5] sstable - index 0x605009152a00: advance_to({position: clustered,ckp{0005743464306d},-1}), current data_file_pos=0
scylla:  [shard 5] sstable - index 0x605009152a00: lower bound skipped to cell, _current_pi_idx=28, _data_file_position=1769865
scylla:  [shard 5] sstable - data_consume_rows_context 0x605009136a00: skip_to(0 -> 1769865, el=1)
scylla[9151]: end is before stream position! begin: 1769865 end: 983252
scylla[9151]: scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed.
scylla[9151]: Aborting on shard 5.

alienth · 2020-08-04T01:25:25Z

I tcpdump'd and flipped on reads to this ring from a single source and was able to repro the crash, however the query which triggered the crash I was able to repeat after restarting without a problem.. so, it may be some bug which only arises after some number of reads?

The read itself is uninteresting - a get_slice for the only partition with a list of 85 columns, of which ~60 are still alive.

alienth · 2020-08-04T07:15:54Z

Based on the trace I'm thinking the promoted index is pointing to a location in the file that doesn't exist? Going to try parsing the index and data file more carefully to see if that's the case.

slivne · 2020-08-04T07:55:10Z

@alienth - if the stable doesn';t have a lot of data - can you try and overwrite the data with fake data - check the compacted sstable still crashed and shared with us the sstable with the faked data.

alienth · 2020-08-04T09:01:34Z

Attached:

sstable.tar.gz

I'm still hacking away at the files to see if I can find the issue. Are these sstables binary-compatible with the same mc files that apache cassandra provides? If so, I might pick em apart using the deserializers in that code, which I'm far more familiar with.

Edit: Also, I can confirm that scylla does crash with that sstable.

scylla[7741]: end is before stream position! begin: 1507548 end: 1376461
scylla[7741]: scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>
::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed.

avikivity · 2020-08-04T10:11:37Z

Yes, those sstables are fully compatible. Scylla adds a -Scylla.db component, but Cassandra ignores it.

avikivity · 2020-08-04T10:12:58Z

/cc @tgrabiec please take a look

tgrabiec · 2020-08-04T15:17:46Z

The assert fails because the position in the data file for the upper bound of the read is smaller than the position for the lower bound of the read.

This is because the read uses lower bound which is larger than the upper bound. In one of the traces the column name range is ["t4d0m", "2veze"].

Are you maybe passing the column names to get_slice in unsorted order?

Looking at slice_pred_to_read_cmd(), we don't sort ranges in the thrift server, which could explain what you see.

It doesn't explain why you don't get the same crash for the same query after restarting the server.

alienth · 2020-08-04T19:09:39Z

@tgrabiec

Are you maybe passing the column names to get_slice in unsorted order?

AH, yeah, we don't sort columns on the call, and the library we're using also does not:

https://github.com/pycassa/pycassa/blob/b314d5fa4e6ba1219850f50d767aa0be5ed5ca5f/pycassa/columnfamily.py#L389

Knowing this I'll try to intentionally request unsorted columns via thrift to see if I can get it to crash.

Something interesting about this is that when blowing away the data and restarting, things work fine for a while before the crashes resume. Unsure why that'd be 🤔 .

alienth · 2020-08-04T19:21:46Z

It took several tries for whatever reason, but I was able to repro the crash by issuing a get_slice with unsorted columns.

Edit: I'm doing a CL_ONE call on a ring with an RF of 3 - looks like it only crashes when the query happens to go to one specific node. That seems rather odd - I'm unclear why the other nodes would be able to handle this query successfully?

tgrabiec · 2020-08-05T20:55:50Z

@alienth Can you paste the log with a trace-level enabled on the sstable logger for a query which passes?

One hypothesis I have is that the nodes which can handle the read have the data fully in cache, which is not as prone to out of order ranges.

The column names in SlicePredicate can be passed in arbitrary order. We converted them to clustering ranges in read_command preserving the original order. As a result, the clustering ranges in read command may appear out of order. This violates storage engine's assumptions and lead to undefined behavior. It was seen manifesting as a SIGSEGV or an abort in sstable reader when executing a get_slice() thrift verb: scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed. Fixes #6486. Tests: - added a new dtest to thrift_tests.py which reproduces the problem Message-Id: <1596725657-15802-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit bfd129c)

avikivity · 2020-08-08T16:49:54Z

Bacported to 4.0, 4.1, 4.2.

alienth changed the title ~~Crash in continuous_data_consumer.~~ Node crashlooping in continuous_data_consumer. May 20, 2020

slivne added user request type/bug labels May 21, 2020

tgrabiec added the area/thrift api label Aug 4, 2020

scylladb-promoter added the Backport candidate label Aug 8, 2020

scylladb-promoter closed this as completed in bfd129c Aug 8, 2020

avikivity removed the Backport candidate label Aug 8, 2020

tzach added this to the 4.2 milestone Aug 20, 2020

Node crashlooping in continuous_data_consumer. #6486

Node crashlooping in continuous_data_consumer. #6486

Comments

alienth commented May 20, 2020 • edited

alienth commented May 20, 2020

alienth commented May 20, 2020

alienth commented May 20, 2020

slivne commented May 20, 2020

alienth commented May 20, 2020 • edited

slivne commented May 20, 2020

alienth commented May 20, 2020

slivne commented May 20, 2020

alienth commented May 20, 2020

slivne commented May 20, 2020 • edited

alienth commented May 20, 2020

alienth commented May 20, 2020

slivne commented May 20, 2020

alienth commented May 22, 2020

slivne commented May 24, 2020

slivne commented May 24, 2020

avikivity commented May 25, 2020

tgrabiec commented May 25, 2020 • edited

alienth commented May 25, 2020 • edited

alienth commented May 26, 2020 • edited

tgrabiec commented May 26, 2020

alienth commented May 26, 2020 • edited

tgrabiec commented May 26, 2020 via email

alienth commented May 26, 2020

tgrabiec commented May 26, 2020

alienth commented May 27, 2020 • edited

denesb commented May 27, 2020

avikivity commented Jul 23, 2020

tgrabiec commented Jul 23, 2020

avikivity commented Jul 23, 2020

alienth commented Jul 23, 2020

avikivity commented Jul 23, 2020

avikivity commented Jul 23, 2020

alienth commented Aug 1, 2020

alienth commented Aug 3, 2020

alienth commented Aug 3, 2020

alienth commented Aug 4, 2020

alienth commented Aug 4, 2020

alienth commented Aug 4, 2020 • edited

slivne commented Aug 4, 2020

alienth commented Aug 4, 2020 • edited

avikivity commented Aug 4, 2020

avikivity commented Aug 4, 2020

tgrabiec commented Aug 4, 2020 • edited

alienth commented Aug 4, 2020

alienth commented Aug 4, 2020 • edited

tgrabiec commented Aug 5, 2020

avikivity commented Aug 8, 2020

Node crashlooping in `continuous_data_consumer`. #6486

Node crashlooping in `continuous_data_consumer`. #6486

alienth commented May 20, 2020 •

edited

alienth commented May 20, 2020 •

edited

slivne commented May 20, 2020 •

edited

tgrabiec commented May 25, 2020 •

edited

alienth commented May 25, 2020 •

edited

alienth commented May 26, 2020 •

edited

alienth commented May 26, 2020 •

edited

alienth commented May 27, 2020 •

edited

alienth commented Aug 4, 2020 •

edited

alienth commented Aug 4, 2020 •

edited

tgrabiec commented Aug 4, 2020 •

edited

alienth commented Aug 4, 2020 •

edited