Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node crashlooping in continuous_data_consumer. #6486

Closed
1 task done
alienth opened this issue May 20, 2020 · 64 comments
Closed
1 task done

Node crashlooping in continuous_data_consumer. #6486

alienth opened this issue May 20, 2020 · 64 comments

Comments

@alienth
Copy link

alienth commented May 20, 2020

This is Scylla's bug tracker, to be used for reporting bugs only.
If you have a question about Scylla, and not a bug, please ask it in
our mailing-list at scylladb-dev@googlegroups.com or in our slack channel.

  • I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.

Installation details
Scylla version (or git commit hash): 3.3.1, 6f939ff
Cluster size: 9
OS (RHEL/CentOS/Ubuntu/AWS AMI): Ubuntu

Encountered the following crash on Scylla. Not sure if it matters, but this ring is primarily serving Thrift requests.

May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]: scylla: sstables/consumer.hh:469: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed.
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]: Aborting on shard 5.
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]: Backtrace:
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000002a5ed82
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000002a058fb
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000002a05bf5
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000002a05c90
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x00007fcc5087fb1f
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   /opt/scylladb/libreloc/libc.so.6+0x000000000003c624
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   /opt/scylladb/libreloc/libc.so.6+0x00000000000258d8
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   /opt/scylladb/libreloc/libc.so.6+0x00000000000257a8
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   /opt/scylladb/libreloc/libc.so.6+0x0000000000034a65
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000000e5502f
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000000ea9345
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000000e55139
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000000eaac7c
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000000e5a3e4
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000002a01ba1
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000002a01daf
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000002a42575
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x0000000002a511cc
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   0x00000000029dc43d
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   /opt/scylladb/libreloc/libpthread.so.0+0x00000000000094e1
May 20 10:04:10 s-0cab66fc8c7c3048e scylla[13838]:   /opt/scylladb/libreloc/libc.so.6+0x0000000000101692

No core dump was produced.

When restarting the node, it crashes again, so guessing there is some sstable triggering this.

Seems like this might have a similar symptoms to #4315?

@alienth alienth changed the title Crash in continuous_data_consumer. Node crashlooping in continuous_data_consumer. May 20, 2020
@alienth
Copy link
Author

alienth commented May 20, 2020

Just tested: Wiping out the data in the node results in the crashloop ceasing. Going to try and binary search to find the culprit.

@alienth
Copy link
Author

alienth commented May 20, 2020

Narrowed the culprit CF down via binary search. Very curiously, it's a CF with only one small (1MB) sstable. That sstable dumps just fine with sstabledump, so nothing seems obviously corrupt about it.

Going to copy that single sstable to a one-node ring and see if the breakage follows.

@alienth
Copy link
Author

alienth commented May 20, 2020

Curiously, scylla logged that it dumped its core, yet /var/lib/scylla/coredump is empty 🤔 .

May 19 23:17:57 s-0cab66fc8c7c3048e systemd-coredump[8849]: Process 8809 (scylla) of user 113 dumped core.

@slivne
Copy link
Contributor

slivne commented May 20, 2020

@alienth

can you please share cat /proc/sys/kernel/core_pattern

@alienth
Copy link
Author

alienth commented May 20, 2020

@slivne

$ cat /proc/sys/kernel/core_pattern
|/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %e

Verified that /var/lib/systemd/coredump is also empty.

@slivne
Copy link
Contributor

slivne commented May 20, 2020

lets try to force the setup to be according to scylla preference

 sudo scylla_coredump_setup --dump-to-raiddir

and then run

ls -lstra /var/lib/systemd/coredump

@alienth
Copy link
Author

alienth commented May 20, 2020

So, I copied the CF that was causing the trouble to a one-node ring, and wasn't able to repro there. Going to try and repro again elsewhere. I think the reason for the missing coredump is insufficient space to store it on the root volume.

@slivne
Copy link
Contributor

slivne commented May 20, 2020

@alienth the info to setup for scylla I listed above (#6486 (comment))

@alienth
Copy link
Author

alienth commented May 20, 2020

I'm able to repro again, but still no coredump (even when running the coredump setup that symlinks). Going to just force the kernel to coredump to non-systemd.

@slivne
Copy link
Contributor

slivne commented May 20, 2020

so its clear once it setup correctly you should see something like

shlomi@sl-ubuntu-1:~$ sudo scylla_coredump_setup --dump-to-raiddir
Reading package lists... Done
Building dependency tree       
Reading state information... Done
systemd-coredump is already the newest version (229-4ubuntu21.28).
The following package was automatically installed and is no longer required:
  grub-pc-bin
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 32 not upgraded.
kernel.core_pattern = |/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %e
shlomi@sl-ubuntu-1:~$ 
shlomi@sl-ubuntu-1:~$ cat /proc/sys/kernel/core_pattern
|/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %e
shlomi@sl-ubuntu-1:~$ ls -lstra /lib/systemd/systemd-coredump
116 -rwxr-xr-x 1 root root 117448 Apr 28 13:03 /lib/systemd/systemd-coredump
shlomi@sl-ubuntu-1:~$ kill -ABRT $$
Aborted (core dumped)
shlomi@sl-ubuntu-1:~$ ls -lstra /var//lib/scylla/coredump/
total 4376
   4 drwxr-xr-x  7 scylla scylla    4096 May 20 19:08 ..
2180 -rw-r-----+ 1 root   root   2228224 May 20 19:12 core.bash.1040.134f829743704cc98a9eeb40c6843365.8577.1590001973000000000000
2188 -rw-r-----+ 1 root   root   2236416 May 20 19:26 core.bash.1040.134f829743704cc98a9eeb40c6843365.8459.1590002791000000000000
   4 drwxr-xr-x  2 scylla scylla    4096 May 20 19:26 .

(the last part is validation cores are created correctly)

@alienth
Copy link
Author

alienth commented May 20, 2020

Yea, everything validates yet I still can't get any cores generated. I also tried

echo '/var/lib/scylla/coredump/core.%e.%p.%h.%t' > /proc/sys/kernel/core_pattern

to no avail. Something is hosed with coredumps on this box, I s'pose? Continuing to dig.

@alienth
Copy link
Author

alienth commented May 20, 2020

Ah, default system ulimit on coredump was hampering me :P. Getting the coredump now.

@slivne
Copy link
Contributor

slivne commented May 20, 2020

@alienth great (one small step ...)

if you can please provide also the schema from that specific node and also the specific sstables if you can share them

to upload the info please follow https://docs.scylladb.com/troubleshooting/report_scylla_problem/#send-files-to-scylladb-support

@alienth
Copy link
Author

alienth commented May 22, 2020

A new node started crashing today on the same CF, so seems like this is definitely some data corruption bug that is somehow unique to that CF. It's a thrift-built CF, so no special schema. Unfortunately I can't upload the sstables without legal clearance, so going to try and troubleshoot myself.

I've resolved the backtrace, which can be found here: https://gist.github.com/alienth/2f3d0d486255f432ae2778a2cd510028

@slivne
Copy link
Contributor

slivne commented May 24, 2020

backterace

acktrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:56
seastar::backtrace_buffer::append_backtrace() at ./build/release/seastar/./seastar/src/core/reactor.cc:721
 (inlined by) print_with_backtrace at ./build/release/seastar/./seastar/src/core/reactor.cc:742
seastar::print_with_backtrace(char const*) at ./build/release/seastar/./seastar/src/core/reactor.cc:749
seastar::install_oneshot_signal_handler<6, &seastar::sigabrt_action>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) at ./build/release/seastar/./seastar/src/core/reactor.cc:3424
 (inlined by) operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:3406
 (inlined by) _FUN at ./build/release/seastar/./seastar/src/core/reactor.cc:3402
?? ??:0
/opt/scylladb/libreloc/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4cedb55f04630f72406978c5711657a2f7bae224, for GNU/Linux 3.2.0, stripped

gsignal at ??:?
abort at ??:?
?? ??:0
__assert_fail at ??:?
data_consumer::continuous_data_consumer<sstables::data_consume_rows_context_m>::fast_forward_to(unsigned long, unsigned long) at ./sstables/consumer.hh:469
 (inlined by) data_consumer::continuous_data_consumer<sstables::data_consume_rows_context_m>::skip_to(unsigned long) at ./sstables/consumer.hh:477
 (inlined by) sstables::data_consume_context<sstables::data_consume_rows_context_m>::skip_to(sstables::indexable_element, unsigned long) at ./sstables/data_consume_context.hh:116
 (inlined by) sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const at ./sstables/partition.cc:362
 (inlined by) seastar::apply_helper<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}, std::tuple<>&&, std::integer_sequence<unsigned long> >::apply({lambda()#2}&&, std::tuple) at ././seastar/include/seastar/core/apply.hh:36
 (inlined by) auto seastar::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}&&, std::tuple<>&&) at ././seastar/include/seastar/core/apply.hh:44
 (inlined by) seastar::future<> seastar::futurize<seastar::future<> >::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}&&, std::tuple<>&&) at ././seastar/include/seastar/core/future.hh:1645
 (inlined by) seastar::future<> seastar::future<>::then_impl<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}, seastar::future<> >(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}&&) at ././seastar/include/seastar/core/future.hh:1178
 (inlined by) seastar::future<> seastar::future<>::then<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}, seastar::future<> >(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const::{lambda()#1}&&) at ././seastar/include/seastar/core/future.hh:1159
 (inlined by) sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}::operator()() const at ./sstables/partition.cc:363
 (inlined by) seastar::apply_helper<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}, std::tuple<>&&, std::integer_sequence<unsigned long> >::apply({lambda()#2}&&, std::tuple<>) at ././seastar/include/seastar/core/apply.hh:36
 (inlined by) auto seastar::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}&&, std::tuple<>&&) at ././seastar/include/seastar/core/apply.hh:44
 (inlined by) seastar::future<> seastar::futurize<seastar::future<> >::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}&&, std::tuple<>&&) at ././seastar/include/seastar/core/future.hh:1645
seastar::future<> seastar::future<>::then_impl<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}, seastar::future<> >(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}&&) at ././seastar/include/seastar/core/future.hh:1178
 (inlined by) seastar::future<> seastar::future<>::then<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}, seastar::future<> >(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>)::{lambda()#2}&&) at ././seastar/include/seastar/core/future.hh:1159
 (inlined by) sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::advance_context(std::optional<position_in_partition_view>) at ./sstables/partition.cc:364
 (inlined by) sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./sstables/partition.cc:464
flat_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./flat_mutation_reader.hh:473
 (inlined by) operator() at ./mutation_reader.cc:562
 (inlined by) apply<mutation_reader_merger::fast_forward_to(position_range, seastar::lowres_clock::time_point)::<lambda(mutation_reader_merger::reader_and_last_fragment_kind)>, mutation_reader_merger::reader_and_last_fragment_kind&> at ././seastar/include/seastar/core/future.hh:1655
 (inlined by) futurize_apply<mutation_reader_merger::fast_forward_to(position_range, seastar::lowres_clock::time_point)::<lambda(mutation_reader_merger::reader_and_last_fragment_kind)>, mutation_reader_merger::reader_and_last_fragment_kind&> at ././seastar/include/seastar/core/future.hh:1733
 (inlined by) parallel_for_each<mutation_reader_merger::reader_and_last_fragment_kind*, mutation_reader_merger::fast_forward_to(position_range, seastar::lowres_clock::time_point)::<lambda(mutation_reader_merger::reader_and_last_fragment_kind)> > at ././seastar/include/seastar/core/future-util.hh:156
 (inlined by) parallel_for_each<utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4>&, mutation_reader_merger::fast_forward_to(position_range, seastar::lowres_clock::time_point)::<lambda(mutation_reader_merger::reader_and_last_fragment_kind)> > at ././seastar/include/seastar/core/future-util.hh:196
 (inlined by) mutation_reader_merger::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./mutation_reader.cc:563
mutation_fragment_merger<mutation_reader_merger>::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./mutation_reader.cc:132
 (inlined by) combined_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./mutation_reader.cc:623
flat_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./flat_mutation_reader.hh:473
 (inlined by) restricting_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(flat_mutation_reader&)#1}::operator()(flat_mutation_reader&) at ./mutation_reader.cc:756
 (inlined by) decltype(auto) restricting_mutation_reader::with_reader<restricting_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(flat_mutation_reader&)#1}>(restricting_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(flat_mutation_reader&)#1}, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./mutation_reader.cc:702
 (inlined by) restricting_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./mutation_reader.cc:757
flat_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./flat_mutation_reader.hh:473
 (inlined by) cache::cache_flat_mutation_reader::do_fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ./cache_flat_mutation_reader.hh:270
cache::cache_flat_mutation_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const at ./cache_flat_mutation_reader.hh:237
 (inlined by) seastar::internal::do_until_state<cache::cache_flat_mutation_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#2}, cache::cache_flat_mutation_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}>::run_and_dispose() at ././seastar/include/seastar/core/future-util.hh:463
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2118
seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2541
seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2524
 (inlined by) seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:2686
seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::{lambda()#3}::operator()() const at ./build/release/seastar/./seastar/src/core/reactor.cc:3826
std::function<void ()>::operator()() const at /usr/include/c++/9/bits/std_function.h:690
 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:52
/opt/scylladb/libreloc/libpthread.so.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=524a075152f231533a5006f17539a78aab4cea0d, for GNU/Linux 3.2.0, stripped

__libpthread_freeres at ??:?
clone at ??:?`

@slivne
Copy link
Contributor

slivne commented May 24, 2020

@tgrabiec can this be related to your fix for #6040 ?

@avikivity
Copy link
Member

@alienth please check if you have any files named *Index.db that are 4GB in size or larger.

@tgrabiec
Copy link
Contributor

tgrabiec commented May 25, 2020

The user says he gets this with a small (1MB) sstable, so it's not #6040.

@alienth What kind of read query are you issuing which triggers this?

@alienth
Copy link
Author

alienth commented May 25, 2020

@tgrabiec The node crashes as soon as it starts and I think before it's accepting read queries (haven't confirmed that, though).

The queries that hit this CF are just standard thrift get calls fetching down a single partition. In fact, the entire CF is only made up of one partition (an anti-pattern, I know).

@alienth
Copy link
Author

alienth commented May 26, 2020

A few more details:

The CF contains a single partition with about 80k columns, all of which have TTLs. As such, the row is constantly rotating out old columns and having new ones written.

The single partition is read a couple hundred times a second, with each get specifying up to 100 columns.

Row cache is enabled.

The CF was working fine for a few days before the crashing began. Removing the sstables on the crashing node and restarting it fixed it. A day later another node started doing the same. When that occurred I just failed back to C*, which has the same CF in place.

@tgrabiec
Copy link
Contributor

@alienth I understood that the crash is during a read from one of your app's tables. Scylla doesn't read from user tables on its own during boot so it must be some external query triggering this.

What's the schema of the CF involved in the crash?

To get to the root cause we may need the core dump.

@alienth
Copy link
Author

alienth commented May 26, 2020

Hi @tgrabiec ,

It's just a thrift-built table, so schema is

CREATE TABLE ks."ActiveForFrontPage" (
    key ascii,
    column1 text,
    value blob,
    PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
    AND CLUSTERING ORDER BY (column1 ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'LeveledCompactionStrategy'}
    AND compression = {'chunk_length_in_kb': '64', 'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 10800
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = 'NONE';

To get to the root cause we may need the core dump.

Yeah, kinda what I suspected. I'm trying to pry open the core dump myself but hitting a separate tooling issue. Hoping once I can get that resolved I can pick the core dump apart to find the trouble.

(gdb) source /opt/scylladb/scripts/scylla-gdb.py
Traceback (most recent call last):
  File "/opt/scylladb/scripts/scylla-gdb.py", line 43, in <module>
    class intrusive_list:
  File "/opt/scylladb/scripts/scylla-gdb.py", line 44, in intrusive_list
    size_t = gdb.lookup_type('size_t')
gdb.error: No type named size_t.

Unfortunately I can't share the core dump without getting legal clearance.

@tgrabiec
Copy link
Contributor

tgrabiec commented May 26, 2020 via email

@alienth
Copy link
Author

alienth commented May 26, 2020

This usually means that gdb didn't load debug info properly. Maybe you
don't have the debug info package installed?

Yeah, it's definitely installed. Likely something mucked up in my environment elsewhere. I'll update once I figure it out.

@tgrabiec
Copy link
Contributor

@alienth Note that relocatable binaries may need special treatment do debug: https://github.com/scylladb/scylla/blob/master/docs/debugging.md#relocatable-binaries

@alienth
Copy link
Author

alienth commented May 27, 2020

I think it might be related to the binutils mess on Ubuntu:

unable to initialize decompress status for section .debug_info

Nearly have it figured out, I think.

@denesb
Copy link
Contributor

denesb commented May 27, 2020

Yeah, kinda what I suspected. I'm trying to pry open the core dump myself but hitting a separate tooling issue. Hoping once I can get that resolved I can pick the core dump apart to find the trouble.

(gdb) source /opt/scylladb/scripts/scylla-gdb.py
Traceback (most recent call last):
  File "/opt/scylladb/scripts/scylla-gdb.py", line 43, in <module>
    class intrusive_list:
  File "/opt/scylladb/scripts/scylla-gdb.py", line 44, in intrusive_list
    size_t = gdb.lookup_type('size_t')
gdb.error: No type named size_t.

Unfortunately I can't share the core dump without getting legal clearance.

This could be https://github.com/scylladb/scylla/blob/master/docs/debugging.md#namespace-issues.

@avikivity
Copy link
Member

@tgrabiec any idea?

This is the first (and only) partition, so it can't be related to mis-clearing when moving from one partition to the next.

@tgrabiec
Copy link
Contributor

@avikivity promoted index length is varint so 82 7b is actually 0x27b. It includes promoted index header so I think it's correct. The index file has length 0x281 and promoted index header starts at offset 6.

@avikivity
Copy link
Member

Ah, I thought varint was little endian.

@alienth
Copy link
Author

alienth commented Jul 23, 2020

Mm, so index is looking fine, then?

If that's the case, any recommended path I can take on the GDB side of things with the coredump?

@avikivity
Copy link
Member

You can instrument the code to print end and _stream_position, and we can work backwards from there.

@avikivity
Copy link
Member

That is if you're comfortable with modifying and building the code. With gdb+core you can walk up the frame and look for variables the optimizer forgot to destroy.

@alienth
Copy link
Author

alienth commented Aug 1, 2020

I was able to repro this again after completely truncating all data and rebuilding it. Going to modify the code to try and diagnose the issue.

@alienth
Copy link
Author

alienth commented Aug 3, 2020

Got some printing. end is greatly before begin:

scylla[4255]: end is before stream position! begin: 589975 end: 131117

@alienth
Copy link
Author

alienth commented Aug 3, 2020

Tracing logs from the shard which crashed:

[shard 5] sstable - mp_row_consumer_m 0x6050087f2088: consume_row_marker_and_tombstone({row_marker: }, {tombstone: none}, {tombstone: none}), key={position: clustered,ckp{000974335f693070366d39},0}
[shard 5] sstable - mp_row_consumer_m 0x6050087f2088: consume_column(id={0}, path=, value=5b313539363438323836312e305d, ts=1596482861355781, ttl=0, del_time=9223372036854775807, deleted=0)
[shard 5] sstable - index 0x605000b79900: index_reader for /var/lib/scylla/data/test/test-ba318ff0953f11eaaafe000000000000/mc-4429-big-Data.db
[shard 5] sstable - index 0x605000b79900 bound 0x605000b79950: advance_to({8213365047359667313, pk{000131}, w=0}), _previous_summary_idx=0, _current_summary_idx=0
[shard 5] sstable - index 0x605000b79900: first entry
[shard 5] sstable - index 0x605000b5f680: index_reader for /var/lib/scylla/data/test/test-ba318ff0953f11eaaafe000000000000/mc-4437-big-Data.db
[shard 5] sstable - index 0x605000b5f680 bound 0x605000b5f6d0: advance_to({8213365047359667313, pk{000131}, w=0}), _previous_summary_idx=0, _current_summary_idx=0
[shard 5] sstable - index 0x605000b5f680: first entry
[shard 5] sstable - index 0x605000b5f400: index_reader for /var/lib/scylla/data/test/test-ba318ff0953f11eaaafe000000000000/mc-4445-big-Data.db
[shard 5] sstable - index 0x605000b5f400 bound 0x605000b5f450: advance_to({8213365047359667313, pk{000131}, w=0}), _previous_summary_idx=0, _current_summary_idx=0
[shard 5] sstable - index 0x605000b5f400: first entry
[shard 5] sstable - index 0x605000b79900: advance_to_page(0), bound 0x605000b79950
[shard 5] sstable - index 0x605000b5f680: advance_to_page(0), bound 0x605000b5f6d0
[shard 5] sstable - index 0x605000b5f400: advance_to_page(0), bound 0x605000b5f450
[shard 5] sstable - index 0x605000b5f400 bound 0x605000b5f450: page:
[shard 5] sstable -   {key: pk{000131}, token:8213365047359667313} -> 0
[shard 5] sstable - index 0x605000b79900 bound 0x605000b79950: page:
[shard 5] sstable -   {key: pk{000131}, token:8213365047359667313} -> 0
[shard 5] sstable - index 0x605000b79900: advance_upper_past({position: clustered,ckp{0006323261627a70},1})
[shard 5] sstable - index 0x605000b79900: advance_upper_past({position: clustered,ckp{0006323261627a70},1})
[shard 5] sstable - index 0x605000b79900 upper bound: skipped to cell, _current_pi_idx=2, _data_file_position=131117
[shard 5] sstable - reader 0x60500968a000: reading partition
[shard 5] sstable - reader 0x60500968a000: read from index
[shard 5] sstable - mp_row_consumer_m 0x60500968a088: setup_for_partition(pk{000131})
[shard 5] sstable - index 0x605000b79900: advance_to({position: clustered,ckp{00053272703330},-1}), current data_file_pos=0
[shard 5] sstable - index 0x605000b79900: lower bound skipped to cell, _current_pi_idx=10, _data_file_position=589975
[shard 5] sstable - data_consume_rows_context 0x605009139200: skip_to(0 -> 589975, el=1)
[shard 5] sstable - mp_row_consumer_m 0x60500968a088: reset(1)
end is before stream position! begin: 589975 end: 131117
scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed.
Aborting on shard 5.
Backtrace:
  0x00000000019cde14
  0x0000000001993ffb
  0x0000000001994147
  0x00000000019941d4
  0x00007f4301debb1f
  /opt/scylladb/libreloc/libc.so.6+0x000000000003c624
  /opt/scylladb/libreloc/libc.so.6+0x00000000000258d8
  /opt/scylladb/libreloc/libc.so.6+0x00000000000257a8
  /opt/scylladb/libreloc/libc.so.6+0x0000000000034a65
  0x0000000000884730
  0x00000000008abcdd
  0x00000000004f1712
  0x00000000008a450d
  0x00000000004f1712
  0x00000000008a884f
  0x00000000008aace7
  0x00000000008aafab
  0x000000000087fc3c
  0x00000000004e839f
  0x0000000001991369
  0x000000000199d40b
  0x00000000019bc206
  0x00000000019c45bf
  0x00000000019c46c3
  0x000000000197ece1
  /opt/scylladb/libreloc/libpthread.so.0+0x00000000000094e1
  /opt/scylladb/libreloc/libc.so.6+0x0000000000101692

@alienth
Copy link
Author

alienth commented Aug 4, 2020

Interestingly, on a subsequent restart the sstable got compacted down, and it crashed again on the newly compacted sstable. Clustering position is different, too:

scylla:  [shard 5] sstable - index 0x605009152a00 bound 0x605009152a50: page:
scylla:  [shard 5] sstable -   {key: pk{000131}, token:8213365047359667313} -> 0
scylla:  [shard 5] sstable - index 0x605009152a00: advance_upper_past({position: clustered,ckp{00053276657a65},1})
scylla:  [shard 5] sstable - index 0x605009152a00: advance_upper_past({position: clustered,ckp{00053276657a65},1})
scylla:  [shard 5] sstable - index 0x605009152a00 upper bound: skipped to cell, _current_pi_idx=15, _data_file_position=983252
scylla:  [shard 5] sstable - reader 0x605009486c00: reading partition
scylla:  [shard 5] sstable - reader 0x605009486c00: read from index
scylla:  [shard 5] sstable - mp_row_consumer_m 0x605009486c88: setup_for_partition(pk{000131})
scylla:  [shard 5] sstable - index 0x605009152a00: advance_to({position: clustered,ckp{0005743464306d},-1}), current data_file_pos=0
scylla:  [shard 5] sstable - index 0x605009152a00: lower bound skipped to cell, _current_pi_idx=28, _data_file_position=1769865
scylla:  [shard 5] sstable - data_consume_rows_context 0x605009136a00: skip_to(0 -> 1769865, el=1)
scylla[9151]: end is before stream position! begin: 1769865 end: 983252
scylla[9151]: scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed.
scylla[9151]: Aborting on shard 5.

@alienth
Copy link
Author

alienth commented Aug 4, 2020

I tcpdump'd and flipped on reads to this ring from a single source and was able to repro the crash, however the query which triggered the crash I was able to repeat after restarting without a problem.. so, it may be some bug which only arises after some number of reads?

The read itself is uninteresting - a get_slice for the only partition with a list of 85 columns, of which ~60 are still alive.

@alienth
Copy link
Author

alienth commented Aug 4, 2020

Based on the trace I'm thinking the promoted index is pointing to a location in the file that doesn't exist? Going to try parsing the index and data file more carefully to see if that's the case.

@slivne
Copy link
Contributor

slivne commented Aug 4, 2020

@alienth - if the stable doesn';t have a lot of data - can you try and overwrite the data with fake data - check the compacted sstable still crashed and shared with us the sstable with the faked data.

@alienth
Copy link
Author

alienth commented Aug 4, 2020

Attached:

sstable.tar.gz

I'm still hacking away at the files to see if I can find the issue. Are these sstables binary-compatible with the same mc files that apache cassandra provides? If so, I might pick em apart using the deserializers in that code, which I'm far more familiar with.

Edit: Also, I can confirm that scylla does crash with that sstable.

scylla[7741]: end is before stream position! begin: 1507548 end: 1376461
scylla[7741]: scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>
::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed.

@avikivity
Copy link
Member

Yes, those sstables are fully compatible. Scylla adds a -Scylla.db component, but Cassandra ignores it.

@avikivity
Copy link
Member

/cc @tgrabiec please take a look

@tgrabiec
Copy link
Contributor

tgrabiec commented Aug 4, 2020

The assert fails because the position in the data file for the upper bound of the read is smaller than the position for the lower bound of the read.

This is because the read uses lower bound which is larger than the upper bound. In one of the traces the column name range is ["t4d0m", "2veze"].

Are you maybe passing the column names to get_slice in unsorted order?

Looking at slice_pred_to_read_cmd(), we don't sort ranges in the thrift server, which could explain what you see.

It doesn't explain why you don't get the same crash for the same query after restarting the server.

@alienth
Copy link
Author

alienth commented Aug 4, 2020

@tgrabiec

Are you maybe passing the column names to get_slice in unsorted order?

AH, yeah, we don't sort columns on the call, and the library we're using also does not:

https://github.com/pycassa/pycassa/blob/b314d5fa4e6ba1219850f50d767aa0be5ed5ca5f/pycassa/columnfamily.py#L389

Knowing this I'll try to intentionally request unsorted columns via thrift to see if I can get it to crash.

Something interesting about this is that when blowing away the data and restarting, things work fine for a while before the crashes resume. Unsure why that'd be 🤔 .

@alienth
Copy link
Author

alienth commented Aug 4, 2020

It took several tries for whatever reason, but I was able to repro the crash by issuing a get_slice with unsorted columns.

Edit: I'm doing a CL_ONE call on a ring with an RF of 3 - looks like it only crashes when the query happens to go to one specific node. That seems rather odd - I'm unclear why the other nodes would be able to handle this query successfully?

@tgrabiec
Copy link
Contributor

tgrabiec commented Aug 5, 2020

@alienth Can you paste the log with a trace-level enabled on the sstable logger for a query which passes?

One hypothesis I have is that the nodes which can handle the read have the data fully in cache, which is not as prone to out of order ranges.

avikivity pushed a commit that referenced this issue Aug 8, 2020
The column names in SlicePredicate can be passed in arbitrary order.
We converted them to clustering ranges in read_command preserving the
original order. As a result, the clustering ranges in read command may
appear out of order. This violates storage engine's assumptions and
lead to undefined behavior.

It was seen manifesting as a SIGSEGV or an abort in sstable reader
when executing a get_slice() thrift verb:

scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed.

Fixes #6486.

Tests:

   - added a new dtest to thrift_tests.py which reproduces the problem

Message-Id: <1596725657-15802-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit bfd129c)
avikivity pushed a commit that referenced this issue Aug 8, 2020
The column names in SlicePredicate can be passed in arbitrary order.
We converted them to clustering ranges in read_command preserving the
original order. As a result, the clustering ranges in read command may
appear out of order. This violates storage engine's assumptions and
lead to undefined behavior.

It was seen manifesting as a SIGSEGV or an abort in sstable reader
when executing a get_slice() thrift verb:

scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed.

Fixes #6486.

Tests:

   - added a new dtest to thrift_tests.py which reproduces the problem

Message-Id: <1596725657-15802-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit bfd129c)
avikivity pushed a commit that referenced this issue Aug 8, 2020
The column names in SlicePredicate can be passed in arbitrary order.
We converted them to clustering ranges in read_command preserving the
original order. As a result, the clustering ranges in read command may
appear out of order. This violates storage engine's assumptions and
lead to undefined behavior.

It was seen manifesting as a SIGSEGV or an abort in sstable reader
when executing a get_slice() thrift verb:

scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed.

Fixes #6486.

Tests:

   - added a new dtest to thrift_tests.py which reproduces the problem

Message-Id: <1596725657-15802-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit bfd129c)
@avikivity
Copy link
Member

Bacported to 4.0, 4.1, 4.2.

@tzach tzach added this to the 4.2 milestone Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants