Coredump during read-only workload #3830

glommer · 2018-10-08T15:10:58Z

I am running the 3.0 branch with patches from avi on top (per-user SLA).
However, it doesn't seem to me that this coredump has anything to do with per-user SLA and seem like it's happening because of 3.0 code

[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /home/sylla/scylla/seastar/util/backtrace.hh:56
seastar::backtrace_buffer::append_backtrace() at /home/sylla/scylla/seastar/core/reactor.cc:410
 (inlined by) print_with_backtrace at /home/sylla/scylla/seastar/core/reactor.cc:431
seastar::print_with_backtrace(char const*) at /home/sylla/scylla/seastar/core/reactor.cc:438
sigabrt_action at /home/sylla/scylla/seastar/core/reactor.cc:4023
 (inlined by) operator() at /home/sylla/scylla/seastar/core/reactor.cc:4005
 (inlined by) _FUN at /home/sylla/scylla/seastar/core/reactor.cc:4001
_L_unlock_13 at funlockfile.c:?
__GI_raise at :?
__GI_abort at :?
__assert_fail_base at :?
__GI___assert_fail at :?
schedule<seastar::future<T>::then_wrapped(Func&&) [with Func = read_context::stop()::<lambda(seastar::shard_id, read_context::dismantling_state)>::<lambda(seastar::future<stopped_foreign_reader>&&)>; Result = seastar::future<>; T = {stopped_foreign_reader}]::<lambda(auto:2&&)> > at /home/sylla/scylla/./seastar/core/future.hh:765
 (inlined by) then_wrapped<read_context::stop()::<lambda(seastar::shard_id, read_context::dismantling_state)>::<lambda(seastar::future<stopped_foreign_reader>&&)> > at /home/sylla/scylla/./seastar/core/future.hh:1003
 (inlined by) operator() at /home/sylla/scylla/multishard_mutation_query.cc:368
 (inlined by) read_context::stop() at /home/sylla/scylla/multishard_mutation_query.cc:380
operator() at /home/sylla/scylla/multishard_mutation_query.cc:686
 (inlined by) apply<do_query_mutations(seastar::distributed<database>&, schema_ptr, const query::read_command&, const partition_range_vector&, tracing::trace_state_ptr, seastar::lowres_clock::time_point, query::result_memory_accounter&&)::<lambda(std::unique_ptr<read_context, std::default_delete<read_context> >&)> mutable::<lambda()> mutable::<lambda()>&> at /home/sylla/scylla/./seastar/core/future.hh:1399
 (inlined by) operator() at /home/sylla/scylla/./seastar/core/future.hh:1082
apply<seastar::future<reconcilable_result>::finally_body<do_query_mutations(seastar::distributed<database>&, schema_ptr, const query::read_command&, const partition_range_vector&, tracing::trace_state_ptr, seastar::lowres_clock::time_point, query::result_memory_accounter&&)::<lambda(std::unique_ptr<read_context, std::default_delete<read_context> >&)> mutable::<lambda()> mutable::<lambda()>, true>, seastar::future<reconcilable_result> > at /home/sylla/scylla/./seastar/core/future.hh:1399
 (inlined by) operator()<seastar::future_state<reconcilable_result> > at /home/sylla/scylla/./seastar/core/future.hh:1004
 (inlined by) run_and_dispose at /home/sylla/scylla/./seastar/core/future.hh:414
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /home/sylla/scylla/seastar/core/reactor.cc:2694
seastar::reactor::run_some_tasks() at /home/sylla/scylla/seastar/core/reactor.cc:3117
seastar::reactor::run_some_tasks() at /home/sylla/scylla/seastar/util/log.hh:313
 (inlined by) seastar::reactor::run() at /home/sylla/scylla/seastar/core/reactor.cc:3264
seastar::smp::configure(boost::program_options::variables_map)::{lambda()#3}::operator()() const at /home/sylla/scylla/seastar/core/reactor.cc:4333
std::function<void ()>::operator()() const at /opt/scylladb/include/c++/7/bits/std_function.h:706
 (inlined by) seastar::posix_thread::start_routine(void*) at /home/sylla/scylla/seastar/core/posix.cc:52
start_thread at pthread_create.c:?

The coredump happens when I am running a cassandra-stress command (from the 3.0 scylla-tools branch, that support fixed mode - so far it doesn't seem to reproduce without it)

The command is:

$STRESS read duration=3h -mode cql3 native user=cassandra password=cassandra \
	 -rate threads=400 fixed=300000/s  -pop 'dist=uniform(1..393216000)' \
	 -col n='FIXED(1)' size='FIXED(4096)' -node 40.40.40.1

In parallel to that, I am doing a full table scan with high parallelism. The parallelism is high enough that some of the cassandra-stress queries time out.

After a couple of them timeout, Scylla crashes.

I have the coredump and access to the box if anyone wants to take a look

The text was updated successfully, but these errors were encountered:

duarten · 2018-10-08T17:01:09Z

Seems like an issue in multishard_mutation_query. cc @denesb

denesb · 2018-10-09T07:16:31Z

I'm looking at it.

denesb · 2018-10-09T07:19:56Z

@glommer are scans timing out? Is the scylla_database_multishard_query_failed_reader_stops non-zero?

denesb · 2018-10-09T07:24:27Z

One possible explanation: when stopping a shard reader fails it is left in dismantling_state. In the read_context::stop() (called from a finally block after the read finished) we attempt to clean up this reader and we try to wait on the reader_fut future that was already waited for (and have failed).

denesb · 2018-10-09T07:30:26Z

Regardless of whether this is the underlying cause, I'll send a patch for this, as it might as well cause an assert failure like this.

Currently, when stopping a reader fails, it simply won't be attempted to be saved, and it will be left in the `_readers` array as-is. This can lead to an assertion failure as the reader state will contain futures that were already waited upon, and that the cleanup code will attempt to wait on again. To prevent this, when stopping a reader fails, reset it to nonexistent state, so that the cleanup code doesn't attempt to do anything with it. Refs: #3830 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <a1afc1d3d74f196b772e6c218999c57c15ca05be.1539088164.git.bdenes@scylladb.com>

denesb · 2018-10-09T13:39:21Z

After a discussion with @glommer it seems that indeed this is the case. scylla_database_multishard_query_failed_reader_stops is ~2.5k on some shards.
Scans are also timing out. I think what is happening is that in some cases the scan can finish just in time, but while it is attempting to wait on pending read-ahead to save the readers, some of these readers time-out. This will not fail the read itself but the timed out reader won't be saved.

I did not expect to see values this high for these counters, but I guess we will see numbers like this on severely overloaded nodes.

Lesson learned: badness counters are good! :)

denesb · 2018-10-09T13:41:46Z

I sent a patch that should solve this, once it's in I we can close this issue.

denesb · 2018-10-10T06:56:27Z

Fix was commited as d467b51. This can be closed now.
(I didn't add a "Fixes" tag to the commit message as I wasn't sure at the time that it indeed fixes the root cause of this).

Currently, when stopping a reader fails, it simply won't be attempted to be saved, and it will be left in the `_readers` array as-is. This can lead to an assertion failure as the reader state will contain futures that were already waited upon, and that the cleanup code will attempt to wait on again. To prevent this, when stopping a reader fails, reset it to nonexistent state, so that the cleanup code doesn't attempt to do anything with it. Refs: #3830 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <a1afc1d3d74f196b772e6c218999c57c15ca05be.1539088164.git.bdenes@scylladb.com> (cherry picked from commit d467b51)

duarten assigned denesb Oct 8, 2018

slivne added type/bug showstopper labels Oct 8, 2018

slivne added this to the 3.0 milestone Oct 8, 2018

duarten closed this as completed Oct 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coredump during read-only workload #3830

Coredump during read-only workload #3830

glommer commented Oct 8, 2018

duarten commented Oct 8, 2018

denesb commented Oct 9, 2018

denesb commented Oct 9, 2018

denesb commented Oct 9, 2018 •

edited

denesb commented Oct 9, 2018

denesb commented Oct 9, 2018

denesb commented Oct 9, 2018 •

edited

denesb commented Oct 10, 2018

Coredump during read-only workload #3830

Coredump during read-only workload #3830

Comments

glommer commented Oct 8, 2018

duarten commented Oct 8, 2018

denesb commented Oct 9, 2018

denesb commented Oct 9, 2018

denesb commented Oct 9, 2018 • edited

denesb commented Oct 9, 2018

denesb commented Oct 9, 2018

denesb commented Oct 9, 2018 • edited

denesb commented Oct 10, 2018

denesb commented Oct 9, 2018 •

edited

denesb commented Oct 9, 2018 •

edited