Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceptional future ignored: seastar::gate_closed_exception in sstable_compaction_test.simple_backlog_controller_test #15211

Closed
bhalevy opened this issue Aug 29, 2023 · 12 comments · Fixed by #15213
Assignees
Labels
area/compaction P1 Urgent symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework
Milestone

Comments

@bhalevy
Copy link
Member

bhalevy commented Aug 29, 2023

Seen in https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/6449/artifact/testlog/x86_64/release/boost.sstable_compaction_test.simple_backlog_controller_test.1.log

INFO  2023-08-29 11:03:59,235 [shard 0] compaction_manager - Asked to stop
INFO  2023-08-29 11:03:59,235 [shard 0] compaction_manager - Stopping 2 tasks for 1 ongoing compactions due to shutdown
INFO  2023-08-29 11:03:59,235 [shard 0] task_manager - Stopping module compaction
INFO  2023-08-29 11:03:59,235 [shard 0] task_manager - Unregistered module compaction
WARN  2023-08-29 11:03:59,236 [shard 0] sstable - Unable to delete /jenkins/workspace/scylla-master/next/scylla/testlog/x86_64/release/scylla-3e5a0b0d-d05b-4da7-8711-eb6f72b602b6/me-669-big-TOC.txt because it doesn't exist.
INFO  2023-08-29 11:03:59,236 [shard 0] compaction - [Compact ks.9be1b3b2-4642-11ee-aea2-9a60db666a28 9be58440-4642-11ee-aea2-9a60db666a28] Compacting [/jenkins/workspace/scylla-master/next/scylla/testlog/x86_64/release/scylla-3e5a0b0d-d05b-4da7-8711-eb6f72b602b6/me-659-big-Data.db:level=1:origin=]
WARN  2023-08-29 11:03:59,236 [shard 0] seastar - Exceptional future ignored: seastar::gate_closed_exception (gate closed), backtrace: 0x4aff2be 0x4aff880 0x4affb58 0x46ff87b 0x14e0b17 0x14e15c5 0x473728f 0x47384c7 0x4737839 0x4bbb9b7 0x4bbab6c 0x4bb6ce2 0x471c2ca /lib64/libc.so.6+0x8c906 /lib64/libc.so.6+0x11286f
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::shared_future<>::shared_state::get_future(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::{lambda(seastar::future<void>&&)#1}, seastar::future<void>::then_wrapped_nrvo<void, seastar::shared_future<>::shared_state::get_future(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::{lambda(seastar::future<void>&&)#1}>(seastar::shared_future<>::shared_state::get_future(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::{lambda(seastar::future<void>&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::shared_future<>::shared_state::get_future(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::{lambda(seastar::future<void>&&)#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
...
test/boost/sstable_compaction_test.cc(4441): Leaving test case "simple_backlog_controller_test"; testing time: 849725us
test/boost/sstable_compaction_test.cc(4559): Test case "test_compaction_strategy_cleanup_method" is skipped because disabled
test/boost/sstable_compaction_test.cc(4648): Test case "test_large_partition_splitting_on_compaction" is skipped because disabled
test/boost/sstable_compaction_test.cc(4782): Test case "check_table_sstable_set_includes_maintenance_sstables" is skipped because disabled
test/boost/sstable_compaction_test.cc(4803): Test case "compaction_manager_stop_and_drain_race_test" is skipped because disabled
test/boost/sstable_compaction_test.cc(4821): Test case "test_print_shared_sstables_vector" is skipped because disabled
test/boost/sstable_compaction_test.cc(4846): Test case "tombstone_gc_disabled_test" is skipped because disabled
test/boost/sstable_compaction_test.cc(4940): Test case "compaction_optimization_to_avoid_bloom_filter_checks" is skipped because disabled
test/boost/sstable_compaction_test.cc(4988): Test case "cleanup_incremental_compaction_test" is skipped because disabled
test/boost/sstable_compaction_test.cc(5086): Test case "cleanup_during_offstrategy_incremental_compaction_test" is skipped because disabled
test/boost/sstable_compaction_test.cc(5181): Test case "test_sstables_excluding_staging_correctness" is skipped because disabled
Leaving test module "Master Test Suite"; testing time: 850063us
*** 1 abandoned failed future(s) detected
Failing the test because fail was requested by --fail-on-abandoned-failed-futures

This is reproducible locally for me with high repeat count.

Decoded:

[Backtrace #0]
void seastar::backtrace<seastar::current_backtrace_tasklocal()::$_0>(seastar::current_backtrace_tasklocal()::$_0&&) at ./build/dev/seastar/./seastar/include/seastar/util/backtrace.hh:64
 (inlined by) seastar::current_backtrace_tasklocal() at ./build/dev/seastar/./seastar/src/util/backtrace.cc:98
seastar::current_tasktrace() at ./build/dev/seastar/./seastar/src/util/backtrace.cc:149
seastar::current_backtrace() at ./build/dev/seastar/./seastar/src/util/backtrace.cc:182
seastar::report_failed_future(std::__exception_ptr::exception_ptr const&) at ./build/dev/seastar/./seastar/src/core/future.cc:216
void seastar::internal::promise_base::set_exception_impl<std::__exception_ptr::exception_ptr>(std::__exception_ptr::exception_ptr&&) at ././seastar/include/seastar/core/future.hh:806
 (inlined by) seastar::internal::promise_base::set_exception(std::__exception_ptr::exception_ptr&&) at ././seastar/include/seastar/core/future.hh:815
 (inlined by) seastar::internal::promise_base::set_exception(std::__exception_ptr::exception_ptr const&) at ././seastar/include/seastar/core/future.hh:819
seastar::promise<void>::set_exception(std::__exception_ptr::exception_ptr const&) at ././seastar/include/seastar/core/future.hh:982
 (inlined by) seastar::shared_future<>::shared_state::resolve(seastar::future<void>&&) at ././seastar/include/seastar/core/shared_future.hh:154
operator() at ././seastar/include/seastar/core/shared_future.hh:190
 (inlined by) seastar::future<void> seastar::futurize<void>::invoke<seastar::shared_future<>::shared_state::get_future(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::{lambda(seastar::future<void>&&)#1}&, seastar::future<void> >(seastar::shared_future<>::shared_state::get_future(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::{lambda(seastar::future<void>&&)#1}&, seastar::future<void>&&) at ././seastar/include/seastar/core/future.hh:2003
 (inlined by) operator() at ././seastar/include/seastar/core/future.hh:1523
 (inlined by) seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>::direct_vtable_for<seastar::future<void>::then_wrapped_maybe_erase<false, void, seastar::shared_future<>::shared_state::get_future(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::{lambda(seastar::future<void>&&)#1}>(seastar::shared_future<>::shared_state::get_future(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::{lambda(seastar::future<void>&&)#1}&&)::{lambda(seastar::future<void>&&)#1}>::call(seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)> const*, seastar::future<void>&&) at ././seastar/include/seastar/util/noncopyable_function.hh:129
seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>::operator()(seastar::future<void>&&) const at ././seastar/include/seastar/util/noncopyable_function.hh:215
 (inlined by) operator() at ././seastar/include/seastar/core/future.hh:1539
 (inlined by) void seastar::futurize<seastar::future<void> >::satisfy_with_result_of<seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)> >(seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>&, seastar::future_state<seastar::internal::monostate>&&)#1}::operator()(seastar::internal::promise_base_with_type<void>&&, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>&, seastar::future_state<seastar::internal::monostate>&&) const::{lambda()#1}>(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)> >(seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>&, seastar::future_state<seastar::internal::monostate>&&)#1}::operator()(seastar::internal::promise_base_with_type<void>&&, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>&, seastar::future_state<seastar::internal::monostate>&&) const::{lambda()#1}&&) at ././seastar/include/seastar/core/future.hh:1991
operator() at ././seastar/include/seastar/core/future.hh:1538
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)> >(seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>::run_and_dispose() at ././seastar/include/seastar/core/future.hh:741
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/dev/seastar/./seastar/src/core/reactor.cc:2597
 (inlined by) seastar::reactor::run_some_tasks() at ./build/dev/seastar/./seastar/src/core/reactor.cc:3060
seastar::reactor::do_run() at ./build/dev/seastar/./seastar/src/core/reactor.cc:3229
seastar::reactor::run() at ./build/dev/seastar/./seastar/src/core/reactor.cc:3112
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ./build/dev/seastar/./seastar/src/core/app-template.cc:276
seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at ./build/dev/seastar/./seastar/src/core/app-template.cc:167
operator() at ./build/dev/seastar/./seastar/src/testing/test_runner.cc:75
 (inlined by) void std::__invoke_impl<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>(std::__invoke_other, seastar::testing::test_runner::start_thread(int, char**)::$_0&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>, void>::type std::__invoke_r<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>(seastar::testing::test_runner::start_thread(int, char**)::$_0&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:111
 (inlined by) std::_Function_handler<void (), seastar::testing::test_runner::start_thread(int, char**)::$_0>::_M_invoke(std::_Any_data const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:290
std::function<void ()>::operator()() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591
 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/dev/seastar/./seastar/src/core/posix.cc:90
/lib64/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=245240a31888ad5c11bbc55b18e02d87388f59a9, for GNU/Linux 3.2.0, not stripped

start_thread at ??:?
__clone3 at :?

I suspect this might have to do with the use of shared_future for compaction tasks.
Cc @Deexie

@bhalevy bhalevy added area/compaction symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework labels Aug 29, 2023
@raphaelsc
Copy link
Member

dup of #15162?

@bhalevy
Copy link
Member Author

bhalevy commented Aug 29, 2023

Yes, though I like the information in this report better :)

@bhalevy
Copy link
Member Author

bhalevy commented Aug 29, 2023

Pick the best one from your perspective

@raphaelsc
Copy link
Member

Yes, though I like the information in this report better :)

Indeed, the original is not always better :-) Will close that one.

@bhalevy
Copy link
Member Author

bhalevy commented Aug 29, 2023

I produced a coredump with:

diff --git a/src/core/future.cc b/src/core/future.cc
index c0373654..1d19285b 100644
--- a/src/core/future.cc
+++ b/src/core/future.cc
@@ -213,7 +213,7 @@ void future_state_base::rethrow_exception() const& {
 
 void report_failed_future(const std::exception_ptr& eptr) noexcept {
     ++engine()._abandoned_failed_futures;
-    seastar_logger.warn("Exceptional future ignored: {}, backtrace: {}", eptr, current_backtrace());
+    on_fatal_internal_error(seastar_logger, format("Exceptional future ignored: {}", eptr));
 }
 
 void report_failed_future(const future_state_base& state) noexcept {

@bhalevy bhalevy added the P1 Urgent label Aug 29, 2023
@bhalevy
Copy link
Member Author

bhalevy commented Aug 29, 2023

This change seems to be required and sufficient:

diff --git a/test/boost/sstable_compaction_test.cc b/test/boost/sstable_compaction_test.cc
index eb8911fbba..64f62b6edf 100644
--- a/test/boost/sstable_compaction_test.cc
+++ b/test/boost/sstable_compaction_test.cc
@@ -4454,12 +4454,14 @@ SEASTAR_TEST_CASE(simple_backlog_controller_test) {
         auto as = abort_source();
 
         auto task_manager = tasks::task_manager({}, as);
+        auto stop_task_manager = deferred_stop(task_manager);
         compaction_manager::config cfg = {
             .compaction_sched_group = { default_scheduling_group() },
             .maintenance_sched_group = { default_scheduling_group() },
             .available_memory = available_memory,
         };
         auto manager = compaction_manager(std::move(cfg), as, task_manager);
+        auto stop_manager = deferred_stop(manager);
 
         auto add_sstable = [&env] (table_for_tests& t, uint64_t data_size, int level) {
             auto sst = env.make_sstable(t.schema());

@bhalevy
Copy link
Member Author

bhalevy commented Aug 29, 2023

Nope, still happens.

@bhalevy
Copy link
Member Author

bhalevy commented Aug 29, 2023

Looks like the gate_close_exception originates in task_manager::task::start
when it calls

(void)with_gate(_impl->_module->async_gate(), [f = done(), module = _impl->_module, id = id()] () mutable {

and
_impl->finish_failed(std::current_exception());

in the catch clause.

bhalevy added a commit to bhalevy/scylla that referenced this issue Aug 29, 2023
Passing the gate_closed_exception to the task promise
ends up with abandoned exception since no-one is waiting
for it.

Instead, let the exception be thrown from start().

Fixes scylladb#15211

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Aug 30, 2023
Passing the gate_closed_exception to the task promise
ends up with abandoned exception since no-one is waiting
for it.

Instead, let the exception be thrown from start().

Fixes scylladb#15211

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Aug 30, 2023
Passing the gate_closed_exception to the task promise in start()
ends up with abandoned exception since no-one is waiting
for it.

Instead, enter the gate when the task is made
so it will fail make_task if the gate is already closed.

Fixes scylladb#15211

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Aug 31, 2023
Passing the gate_closed_exception to the task promise in start()
ends up with abandoned exception since no-one is waiting
for it.

Instead, enter the gate when the task is made
so it will fail make_task if the gate is already closed.

Fixes scylladb#15211

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
@DoronArazii DoronArazii added this to the 5.4 milestone Sep 3, 2023
bhalevy added a commit to bhalevy/scylla that referenced this issue Sep 4, 2023
Passing the gate_closed_exception to the task promise in start()
ends up with abandoned exception since no-one is waiting
for it.

Instead, enter the gate when the task is made
so it will fail make_task if the gate is already closed.

Fixes scylladb#15211

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
nyh added a commit that referenced this issue Sep 6, 2023
…reated' from Benny Halevy

Passing the gate_closed_exception to the task promise
ends up with abandoned exception since no-one is waiting
for it.

Instead, enter the gate when the task is made
so it will fail make_task if the gate is already closed.

Fixes #15211

In addition, this series adds a private abort_source for each task_manager module
(chained to the main task_manager::abort_source) and abort is requested on task_manager::module::stop().

gate holding in compaction_manager is hardened
and makes sure to stop compaction_manager and task_manager in sstable_compaction_test cases.

Closes #15213

* github.com:scylladb/scylladb:
  compaction_manager: stop: close compaction_state:s gates
  compaction_manager: gracefully handle gate close
  task_manager: task: start: fixup indentation
  task_manager: module: make_task: enter gate when the task is created
  task_manaer: module: stop: request abort
  task_manager: task::impl: subscribe to module about_source
  test: compaction_manager_stop_and_drain_race_test: stop compaction and task managers
  test: simple_backlog_controller_test: stop compaction and task managers
@bhalevy
Copy link
Member Author

bhalevy commented Oct 19, 2023

@raphaelsc / @Deexie: which branches require backport of the fix?

@Deexie
Copy link
Contributor

Deexie commented Oct 19, 2023

@raphaelsc / @Deexie: which branches require backport of the fix?

All containing task manager, that is from 5.2 (I don't know if those should be considered but from the PR introducting bug: scylla-5.3.0-rc0 scylla-5.2.9 scylla-5.2.8 scylla-5.2.7 scylla-5.2.6 scylla-5.2.5 scylla-5.2.4 scylla-5.2.3 scylla-5.2.2 scylla-5.2.1 scylla-5.2.0 scylla-5.2.0-rc5 scylla-5.2.0-rc4 scylla-5.2.0-rc3 scylla-5.2.0-rc2 scylla-5.2.0-rc1 scylla-5.2.0-rc0)

@avikivity
Copy link
Member

@Deexie it's only necessary to name the branch (5.2), not patch releases on that branch.

avikivity pushed a commit that referenced this issue Nov 30, 2023
Passing the gate_closed_exception to the task promise in start()
ends up with abandoned exception since no-one is waiting
for it.

Instead, enter the gate when the task is made
so it will fail make_task if the gate is already closed.

Fixes #15211

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit f9a7635)
@avikivity
Copy link
Member

Backported to 5.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/compaction P1 Urgent symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants