667592 ms stall in ~data_sync_repair_task_impl #15095

bhalevy · 2023-08-20T12:18:23Z

Seen in https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/6417/artifact/dtest/logs-full.release.011/1692525345967_nodetool_additional_test.py%3A%3ATestNodetool%3A%3Atest_concurrent_drain/node2.log

Scylla version 5.4.0~dev-0.20230820.bd8bb5d4b157 with build-id ccf18ab6bae7cde0ba71696466ec93f6d1ab8a43 starting ...

INFO  2023-08-20 09:32:39,430 [shard 0] init - Scylla version 5.4.0~dev-0.20230820.bd8bb5d4b157 initialization completed.
Reactor stalled for 667592 ms on shard 0. Backtrace: 0x58be0fa 0x58bd505 0x58be8af 0x3db6f 0x4177f4f 0x419c513 0x1bb7af3 0x41a3f15 0x46aca38 0x58cf59f 0x58d07d7 0x58cfb49 0x5873097 0x587224c 0x127cf3f 0x127e9b0 0x127b4ce 0x27b49 0x27c0a 0x1279264
INFO  2023-08-20 09:55:29,072 [shard 0] gossip - failure_detector_loop: Mark node 127.0.48.1 as DOWN
INFO  2023-08-20 09:55:29,072 [shard 0] gossip - InetAddress 127.0.48.1 is now DOWN, status = NORMAL
INFO  2023-08-20 09:55:29,080 [shard 0] gossip - InetAddress 127.0.48.1 is now UP, status = NORMAL
ERROR 2023-08-20 09:55:39,025 [shard 0] service_level_controller - update_from_distributed_data: failed to update configuration for more than  1350 seconds : exceptions::read_timeout_exception (Operation timed out for system_distributed.service_levels - received only 0 responses from 1 CL=ONE.)
INFO  2023-08-20 09:55:42,952 [shard 0] compaction_manager - Asked to stop

Decoded:

Reactor stalled for 667592 ms on shard 0. Backtrace:
[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:64
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at ./build/release/seastar/./seastar/src/core/reactor.cc:832
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:851
seastar::internal::cpu_stall_detector::generate_trace() at ./build/release/seastar/./seastar/src/core/reactor.cc:1421
seastar::internal::cpu_stall_detector::maybe_report() at ./build/release/seastar/./seastar/src/core/reactor.cc:1158
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1178
 (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1458
?? ??:0
repair::data_sync_repair_task_impl::is_abortable() const at repair.cc:?
~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:492
 (inlined by) ~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:492
~shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:570
 (inlined by) ~task at ././tasks/task_manager.hh:75
 (inlined by) seastar::internal::lw_shared_ptr_accessors_esft<tasks::task_manager::task>::dispose(tasks::task_manager::task*) at ././seastar/include/seastar/core/shared_ptr.hh:212
 (inlined by) seastar::internal::lw_shared_ptr_accessors_esft<tasks::task_manager::task>::dispose(seastar::lw_shared_ptr_counter_base*) at ././seastar/include/seastar/core/shared_ptr.hh:209
 (inlined by) ~lw_shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:325
 (inlined by) seastar::lw_shared_ptr<tasks::task_manager::task>::operator=(seastar::lw_shared_ptr<tasks::task_manager::task>&&) at ././seastar/include/seastar/core/shared_ptr.hh:338
 (inlined by) seastar::lw_shared_ptr<tasks::task_manager::task>::operator=(decltype(nullptr)) at ././seastar/include/seastar/core/shared_ptr.hh:344
 (inlined by) seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >::destroy_on(seastar::lw_shared_ptr<tasks::task_manager::task>, unsigned int) at ././seastar/include/seastar/core/sharded.hh:875
 (inlined by) seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >::destroy(seastar::lw_shared_ptr<tasks::task_manager::task>, unsigned int) at ././seastar/include/seastar/core/sharded.hh:859
 (inlined by) ~foreign_ptr at ././seastar/include/seastar/core/sharded.hh:903
 (inlined by) void std::destroy_at<seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> > >(seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/stl_construct.h:88
 (inlined by) void std::_Destroy<seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> > >(seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/stl_construct.h:149
 (inlined by) void std::_Destroy_aux<false>::__destroy<seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*>(seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*, seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/stl_construct.h:163
 (inlined by) void std::_Destroy<seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*>(seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*, seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/stl_construct.h:195
 (inlined by) void std::_Destroy<seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*, seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> > >(seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*, seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*, std::allocator<seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> > >&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/alloc_traits.h:947
 (inlined by) ~vector at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/stl_vector.h:732
 (inlined by) ~impl at ././tasks/task_manager.hh:118
~data_sync_repair_task_impl at ././repair/task_manager_module.hh:70
 (inlined by) ~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:492
 (inlined by) ~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:492
~shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:570
 (inlined by) ~task at ./tasks/task_manager.hh:75
 (inlined by) seastar::internal::lw_shared_ptr_accessors_esft<tasks::task_manager::task>::dispose(tasks::task_manager::task*) at ././seastar/include/seastar/core/shared_ptr.hh:212
 (inlined by) seastar::internal::lw_shared_ptr_accessors_esft<tasks::task_manager::task>::dispose(seastar::lw_shared_ptr_counter_base*) at ././seastar/include/seastar/core/shared_ptr.hh:209
 (inlined by) ~lw_shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:325
 (inlined by) ~pair at /usr/include/boost/container/detail/std_fwd.hpp:39
 (inlined by) void std::destroy_at<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> > >(std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >*) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/stl_construct.h:88
 (inlined by) void std::allocator_traits<std::allocator<std::__detail::_Hash_node<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >, false> > >::destroy<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> > >(std::allocator<std::__detail::_Hash_node<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >, false> >&, std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >*) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/alloc_traits.h:559
 (inlined by) std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >, false> > >::_M_deallocate_node(std::__detail::_Hash_node<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >, false>*) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/hashtable_policy.h:2020
 (inlined by) std::_Hashtable<utils::tagged_uuid<tasks::task_id_tag>, std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >, std::allocator<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> > >, std::__detail::_Select1st, std::equal_to<utils::tagged_uuid<tasks::task_id_tag> >, std::hash<utils::tagged_uuid<tasks::task_id_tag> >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >, false>*) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/hashtable.h:2338
 (inlined by) std::_Hashtable<utils::tagged_uuid<tasks::task_id_tag>, std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >, std::allocator<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> > >, std::__detail::_Select1st, std::equal_to<utils::tagged_uuid<tasks::task_id_tag> >, std::hash<utils::tagged_uuid<tasks::task_id_tag> >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(std::integral_constant<bool, true>, utils::tagged_uuid<tasks::task_id_tag> const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/hashtable.h:2381
 (inlined by) std::_Hashtable<utils::tagged_uuid<tasks::task_id_tag>, std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >, std::allocator<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> > >, std::__detail::_Select1st, std::equal_to<utils::tagged_uuid<tasks::task_id_tag> >, std::hash<utils::tagged_uuid<tasks::task_id_tag> >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::erase(utils::tagged_uuid<tasks::task_id_tag> const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/hashtable.h:984
 (inlined by) std::unordered_map<utils::tagged_uuid<tasks::task_id_tag>, seastar::lw_shared_ptr<tasks::task_manager::task>, std::hash<utils::tagged_uuid<tasks::task_id_tag> >, std::equal_to<utils::tagged_uuid<tasks::task_id_tag> >, std::allocator<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> > > >::erase(utils::tagged_uuid<tasks::task_id_tag> const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/unordered_map.h:770
 (inlined by) tasks::task_manager::unregister_task(utils::tagged_uuid<tasks::task_id_tag>) at ./tasks/task_manager.cc:411
 (inlined by) tasks::task_manager::module::unregister_task(utils::tagged_uuid<tasks::task_id_tag>) at ./tasks/task_manager.cc:292
 (inlined by) operator()<seastar::future<void> > at ./tasks/task_manager.cc:191
 (inlined by) operator() at ././seastar/include/seastar/core/future.hh:1539
 (inlined by) void seastar::futurize<void>::satisfy_with_result_of<seastar::future<void>::then_wrapped_nrvo<void, tasks::task_manager::task::start()::$_0::operator()()::{lambda(auto:1)#1}>(tasks::task_manager::task::start()::$_0::operator()()::{lambda(auto:1)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(auto:1)#1}&, seastar::future_state<auto:2&&::monostate>&&)#1}::operator()(seastar::internal::promise_base_with_type<void>, seastar::internal::promise_base_with_type<void>&&, seastar::future_state<tasks::task_manager::task::start()::$_0::operator()()::{lambda(auto:1)#1}&&::monostate>) const::{lambda()#1}>(seastar::internal::promise_base_with_type<void>, seastar::future<void>::then_wrapped_nrvo<void, tasks::task_manager::task::start()::$_0::operator()()::{lambda(auto:1)#1}>(tasks::task_manager::task::start()::$_0::operator()()::{lambda(auto:1)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(auto:1)#1}&, seastar::future_state<auto:2&&::monostate>&&)#1}::operator()(seastar::internal::promise_base_with_type<void>, seastar::internal::promise_base_with_type<void>&&, seastar::future_state<tasks::task_manager::task::start()::$_0::operator()()::{lambda(auto:1)#1}&&::monostate>) const::{lambda()#1}&&) at ././seastar/include/seastar/core/future.hh:1988
 (inlined by) operator() at ././seastar/include/seastar/core/future.hh:1538
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<void>, tasks::task_manager::task::start()::$_0::operator()()::{lambda(auto:1)#1}, seastar::future<void>::then_wrapped_nrvo<void, {lambda(auto:1)#1}>({lambda(auto:1)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(auto:1)#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>::run_and_dispose() at ././seastar/include/seastar/core/future.hh:741
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2597
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:3060
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3229
seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3112
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:276
seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:167
scylla_main(int, char**) at ./main.cc:631
std::function<int (int, char**)>::operator()(int, char**) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591
main at ./main.cc:1990
?? ??:0
?? ??:0
_start at ??:?

The text was updated successfully, but these errors were encountered:

bhalevy · 2023-08-20T12:21:15Z

For some reason there was a another long (but not THAT long) stall from a completely different code path on node1 at the same time. Could they be both relate to networking somehow?

On https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/6417/artifact/dtest/logs-full.release.011/1692525345967_nodetool_additional_test.py%3A%3ATestNodetool%3A%3Atest_concurrent_drain/node1.log

INFO  2023-08-20 09:32:27,050 [shard 0] init - Scylla version 5.4.0~dev-0.20230820.bd8bb5d4b157 initialization completed.
INFO  2023-08-20 09:32:27,051 [shard 0] standard_role_manager - Created default superuser role 'cassandra'.
INFO  2023-08-20 09:32:35,661 [shard 0] compaction - [Compact system.scylla_local 7f03d7d0-3f3c-11ee-bf9c-16e6881fd14d] Compacting [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8o_0qib_3wlq82wwcm8pyh2i4t-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8o_0qi2_5pbz52wwcm8pyh2i4t-big-Data.db:level=0:origin=compaction]
INFO  2023-08-20 09:32:35,661 [shard 0] gossip - InetAddress 127.0.48.2 is now UP, status = UNKNOWN
INFO  2023-08-20 09:32:35,668 [shard 0] compaction - [Compact system.scylla_local 7f03d7d0-3f3c-11ee-bf9c-16e6881fd14d] Compacted 2 sstables to [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8o_0qib_3xob52wwcm8pyh2i4t-big-Data.db:level=0]. 11kB to 6kB (~51% of original) in 5ms = 2MB/s. ~256 total partitions merged to 1.
INFO  2023-08-20 09:32:36,007 [shard 0] storage_service - bootstrap[67505c38-1320-4da0-a8a4-c19d4f529da3]: Added node=127.0.48.2 as bootstrap, coordinator=127.0.48.2
INFO  2023-08-20 09:32:36,662 [shard 0] compaction - [Compact system.scylla_local 7f9c9560-3f3c-11ee-bf9c-16e6881fd14d] Compacting [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8o_0qic_3wtg02wwcm8pyh2i4t-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8o_0qib_3xob52wwcm8pyh2i4t-big-Data.db:level=0:origin=compaction]
INFO  2023-08-20 09:32:36,670 [shard 0] cdc - Starting to use generation (2023/08/20 09:33:01, 74600efd-c5c4-4392-92c4-20dbe79a4fa5)
INFO  2023-08-20 09:32:36,674 [shard 0] compaction - [Compact system.scylla_local 7f9c9560-3f3c-11ee-bf9c-16e6881fd14d] Compacted 2 sstables to [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8o_0qic_3xw0x2wwcm8pyh2i4t-big-Data.db:level=0]. 11kB to 6kB (~51% of original) in 8ms = 1MB/s. ~256 total partitions merged to 1.
INFO  2023-08-20 09:32:36,674 [shard 0] compaction - [Compact system.cdc_local 7f9e6a20-3f3c-11ee-bf9c-16e6881fd14d] Compacting [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/cdc_local-0bcaffd40c833eadad13dc1d5015b77c/me-3g8o_0qic_3ybgg2wwcm8pyh2i4t-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/cdc_local-0bcaffd40c833eadad13dc1d5015b77c/me-3g8o_0qi2_5qek02wwcm8pyh2i4t-big-Data.db:level=0:origin=memtable]
INFO  2023-08-20 09:32:36,675 [shard 0] cdc - CDC description table successfully updated with generation (2023/08/20 09:33:01, 74600efd-c5c4-4392-92c4-20dbe79a4fa5).
INFO  2023-08-20 09:32:36,684 [shard 0] compaction - [Compact system.cdc_local 7f9e6a20-3f3c-11ee-bf9c-16e6881fd14d] Compacted 2 sstables to [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/cdc_local-0bcaffd40c833eadad13dc1d5015b77c/me-3g8o_0qic_40gm92wwcm8pyh2i4t-big-Data.db:level=0]. 68kB to 63kB (~92% of original) in 8ms = 8MB/s. ~256 total partitions merged to 2.
INFO  2023-08-20 09:32:39,388 [shard 0] storage_service - bootstrap[67505c38-1320-4da0-a8a4-c19d4f529da3]: Marked ops done from coordinator=127.0.48.2
INFO  2023-08-20 09:32:39,682 [shard 0] storage_service - Set host_id=7083fcf8-0d71-4ec2-b7dc-df9c3a6ec256 to be owned by node=127.0.48.2
INFO  2023-08-20 09:32:39,693 [shard 1] compaction - [Compact system.peers 816b13d0-3f3c-11ee-8390-16e7881fd14d] Compacting [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8o_0qi2_5tu0g200wvtzejyczx-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8o_0qif_4317k200wvtzejyczx-big-Data.db:level=0:origin=memtable]
INFO  2023-08-20 09:32:39,703 [shard 1] compaction - [Compact system.peers 816b13d0-3f3c-11ee-8390-16e7881fd14d] Compacted 2 sstables to [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8o_0qif_44qxs200wvtzejyczx-big-Data.db:level=0]. 17kB to 11kB (~67% of original) in 8ms = 2MB/s. ~256 total partitions merged to 1.
Reactor stalled for 13356 ms on shard 0. Backtrace: 0x58be0fa 0x58bd505 0x58be8af 0x3db6f 0x5c738e1 0x5c751a3 0x1568615 0x1465d08 0x3c35ee3 0x12f706a 0x58cf59f 0x58d07d7 0x58cfb49 0x5873097 0x587224c 0x127cf3f 0x127e9b0 0x127b4ce 0x27b49 0x27c0a 0x1279264
WARN  2023-08-20 09:55:29,071 [shard 0] gossip - failure_detector_loop: Send echo to node 127.0.48.2, status = failed: seastar::rpc::timeout_error (rpc call timed out)
INFO  2023-08-20 09:55:29,071 [shard 0] gossip - failure_detector_loop: Mark node 127.0.48.2 as DOWN
INFO  2023-08-20 09:55:29,073 [shard 0] gossip - InetAddress 127.0.48.2 is now DOWN, status = NORMAL
INFO  2023-08-20 09:55:29,080 [shard 0] gossip - InetAddress 127.0.48.2 is now UP, status = NORMAL
INFO  2023-08-20 09:55:42,952 [shard 0] compaction_manager - Asked to stop

Decoded:

Reactor stalled for 13356 ms on shard 0. Backtrace:
[Backtrace #1]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:64
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at ./build/release/seastar/./seastar/src/core/reactor.cc:832
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:851
seastar::internal::cpu_stall_detector::generate_trace() at ./build/release/seastar/./seastar/src/core/reactor.cc:1421
seastar::internal::cpu_stall_detector::maybe_report() at ./build/release/seastar/./seastar/src/core/reactor.cc:1158
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1178
 (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1458
?? ??:0
noncopyable_function at ./build/release/seastar/./seastar/include/seastar/util/noncopyable_function.hh:202
 (inlined by) seastar::noncopyable_function<void ()>::operator=(seastar::noncopyable_function<void ()>&&) at ./build/release/seastar/./seastar/include/seastar/util/noncopyable_function.hh:208
 (inlined by) seastar::timer<seastar::lowres_clock>::set_callback(seastar::scheduling_group, seastar::noncopyable_function<void ()>&&) at ./build/release/seastar/./seastar/include/seastar/core/timer.hh:141
 (inlined by) seastar::timer<seastar::lowres_clock>::set_callback(seastar::noncopyable_function<void ()>&&) at ./build/release/seastar/./seastar/include/seastar/core/timer.hh:147
 (inlined by) seastar::rpc::connection::send(seastar::rpc::snd_buf, std::optional<std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > >, seastar::rpc::cancellable*) at ./build/release/seastar/./seastar/src/rpc/rpc.cc:281
seastar::rpc::client::request(unsigned long, long, seastar::rpc::snd_buf, std::optional<std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > >, seastar::rpc::cancellable*) at ./build/release/seastar/./seastar/src/rpc/rpc.cc:655
send at ././seastar/include/seastar/rpc/rpc_impl.hh:478
operator() at ././seastar/include/seastar/rpc/rpc_impl.hh:490
 (inlined by) auto netw::send_message_timeout<void, std::chrono::duration<long, std::ratio<1l, 1000l> >, long&>(netw::messaging_service*, netw::messaging_verb, netw::msg_addr, std::chrono::duration<long, std::ratio<1l, 1000l> >, long&) at ././message/rpc_protocol_impl.hh:157
 (inlined by) netw::messaging_service::send_gossip_echo(netw::msg_addr, long, std::chrono::duration<long, std::ratio<1l, 1000l> >) at ./message/messaging_service.cc:1143
gms::gossiper::failure_detector_loop_for_node(gms::inet_address, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, unsigned long) at ./gms/gossiper.cc:839
std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/coroutine:240
 (inlined by) seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() at ././seastar/include/seastar/core/coroutine.hh:125
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2597
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:3060
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3229
seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3112
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:276
seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:167
scylla_main(int, char**) at ./main.cc:631
std::function<int (int, char**)>::operator()(int, char**) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591
main at ./main.cc:1990
?? ??:0
?? ??:0
_start at ??:?

bhalevy · 2023-08-20T12:21:31Z

@xemul are there known issues in that area?

avikivity · 2023-08-20T14:40:21Z

Only 667592 ms?

bhalevy · 2023-08-20T15:28:42Z

Only 667592 ms?

We need a contest!

bhalevy · 2023-08-20T15:30:37Z

@avikivity shouldn't we observe exponentially arger stalls as they are detected and the threshold is doubled each time?
Did something change on this area recently?

avikivity · 2023-08-20T15:34:50Z

667 seconds? that's 11 minutes.

It looks like an infinite loop, not a stall.

bhalevy · 2023-08-21T06:07:38Z

https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/249/artifact/logs-full.debug.041/1692521802039_compaction_additional_test.py%3A%3ATestCompactionAdditional%3A%3Atest_double_compaction_by_cleanup_and_major_compactions/node1.log
has 16_077_758 printouts of the following. Is it related?

INFO  2023-08-20 06:45:48,995 [shard 0] compaction_manager - Stopping 1 tasks for 0 ongoing compactions for table ks.cf compaction_group=0/1 due to user-triggered operation

bhalevy · 2023-08-21T06:09:47Z

https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/249/testReport/compaction_additional_test/TestLCSSSTablePromotion/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split047___test_lcs_sstable_promotion/

ccmlib.node.TimeoutError: Waiting for compactions timed out after 300 seconds with 24 pending tasks remaining.

could be related too

michoecho · 2023-08-21T14:54:25Z

The stall is also visible in other tests running on the same machine at the same time:

$ rg 'stalled for ...... ms' -C1

logs-full.release.011/1692525345967_nodetool_additional_test.py::TestNodetool::test_concurrent_drain/node2.log
470-INFO  2023-08-20 09:32:39,430 [shard 0] init - Scylla version 5.4.0~dev-0.20230820.bd8bb5d4b157 initialization completed.
471:Reactor stalled for 667592 ms on shard 0. Backtrace: 0x58be0fa 0x58bd505 0x58be8af 0x3db6f 0x4177f4f 0x419c513 0x1bb7af3 0x41a3f15 0x46aca38 0x58cf59f 0x58d07d7 0x58cfb49 0x5873097 0x587224c 0x127cf3f 0x127e9b0 0x127b4ce 0x27b49 0x27c0a 0x1279264
472-INFO  2023-08-20 09:55:29,072 [shard 0] gossip - failure_detector_loop: Mark node 127.0.48.1 as DOWN

logs-full.release.011/1692525334254_materialized_views_test.py::TestInterruptBuildProcess::test_interrupt_build_process_with_resharding_low_to_half_test/node3.log
1319-INFO  2023-08-20 09:32:46,362 [shard  0] view - Building view ks.t_by_v2, starting at token minimum token
1320:Reactor stalled for 115836 ms on shard 0. Backtrace: 0x58be0fa 0x58bd505 0x58be8af 0x3db6f 0x10c76 0x130bd 0x40557c9 0x3bf1c45 0x3b6a97a 0x3c00e33 0x3bfeb35 0x5b807c6
1321-INFO  2023-08-20 09:55:28,981 [shard  1] gossip - failure_detector_loop: Mark node 127.0.84.2 as DOWN

logs-full.release.011/1692525339776_materialized_views_test.py::TestMaterializedViews::test_mv_create_with_synchronous_updates/node3.log
884-INFO  2023-08-20 09:32:46,345 [shard 1] compaction - [Compact ks.cf 8552fc60-3f3c-11ee-98f5-f00dc960ae11] Compacted 2 sstables to [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-yqu_9hd8/test/node3/data/ks/cf-4bcb82003f3c11ee9c4af00cc960ae11/me-3g8o_0qim_1gxv42bqmtop5aapld-big-Data.db:level=0]. 238kB to 120kB (~50% of original) in 95ms = 2MB/s. ~768 total partitions merged to 365.
885:Reactor stalled for 372352 ms on shard 0. Backtrace: 0x58be0fa 0x58bd505 0x58be8af 0x3db6f 0x4e8c9 0x4055765 0x3bf1c45 0x3b6a97a 0x3c00e33 0x3bfeb35 0x5b807c6
886-INFO  2023-08-20 09:55:28,975 [shard 1] gossip - failure_detector_loop: Mark node 127.0.42.2 as DOWN

logs-full.release.011/1692525334254_materialized_views_test.py::TestInterruptBuildProcess::test_interrupt_build_process_with_resharding_low_to_half_test/node1.log
1253-INFO  2023-08-20 09:32:45,836 [shard  3] compaction - [Compact system.scylla_local 8512e620-3f3c-11ee-82a6-ed72dd346128] Compacted 2 sstables to [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-0c_i57nb/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8o_0qil_4x9681ziyo7dvtggmg-big-Data.db:level=0]. 11kB to 6kB (~51% of original) in 7ms = 1MB/s. ~256 total partitions merged to 1.
1254:Reactor stalled for 775504 ms on shard 0. Backtrace: 0x58be0fa 0x58bd505 0x58be8af 0x3db6f 0x437749f 0x42f8331 0x58cf59f 0x58d07d7 0x58cfb49 0x5873097 0x587224c 0x127cf3f 0x127e9b0 0x127b4ce 0x27b49 0x27c0a 0x1279264
1255-WARN  2023-08-20 09:55:28,977 [shard  1] gossip - failure_detector_loop: Send echo to node 127.0.84.3, status = failed: seastar::rpc::timeout_error (rpc call timed out)

So obviously it's a OS problem, not a Scylla problem.

avikivity · 2023-08-21T15:05:40Z

Maybe related to the infamous three-digit stalls?

avikivity · 2023-08-21T15:09:29Z

INFO  2023-08-20 09:32:39,430 [shard 0] init - Scylla version 5.4.0~dev-0.20230820.bd8bb5d4b157 initialization completed.
Reactor stalled for 667592 ms on shard 0. Backtrace: 0x58be0fa 0x58bd505 0x58be8af 0x3db6f 0x4177f4f 0x419c513 0x1bb7af3 0x41a3f15 0x46aca38 0x58cf59f 0x58d07d7 0x58cfb49 0x5873097 0x587224c 0x127cf3f 0x127e9b0 0x127b4ce 0x27b49 0x27c0a 0x1279264
INFO  2023-08-20 09:55:29,072 [shard 0] gossip - failure_detector_loop: Mark node 127.0.48.1 as DOWN

The two timestamps are 1370 seconds apart. That's a little over 2X the duration of the stall.

avikivity · 2023-08-21T15:10:41Z

I think it rules out games with CLOCK_THREAD_CPUTIME_ID, since realtime agrees something funny happened.

avikivity · 2023-08-21T15:13:12Z

https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/249/artifact/logs-full.debug.041/1692521802039_compaction_additional_test.py%3A%3ATestCompactionAdditional%3A%3Atest_double_compaction_by_cleanup_and_major_compactions/node1.log has 16_077_758 printouts of the following. Is it related?
INFO  2023-08-20 06:45:48,995 [shard 0] compaction_manager - Stopping 1 tasks for 0 ongoing compactions for table ks.cf compaction_group=0/1 due to user-triggered operation

Yes. Log spam -> syslog stalled -> everything blocked.

Does dtest/ccm run with scylladb logs directed to syslog? It shouldn't.

avikivity · 2023-08-21T15:13:55Z

@avikivity shouldn't we observe exponentially arger stalls as they are detected and the threshold is doubled each time? Did something change on this area recently?

Probably, everything is stalled so the stall detector itself isn't getting CPU time.

michoecho · 2023-08-21T15:15:56Z

INFO  2023-08-20 09:32:39,430 [shard 0] init - Scylla version 5.4.0~dev-0.20230820.bd8bb5d4b157 initialization completed.
Reactor stalled for 667592 ms on shard 0. Backtrace: 0x58be0fa 0x58bd505 0x58be8af 0x3db6f 0x4177f4f 0x419c513 0x1bb7af3 0x41a3f15 0x46aca38 0x58cf59f 0x58d07d7 0x58cfb49 0x5873097 0x587224c 0x127cf3f 0x127e9b0 0x127b4ce 0x27b49 0x27c0a 0x1279264
INFO  2023-08-20 09:55:29,072 [shard 0] gossip - failure_detector_loop: Mark node 127.0.48.1 as DOWN

The two timestamps are 1370 seconds apart. That's a little over 2X the duration of the stall.

That, and also the lengths of the stalls vary wildly across samples, even though the realtime "before" and "after" agrees in all cases. The stalls are: 667 s, 13 s, 115 s, 372 s, 775 s. (Each in a different process.)

michoecho · 2023-08-21T15:16:26Z

https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/249/artifact/logs-full.debug.041/1692521802039_compaction_additional_test.py%3A%3ATestCompactionAdditional%3A%3Atest_double_compaction_by_cleanup_and_major_compactions/node1.log has 16_077_758 printouts of the following. Is it related?
INFO  2023-08-20 06:45:48,995 [shard 0] compaction_manager - Stopping 1 tasks for 0 ongoing compactions for table ks.cf compaction_group=0/1 due to user-triggered operation
Yes. Log spam -> syslog stalled -> everything blocked.

Does dtest/ccm run with scylladb logs directed to syslog? It shouldn't.

But the log spam happens at a different time. The stall happens 1 hour after the log spam ends.

avikivity · 2023-08-21T15:18:10Z

https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/249/artifact/logs-full.debug.041/1692521802039_compaction_additional_test.py%3A%3ATestCompactionAdditional%3A%3Atest_double_compaction_by_cleanup_and_major_compactions/node1.log has 16_077_758 printouts of the following. Is it related?
INFO  2023-08-20 06:45:48,995 [shard 0] compaction_manager - Stopping 1 tasks for 0 ongoing compactions for table ks.cf compaction_group=0/1 due to user-triggered operation
Yes. Log spam -> syslog stalled -> everything blocked.
Does dtest/ccm run with scylladb logs directed to syslog? It shouldn't.
But the log spam happens at a different time. The stall happens 1 hour after the log spam ends.

Isn't there a rate limit on the number of confusing things in a single issue?

avikivity · 2023-08-21T15:20:17Z

Another option is a swapstorm more or less killing the node.

michoecho · 2023-08-21T15:28:34Z

https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/249/artifact/logs-full.debug.041/1692521802039_compaction_additional_test.py%3A%3ATestCompactionAdditional%3A%3Atest_double_compaction_by_cleanup_and_major_compactions/node1.log has 16_077_758 printouts of the following. Is it related?
INFO  2023-08-20 06:45:48,995 [shard 0] compaction_manager - Stopping 1 tasks for 0 ongoing compactions for table ks.cf compaction_group=0/1 due to user-triggered operation
Yes. Log spam -> syslog stalled -> everything blocked.
Does dtest/ccm run with scylladb logs directed to syslog? It shouldn't.
But the log spam happens at a different time. The stall happens 1 hour after the log spam ends.
Isn't there a rate limit on the number of confusing things in a single issue?

This fact doesn't seem very confusing — a natural explanation is that the log spam is unrelated to the stall.
It probably didn't even happen on the same machine. I don't know how dtests are set up, but note that the logs spam is in a different jenkins job (dtest-debug vs dtest) than than the stalls.

But 13 million INFO logs in a test is insane in its own right. This is obviously some bug that deserves its own fix.

avikivity · 2023-08-21T15:36:19Z

Yes please file one and assign to @Deexie

Deexie · 2023-08-21T15:45:43Z

How can I checkout to the exact version used in Jenkins?

The logs look pretty similar to #14966, I'm not sure yet if they are related anyhow.

michoecho · 2023-08-21T15:49:50Z

How can I checkout to the exact version used in Jenkins?

The logs look pretty similar to #14966, I'm not sure yet if they are related anyhow.

Scylla logs its git hash and build id when it starts. The version used in this run is d57a951, which predates the fix for #14966 (e9d9489). So this run could have been affected.

Deexie · 2023-08-22T10:39:47Z

How can I checkout to the exact version used in Jenkins?
The logs look pretty similar to #14966, I'm not sure yet if they are related anyhow.

Scylla logs its git hash and build id when it starts. The version used in this run is d57a951, which predates the fix for #14966 (e9d9489). So this run could have been affected.

Thanks! I checked it on current master and the problem does not appear

avikivity · 2023-08-22T10:48:52Z

A swapstorm wouldn't register on CLOCK_CPU_THREAD_ID. Though sometimes the kernel enlists a faulting thread to work to free memory.

Idea: make --overprovisioned (which should really be named --underprovisioned, or --overcommitted) disable transparent hugepages. I'll post a patch.

michoecho · 2023-08-22T11:06:15Z

A swapstorm wouldn't register on CLOCK_CPU_THREAD_ID. Though sometimes the kernel enlists a faulting thread to work to free memory.

Idea: make --overprovisioned (which should really be named --underprovisioned, or --overcommitted) disable transparent hugepages. I'll post a patch.

I'm out of my depth here. Is it really possible for a swapstorm to stall the process for 12 minutes? I understand that it can slow down everything to a crawl, but can it slow the process so much that it can't even handle a signal and look at the clock for minutes?

--overprovisioned (which should really be renamed --overcommitted or --underprovisioned) means that Seastar should sacrifice performance in favor of playing well with other applications on the same system. One such sacrifice can be avoiding transparent hugepages. Transparent hugepages can waste memory if only part of a hugepage is used, waste kernel CPU time trying to coalesce a huge page, and make it harder to swap. Be more friendly by avoiding transparent hugepages when --overprovisioned is specified. This is a little complicated since we allocate memory before we know whether --overprovisioned is specified or not (even before main() runs). Since it's only a small amount of memory, conservatively (from the performance point of view) assume we want transparent hugepages, and back down and undo if it later turns out we don't. I verified with strace that madvise() calls happen as expected. Ref scylladb/scylladb#15095.

avikivity · 2023-08-22T11:24:37Z

A swapstorm wouldn't register on CLOCK_CPU_THREAD_ID. Though sometimes the kernel enlists a faulting thread to work to free memory.
Idea: make --overprovisioned (which should really be named --underprovisioned, or --overcommitted) disable transparent hugepages. I'll post a patch.

I'm out of my depth here. Is it really possible for a swapstorm to stall the process for 12 minutes? I understand that it can slow down everything to a crawl, but can it slow the process so much that it can't even handle a signal and look at the clock for minutes?

It's possible but I admit it's unlikely. In the days of spinning disks it could easily happen, since 12 minutes = 72,000 seeks, so if there is that much activity before the code involved gets paged in then nothing happens. In those days reaching for the reset button was sometimes the only option.

In these days of sub-millisecond flash it's very rare.

--overprovisioned (which should really be renamed --overcommitted or --underprovisioned) means that Seastar should sacrifice performance in favor of playing well with other applications on the same system. One such sacrifice can be avoiding transparent hugepages. Transparent hugepages can waste memory if only part of a hugepage is used, waste kernel CPU time trying to coalesce a huge page, and make it harder to swap. Be more friendly by avoiding transparent hugepages when --overprovisioned is specified. This is a little complicated since we allocate memory before we know whether --overprovisioned is specified or not (even before main() runs). Since it's only a small amount of memory, conservatively (from the performance point of view) assume we want transparent hugepages, and back down and undo if it later turns out we don't. I verified with strace that madvise() calls happen as expected. Ref scylladb/scylladb#15095.

mykaul · 2023-08-22T13:51:26Z

A swapstorm wouldn't register on CLOCK_CPU_THREAD_ID. Though sometimes the kernel enlists a faulting thread to work to free memory.
Idea: make --overprovisioned (which should really be named --underprovisioned, or --overcommitted) disable transparent hugepages. I'll post a patch.

I'm out of my depth here. Is it really possible for a swapstorm to stall the process for 12 minutes? I understand that it can slow down everything to a crawl, but can it slow the process so much that it can't even handle a signal and look at the clock for minutes?

It's possible but I admit it's unlikely. In the days of spinning disks it could easily happen, since 12 minutes = 72,000 seeks, so if there is that much activity before the code involved gets paged in then nothing happens. In those days reaching for the reset button was sometimes the only option.

Even with rotating disks, rest their soul in peace, you had NCQ which would re-order those seeks to complete faster.

In these days of sub-millisecond flash it's very rare.

--overprovisioned (which should really be renamed --overcommitted or --underprovisioned) means that Seastar should sacrifice performance in favor of playing well with other applications on the same system. One such sacrifice can be avoiding transparent hugepages. Transparent hugepages can waste memory if only part of a hugepage is used, waste kernel CPU time trying to coalesce a huge page, and make it harder to swap. Be more friendly by avoiding transparent hugepages when --overprovisioned is specified. This is a little complicated since we allocate memory before we know whether --overprovisioned is specified or not (even before main() runs). Since it's only a small amount of memory, conservatively (from the performance point of view) assume we want transparent hugepages, and back down and undo if it later turns out we don't. I verified with strace that madvise() calls happen as expected. Ref scylladb/scylladb#15095. Closes #1796

* seastar 0784da876...6e80e84a3 (29): > Revert "shared_token_bucket: Make duration->tokens conversion more solid" > Merge 'chunked_fifo: let incremetal operator return iterator not basic_iterator' from Kefu Chai > memory: diable transparent hugepages if --overprovisioned is specified Ref scylladb#15095 > http/exception: s/<TAB>/ / > install-dependencies.sh: re-add protobuf > Merge 'Keep capacity on fair_queue_entry' from Pavel Emelyanov > Merge 'Fix server-side RPC stream shutdown' from Pavel Emelyanov Fixes scylladb#13100 > smp: make service management semaphore thread local > tls_test: abort_accept() after getting server socket > Merge 'Print more IO info with ioinfo app' from Pavel Emelyanov > rpc: Fix client-side stream registration race Ref scylladb#13100 > tests: perf: shard_token_bucket: avoid capturing unused variables in lambdas > build: pass -DBoost_NO_CXX98_FUNCTION_BASE to C++ compiler > reactor: Drop some dangling friend declarations > fair_queue: Do not re-evaluate request capacity twice > build: do not use serial number file when signing a cert > shared_token_bucket: Make duration->tokens conversion more solid > tests: Add perf test for shard_token_bucket > Merge 'Make make_file_impl() less yielding' from Pavel Emelyanov > fair_queue: Remove individual requests counting > reactor, linux-aio: print value of aio-max-nr on error > Merge 'build, net: disable implicit fallthough' from Kefu Chai > shared_token_bucket: Fix duration_for() underflow > rpc: Generalize get_stats_internal() method > doc/building-dpdk.md: fix invalid file path of README-DPDK.md > install-dependencies: add centos9 > Merge 'log: report scheduling group along with shard id' from Kefu Chai > dns: handle exception in do_sendv for udp > Merge 'Add a stall detector histogram' from Amnon Heiman

* seastar 0784da87...6e80e84a (29): > Revert "shared_token_bucket: Make duration->tokens conversion more solid" > Merge 'chunked_fifo: let incremetal operator return iterator not basic_iterator' from Kefu Chai > memory: diable transparent hugepages if --overprovisioned is specified Ref scylladb#15095 > http/exception: s/<TAB>/ / > install-dependencies.sh: re-add protobuf > Merge 'Keep capacity on fair_queue_entry' from Pavel Emelyanov > Merge 'Fix server-side RPC stream shutdown' from Pavel Emelyanov Fixes scylladb#13100 > smp: make service management semaphore thread local > tls_test: abort_accept() after getting server socket > Merge 'Print more IO info with ioinfo app' from Pavel Emelyanov > rpc: Fix client-side stream registration race Ref scylladb#13100 > tests: perf: shard_token_bucket: avoid capturing unused variables in lambdas > build: pass -DBoost_NO_CXX98_FUNCTION_BASE to C++ compiler > reactor: Drop some dangling friend declarations > fair_queue: Do not re-evaluate request capacity twice > build: do not use serial number file when signing a cert > shared_token_bucket: Make duration->tokens conversion more solid > tests: Add perf test for shard_token_bucket > Merge 'Make make_file_impl() less yielding' from Pavel Emelyanov > fair_queue: Remove individual requests counting > reactor, linux-aio: print value of aio-max-nr on error > Merge 'build, net: disable implicit fallthough' from Kefu Chai > shared_token_bucket: Fix duration_for() underflow > rpc: Generalize get_stats_internal() method > doc/building-dpdk.md: fix invalid file path of README-DPDK.md > install-dependencies: add centos9 > Merge 'log: report scheduling group along with shard id' from Kefu Chai > dns: handle exception in do_sendv for udp > Merge 'Add a stall detector histogram' from Amnon Heiman Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

* seastar 0784da87...6e80e84a (29): > Revert "shared_token_bucket: Make duration->tokens conversion more solid" > Merge 'chunked_fifo: let incremetal operator return iterator not basic_iterator' from Kefu Chai > memory: diable transparent hugepages if --overprovisioned is specified Ref #15095 > http/exception: s/<TAB>/ / > install-dependencies.sh: re-add protobuf > Merge 'Keep capacity on fair_queue_entry' from Pavel Emelyanov > Merge 'Fix server-side RPC stream shutdown' from Pavel Emelyanov Fixes #13100 > smp: make service management semaphore thread local > tls_test: abort_accept() after getting server socket > Merge 'Print more IO info with ioinfo app' from Pavel Emelyanov > rpc: Fix client-side stream registration race Ref #13100 > tests: perf: shard_token_bucket: avoid capturing unused variables in lambdas > build: pass -DBoost_NO_CXX98_FUNCTION_BASE to C++ compiler > reactor: Drop some dangling friend declarations > fair_queue: Do not re-evaluate request capacity twice > build: do not use serial number file when signing a cert > shared_token_bucket: Make duration->tokens conversion more solid > tests: Add perf test for shard_token_bucket > Merge 'Make make_file_impl() less yielding' from Pavel Emelyanov > fair_queue: Remove individual requests counting > reactor, linux-aio: print value of aio-max-nr on error > Merge 'build, net: disable implicit fallthough' from Kefu Chai > shared_token_bucket: Fix duration_for() underflow > rpc: Generalize get_stats_internal() method > doc/building-dpdk.md: fix invalid file path of README-DPDK.md > install-dependencies: add centos9 > Merge 'log: report scheduling group along with shard id' from Kefu Chai > dns: handle exception in do_sendv for udp > Merge 'Add a stall detector histogram' from Amnon Heiman Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #15218

avikivity · 2023-09-03T11:27:46Z

91cc544 should reduce the likelihood of a swapstorm

--overprovisioned (which should really be renamed --overcommitted or --underprovisioned) means that Seastar should sacrifice performance in favor of playing well with other applications on the same system. One such sacrifice can be avoiding transparent hugepages. Transparent hugepages can waste memory if only part of a hugepage is used, waste kernel CPU time trying to coalesce a huge page, and make it harder to swap. Be more friendly by avoiding transparent hugepages when --overprovisioned is specified. This is a little complicated since we allocate memory before we know whether --overprovisioned is specified or not (even before main() runs). Since it's only a small amount of memory, conservatively (from the performance point of view) assume we want transparent hugepages, and back down and undo if it later turns out we don't. I verified with strace that madvise() calls happen as expected. Ref scylladb/scylladb#15095. Closes scylladb#1796

bhalevy · 2024-03-03T06:19:45Z

This hasn't reproduced.
Closing until we have further evidence.

--overprovisioned (which should really be renamed --overcommitted or --underprovisioned) means that Seastar should sacrifice performance in favor of playing well with other applications on the same system. One such sacrifice can be avoiding transparent hugepages. Transparent hugepages can waste memory if only part of a hugepage is used, waste kernel CPU time trying to coalesce a huge page, and make it harder to swap. Be more friendly by avoiding transparent hugepages when --overprovisioned is specified. This is a little complicated since we allocate memory before we know whether --overprovisioned is specified or not (even before main() runs). Since it's only a small amount of memory, conservatively (from the performance point of view) assume we want transparent hugepages, and back down and undo if it later turns out we don't. I verified with strace that madvise() calls happen as expected. Ref scylladb/scylladb#15095. Closes scylladb#1796

bhalevy assigned Deexie Aug 20, 2023

avikivity mentioned this issue Aug 22, 2023

memory: diable transparent hugepages if --overprovisioned is specified scylladb/seastar#1796

Closed

xemul mentioned this issue Aug 30, 2023

Update seastar submodule #15218

Closed

bhalevy closed this as not planned Won't fix, can't repro, duplicate, stale Mar 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

667592 ms stall in ~data_sync_repair_task_impl #15095

667592 ms stall in ~data_sync_repair_task_impl #15095

bhalevy commented Aug 20, 2023

bhalevy commented Aug 20, 2023

bhalevy commented Aug 20, 2023

avikivity commented Aug 20, 2023

bhalevy commented Aug 20, 2023

bhalevy commented Aug 20, 2023 •

edited

avikivity commented Aug 20, 2023

bhalevy commented Aug 21, 2023

bhalevy commented Aug 21, 2023

michoecho commented Aug 21, 2023 •

edited

avikivity commented Aug 21, 2023

avikivity commented Aug 21, 2023

avikivity commented Aug 21, 2023

avikivity commented Aug 21, 2023

avikivity commented Aug 21, 2023

michoecho commented Aug 21, 2023 •

edited

michoecho commented Aug 21, 2023

avikivity commented Aug 21, 2023

avikivity commented Aug 21, 2023

michoecho commented Aug 21, 2023 •

edited

avikivity commented Aug 21, 2023

Deexie commented Aug 21, 2023

michoecho commented Aug 21, 2023

Deexie commented Aug 22, 2023

avikivity commented Aug 22, 2023

michoecho commented Aug 22, 2023

avikivity commented Aug 22, 2023

mykaul commented Aug 22, 2023

avikivity commented Sep 3, 2023

bhalevy commented Mar 3, 2024

667592 ms stall in ~data_sync_repair_task_impl #15095

667592 ms stall in ~data_sync_repair_task_impl #15095

Comments

bhalevy commented Aug 20, 2023

bhalevy commented Aug 20, 2023

bhalevy commented Aug 20, 2023

avikivity commented Aug 20, 2023

bhalevy commented Aug 20, 2023

bhalevy commented Aug 20, 2023 • edited

avikivity commented Aug 20, 2023

bhalevy commented Aug 21, 2023

bhalevy commented Aug 21, 2023

michoecho commented Aug 21, 2023 • edited

avikivity commented Aug 21, 2023

avikivity commented Aug 21, 2023

avikivity commented Aug 21, 2023

avikivity commented Aug 21, 2023

avikivity commented Aug 21, 2023

michoecho commented Aug 21, 2023 • edited

michoecho commented Aug 21, 2023

avikivity commented Aug 21, 2023

avikivity commented Aug 21, 2023

michoecho commented Aug 21, 2023 • edited

avikivity commented Aug 21, 2023

Deexie commented Aug 21, 2023

michoecho commented Aug 21, 2023

Deexie commented Aug 22, 2023

avikivity commented Aug 22, 2023

michoecho commented Aug 22, 2023

avikivity commented Aug 22, 2023

mykaul commented Aug 22, 2023

avikivity commented Sep 3, 2023

bhalevy commented Mar 3, 2024

bhalevy commented Aug 20, 2023 •

edited

michoecho commented Aug 21, 2023 •

edited

michoecho commented Aug 21, 2023 •

edited

michoecho commented Aug 21, 2023 •

edited