Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

667592 ms stall in ~data_sync_repair_task_impl #15095

Closed
bhalevy opened this issue Aug 20, 2023 · 29 comments
Closed

667592 ms stall in ~data_sync_repair_task_impl #15095

bhalevy opened this issue Aug 20, 2023 · 29 comments
Assignees

Comments

@bhalevy
Copy link
Member

bhalevy commented Aug 20, 2023

Seen in https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/6417/artifact/dtest/logs-full.release.011/1692525345967_nodetool_additional_test.py%3A%3ATestNodetool%3A%3Atest_concurrent_drain/node2.log

Scylla version 5.4.0~dev-0.20230820.bd8bb5d4b157 with build-id ccf18ab6bae7cde0ba71696466ec93f6d1ab8a43 starting ...

INFO  2023-08-20 09:32:39,430 [shard 0] init - Scylla version 5.4.0~dev-0.20230820.bd8bb5d4b157 initialization completed.
Reactor stalled for 667592 ms on shard 0. Backtrace: 0x58be0fa 0x58bd505 0x58be8af 0x3db6f 0x4177f4f 0x419c513 0x1bb7af3 0x41a3f15 0x46aca38 0x58cf59f 0x58d07d7 0x58cfb49 0x5873097 0x587224c 0x127cf3f 0x127e9b0 0x127b4ce 0x27b49 0x27c0a 0x1279264
INFO  2023-08-20 09:55:29,072 [shard 0] gossip - failure_detector_loop: Mark node 127.0.48.1 as DOWN
INFO  2023-08-20 09:55:29,072 [shard 0] gossip - InetAddress 127.0.48.1 is now DOWN, status = NORMAL
INFO  2023-08-20 09:55:29,080 [shard 0] gossip - InetAddress 127.0.48.1 is now UP, status = NORMAL
ERROR 2023-08-20 09:55:39,025 [shard 0] service_level_controller - update_from_distributed_data: failed to update configuration for more than  1350 seconds : exceptions::read_timeout_exception (Operation timed out for system_distributed.service_levels - received only 0 responses from 1 CL=ONE.)
INFO  2023-08-20 09:55:42,952 [shard 0] compaction_manager - Asked to stop

Decoded:

Reactor stalled for 667592 ms on shard 0. Backtrace:
[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:64
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at ./build/release/seastar/./seastar/src/core/reactor.cc:832
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:851
seastar::internal::cpu_stall_detector::generate_trace() at ./build/release/seastar/./seastar/src/core/reactor.cc:1421
seastar::internal::cpu_stall_detector::maybe_report() at ./build/release/seastar/./seastar/src/core/reactor.cc:1158
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1178
 (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1458
?? ??:0
repair::data_sync_repair_task_impl::is_abortable() const at repair.cc:?
~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:492
 (inlined by) ~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:492
~shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:570
 (inlined by) ~task at ././tasks/task_manager.hh:75
 (inlined by) seastar::internal::lw_shared_ptr_accessors_esft<tasks::task_manager::task>::dispose(tasks::task_manager::task*) at ././seastar/include/seastar/core/shared_ptr.hh:212
 (inlined by) seastar::internal::lw_shared_ptr_accessors_esft<tasks::task_manager::task>::dispose(seastar::lw_shared_ptr_counter_base*) at ././seastar/include/seastar/core/shared_ptr.hh:209
 (inlined by) ~lw_shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:325
 (inlined by) seastar::lw_shared_ptr<tasks::task_manager::task>::operator=(seastar::lw_shared_ptr<tasks::task_manager::task>&&) at ././seastar/include/seastar/core/shared_ptr.hh:338
 (inlined by) seastar::lw_shared_ptr<tasks::task_manager::task>::operator=(decltype(nullptr)) at ././seastar/include/seastar/core/shared_ptr.hh:344
 (inlined by) seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >::destroy_on(seastar::lw_shared_ptr<tasks::task_manager::task>, unsigned int) at ././seastar/include/seastar/core/sharded.hh:875
 (inlined by) seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >::destroy(seastar::lw_shared_ptr<tasks::task_manager::task>, unsigned int) at ././seastar/include/seastar/core/sharded.hh:859
 (inlined by) ~foreign_ptr at ././seastar/include/seastar/core/sharded.hh:903
 (inlined by) void std::destroy_at<seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> > >(seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/stl_construct.h:88
 (inlined by) void std::_Destroy<seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> > >(seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/stl_construct.h:149
 (inlined by) void std::_Destroy_aux<false>::__destroy<seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*>(seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*, seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/stl_construct.h:163
 (inlined by) void std::_Destroy<seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*>(seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*, seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/stl_construct.h:195
 (inlined by) void std::_Destroy<seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*, seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> > >(seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*, seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> >*, std::allocator<seastar::foreign_ptr<seastar::lw_shared_ptr<tasks::task_manager::task> > >&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/alloc_traits.h:947
 (inlined by) ~vector at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/stl_vector.h:732
 (inlined by) ~impl at ././tasks/task_manager.hh:118
~data_sync_repair_task_impl at ././repair/task_manager_module.hh:70
 (inlined by) ~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:492
 (inlined by) ~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:492
~shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:570
 (inlined by) ~task at ./tasks/task_manager.hh:75
 (inlined by) seastar::internal::lw_shared_ptr_accessors_esft<tasks::task_manager::task>::dispose(tasks::task_manager::task*) at ././seastar/include/seastar/core/shared_ptr.hh:212
 (inlined by) seastar::internal::lw_shared_ptr_accessors_esft<tasks::task_manager::task>::dispose(seastar::lw_shared_ptr_counter_base*) at ././seastar/include/seastar/core/shared_ptr.hh:209
 (inlined by) ~lw_shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:325
 (inlined by) ~pair at /usr/include/boost/container/detail/std_fwd.hpp:39
 (inlined by) void std::destroy_at<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> > >(std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >*) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/stl_construct.h:88
 (inlined by) void std::allocator_traits<std::allocator<std::__detail::_Hash_node<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >, false> > >::destroy<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> > >(std::allocator<std::__detail::_Hash_node<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >, false> >&, std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >*) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/alloc_traits.h:559
 (inlined by) std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >, false> > >::_M_deallocate_node(std::__detail::_Hash_node<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >, false>*) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/hashtable_policy.h:2020
 (inlined by) std::_Hashtable<utils::tagged_uuid<tasks::task_id_tag>, std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >, std::allocator<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> > >, std::__detail::_Select1st, std::equal_to<utils::tagged_uuid<tasks::task_id_tag> >, std::hash<utils::tagged_uuid<tasks::task_id_tag> >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >, false>*) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/hashtable.h:2338
 (inlined by) std::_Hashtable<utils::tagged_uuid<tasks::task_id_tag>, std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >, std::allocator<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> > >, std::__detail::_Select1st, std::equal_to<utils::tagged_uuid<tasks::task_id_tag> >, std::hash<utils::tagged_uuid<tasks::task_id_tag> >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(std::integral_constant<bool, true>, utils::tagged_uuid<tasks::task_id_tag> const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/hashtable.h:2381
 (inlined by) std::_Hashtable<utils::tagged_uuid<tasks::task_id_tag>, std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> >, std::allocator<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> > >, std::__detail::_Select1st, std::equal_to<utils::tagged_uuid<tasks::task_id_tag> >, std::hash<utils::tagged_uuid<tasks::task_id_tag> >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::erase(utils::tagged_uuid<tasks::task_id_tag> const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/hashtable.h:984
 (inlined by) std::unordered_map<utils::tagged_uuid<tasks::task_id_tag>, seastar::lw_shared_ptr<tasks::task_manager::task>, std::hash<utils::tagged_uuid<tasks::task_id_tag> >, std::equal_to<utils::tagged_uuid<tasks::task_id_tag> >, std::allocator<std::pair<utils::tagged_uuid<tasks::task_id_tag> const, seastar::lw_shared_ptr<tasks::task_manager::task> > > >::erase(utils::tagged_uuid<tasks::task_id_tag> const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/unordered_map.h:770
 (inlined by) tasks::task_manager::unregister_task(utils::tagged_uuid<tasks::task_id_tag>) at ./tasks/task_manager.cc:411
 (inlined by) tasks::task_manager::module::unregister_task(utils::tagged_uuid<tasks::task_id_tag>) at ./tasks/task_manager.cc:292
 (inlined by) operator()<seastar::future<void> > at ./tasks/task_manager.cc:191
 (inlined by) operator() at ././seastar/include/seastar/core/future.hh:1539
 (inlined by) void seastar::futurize<void>::satisfy_with_result_of<seastar::future<void>::then_wrapped_nrvo<void, tasks::task_manager::task::start()::$_0::operator()()::{lambda(auto:1)#1}>(tasks::task_manager::task::start()::$_0::operator()()::{lambda(auto:1)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(auto:1)#1}&, seastar::future_state<auto:2&&::monostate>&&)#1}::operator()(seastar::internal::promise_base_with_type<void>, seastar::internal::promise_base_with_type<void>&&, seastar::future_state<tasks::task_manager::task::start()::$_0::operator()()::{lambda(auto:1)#1}&&::monostate>) const::{lambda()#1}>(seastar::internal::promise_base_with_type<void>, seastar::future<void>::then_wrapped_nrvo<void, tasks::task_manager::task::start()::$_0::operator()()::{lambda(auto:1)#1}>(tasks::task_manager::task::start()::$_0::operator()()::{lambda(auto:1)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(auto:1)#1}&, seastar::future_state<auto:2&&::monostate>&&)#1}::operator()(seastar::internal::promise_base_with_type<void>, seastar::internal::promise_base_with_type<void>&&, seastar::future_state<tasks::task_manager::task::start()::$_0::operator()()::{lambda(auto:1)#1}&&::monostate>) const::{lambda()#1}&&) at ././seastar/include/seastar/core/future.hh:1988
 (inlined by) operator() at ././seastar/include/seastar/core/future.hh:1538
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<void>, tasks::task_manager::task::start()::$_0::operator()()::{lambda(auto:1)#1}, seastar::future<void>::then_wrapped_nrvo<void, {lambda(auto:1)#1}>({lambda(auto:1)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(auto:1)#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>::run_and_dispose() at ././seastar/include/seastar/core/future.hh:741
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2597
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:3060
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3229
seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3112
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:276
seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:167
scylla_main(int, char**) at ./main.cc:631
std::function<int (int, char**)>::operator()(int, char**) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591
main at ./main.cc:1990
?? ??:0
?? ??:0
_start at ??:?
@bhalevy
Copy link
Member Author

bhalevy commented Aug 20, 2023

For some reason there was a another long (but not THAT long) stall from a completely different code path on node1 at the same time. Could they be both relate to networking somehow?

On https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/6417/artifact/dtest/logs-full.release.011/1692525345967_nodetool_additional_test.py%3A%3ATestNodetool%3A%3Atest_concurrent_drain/node1.log

INFO  2023-08-20 09:32:27,050 [shard 0] init - Scylla version 5.4.0~dev-0.20230820.bd8bb5d4b157 initialization completed.
INFO  2023-08-20 09:32:27,051 [shard 0] standard_role_manager - Created default superuser role 'cassandra'.
INFO  2023-08-20 09:32:35,661 [shard 0] compaction - [Compact system.scylla_local 7f03d7d0-3f3c-11ee-bf9c-16e6881fd14d] Compacting [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8o_0qib_3wlq82wwcm8pyh2i4t-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8o_0qi2_5pbz52wwcm8pyh2i4t-big-Data.db:level=0:origin=compaction]
INFO  2023-08-20 09:32:35,661 [shard 0] gossip - InetAddress 127.0.48.2 is now UP, status = UNKNOWN
INFO  2023-08-20 09:32:35,668 [shard 0] compaction - [Compact system.scylla_local 7f03d7d0-3f3c-11ee-bf9c-16e6881fd14d] Compacted 2 sstables to [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8o_0qib_3xob52wwcm8pyh2i4t-big-Data.db:level=0]. 11kB to 6kB (~51% of original) in 5ms = 2MB/s. ~256 total partitions merged to 1.
INFO  2023-08-20 09:32:36,007 [shard 0] storage_service - bootstrap[67505c38-1320-4da0-a8a4-c19d4f529da3]: Added node=127.0.48.2 as bootstrap, coordinator=127.0.48.2
INFO  2023-08-20 09:32:36,662 [shard 0] compaction - [Compact system.scylla_local 7f9c9560-3f3c-11ee-bf9c-16e6881fd14d] Compacting [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8o_0qic_3wtg02wwcm8pyh2i4t-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8o_0qib_3xob52wwcm8pyh2i4t-big-Data.db:level=0:origin=compaction]
INFO  2023-08-20 09:32:36,670 [shard 0] cdc - Starting to use generation (2023/08/20 09:33:01, 74600efd-c5c4-4392-92c4-20dbe79a4fa5)
INFO  2023-08-20 09:32:36,674 [shard 0] compaction - [Compact system.scylla_local 7f9c9560-3f3c-11ee-bf9c-16e6881fd14d] Compacted 2 sstables to [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8o_0qic_3xw0x2wwcm8pyh2i4t-big-Data.db:level=0]. 11kB to 6kB (~51% of original) in 8ms = 1MB/s. ~256 total partitions merged to 1.
INFO  2023-08-20 09:32:36,674 [shard 0] compaction - [Compact system.cdc_local 7f9e6a20-3f3c-11ee-bf9c-16e6881fd14d] Compacting [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/cdc_local-0bcaffd40c833eadad13dc1d5015b77c/me-3g8o_0qic_3ybgg2wwcm8pyh2i4t-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/cdc_local-0bcaffd40c833eadad13dc1d5015b77c/me-3g8o_0qi2_5qek02wwcm8pyh2i4t-big-Data.db:level=0:origin=memtable]
INFO  2023-08-20 09:32:36,675 [shard 0] cdc - CDC description table successfully updated with generation (2023/08/20 09:33:01, 74600efd-c5c4-4392-92c4-20dbe79a4fa5).
INFO  2023-08-20 09:32:36,684 [shard 0] compaction - [Compact system.cdc_local 7f9e6a20-3f3c-11ee-bf9c-16e6881fd14d] Compacted 2 sstables to [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/cdc_local-0bcaffd40c833eadad13dc1d5015b77c/me-3g8o_0qic_40gm92wwcm8pyh2i4t-big-Data.db:level=0]. 68kB to 63kB (~92% of original) in 8ms = 8MB/s. ~256 total partitions merged to 2.
INFO  2023-08-20 09:32:39,388 [shard 0] storage_service - bootstrap[67505c38-1320-4da0-a8a4-c19d4f529da3]: Marked ops done from coordinator=127.0.48.2
INFO  2023-08-20 09:32:39,682 [shard 0] storage_service - Set host_id=7083fcf8-0d71-4ec2-b7dc-df9c3a6ec256 to be owned by node=127.0.48.2
INFO  2023-08-20 09:32:39,693 [shard 1] compaction - [Compact system.peers 816b13d0-3f3c-11ee-8390-16e7881fd14d] Compacting [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8o_0qi2_5tu0g200wvtzejyczx-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8o_0qif_4317k200wvtzejyczx-big-Data.db:level=0:origin=memtable]
INFO  2023-08-20 09:32:39,703 [shard 1] compaction - [Compact system.peers 816b13d0-3f3c-11ee-8390-16e7881fd14d] Compacted 2 sstables to [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-ig9l2czq/test/node1/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8o_0qif_44qxs200wvtzejyczx-big-Data.db:level=0]. 17kB to 11kB (~67% of original) in 8ms = 2MB/s. ~256 total partitions merged to 1.
Reactor stalled for 13356 ms on shard 0. Backtrace: 0x58be0fa 0x58bd505 0x58be8af 0x3db6f 0x5c738e1 0x5c751a3 0x1568615 0x1465d08 0x3c35ee3 0x12f706a 0x58cf59f 0x58d07d7 0x58cfb49 0x5873097 0x587224c 0x127cf3f 0x127e9b0 0x127b4ce 0x27b49 0x27c0a 0x1279264
WARN  2023-08-20 09:55:29,071 [shard 0] gossip - failure_detector_loop: Send echo to node 127.0.48.2, status = failed: seastar::rpc::timeout_error (rpc call timed out)
INFO  2023-08-20 09:55:29,071 [shard 0] gossip - failure_detector_loop: Mark node 127.0.48.2 as DOWN
INFO  2023-08-20 09:55:29,073 [shard 0] gossip - InetAddress 127.0.48.2 is now DOWN, status = NORMAL
INFO  2023-08-20 09:55:29,080 [shard 0] gossip - InetAddress 127.0.48.2 is now UP, status = NORMAL
INFO  2023-08-20 09:55:42,952 [shard 0] compaction_manager - Asked to stop

Decoded:

Reactor stalled for 13356 ms on shard 0. Backtrace:
[Backtrace #1]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:64
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at ./build/release/seastar/./seastar/src/core/reactor.cc:832
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:851
seastar::internal::cpu_stall_detector::generate_trace() at ./build/release/seastar/./seastar/src/core/reactor.cc:1421
seastar::internal::cpu_stall_detector::maybe_report() at ./build/release/seastar/./seastar/src/core/reactor.cc:1158
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1178
 (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1458
?? ??:0
noncopyable_function at ./build/release/seastar/./seastar/include/seastar/util/noncopyable_function.hh:202
 (inlined by) seastar::noncopyable_function<void ()>::operator=(seastar::noncopyable_function<void ()>&&) at ./build/release/seastar/./seastar/include/seastar/util/noncopyable_function.hh:208
 (inlined by) seastar::timer<seastar::lowres_clock>::set_callback(seastar::scheduling_group, seastar::noncopyable_function<void ()>&&) at ./build/release/seastar/./seastar/include/seastar/core/timer.hh:141
 (inlined by) seastar::timer<seastar::lowres_clock>::set_callback(seastar::noncopyable_function<void ()>&&) at ./build/release/seastar/./seastar/include/seastar/core/timer.hh:147
 (inlined by) seastar::rpc::connection::send(seastar::rpc::snd_buf, std::optional<std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > >, seastar::rpc::cancellable*) at ./build/release/seastar/./seastar/src/rpc/rpc.cc:281
seastar::rpc::client::request(unsigned long, long, seastar::rpc::snd_buf, std::optional<std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > >, seastar::rpc::cancellable*) at ./build/release/seastar/./seastar/src/rpc/rpc.cc:655
send at ././seastar/include/seastar/rpc/rpc_impl.hh:478
operator() at ././seastar/include/seastar/rpc/rpc_impl.hh:490
 (inlined by) auto netw::send_message_timeout<void, std::chrono::duration<long, std::ratio<1l, 1000l> >, long&>(netw::messaging_service*, netw::messaging_verb, netw::msg_addr, std::chrono::duration<long, std::ratio<1l, 1000l> >, long&) at ././message/rpc_protocol_impl.hh:157
 (inlined by) netw::messaging_service::send_gossip_echo(netw::msg_addr, long, std::chrono::duration<long, std::ratio<1l, 1000l> >) at ./message/messaging_service.cc:1143
gms::gossiper::failure_detector_loop_for_node(gms::inet_address, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, unsigned long) at ./gms/gossiper.cc:839
std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/coroutine:240
 (inlined by) seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() at ././seastar/include/seastar/core/coroutine.hh:125
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2597
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:3060
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3229
seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3112
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:276
seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:167
scylla_main(int, char**) at ./main.cc:631
std::function<int (int, char**)>::operator()(int, char**) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591
main at ./main.cc:1990
?? ??:0
?? ??:0
_start at ??:?

@bhalevy
Copy link
Member Author

bhalevy commented Aug 20, 2023

@xemul are there known issues in that area?

@avikivity
Copy link
Member

Only 667592 ms?

@bhalevy
Copy link
Member Author

bhalevy commented Aug 20, 2023

Only 667592 ms?

We need a contest!

@bhalevy
Copy link
Member Author

bhalevy commented Aug 20, 2023

@avikivity shouldn't we observe exponentially arger stalls as they are detected and the threshold is doubled each time?
Did something change on this area recently?

@avikivity
Copy link
Member

667 seconds? that's 11 minutes.

It looks like an infinite loop, not a stall.

@bhalevy
Copy link
Member Author

bhalevy commented Aug 21, 2023

https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/249/artifact/logs-full.debug.041/1692521802039_compaction_additional_test.py%3A%3ATestCompactionAdditional%3A%3Atest_double_compaction_by_cleanup_and_major_compactions/node1.log
has 16_077_758 printouts of the following. Is it related?

INFO  2023-08-20 06:45:48,995 [shard 0] compaction_manager - Stopping 1 tasks for 0 ongoing compactions for table ks.cf compaction_group=0/1 due to user-triggered operation

@bhalevy
Copy link
Member Author

bhalevy commented Aug 21, 2023

@michoecho
Copy link
Contributor

michoecho commented Aug 21, 2023

The stall is also visible in other tests running on the same machine at the same time:

$ rg 'stalled for ...... ms' -C1

logs-full.release.011/1692525345967_nodetool_additional_test.py::TestNodetool::test_concurrent_drain/node2.log
470-INFO  2023-08-20 09:32:39,430 [shard 0] init - Scylla version 5.4.0~dev-0.20230820.bd8bb5d4b157 initialization completed.
471:Reactor stalled for 667592 ms on shard 0. Backtrace: 0x58be0fa 0x58bd505 0x58be8af 0x3db6f 0x4177f4f 0x419c513 0x1bb7af3 0x41a3f15 0x46aca38 0x58cf59f 0x58d07d7 0x58cfb49 0x5873097 0x587224c 0x127cf3f 0x127e9b0 0x127b4ce 0x27b49 0x27c0a 0x1279264
472-INFO  2023-08-20 09:55:29,072 [shard 0] gossip - failure_detector_loop: Mark node 127.0.48.1 as DOWN

logs-full.release.011/1692525334254_materialized_views_test.py::TestInterruptBuildProcess::test_interrupt_build_process_with_resharding_low_to_half_test/node3.log
1319-INFO  2023-08-20 09:32:46,362 [shard  0] view - Building view ks.t_by_v2, starting at token minimum token
1320:Reactor stalled for 115836 ms on shard 0. Backtrace: 0x58be0fa 0x58bd505 0x58be8af 0x3db6f 0x10c76 0x130bd 0x40557c9 0x3bf1c45 0x3b6a97a 0x3c00e33 0x3bfeb35 0x5b807c6
1321-INFO  2023-08-20 09:55:28,981 [shard  1] gossip - failure_detector_loop: Mark node 127.0.84.2 as DOWN

logs-full.release.011/1692525339776_materialized_views_test.py::TestMaterializedViews::test_mv_create_with_synchronous_updates/node3.log
884-INFO  2023-08-20 09:32:46,345 [shard 1] compaction - [Compact ks.cf 8552fc60-3f3c-11ee-98f5-f00dc960ae11] Compacted 2 sstables to [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-yqu_9hd8/test/node3/data/ks/cf-4bcb82003f3c11ee9c4af00cc960ae11/me-3g8o_0qim_1gxv42bqmtop5aapld-big-Data.db:level=0]. 238kB to 120kB (~50% of original) in 95ms = 2MB/s. ~768 total partitions merged to 365.
885:Reactor stalled for 372352 ms on shard 0. Backtrace: 0x58be0fa 0x58bd505 0x58be8af 0x3db6f 0x4e8c9 0x4055765 0x3bf1c45 0x3b6a97a 0x3c00e33 0x3bfeb35 0x5b807c6
886-INFO  2023-08-20 09:55:28,975 [shard 1] gossip - failure_detector_loop: Mark node 127.0.42.2 as DOWN

logs-full.release.011/1692525334254_materialized_views_test.py::TestInterruptBuildProcess::test_interrupt_build_process_with_resharding_low_to_half_test/node1.log
1253-INFO  2023-08-20 09:32:45,836 [shard  3] compaction - [Compact system.scylla_local 8512e620-3f3c-11ee-82a6-ed72dd346128] Compacted 2 sstables to [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-0c_i57nb/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8o_0qil_4x9681ziyo7dvtggmg-big-Data.db:level=0]. 11kB to 6kB (~51% of original) in 7ms = 1MB/s. ~256 total partitions merged to 1.
1254:Reactor stalled for 775504 ms on shard 0. Backtrace: 0x58be0fa 0x58bd505 0x58be8af 0x3db6f 0x437749f 0x42f8331 0x58cf59f 0x58d07d7 0x58cfb49 0x5873097 0x587224c 0x127cf3f 0x127e9b0 0x127b4ce 0x27b49 0x27c0a 0x1279264
1255-WARN  2023-08-20 09:55:28,977 [shard  1] gossip - failure_detector_loop: Send echo to node 127.0.84.3, status = failed: seastar::rpc::timeout_error (rpc call timed out)

So obviously it's a OS problem, not a Scylla problem.

@avikivity
Copy link
Member

Maybe related to the infamous three-digit stalls?

@avikivity
Copy link
Member

INFO  2023-08-20 09:32:39,430 [shard 0] init - Scylla version 5.4.0~dev-0.20230820.bd8bb5d4b157 initialization completed.
Reactor stalled for 667592 ms on shard 0. Backtrace: 0x58be0fa 0x58bd505 0x58be8af 0x3db6f 0x4177f4f 0x419c513 0x1bb7af3 0x41a3f15 0x46aca38 0x58cf59f 0x58d07d7 0x58cfb49 0x5873097 0x587224c 0x127cf3f 0x127e9b0 0x127b4ce 0x27b49 0x27c0a 0x1279264
INFO  2023-08-20 09:55:29,072 [shard 0] gossip - failure_detector_loop: Mark node 127.0.48.1 as DOWN

The two timestamps are 1370 seconds apart. That's a little over 2X the duration of the stall.

@avikivity
Copy link
Member

I think it rules out games with CLOCK_THREAD_CPUTIME_ID, since realtime agrees something funny happened.

@avikivity
Copy link
Member

https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/249/artifact/logs-full.debug.041/1692521802039_compaction_additional_test.py%3A%3ATestCompactionAdditional%3A%3Atest_double_compaction_by_cleanup_and_major_compactions/node1.log has 16_077_758 printouts of the following. Is it related?

INFO  2023-08-20 06:45:48,995 [shard 0] compaction_manager - Stopping 1 tasks for 0 ongoing compactions for table ks.cf compaction_group=0/1 due to user-triggered operation

Yes. Log spam -> syslog stalled -> everything blocked.

Does dtest/ccm run with scylladb logs directed to syslog? It shouldn't.

@avikivity
Copy link
Member

@avikivity shouldn't we observe exponentially arger stalls as they are detected and the threshold is doubled each time? Did something change on this area recently?

Probably, everything is stalled so the stall detector itself isn't getting CPU time.

@michoecho
Copy link
Contributor

michoecho commented Aug 21, 2023

INFO  2023-08-20 09:32:39,430 [shard 0] init - Scylla version 5.4.0~dev-0.20230820.bd8bb5d4b157 initialization completed.
Reactor stalled for 667592 ms on shard 0. Backtrace: 0x58be0fa 0x58bd505 0x58be8af 0x3db6f 0x4177f4f 0x419c513 0x1bb7af3 0x41a3f15 0x46aca38 0x58cf59f 0x58d07d7 0x58cfb49 0x5873097 0x587224c 0x127cf3f 0x127e9b0 0x127b4ce 0x27b49 0x27c0a 0x1279264
INFO  2023-08-20 09:55:29,072 [shard 0] gossip - failure_detector_loop: Mark node 127.0.48.1 as DOWN

The two timestamps are 1370 seconds apart. That's a little over 2X the duration of the stall.

That, and also the lengths of the stalls vary wildly across samples, even though the realtime "before" and "after" agrees in all cases. The stalls are: 667 s, 13 s, 115 s, 372 s, 775 s. (Each in a different process.)

@michoecho
Copy link
Contributor

https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/249/artifact/logs-full.debug.041/1692521802039_compaction_additional_test.py%3A%3ATestCompactionAdditional%3A%3Atest_double_compaction_by_cleanup_and_major_compactions/node1.log has 16_077_758 printouts of the following. Is it related?

INFO  2023-08-20 06:45:48,995 [shard 0] compaction_manager - Stopping 1 tasks for 0 ongoing compactions for table ks.cf compaction_group=0/1 due to user-triggered operation

Yes. Log spam -> syslog stalled -> everything blocked.

Does dtest/ccm run with scylladb logs directed to syslog? It shouldn't.

But the log spam happens at a different time. The stall happens 1 hour after the log spam ends.

@avikivity
Copy link
Member

https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/249/artifact/logs-full.debug.041/1692521802039_compaction_additional_test.py%3A%3ATestCompactionAdditional%3A%3Atest_double_compaction_by_cleanup_and_major_compactions/node1.log has 16_077_758 printouts of the following. Is it related?

INFO  2023-08-20 06:45:48,995 [shard 0] compaction_manager - Stopping 1 tasks for 0 ongoing compactions for table ks.cf compaction_group=0/1 due to user-triggered operation

Yes. Log spam -> syslog stalled -> everything blocked.
Does dtest/ccm run with scylladb logs directed to syslog? It shouldn't.

But the log spam happens at a different time. The stall happens 1 hour after the log spam ends.

Isn't there a rate limit on the number of confusing things in a single issue?

@avikivity
Copy link
Member

Another option is a swapstorm more or less killing the node.

@michoecho
Copy link
Contributor

michoecho commented Aug 21, 2023

https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/249/artifact/logs-full.debug.041/1692521802039_compaction_additional_test.py%3A%3ATestCompactionAdditional%3A%3Atest_double_compaction_by_cleanup_and_major_compactions/node1.log has 16_077_758 printouts of the following. Is it related?

INFO  2023-08-20 06:45:48,995 [shard 0] compaction_manager - Stopping 1 tasks for 0 ongoing compactions for table ks.cf compaction_group=0/1 due to user-triggered operation

Yes. Log spam -> syslog stalled -> everything blocked.
Does dtest/ccm run with scylladb logs directed to syslog? It shouldn't.

But the log spam happens at a different time. The stall happens 1 hour after the log spam ends.

Isn't there a rate limit on the number of confusing things in a single issue?

This fact doesn't seem very confusing — a natural explanation is that the log spam is unrelated to the stall.
It probably didn't even happen on the same machine. I don't know how dtests are set up, but note that the logs spam is in a different jenkins job (dtest-debug vs dtest) than than the stalls.

But 13 million INFO logs in a test is insane in its own right. This is obviously some bug that deserves its own fix.

@avikivity
Copy link
Member

Yes please file one and assign to @Deexie

@Deexie
Copy link
Contributor

Deexie commented Aug 21, 2023

How can I checkout to the exact version used in Jenkins?

The logs look pretty similar to #14966, I'm not sure yet if they are related anyhow.

@michoecho
Copy link
Contributor

How can I checkout to the exact version used in Jenkins?

The logs look pretty similar to #14966, I'm not sure yet if they are related anyhow.

Scylla logs its git hash and build id when it starts. The version used in this run is d57a951, which predates the fix for #14966 (e9d9489). So this run could have been affected.

@Deexie
Copy link
Contributor

Deexie commented Aug 22, 2023

How can I checkout to the exact version used in Jenkins?
The logs look pretty similar to #14966, I'm not sure yet if they are related anyhow.

Scylla logs its git hash and build id when it starts. The version used in this run is d57a951, which predates the fix for #14966 (e9d9489). So this run could have been affected.

Thanks! I checked it on current master and the problem does not appear

@avikivity
Copy link
Member

A swapstorm wouldn't register on CLOCK_CPU_THREAD_ID. Though sometimes the kernel enlists a faulting thread to work to free memory.

Idea: make --overprovisioned (which should really be named --underprovisioned, or --overcommitted) disable transparent hugepages. I'll post a patch.

@michoecho
Copy link
Contributor

A swapstorm wouldn't register on CLOCK_CPU_THREAD_ID. Though sometimes the kernel enlists a faulting thread to work to free memory.

Idea: make --overprovisioned (which should really be named --underprovisioned, or --overcommitted) disable transparent hugepages. I'll post a patch.

I'm out of my depth here. Is it really possible for a swapstorm to stall the process for 12 minutes? I understand that it can slow down everything to a crawl, but can it slow the process so much that it can't even handle a signal and look at the clock for minutes?

avikivity added a commit to avikivity/seastar that referenced this issue Aug 22, 2023
--overprovisioned (which should really be renamed --overcommitted or
--underprovisioned) means that Seastar should sacrifice performance
in favor of playing well with other applications on the same system.
One such sacrifice can be avoiding transparent hugepages.

Transparent hugepages can waste memory if only part of a hugepage is
used, waste kernel CPU time trying to coalesce a huge page, and
make it harder to swap.

Be more friendly by avoiding transparent hugepages when --overprovisioned
is specified. This is a little complicated since we allocate memory before
we know whether --overprovisioned is specified or not (even before main()
runs). Since it's only a small amount of memory, conservatively (from
the performance point of view) assume we want transparent hugepages,
and back down and undo if it later turns out we don't.

I verified with strace that madvise() calls happen as expected.

Ref scylladb/scylladb#15095.
@avikivity
Copy link
Member

A swapstorm wouldn't register on CLOCK_CPU_THREAD_ID. Though sometimes the kernel enlists a faulting thread to work to free memory.
Idea: make --overprovisioned (which should really be named --underprovisioned, or --overcommitted) disable transparent hugepages. I'll post a patch.

I'm out of my depth here. Is it really possible for a swapstorm to stall the process for 12 minutes? I understand that it can slow down everything to a crawl, but can it slow the process so much that it can't even handle a signal and look at the clock for minutes?

It's possible but I admit it's unlikely. In the days of spinning disks it could easily happen, since 12 minutes = 72,000 seeks, so if there is that much activity before the code involved gets paged in then nothing happens. In those days reaching for the reset button was sometimes the only option.

In these days of sub-millisecond flash it's very rare.

avikivity added a commit to avikivity/seastar that referenced this issue Aug 22, 2023
--overprovisioned (which should really be renamed --overcommitted or
--underprovisioned) means that Seastar should sacrifice performance
in favor of playing well with other applications on the same system.
One such sacrifice can be avoiding transparent hugepages.

Transparent hugepages can waste memory if only part of a hugepage is
used, waste kernel CPU time trying to coalesce a huge page, and
make it harder to swap.

Be more friendly by avoiding transparent hugepages when --overprovisioned
is specified. This is a little complicated since we allocate memory before
we know whether --overprovisioned is specified or not (even before main()
runs). Since it's only a small amount of memory, conservatively (from
the performance point of view) assume we want transparent hugepages,
and back down and undo if it later turns out we don't.

I verified with strace that madvise() calls happen as expected.

Ref scylladb/scylladb#15095.
@mykaul
Copy link
Contributor

mykaul commented Aug 22, 2023

A swapstorm wouldn't register on CLOCK_CPU_THREAD_ID. Though sometimes the kernel enlists a faulting thread to work to free memory.
Idea: make --overprovisioned (which should really be named --underprovisioned, or --overcommitted) disable transparent hugepages. I'll post a patch.

I'm out of my depth here. Is it really possible for a swapstorm to stall the process for 12 minutes? I understand that it can slow down everything to a crawl, but can it slow the process so much that it can't even handle a signal and look at the clock for minutes?

It's possible but I admit it's unlikely. In the days of spinning disks it could easily happen, since 12 minutes = 72,000 seeks, so if there is that much activity before the code involved gets paged in then nothing happens. In those days reaching for the reset button was sometimes the only option.

Even with rotating disks, rest their soul in peace, you had NCQ which would re-order those seeks to complete faster.

In these days of sub-millisecond flash it's very rare.

nyh pushed a commit to scylladb/seastar that referenced this issue Aug 24, 2023
--overprovisioned (which should really be renamed --overcommitted or
--underprovisioned) means that Seastar should sacrifice performance
in favor of playing well with other applications on the same system.
One such sacrifice can be avoiding transparent hugepages.

Transparent hugepages can waste memory if only part of a hugepage is
used, waste kernel CPU time trying to coalesce a huge page, and
make it harder to swap.

Be more friendly by avoiding transparent hugepages when --overprovisioned
is specified. This is a little complicated since we allocate memory before
we know whether --overprovisioned is specified or not (even before main()
runs). Since it's only a small amount of memory, conservatively (from
the performance point of view) assume we want transparent hugepages,
and back down and undo if it later turns out we don't.

I verified with strace that madvise() calls happen as expected.

Ref scylladb/scylladb#15095.

Closes #1796
avikivity added a commit to avikivity/scylladb that referenced this issue Aug 28, 2023
* seastar 0784da876...6e80e84a3 (29):
  > Revert "shared_token_bucket: Make duration->tokens conversion more solid"
  > Merge 'chunked_fifo: let incremetal operator return iterator not basic_iterator' from Kefu Chai
  > memory: diable transparent hugepages if --overprovisioned is specified
Ref scylladb#15095
  > http/exception: s/<TAB>/    /
  > install-dependencies.sh: re-add protobuf
  > Merge 'Keep capacity on fair_queue_entry' from Pavel Emelyanov
  > Merge 'Fix server-side RPC stream shutdown' from Pavel Emelyanov
Fixes scylladb#13100
  > smp: make service management semaphore thread local
  > tls_test: abort_accept() after getting server socket
  > Merge 'Print more IO info with ioinfo app' from Pavel Emelyanov
  > rpc: Fix client-side stream registration race
Ref scylladb#13100
> tests: perf: shard_token_bucket: avoid capturing unused variables in lambdas
  > build: pass -DBoost_NO_CXX98_FUNCTION_BASE to C++ compiler
  > reactor: Drop some dangling friend declarations
  > fair_queue: Do not re-evaluate request capacity twice
  > build: do not use serial number file when signing a cert
  > shared_token_bucket: Make duration->tokens conversion more solid
  > tests: Add perf test for shard_token_bucket
  > Merge 'Make make_file_impl() less yielding' from Pavel Emelyanov
  > fair_queue: Remove individual requests counting
  > reactor, linux-aio: print value of aio-max-nr on error
  > Merge 'build, net: disable implicit fallthough' from Kefu Chai
  > shared_token_bucket: Fix duration_for() underflow
  > rpc: Generalize get_stats_internal() method
  > doc/building-dpdk.md: fix invalid file path of README-DPDK.md
  > install-dependencies: add centos9
  > Merge 'log: report scheduling group along with shard id' from Kefu Chai
  > dns: handle exception in do_sendv for udp
  > Merge 'Add a stall detector histogram' from Amnon Heiman
xemul added a commit to xemul/scylla that referenced this issue Aug 30, 2023
* seastar 0784da87...6e80e84a (29):
  > Revert "shared_token_bucket: Make duration->tokens conversion more solid"
  > Merge 'chunked_fifo: let incremetal operator return iterator not basic_iterator' from Kefu Chai
  > memory: diable transparent hugepages if --overprovisioned is specified
Ref scylladb#15095
  > http/exception: s/<TAB>/    /
  > install-dependencies.sh: re-add protobuf
  > Merge 'Keep capacity on fair_queue_entry' from Pavel Emelyanov
  > Merge 'Fix server-side RPC stream shutdown' from Pavel Emelyanov
Fixes scylladb#13100
  > smp: make service management semaphore thread local
  > tls_test: abort_accept() after getting server socket
  > Merge 'Print more IO info with ioinfo app' from Pavel Emelyanov
  > rpc: Fix client-side stream registration race
Ref scylladb#13100
  > tests: perf: shard_token_bucket: avoid capturing unused variables in lambdas
  > build: pass -DBoost_NO_CXX98_FUNCTION_BASE to C++ compiler
  > reactor: Drop some dangling friend declarations
  > fair_queue: Do not re-evaluate request capacity twice
  > build: do not use serial number file when signing a cert
  > shared_token_bucket: Make duration->tokens conversion more solid
  > tests: Add perf test for shard_token_bucket
  > Merge 'Make make_file_impl() less yielding' from Pavel Emelyanov
  > fair_queue: Remove individual requests counting
  > reactor, linux-aio: print value of aio-max-nr on error
  > Merge 'build, net: disable implicit fallthough' from Kefu Chai
  > shared_token_bucket: Fix duration_for() underflow
  > rpc: Generalize get_stats_internal() method
  > doc/building-dpdk.md: fix invalid file path of README-DPDK.md
  > install-dependencies: add centos9
  > Merge 'log: report scheduling group along with shard id' from Kefu Chai
  > dns: handle exception in do_sendv for udp
  > Merge 'Add a stall detector histogram' from Amnon Heiman

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
xemul added a commit that referenced this issue Aug 31, 2023
* seastar 0784da87...6e80e84a (29):
  > Revert "shared_token_bucket: Make duration->tokens conversion more solid"
  > Merge 'chunked_fifo: let incremetal operator return iterator not basic_iterator' from Kefu Chai
  > memory: diable transparent hugepages if --overprovisioned is specified
Ref #15095
  > http/exception: s/<TAB>/    /
  > install-dependencies.sh: re-add protobuf
  > Merge 'Keep capacity on fair_queue_entry' from Pavel Emelyanov
  > Merge 'Fix server-side RPC stream shutdown' from Pavel Emelyanov
Fixes #13100
  > smp: make service management semaphore thread local
  > tls_test: abort_accept() after getting server socket
  > Merge 'Print more IO info with ioinfo app' from Pavel Emelyanov
  > rpc: Fix client-side stream registration race
Ref #13100
  > tests: perf: shard_token_bucket: avoid capturing unused variables in lambdas
  > build: pass -DBoost_NO_CXX98_FUNCTION_BASE to C++ compiler
  > reactor: Drop some dangling friend declarations
  > fair_queue: Do not re-evaluate request capacity twice
  > build: do not use serial number file when signing a cert
  > shared_token_bucket: Make duration->tokens conversion more solid
  > tests: Add perf test for shard_token_bucket
  > Merge 'Make make_file_impl() less yielding' from Pavel Emelyanov
  > fair_queue: Remove individual requests counting
  > reactor, linux-aio: print value of aio-max-nr on error
  > Merge 'build, net: disable implicit fallthough' from Kefu Chai
  > shared_token_bucket: Fix duration_for() underflow
  > rpc: Generalize get_stats_internal() method
  > doc/building-dpdk.md: fix invalid file path of README-DPDK.md
  > install-dependencies: add centos9
  > Merge 'log: report scheduling group along with shard id' from Kefu Chai
  > dns: handle exception in do_sendv for udp
  > Merge 'Add a stall detector histogram' from Amnon Heiman

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #15218
denesb pushed a commit that referenced this issue Sep 1, 2023
* seastar 0784da87...6e80e84a (29):
  > Revert "shared_token_bucket: Make duration->tokens conversion more solid"
  > Merge 'chunked_fifo: let incremetal operator return iterator not basic_iterator' from Kefu Chai
  > memory: diable transparent hugepages if --overprovisioned is specified
Ref #15095
  > http/exception: s/<TAB>/    /
  > install-dependencies.sh: re-add protobuf
  > Merge 'Keep capacity on fair_queue_entry' from Pavel Emelyanov
  > Merge 'Fix server-side RPC stream shutdown' from Pavel Emelyanov
Fixes #13100
  > smp: make service management semaphore thread local
  > tls_test: abort_accept() after getting server socket
  > Merge 'Print more IO info with ioinfo app' from Pavel Emelyanov
  > rpc: Fix client-side stream registration race
Ref #13100
  > tests: perf: shard_token_bucket: avoid capturing unused variables in lambdas
  > build: pass -DBoost_NO_CXX98_FUNCTION_BASE to C++ compiler
  > reactor: Drop some dangling friend declarations
  > fair_queue: Do not re-evaluate request capacity twice
  > build: do not use serial number file when signing a cert
  > shared_token_bucket: Make duration->tokens conversion more solid
  > tests: Add perf test for shard_token_bucket
  > Merge 'Make make_file_impl() less yielding' from Pavel Emelyanov
  > fair_queue: Remove individual requests counting
  > reactor, linux-aio: print value of aio-max-nr on error
  > Merge 'build, net: disable implicit fallthough' from Kefu Chai
  > shared_token_bucket: Fix duration_for() underflow
  > rpc: Generalize get_stats_internal() method
  > doc/building-dpdk.md: fix invalid file path of README-DPDK.md
  > install-dependencies: add centos9
  > Merge 'log: report scheduling group along with shard id' from Kefu Chai
  > dns: handle exception in do_sendv for udp
  > Merge 'Add a stall detector histogram' from Amnon Heiman

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #15218
@avikivity
Copy link
Member

91cc544 should reduce the likelihood of a swapstorm

rishvin pushed a commit to rishvin/seastar that referenced this issue Sep 25, 2023
--overprovisioned (which should really be renamed --overcommitted or
--underprovisioned) means that Seastar should sacrifice performance
in favor of playing well with other applications on the same system.
One such sacrifice can be avoiding transparent hugepages.

Transparent hugepages can waste memory if only part of a hugepage is
used, waste kernel CPU time trying to coalesce a huge page, and
make it harder to swap.

Be more friendly by avoiding transparent hugepages when --overprovisioned
is specified. This is a little complicated since we allocate memory before
we know whether --overprovisioned is specified or not (even before main()
runs). Since it's only a small amount of memory, conservatively (from
the performance point of view) assume we want transparent hugepages,
and back down and undo if it later turns out we don't.

I verified with strace that madvise() calls happen as expected.

Ref scylladb/scylladb#15095.

Closes scylladb#1796
@bhalevy
Copy link
Member Author

bhalevy commented Mar 3, 2024

This hasn't reproduced.
Closing until we have further evidence.

@bhalevy bhalevy closed this as not planned Won't fix, can't repro, duplicate, stale Mar 3, 2024
graphcareful pushed a commit to graphcareful/seastar that referenced this issue Mar 20, 2024
--overprovisioned (which should really be renamed --overcommitted or
--underprovisioned) means that Seastar should sacrifice performance
in favor of playing well with other applications on the same system.
One such sacrifice can be avoiding transparent hugepages.

Transparent hugepages can waste memory if only part of a hugepage is
used, waste kernel CPU time trying to coalesce a huge page, and
make it harder to swap.

Be more friendly by avoiding transparent hugepages when --overprovisioned
is specified. This is a little complicated since we allocate memory before
we know whether --overprovisioned is specified or not (even before main()
runs). Since it's only a small amount of memory, conservatively (from
the performance point of view) assume we want transparent hugepages,
and back down and undo if it later turns out we don't.

I verified with strace that madvise() calls happen as expected.

Ref scylladb/scylladb#15095.

Closes scylladb#1796
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants