Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reactor stalled for 689 ms in time schema preparing before upgrade #15350

Closed
1 of 2 tasks
juliayakovlev opened this issue Sep 11, 2023 · 4 comments
Closed
1 of 2 tasks

Comments

@juliayakovlev
Copy link

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

Reactor stalled for 689 ms in time schema preparing before upgrade (when Scylla version is 5.2.7-20230821)

Sep 06 12:46:32 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] database - Truncating keyspace_fill_db_data.tuple_notation_test with auto-snapshot
Sep 06 12:46:32 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 3] database - Truncating keyspace_fill_db_data.in_order_by_without_selecting_test with auto-snapshot
Sep 06 12:47:02 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=system_traces tables={}
Sep 06 12:47:02 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=system_schema tables={}
Sep 06 12:47:03 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=system tables={}
Sep 06 12:47:03 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=system_distributed_everywhere tables={}
Sep 06 12:47:03 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=system_auth tables={}
Sep 06 12:47:03 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=ks_no_range_ghost_test tables={}
Sep 06 12:47:03 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=keyspace_fill_db_data tables={}
Sep 06 12:47:05 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=scylla_bench tables={}
Sep 06 12:47:10 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] raft_group_registry - marking Raft server bd8ef773-4fed-4ef2-bd64-6e3d6ca43381 as dead for raft groups
Sep 06 12:47:10 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] raft_group_registry - marking Raft server bd8ef773-4fed-4ef2-bd64-6e3d6ca43381 as alive for raft groups
Sep 06 12:47:15 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]: Reactor stalled for 689 ms on shard 5. Backtrace: 0x4fe5fa3 0x4fe5390 0x4fe6770 0x3cb1f 0x1c4c4a8 0x1c4c76f 0x1c50971 0x1b6da09 0x1b6cbe9 0x1c02248 0x1c00c06 0x526f0c6
Sep 06 12:47:15 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]: kernel callstack: 0xffffffffffffff80 0xffffffff8ebab4b7 0xffffffff8ea45e81 0xffffffff8ea024be 0xffffffff8efbf7d3 0xffffffff8efbfb05 0xffffffff8efbb7b8
Sep 06 12:47:15 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]: kernel callstack: 0xffffffffffffff80 0xffffffff8ebab4b7 0xffffffff8ea45e81 0xffffffff8ea024be 0xffffffff8efbf7d3 0xffffffff8efbfb05 0xffffffff8efbb7b8
Sep 06 12:47:15 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]: kernel callstack: 0xffffffffffffff80 0xffffffff8ebab4b7 0xffffffff8ea45e81 0xffffffff8ea024be 0xffffffff8efbf7d3 0xffffffff8efbfb05 0xffffffff8efbb7b8
Sep 06 12:47:15 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]: kernel callstack: 0xffffffffffffff80 0xffffffff8ebab4b7 0xffffffff8ea45e81 0xffffffff8ea024be 0xffffffff8efbf7d3 0xffffffff8efbfb05 0xffffffff8efbb7b8
Sep 06 12:47:15 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]: kernel callstack: 0xffffffffffffff80 0xffffffff8ebab4b7 0xffffffff8ea45e81 0xffffffff8ea024be 0xffffffff8efbf7d3 0xffffffff8efbfb05 0xffffffff8efbb7b8
Sep 06 12:47:15 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]: kernel callstack:
Sep 06 12:47:15 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]: kernel callstack: 0xffffffffffffff80 0xffffffff8ebab4b7 0xffffffff8ea45e81 0xffffffff8ea024be 0xffffffff8efbf7d3 0xffffffff8efbfb05 0xffffffff8efbb7b8
Sep 06 12:47:15 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]: kernel callstack: 0xffffffffffffff80 0xffffffff8ebab4b7 0xffffffff8ea45e81 0xffffffff8ea024be 0xffffffff8efbf7d3 0xffffffff8efbfb05 0xffffffff8efbb7b8
Sep 06 12:47:15 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]: kernel callstack: 0xffffffffffffff80 0xffffffff8ebab4b7 0xffffffff8ea45e81 0xffffffff8ea024be 0xffffffff8efbf7d3 0xffffffff8efbfb05 0xffffffff8efbb7b8
Sep 06 12:47:15 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]: kernel callstack: 0xffffffffffffff80 0xffffffff8ebab4b7 0xffffffff8ea45e81 0xffffffff8ea024be 0xffffffff8efbf7d3 0xffffffff8efbfb05 0xffffffff8efbb7b8
Sep 06 12:47:15 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]: kernel callstack: 0xffffffffffffff80 0xffffffff8ebab4b7 0xffffffff8ea45e81 0xffffffff8ea024be 0xffffffff8efbf7d3 0xffffffff8efbfb05 0xffffffff8efbb7b8
Sep 06 12:47:15 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]: kernel callstack: 0xffffffffffffff80 0xffffffff8ebab4b7 0xffffffff8ea45e81 0xffffffff8ea024be 0xffffffff8efbf7d3 0xffffffff8efbfb05 0xffffffff8efbb7b8
Sep 06 12:47:15 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]: kernel callstack: 0xffffffffffffff80 0xffffffff8ebab4b7 0xffffffff8ea45e81 0xffffffff8ea024be 0xffffffff8efbf7d3 0xffffffff8efbfb05 0xffffffff8efbb7b8
Sep 06 12:47:22 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=system_distributed tables={}
Sep 06 12:47:43 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=system_traces tables={}
Sep 06 12:46:32 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] database - Truncating keyspace_fill_db_data.tuple_notation_test with auto-snapshot
Sep 06 12:46:32 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 3] database - Truncating keyspace_fill_db_data.in_order_by_without_selecting_test with auto-snapshot
Sep 06 12:47:02 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=system_traces tables={}
Sep 06 12:47:02 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=system_schema tables={}
Sep 06 12:47:03 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=system tables={}
Sep 06 12:47:03 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=system_distributed_everywhere tables={}
Sep 06 12:47:03 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=system_auth tables={}
Sep 06 12:47:03 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=ks_no_range_ghost_test tables={}
Sep 06 12:47:03 rolling-upgrade--centos-db-node-e5180d05-0-3 scylla[4722]:  [shard 0] api - perform_keyspace_flush: keyspace=keyspace_fill_db_data tables={}

Decoded:

2023-09-06T12:47:15+00:00 rolling-upgrade--centos-db-node-e5180d05-0-3     !INFO | scylla[4722]: Reactor stalled for 689 ms on shard 5. Backtrace: 0x4fe5fa3 0x4fe5390 0x4fe6770 0x3cb1f 0x1c4c4a8 0x1c4c76f 0x1c50971 0x1b6da09 0x1b6cbe9 0x1c02248 0x1c00c06 0x526f0c6
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at ./build/release/seastar/./seastar/src/core/reactor.cc:797
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:816
seastar::internal::cpu_stall_detector::generate_trace() at ./build/release/seastar/./seastar/src/core/reactor.cc:1346
seastar::internal::cpu_stall_detector::maybe_report() at ./build/release/seastar/./seastar/src/core/reactor.cc:1123
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1140
 (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1382
?? ??:0
utils::uleb64_express_encode_impl(char*&, unsigned long, unsigned long) at ././utils/vle.hh:73
 (inlined by) void utils::uleb64_express_encode<void (&)(char const*, unsigned long), void (&)(char const*, unsigned long)>(char*&, unsigned int, unsigned long, unsigned long, void (&)(char const*, unsigned long), void (&)(char const*, unsigned long)) at ././utils/vle.hh:82
 (inlined by) logalloc::region_impl::object_descriptor::encode(char*&, unsigned long, unsigned long) const at ./utils/logalloc.cc:1620
 (inlined by) logalloc::region_impl::alloc_small(logalloc::region_impl::object_descriptor const&, unsigned int, unsigned long) at ./utils/logalloc.cc:1705
logalloc::region_impl::alloc_small(logalloc::region_impl::object_descriptor const&, unsigned int, unsigned long) at ./utils/logalloc.cc:1699
logalloc::region_impl::alloc(migrate_fn_type const*, unsigned long, unsigned long) at ./utils/logalloc.cc:2066
rows_entry* allocation_strategy::construct<rows_entry, schema const&, position_in_partition_view&, seastar::bool_class<dummy_tag>, seastar::bool_class<continuous_tag> >(schema const&, position_in_partition_view&, seastar::bool_class<dummy_tag>&&, seastar::bool_class<continuous_tag>&&) at ././utils/allocation_strategy.hh:153
 (inlined by) partition_snapshot_row_cursor::ensure_entry_if_complete(position_in_partition_view) at ././partition_snapshot_row_cursor.hh:606
operator() at ./partition_version.cc:471
 (inlined by) _ZN8logalloc18allocating_section24with_reclaiming_disabledIRZZN15partition_entry19apply_to_incompleteERK6schemaOS2_R16mutation_cleanerRS0_RNS_6regionER13cache_trackermR27real_dirty_memory_accounterEN4$_12clEvEUlvE_EEDcSB_OT_ at ././utils/logalloc.hh:499
 (inlined by) operator() at ././utils/logalloc.hh:521
 (inlined by) _ZN8logalloc18allocating_section12with_reserveIZNS0_clIZZN15partition_entry19apply_to_incompleteERK6schemaOS3_R16mutation_cleanerRS0_RNS_6regionER13cache_trackermR27real_dirty_memory_accounterEN4$_12clEvEUlvE_EEDcSC_OT_EUlvE_EEDcSC_SK_ at ././utils/logalloc.hh:470
 (inlined by) _ZN8logalloc18allocating_sectionclIZZN15partition_entry19apply_to_incompleteERK6schemaOS2_R16mutation_cleanerRS0_RNS_6regionER13cache_trackermR27real_dirty_memory_accounterEN4$_12clEvEUlvE_EEDcSB_OT_ at ././utils/logalloc.hh:520
 (inlined by) operator() at ./partition_version.cc:428
 (inlined by) seastar::noncopyable_function<seastar::bool_class<seastar::stop_iteration_tag> ()>::indirect_vtable_for<partition_entry::apply_to_incomplete(schema const&, partition_entry&&, mutation_cleaner&, logalloc::allocating_section&, logalloc::region&, cache_tracker&, unsigned long, real_dirty_memory_accounter&)::$_12>::call(seastar::noncopyable_function<seastar::bool_class<seastar::stop_iteration_tag> ()> const*) at ././seastar/include/seastar/util/noncopyable_function.hh:153
seastar::noncopyable_function<seastar::bool_class<seastar::stop_iteration_tag> ()>::operator()() const at ././seastar/include/seastar/util/noncopyable_function.hh:210
 (inlined by) utils::coroutine::run() at ././utils/coroutine.hh:40
 (inlined by) operator() at ./row_cache.cc:975
 (inlined by) _Z14with_allocatorIZZZN9row_cache9do_updateIZNS0_6updateENS0_16external_updaterERN7replica8memtableEE4$_21EEN7seastar6futureIvEES2_S5_T_ENKUlvE_clEvENUlvE0_clEvEUlvE1_EDcR19allocation_strategyOSA_ at ././utils/allocation_strategy.hh:313
 (inlined by) operator() at ./row_cache.cc:954
 (inlined by) void std::__invoke_impl<void, seastar::future<void> row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}>(std::__invoke_other, seastar::future<void> row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::__invoke_result<seastar::future<void> row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}>::type std::__invoke<seastar::future<void> row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}>(seastar::future<void> row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}&&, (seastar::future<void> row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}&&)...) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:96
 (inlined by) _ZSt12__apply_implIZZN9row_cache9do_updateIZNS0_6updateENS0_16external_updaterERN7replica8memtableEE4$_21EEN7seastar6futureIvEES2_S5_T_ENKUlvE_clEvEUlvE0_St5tupleIJEEJEEDcOSA_OT0_St16integer_sequenceImJXspT1_EEE at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:1852
 (inlined by) _ZSt5applyIZZN9row_cache9do_updateIZNS0_6updateENS0_16external_updaterERN7replica8memtableEE4$_21EEN7seastar6futureIvEES2_S5_T_ENKUlvE_clEvEUlvE0_St5tupleIJEEEDcOSA_OT0_ at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:1863
 (inlined by) seastar::future<void> seastar::futurize<void>::apply<seastar::future<void> row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}>(seastar::future<void> row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}&&, std::tuple<>&&) at ././seastar/include/seastar/core/future.hh:2111
operator() at ././seastar/include/seastar/core/thread.hh:258
 (inlined by) seastar::noncopyable_function<void ()>::direct_vtable_for<seastar::futurize<std::invoke_result<seastar::future<void> row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}>::type>::type seastar::async<seastar::future<void> row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}>(seastar::thread_attributes, seastar::future<void> row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}&&, (seastar::futurize&&)...)::{lambda()#1}>::call(seastar::noncopyable_function<void ()> const*) at ././seastar/include/seastar/util/noncopyable_function.hh:124
seastar::noncopyable_function<void ()>::operator()() const at ./build/release/seastar/./seastar/include/seastar/util/noncopyable_function.hh:210
 (inlined by) seastar::thread_context::main() at ./build/release/seastar/./seastar/src/core/thread.cc:299
?? ??:0

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Kernel Version: 3.10.0-1160.95.1.el7.x86_64
Scylla version (or git commit hash): 5.2.7-20230821.e0ebc95025d1 with build-id 0dff18311edc6ec24a66cfcfa280b6a150ab9fc5
Target Scylla version (or git commit hash): 2023.1.1-20230906.f4633ec973b0 with build-id b454e7a22f80cf71a33b2f39e47127225e8fbc13

Cluster size: 4 nodes (n2-highmem-8)

Scylla Nodes used in this run:

  • rolling-upgrade--centos-db-node-e5180d05-0-4 (34.138.32.203 | 10.142.1.20) (shards: 7)
  • rolling-upgrade--centos-db-node-e5180d05-0-3 (34.23.237.122 | 10.142.1.10) (shards: 7)
  • rolling-upgrade--centos-db-node-e5180d05-0-2 (35.237.132.16 | 10.142.1.2) (shards: 7)
  • rolling-upgrade--centos-db-node-e5180d05-0-1 (34.23.18.53 | 10.142.0.250) (shards: 7)

OS / Image: https://www.googleapis.com/compute/v1/projects/centos-cloud/global/images/family/centos-7 (gce: us-east1)

Test: rolling-upgrade-centos7-test
Test id: e5180d05-229d-45b7-aedc-9825ab0e2af3
Test name: enterprise-2023.1/rolling-upgrade/rolling-upgrade-centos7-test
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor e5180d05-229d-45b7-aedc-9825ab0e2af3
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs e5180d05-229d-45b7-aedc-9825ab0e2af3

Logs:

Jenkins job URL
Argus

@avikivity
Copy link
Member

Please decode the kernel traces.

@avikivity
Copy link
Member

Looks like #8828

@juliayakovlev
Copy link
Author

Please decode the kernel traces.

@avikivity
Decoded kernel traces for rolling-upgrade--centos-db-node-e5180d05-0-3:
results.log

@avikivity
Copy link
Member

Closing as duplicate of #8828.

@avikivity avikivity closed this as not planned Won't fix, can't repro, duplicate, stale Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants