A lot of lsa-timing log messages during node replace cause c-s stucked and aborted #13753

aleksbykov · 2023-05-03T07:36:33Z

Installation details
Scylla version (or git commit hash): 5.2.0~rc5-0.20230425.a89867d8c230 with build-id 15657fcbd406469aa63748f0b8aab3330c137b4e
Cluster size: 3
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0859cdda6c3ecfd84(eu-west-1)
instance type: i3en.2xlarge
test-id: db29e3d2-3903-44e5-bc8b-e4c3c74bc4b0
job: https://jenkins.scylladb.com/job/Perf-Regression/job/scylla-release-perf-regression-latency-650gb-with-nemesis/2/

One of operation during latency_with_nemesis write_test was Terminate and replace. Node 3 was terminated and in 5 minutes node was added and start bootstrapping. the boot strapping took about 3 hours. Also during boot strap process in node4 log there are a lot of next messages:

2023-04-28T20:17:36+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 3] lsa-timing - - segments to release: 30
2023-04-28T20:17:36+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 3] lsa-timing - - processed 12 regions: reclaimed from 1, compacted 11
2023-04-28T20:17:36+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 3] lsa-timing - - evicted memory: 3.127 MiB
2023-04-28T20:17:36+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 3] lsa-timing - - compacted segments: 72
2023-04-28T20:17:36+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 3] lsa-timing - - compacted memory: 5.512 MiB
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4  !WARNING | scylla[5319]:  [shard 4] lsa-timing - compact took 12000 us, trying to release 3.750 MiB non-preemptibly, reserve: {goal: 0, max: 30}, at 0x52e2e5e 0x52e3410 0x52e36f8 0x1bfb55d 0x1c00f7c 0x1c08600 0x19db713 0x18ac6e3 0x174fbe9 0x17e771f 0x1807509 0x18070dd 0x4f07d33 0x4f56544 0x4f577c7 0x4f785b1 0x4f288aa /opt/scylladb/libreloc/libc.so.6+0x8b14c /opt/scylladb/libreloc/libc.so.6+0x10c9ff
   --------
   seastar::lambda_task<seastar::execution_stage::flush()::$_1>
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - segments to release: 30
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - processed 1 regions: reclaimed from 1, compacted 0
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - evicted memory: 3.877 MiB
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted segments: 78
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted memory: 6.068 MiB
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4  !WARNING | scylla[5319]:  [shard 4] lsa-timing - compact took 16000 us, trying to release 3.500 MiB non-preemptibly, reserve: {goal: 0, max: 30}, at 0x52e2e5e 0x52e3410 0x52e36f8 0x1bfb55d 0x1c00f7c 0x1c08600 0x19db713 0x18ac6e3 0x174fbe9 0x17e771f 0x1807509 0x18070dd 0x4f07d33 0x4f56544 0x4f577c7 0x4f785b1 0x4f288aa /opt/scylladb/libreloc/libc.so.6+0x8b14c /opt/scylladb/libreloc/libc.so.6+0x10c9ff
   --------
   seastar::lambda_task<seastar::execution_stage::flush()::$_1>
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - segments to release: 30
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - processed 2 regions: reclaimed from 1, compacted 1
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - evicted memory: 3.500 MiB
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted segments: 74
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted memory: 5.844 MiB
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4  !WARNING | scylla[5319]:  [shard 4] lsa-timing - compact took 12000 us, trying to release 2.625 MiB non-preemptibly, reserve: {goal: 0, max: 30}, at 0x52e2e5e 0x52e3410 0x52e36f8 0x1bfb55d 0x1c00f7c 0x1c08600 0x19db713 0x18ac6e3 0x174fbe9 0x17e771f 0x1807509 0x18070dd 0x4f07d33 0x4f56544 0x4f577c7 0x4f785b1 0x4f288aa /opt/scylladb/libreloc/libc.so.6+0x8b14c /opt/scylladb/libreloc/libc.so.6+0x10c9ff
   --------
   seastar::lambda_task<seastar::execution_stage::flush()::$_1>
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - segments to release: 30
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - processed 3 regions: reclaimed from 1, compacted 2
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - evicted memory: 2.626 MiB
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted segments: 57
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted memory: 4.524 MiB
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4  !WARNING | scylla[5319]:  [shard 4] lsa-timing - compact took 12000 us, trying to release 2.750 MiB non-preemptibly, reserve: {goal: 0, max: 30}, at 0x52e2e5e 0x52e3410 0x52e36f8 0x1bfb55d 0x1c00f7c 0x1c08600 0x19db713 0x18ac6e3 0x174fbe9 0x17e771f 0x1807509 0x18070dd 0x4f07d33 0x4f56544 0x4f577c7 0x4f785b1 0x4f288aa /opt/scylladb/libreloc/libc.so.6+0x8b14c /opt/scylladb/libreloc/libc.so.6+0x10c9ff
   --------
   seastar::lambda_task<seastar::execution_stage::flush()::$_1>
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - segments to release: 30
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - processed 2 regions: reclaimed from 1, compacted 1
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - evicted memory: 2.750 MiB
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted segments: 60
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted memory: 4.825 MiB
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4  !WARNING | scylla[5319]:  [shard 4] lsa-timing - compact took 12000 us, trying to release 3.625 MiB non-preemptibly, reserve: {goal: 0, max: 30}, at 0x52e2e5e 0x52e3410 0x52e36f8 0x1bfb55d 0x1c00f7c 0x1c08600 0x19db713 0x18ac6e3 0x174fbe9 0x17e771f 0x1807509 0x18070dd 0x4f07d33 0x4f56544 0x4f577c7 0x4f785b1 0x4f288aa /opt/scylladb/libreloc/libc.so.6+0x8b14c /opt/scylladb/libreloc/libc.so.6+0x10c9ff
   --------
   seastar::lambda_task<seastar::execution_stage::flush()::$_1>
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - segments to release: 30
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - processed 7 regions: reclaimed from 1, compacted 6
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - evicted memory: 3.501 MiB
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted segments: 80
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted memory: 6.385 MiB
2023-04-28T20:17:44+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4  !WARNING | scylla[5319]:  [shard 4] lsa-timing - compact took 12000 us, trying to release 3.000 MiB non-preemptibly, reserve: {goal: 0, max: 30}, at 0x52e2e5e 0x52e3410 0x52e36f8 0x1bfb55d 0x1c00f7c 0x1c08600 0x1a2ad54 0x1a2a5f4 0x1a29ebb 0x205cb14 0x51ced86
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::async<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33>(seastar::thread_attributes, sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33&&)::{lambda()#2}, seastar::future<void>::then_impl_nrvo<seastar::async<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33>(seastar::thread_attributes, sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33&&)::{lambda()#2}, seastar::future<void> >(sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::async<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33>(seastar::thread_attributes, auto:1&&)::{lambda()#2}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<seastar::async<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33>(seastar::thread_attributes, sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33&&)::{lambda()#3}, false>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void>::finally_body<seastar::async<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33>(seastar::thread_attributes, sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33&&)::{lambda()#3}, false> >(seastar::future<void>::finally_body<seastar::async<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33>(seastar::thread_attributes, sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33&&)::{lambda()#3}, false>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::finally_body<seastar::async<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33>(seastar::thread_attributes, auto:1&&)::{lambda()#3}, false>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_34, false>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void>::finally_body<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_34, false> >(seastar::future<void>::finally_body<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_34, false>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::finally_body<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_34, false>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::handle_exception<replica::dirty_memory_manager::flush_one(replica::memtable_list&, replica::flush_permit&&)::$_6>(replica::dirty_memory_manager::flush_one(replica::memtable_list&, replica::flush_permit&&)::$_6&&)::{lambda(auto:1&&)#1}, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void>::handle_exception<replica::dirty_memory_manager::flush_one(replica::memtable_list&, replica::flush_permit&&)::$_6>(replica::dirty_memory_manager::flush_one(replica::memtable_list&, replica::flush_permit&&)::$_6&&)::{lambda(auto:1&&)#1}>(seastar::future<void>::handle_exception<replica::dirty_memory_manager::flush_one(replica::memtable_list&, replica::flush_permit&&)::$_6>(replica::dirty_memory_manager::flush_one(replica::memtable_list&, replica::flush_permit&&)::$_6&&)::{lambda(auto:1&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::handle_exception<replica::dirty_memory_manager::flush_one(replica::memtable_list&, replica::flush_permit&&)::$_6>(auto:1&&)::{lambda(auto:1&&)#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::handle_exception<replica::dirty_memory_manager::flush_when_needed()::$_2::operator()() const::{lambda()#2}::operator()() const::{lambda(auto:1)#1}::operator()<replica::flush_permit>(replica::flush_permit) const::{lambda(std::__exception_ptr::exception_ptr)#1}>(replica::dirty_memory_manager::flush_when_needed()::$_2::operator()() const::{lambda()#2}::operator()() const::{lambda(auto:1)#1}::operator()<replica::flush_permit>(replica::flush_permit) const::{lambda(std::__exception_ptr::exception_ptr)#1}&&)::{lambda(auto:1&&)#1}, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void>::handle_exception<replica::dirty_memory_manager::flush_when_needed()::$_2::operator()() const::{lambda()#2}::operator()() const::{lambda(auto:1)#1}::operator()<replica::flush_permit>(replica::flush_permit) const::{lambda(std::__exception_ptr::exception_ptr)#1}>(replica::dirty_memory_manager::flush_when_needed()::$_2::operator()() const::{lambda()#2}::operator()() const::{lambda(auto:1)#1}::operator()<replica::flush_permit>(replica::flush_permit) const::{lambda(std::__exception_ptr::exception_ptr)#1}&&)::{lambda(auto:1&&)#1}>(seastar::future<void>::handle_exception<replica::dirty_memory_manager::flush_when_needed()::$_2::operator()() const::{lambda()#2}::operator()() const::{lambda(auto:1)#1}::operator()<replica::flush_permit>(replica::flush_permit) const::{lambda(std::__exception_ptr::exception_ptr)#1}>(replica::dirty_memory_manager::flush_when_needed()::$_2::operator()() const::{lambda()#2}::operator()() const::{lambda(auto:1)#1}::operator()<replica::flush_permit>(replica::flush_permit) const::{lambda(std::__exception_ptr::exception_ptr)#1}&&)::{lambda(auto:1&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::handle_exception<replica::dirty_memory_manager::flush_when_needed()::$_2::operator()() const::{lambda()#2}::operator()() const::{lambda(auto:1)#1}::operator()<replica::flush_permit>(auto:1) const::{lambda(std::__exception_ptr::exception_ptr)#1}>(auto:1&&)::{lambda(auto:1&&)#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
2023-04-28T20:17:44+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - segments to release: 30
2023-04-28T20:17:44+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - processed 13 regions: reclaimed from 1, compacted 12
2023-04-28T20:17:44+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - evicted memory: 2.626 MiB
2023-04-28T20:17:44+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted segments: 68
2023-04-28T20:17:44+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted memory: 5.479 MiB
2023-04-28T20:17:44+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4  !WARNING | scylla[5319]:  [shard 4] lsa-timing - compact took 12000 us, trying to release 3.375 MiB non-preemptibly, reserve: {goal: 0, max: 30}, at 0x52e2e5e 0x52e3410 0x52e36f8 0x1bfb55d 0x1c00f7c 0x1c08600 0x19db713 0x18ac6e3 0x174fbe9 0x17e771f 0x1807509 0x18070dd 0x4f07d33 0x4f56544 0x4f577c7 0x4f785b1 0x4f288aa /opt/scylladb/libreloc/libc.so.6+0x8b14c /opt/scylladb/libreloc/libc.so.6+0x10c9ff
   --------
   seastar::lambda_task<seastar::execution_stage::flush()::$_1>

2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: Reactor stalled for 6 ms on shard 5. Backtrace: 0x4f45c63 0x4f45050 0x4f46430 0x3cb4f 0x3c2f547 0x18d8ac6 0x4f56544 0x4f577c7 0x4f785b1 0x4f288aa 0x8b14c 0x10c9ff
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack: 0xffffffffffffff80 0xffffffffb1d1df8f 0xffffffffb1cc0ec7 0xffffffffb24de6bd 0xffffffffb24e0866 0xffffffffb25bde13 0xffffffffb25bec89 0xffffffffb25fd16c 0xffffffffb24ce431 0xffffffffb24d0885 0xffffffffb24d0994 0xffffffffb27b118c 0xffffffffb2800099
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack: 0xffffffffffffff80 0xffffffffb20a47ce 0xffffffffb24e5579 0xffffffffb24e572b 0xffffffffb24e58e8 0xffffffffb25bd957 0xffffffffb25bec89 0xffffffffb25fd16c 0xffffffffb24ce431 0xffffffffb24d0885 0xffffffffb24d0994 0xffffffffb27b118c 0xffffffffb2800099
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack: 0xffffffffffffff80 0xffffffffb2a0006f 0xffffffffb1ac5834 0xffffffffb27b46f3 0xffffffffb2800eeb
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack: 0xffffffffffffff80 0xffffffffb2a0006f 0xffffffffb1ac5834 0xffffffffb27b4663 0xffffffffb2800f0b

Right after Scylla was started, next error appeared in node4 log :

2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] init - Scylla version 5.2.0~rc5-0.20230425.a89867d8c230 initialization completed.
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:26.011Z","N":"http","M":"GET /gossiper/endpoint/live/","from":"10.4.1.17:52708","status":200,"bytes":36,"duration":"0ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:26.013Z","N":"http","M":"GET /storage_service/nodes/joining","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"0ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:26.014Z","N":"http","M":"GET /storage_service/nodes/leaving","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"0ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:32.006Z","N":"http","M":"GET /storage_service/host_id","from":"10.4.1.17:52708","status":200,"bytes":210,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:32.018Z","N":"http","M":"GET /storage_service/nodes/leaving","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4   !NOTICE | sudo[13563]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13563]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13563]: pam_unix(sudo:session): session closed for user root
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:35.005Z","N":"http","M":"GET /storage_service/host_id","from":"10.4.1.17:52708","status":200,"bytes":210,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:35.010Z","N":"http","M":"GET /storage_service/nodes/joining","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:35.016Z","N":"http","M":"GET /storage_service/scylla_release_version","from":"10.4.1.17:52708","status":200,"bytes":35,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 5] compaction - [Compact keyspace1.standard1 e3eec3c0-e619-11ed-858b-8cd8ac24a596] Compacting [/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7334-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7285-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7348-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7355-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7341-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7320-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7313-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7306-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7299-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7327-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7292-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7278-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7264-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7271-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7201-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7257-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7250-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7243-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7236-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7215-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7229-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7180-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7222-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7152-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7194-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7131-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7117-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7040-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7103-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7012-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7075-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-6963-big-Data.db:level=0:origin=memtable]
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:41.010Z","N":"http","M":"GET /gossiper/endpoint/live/","from":"10.4.1.17:52708","status":200,"bytes":36,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:41.013Z","N":"http","M":"GET /storage_service/nodes/joining","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:41.015Z","N":"http","M":"GET /storage_service/nodes/leaving","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:47.014Z","N":"http","M":"GET /storage_service/nodes/joining","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 3] compaction - [Compact keyspace1.standard1 edeac3b0-e619-11ed-bad7-8cd4ac24a596] Compacting [/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7199-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7227-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7241-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7234-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7248-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7255-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7220-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7213-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7262-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7206-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7178-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7185-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7171-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7192-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7164-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7157-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7094-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7150-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7269-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7143-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7136-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7087-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7129-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7122-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7115-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7108-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7045-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7017-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7073-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-6996-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-6947-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-6982-big-Data.db:level=0:origin=memtable]
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:56.011Z","N":"http","M":"GET /gossiper/endpoint/live/","from":"10.4.1.17:52708","status":200,"bytes":36,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:56.021Z","N":"http","M":"GET /storage_service/nodes/leaving","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"5ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4   !NOTICE | sudo[13577]: scyllaadm : PWD=/home/scyllaadm ; USER=scylla ; COMMAND=/usr/bin/touch /var/lib/scylla-housekeeping/housekeeping.uuid.marked
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13577]: pam_unix(sudo:session): session opened for user scylla(uid=112) by (uid=1000)
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13577]: pam_unix(sudo:session): session closed for user scylla
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] compaction - [Compact keyspace1.standard1 efeb75b0-e619-11ed-9d4a-8cd7ac24a596] Compacting [/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7249-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7277-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7291-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7284-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7298-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7305-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7312-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7270-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7263-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7256-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7228-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7235-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7221-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7242-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7214-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7207-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7200-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7319-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7193-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7123-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7186-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7179-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7172-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7165-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7158-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7151-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7144-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7137-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7116-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7067-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7102-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7046-big-Data.db:level=0:origin=memtable]
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:11:02.014Z","N":"http","M":"GET /storage_service/host_id","from":"10.4.1.17:52708","status":200,"bytes":210,"duration":"3ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4   !NOTICE | syslog-ng[4697]: Syslog connection established; fd='16', server='AF_INET(10.4.3.72:32768)', local='AF_INET(0.0.0.0:0)'
2023-04-28T23:11:04+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4   !NOTICE | sudo[13653]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-04-28T23:11:04+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13653]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-04-28T23:11:04+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13653]: pam_unix(sudo:session): session closed for user root
2023-04-28T23:11:10+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4   !NOTICE | sudo[13665]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/sed -i -e 's/^replace_node_first_boot:/# replace_node_first_boot:/g' /etc/scylla/scylla.yaml
2023-04-28T23:11:10+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13665]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-04-28T23:11:10+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13665]: pam_unix(sudo:session): session closed for user root
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4   !NOTICE | sudo[13667]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/grep ^SCYLLA_ARGS= /etc/default/scylla-server
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13667]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13667]: pam_unix(sudo:session): session closed for user root
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] gossip - 60000 ms elapsed, 10.4.2.143 gossip quarantine over
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:11:11.010Z","N":"http","M":"GET /gossiper/endpoint/live/","from":"10.4.1.17:52708","status":200,"bytes":36,"duration":"0ms"}
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:11:11.013Z","N":"http","M":"GET /storage_service/nodes/joining","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"0ms"}
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:11:11.015Z","N":"http","M":"GET /storage_service/nodes/leaving","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"1ms"}
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - Set host_id=9c6c8303-17d2-4f27-8314-e36da32806f6 to be owned by node=10.4.2.143
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token -8542634246853768106. Ignoring 10.4.2.143
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token 165501024429053374. Ignoring 10.4.2.143
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token -7446035036376763312. Ignoring 10.4.2.143
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token -4339697659981632653. Ignoring 10.4.2.143
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token 2828690592628119438. Ignoring 10.4.2.143
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token 8130484774468784335. Ignoring 10.4.2.143
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token -9089163918726289559. Ignoring 10.4.2.143
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token -6630159402286116649. Ignoring 10.4.2.143

once handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token -6630159402286116649. Ignoring 10.4.2.143 start appearing in log, c-s start reporting next errors:

2023-04-28 23:17:07.964: (CassandraStressLogEvent Severity.CRITICAL) period_type=one-time event_id=be5165af-25c8-4b03-a19b-ea6713f6d4ae during_nemesis=RunUniqueSequence: type=OperationOnKey regex=Operation x10 on key\(s\) \[ line_number=4619 node=Node perf-latency-nemesis-ubuntu-loader-node-db29e3d2-2 [54.170.153.56 | 10.4.1.75] (seed: False)
java.io.IOException: Operation x10 on key(s) [384d4d334f3335363530]: Error executing: (WriteTimeoutException): Cassandra timeout during SIMPLE write query at consistency QUORUM (2 replica were required but only 1 acknowledged the write)
2023-04-28 23:17:07.964: (CassandraStressLogEvent Severity.CRITICAL) period_type=one-time event_id=87c9a43f-f1f2-4871-89c3-40553ae3842a during_nemesis=RunUniqueSequence: type=OperationOnKey regex=Operation x10 on key\(s\) \[ line_number=4571 node=Node perf-latency-nemesis-ubuntu-loader-node-db29e3d2-3 [52.212.240.95 | 10.4.1.10] (seed: False)
java.io.IOException: Operation x10 on key(s) [364d3937344f4e4e3431]: Error executing: (WriteTimeoutException): Cassandra timeout during SIMPLE write query at consistency QUORUM (2 replica were required but only 1 acknowledged the write)
2023-04-28 23:17:07.965: (CassandraStressLogEvent Severity.CRITICAL) period_type=one-time event_id=87c9a43f-f1f2-4871-89c3-40553ae3842a during_nemesis=RunUniqueSequence: type=OperationOnKey regex=Operation x10 on key\(s\) \[ line_number=4587 node=Node perf-latency-nemesis-ubuntu-loader-node-db29e3d2-3 [52.212.240.95 | 10.4.1.10] (seed: False)
java.io.IOException: Operation x10 on key(s) [4c32344c334b4d504e30]: Error executing: (WriteTimeoutException): Cassandra timeout during SIMPLE write query at consistency QUORUM (2 replica were required but only 1 acknowledged the write)

and test was terminated by crirical issue

db_logs: https://cloudius-jenkins-test.s3.amazonaws.com/db29e3d2-3903-44e5-bc8b-e4c3c74bc4b0/20230428_231723/db-cluster-db29e3d2.tar.gz

monitoring:
http://3.235.167.195:3000/d/Pnn1R-s4k/scylla-release-perf-regression-latency-650gb-with-nemesis-scylla-per-server-metrics-nemesis-master?from=1682676687923&to=1682726481734&var-by=dc&var-cluster=&var-dc=All&var-node=All&var-shard=All&var-mount_point=&var-func=sum&var-dash_version=5-2&var-all_scyllas_versions=All&var-count_dc=All&var-scylla_version=5.2&var-monitoring_version=4.2.1&orgId=1
http://3.235.167.195:3000/d/advanced-5-2/advanced?orgId=1&refresh=30s&from=1682676687923&to=1682726481734

Similar happened also during test_read when new node was bootstrapped with replace and during bootstrapping new nodes
db_logs: https://cloudius-jenkins-test.s3.amazonaws.com/cc78a106-e5b6-4b9a-bfca-44c1dce36118/20230429_030853/db-cluster-cc78a106.tar.gz

The text was updated successfully, but these errors were encountered:

aleksbykov · 2023-05-03T07:42:47Z

Green lines on graf is bootstrapped with replace new node

mykaul · 2023-05-03T08:18:08Z

@aleksbykov - can you describe the topology and the workload? Specifically, how many nodes and what was the workload? If I'm seeing correctly, you only had 3 nodes and you've replaced one of them? If the load is too high, I reckon the other 2 'live' nodes could not sustain it (just a theory, looking at the above - haven't looked at the logs or monitor yet!)

aleksbykov · 2023-05-03T08:20:49Z

topology is 3 nodes. 1 node was terminated and was replaced with new one.
next command are running from 4 loaders for write test:

stress_cmd_w: "cassandra-stress write no-warmup cl=QUORUM duration=800m -schema 'replication(factor=3)' -mode cql3 native -rate 'threads=250 fixed=20332/s' -col 'size=FIXED(128) n=FIXED(8)' -pop 'dist=gauss(1..650000000,325000000,9750000)' "

DoronArazii · 2023-05-03T11:09:52Z

@avikivity / @tgrabiec can you please give some directions

avikivity · 2023-05-03T11:11:30Z

Suspecting #8828.

avikivity · 2023-05-03T11:42:21Z

I see many tcp related kernel stalls, but also #8828 stalls.

[Backtrace #19]
0xffffffffffffff80 (1065761408 bytes after last symbol)
clear_page_erms+0x7
get_page_from_freelist+0x353
__alloc_pages+0x17e
alloc_pages_vma+0x29e
do_huge_pmd_anonymous_page+0x13c
__handle_mm_fault+0x6e7
handle_mm_fault+0xd8
do_user_addr_fault+0x1c9
exc_page_fault+0x77
asm_exc_page_fault+0x27
copy_user_enhanced_fast_string+0xe
simple_copy_to_iter+0x39
__skb_datagram_iter+0x78
skb_copy_datagram_iter+0x38
tcp_recvmsg_locked+0x2a7
tcp_recvmsg+0x79
inet_recvmsg+0x5c
sock_recvmsg+0x71
__sys_recvfrom+0xe5
__x64_sys_recvfrom+0x24
do_syscall_64+0x5c
entry_SYSCALL_64_after_hwframe+0x61

It's hard to be sure because the release predates seastar/seastar@d41af8b59237a949620a9e0b79541b934e0a5375. Still, it's a good indication.

avikivity · 2023-05-03T12:18:31Z

High shares in background_reclaim indicate very high memory pressure, but low cpu activity.

avikivity · 2023-05-03T12:23:33Z

Apr 28 19:03:30 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4 scylla[5319]: Reactor stalled for 8 ms on shard 1. Backtrace:
[Backtrace #1]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at ./build/release/seastar/./seastar/src/core/reactor.cc:797
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:816
seastar::internal::cpu_stall_detector::generate_trace() at ./build/release/seastar/./seastar/src/core/reactor.cc:1346
seastar::internal::cpu_stall_detector::maybe_report() at ./build/release/seastar/./seastar/src/core/reactor.cc:1123
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1140
 (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1382
?? ??:0
unsigned int utils::uleb64_decode_forwards<void (&)(char const*, unsigned long), void (&)(char const*, unsigned long)>(char const*&, void (&)(char const*, unsigned long), void (&)(char const*, unsigned long)) at ././utils/vle.hh:105
 (inlined by) logalloc::region_impl::object_descriptor::decode_forwards(char const*&) at ./utils/logalloc.cc:1624
 (inlined by) void logalloc::region_impl::for_each_live<logalloc::region_impl::compact_segment_locked(logalloc::segment*, logalloc::segment_descriptor&)::{lambda(logalloc::region_impl::object_descriptor const*, void*, unsigned long)#1}>(logalloc::segment*, logalloc::region_impl::compact_segment_locked(logalloc::segment*, logalloc::segment_descriptor&)::{lambda(logalloc::region_impl::object_descriptor const*, void*, unsigned long)#1}&&) at ./utils/logalloc.cc:1722
 (inlined by) logalloc::region_impl::compact_segment_locked(logalloc::segment*, logalloc::segment_descriptor&) at ./utils/logalloc.cc:1878
logalloc::region_impl::compact() at ./utils/logalloc.cc:2218
 (inlined by) logalloc::tracker::impl::compact_and_evict_locked(unsigned long, unsigned long, seastar::bool_class<is_preemptible_tag>) at ./utils/logalloc.cc:2699
logalloc::tracker::impl::compact_and_evict(unsigned long, unsigned long, seastar::bool_class<is_preemptible_tag>) at ./utils/logalloc.cc:2623
 (inlined by) logalloc::segment_pool::allocate_segment(unsigned long) at ./utils/logalloc.cc:1284
 (inlined by) logalloc::segment_pool::refill_emergency_reserve() at ./utils/logalloc.cc:1297
 (inlined by) logalloc::allocating_section::reserve(logalloc::tracker::impl&) at ./utils/logalloc.cc:2876
decltype(auto) logalloc::allocating_section::with_reserve<partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(logalloc::region&, partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&) at ././utils/logalloc.hh:469
 (inlined by) decltype(auto) partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::lsa_partition_reader::with_reserve<partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&) at ././partition_snapshot_reader.hh:239
 (inlined by) operator() at ././partition_snapshot_reader.hh:490
 (inlined by) seastar::internal::do_until_state<partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#1}, partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}>::run_and_dispose() at ././seastar/include/seastar/core/loop.hh:297
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2509
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2946
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3115
operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:4326
 (inlined by) void std::__invoke_impl<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>(std::__invoke_other, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>, void>::type std::__invoke_r<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:111
 (inlined by) std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94>::_M_invoke(std::_Any_data const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290
std::function<void ()>::operator()() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591
 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:73
?? ??:0

That's not a true #8828, the stall is during decode, which means it's just struggling to reclaim memory.

avikivity · 2023-05-03T15:37:16Z

Memory stress around 21:09, will cross reference with the logs:

avikivity · 2023-05-03T15:39:00Z

Also seen with cache memory usage. Certainly something strange is going on.

avikivity · 2023-05-03T15:43:38Z

Free memory. 21:09 anomaly is visible, but actually more memory is free for the bad shard.

avikivity · 2023-05-03T15:47:12Z

A #8828 stall (but I don't believe that's the main issue):

Apr 28 19:04:23 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4 scylla[5319]: Reactor stalled for 9 ms on shard 2. Backtrace:
[Backtrace #35]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at ./build/release/seastar/./seastar/src/core/reactor.cc:797
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:816
seastar::internal::cpu_stall_detector::generate_trace() at ./build/release/seastar/./seastar/src/core/reactor.cc:1346
seastar::internal::cpu_stall_detector::maybe_report() at ./build/release/seastar/./seastar/src/core/reactor.cc:1123
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1140
 (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1382
?? ??:0
utils::uleb64_express_encode_impl(char*&, unsigned long, unsigned long) at ././utils/vle.hh:73
 (inlined by) void utils::uleb64_express_encode<void (&)(char const*, unsigned long), void (&)(char const*, unsigned long)>(char*&, unsigned int, unsigned long, unsigned long, void (&)(char const*, unsigned long), void (&)(char const*, unsigned long)) at ././utils/vle.hh:82
 (inlined by) logalloc::region_impl::object_descriptor::encode(char*&, unsigned long, unsigned long) const at ./utils/logalloc.cc:1620
 (inlined by) logalloc::region_impl::alloc_small(logalloc::region_impl::object_descriptor const&, unsigned int, unsigned long) at ./utils/logalloc.cc:1705
operator() at ./utils/logalloc.cc:1879
 (inlined by) void logalloc::region_impl::for_each_live<logalloc::region_impl::compact_segment_locked(logalloc::segment*, logalloc::segment_descriptor&)::{lambda(logalloc::region_impl::object_descriptor const*, void*, unsigned long)#1}>(logalloc::segment*, logalloc::region_impl::compact_segment_locked(logalloc::segment*, logalloc::segment_descriptor&)::{lambda(logalloc::region_impl::object_descriptor const*, void*, unsigned long)#1}&&) at ./utils/logalloc.cc:1725
 (inlined by) logalloc::region_impl::compact_segment_locked(logalloc::segment*, logalloc::segment_descriptor&) at ./utils/logalloc.cc:1878
logalloc::region_impl::compact() at ./utils/logalloc.cc:2218
 (inlined by) logalloc::reclaim_from_evictable(logalloc::region_impl&, unsigned long, seastar::bool_class<is_preemptible_tag>) at ./utils/logalloc.cc:2537
logalloc::tracker::impl::compact_and_evict_locked(unsigned long, unsigned long, seastar::bool_class<is_preemptible_tag>) at ./utils/logalloc.cc:2696
logalloc::tracker::impl::compact_and_evict(unsigned long, unsigned long, seastar::bool_class<is_preemptible_tag>) at ./utils/logalloc.cc:2623
 (inlined by) logalloc::segment_pool::allocate_segment(unsigned long) at ./utils/logalloc.cc:1284
 (inlined by) logalloc::segment_pool::refill_emergency_reserve() at ./utils/logalloc.cc:1297
 (inlined by) logalloc::allocating_section::reserve(logalloc::tracker::impl&) at ./utils/logalloc.cc:2876
decltype(auto) logalloc::allocating_section::with_reserve<partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(logalloc::region&, partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&) at ././utils/logalloc.hh:469
 (inlined by) decltype(auto) partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::lsa_partition_reader::with_reserve<partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&) at ././partition_snapshot_reader.hh:239
 (inlined by) operator() at ././partition_snapshot_reader.hh:490
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}&>(partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}&) at ././seastar/include/seastar/core/future.hh:2147
 (inlined by) auto seastar::futurize_invoke<partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}&>(partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}&) at ././seastar/include/seastar/core/future.hh:2178
 (inlined by) seastar::future<void> seastar::do_until<partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}, partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#1}>(partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#1}, partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}) at ././seastar/include/seastar/core/loop.hh:343
 (inlined by) partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer() at ././partition_snapshot_reader.hh:489
operator() at ././readers/flat_mutation_reader_v2.hh:209
 (inlined by) seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > seastar::futurize<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >::invoke<flat_mutation_reader_v2::impl::consume_pausable<replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1}>(replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1})::{lambda()#1}&>(flat_mutation_reader_v2::impl::consume_pausable<replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1}>(replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1})::{lambda()#1}&) at ././seastar/include/seastar/core/future.hh:2147
 (inlined by) auto seastar::futurize_invoke<flat_mutation_reader_v2::impl::consume_pausable<replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1}>(replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1})::{lambda()#1}&>(flat_mutation_reader_v2::impl::consume_pausable<replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1}>(replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1})::{lambda()#1}&) at ././seastar/include/seastar/core/future.hh:2178
 (inlined by) seastar::internal::repeater<flat_mutation_reader_v2::impl::consume_pausable<replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1}>(replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1})::{lambda()#1}>::run_and_dispose() at ././seastar/include/seastar/core/loop.hh:73
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2509
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2946
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3115
operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:4326
 (inlined by) void std::__invoke_impl<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>(std::__invoke_other, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>, void>::type std::__invoke_r<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:111
 (inlined by) std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94>::_M_invoke(std::_Any_data const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290
std::function<void ()>::operator()() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591
 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:73
?? ??:0
?? ??:0

avikivity · 2023-05-03T15:51:20Z

Looking for the memory consumer. Execution stages are clean.

avikivity · 2023-05-03T15:52:26Z

coordinator background reads/writes are clean.

avikivity · 2023-05-03T15:53:47Z

No anomaly in memtables:

avikivity · 2023-05-03T15:56:19Z

Active reads on the bad shard go to zero 21:09.

avikivity · 2023-05-03T16:00:58Z

Something strange. On all nodes, the system reader concurrency semaphore, shard 6, reports 176MB of memory in use by active reads.

avikivity · 2023-05-03T16:01:22Z

/cc @denesb is this some leak?

avikivity · 2023-05-03T16:02:49Z

The shard that suffers the memory anomaly at 21:09 is also shard 6. Coincidence? I think not.

avikivity · 2023-05-03T16:08:28Z

QUORUM queries on all nodes all shard. Shard 6 of the bad node shows 0 for most of the time (even before the anomaly). I think it is cursed.

avikivity · 2023-05-03T16:09:11Z

At 23:10 it somehow recovers.

avikivity · 2023-05-03T16:10:27Z

Nothing special in reactor tasks pending.

avikivity · 2023-05-03T16:11:57Z

Repair rx byte rate dropped to zero when the anomaly started:

avikivity · 2023-05-03T16:12:26Z

(only on shard 6 of course)

avikivity · 2023-05-03T16:14:41Z

The logs don't show anything special at 21:09, and the trouble (in the logs) started long before that.

avikivity · 2023-05-03T16:16:53Z

21:09 is partially explained as that's when the data transfer part of repair completed.

avikivity · 2023-05-03T16:20:54Z

CPU steal time looks broken.

avikivity · 2023-05-03T16:22:27Z

But what happened at 21:09 shard 6? Repair completed, so it started offstrategy reshape. Is reshape so bad it consumed gigabytes of memory? Sounds unrealistic. /cc @raphaelsc

avikivity · 2023-05-03T16:25:33Z

Mr. 6 sees a large spike in aio writes.

So:

shard 6 repair completed early, starts reshape
not much competition, so @xemul gives it more disk bandwidth
does that cause compaction to consume more memory? unlikely

michoecho · 2023-06-15T11:31:27Z

So 66e4391 is a fix. But for older branches we need something else. I propose increasing the default priority class shares to 200.

Yes, and this is a good idea regardless of the problems it fixes. 1 is not a sane default number of shares for anything.

denesb · 2023-06-15T12:15:45Z

Why didn't we see this before?

avikivity · 2023-06-15T12:45:32Z

Maybe we did, and the reports were buried under other problems.

michoecho · 2023-06-15T13:06:57Z

Why didn't we see this before?

This test is new. Maybe the problem was there in older tests but the delays induced by it didn't happen to be quite large enough to last through all of cassandra-stress's retries. And we only noticed it when it finally caused a test to have a hard failure.

xemul · 2023-06-15T14:08:00Z

Why didn't we see this before?

We did, pretty much every customer-side "large IO delays" metrics I looked at had default class terrible latencies

mykaul · 2023-06-18T09:25:52Z

So 66e4391 is a fix. But for older branches we need something else. I propose increasing the default priority class shares to 200.

Is there a specific bug about this request (raise the default priority class shares to 200) ?

michoecho · 2023-06-18T09:45:24Z

So 66e4391 is a fix. But for older branches we need something else. I propose increasing the default priority class shares to 200.

Is there a specific bug about this request (raise the default priority class shares to 200) ?

No.

mykaul · 2023-06-18T09:53:44Z

So 66e4391 is a fix. But for older branches we need something else. I propose increasing the default priority class shares to 200.

Is there a specific bug about this request (raise the default priority class shares to 200) ?

No.

#14281

avikivity · 2023-06-18T11:50:26Z

Fixed by 66e4391 (indirectly)

The number of shares is supposed to be set in the 1-1000 range, and user-defined IO classes have shares on the order of hundreds in practice. Therefore, 1 share is not a sane default, because the default class can very easily be starved by accident. We have seen in practice for example in scylladb/scylladb#13753.

This allows us to start backporting Seastar patches. Ref #13753.

* seastar f94b1bb9cb...e45cef9ce8 (1): > reactor: change shares for default IO class from 1 to 200 Fixes #13753.

The number of shares is supposed to be set in the 1-1000 range, and user-defined IO classes have shares on the order of hundreds in practice. Therefore, 1 share is not a sane default, because the default class can very easily be starved by accident. We have seen in practice for example in scylladb/scylladb#13753. (cherry picked from commit e45cef9)

* seastar 32ab15cda6...29a0e64513 (1): > reactor: change shares for default IO class from 1 to 200 Fixes #13753. In 5.3: 37e6e65

The number of shares is supposed to be set in the 1-1000 range, and user-defined IO classes have shares on the order of hundreds in practice. Therefore, 1 share is not a sane default, because the default class can very easily be starved by accident. We have seen in practice for example in scylladb/scylladb#13753. (cherry picked from commit e45cef9)

* seastar 8d7cc3129d...5c27348333 (1): > reactor: change shares for default IO class from 1 to 200 Fixes #13753. In 5.3: 37e6e65

avikivity · 2023-06-21T18:25:33Z

Alternative fix queued for 5.1, 5.2, 5.3.

fgelcer added the triage/oss label May 3, 2023

mykaul modified the milestones: 5.3, 5.4 Jun 18, 2023

mykaul mentioned this issue Jun 18, 2023

Increase the default priority class shares to 200 (on stable branches) #14281

Closed

avikivity closed this as completed Jun 18, 2023

DoronArazii added the Backport candidate label Jun 18, 2023

michoecho mentioned this issue Jun 19, 2023

reactor: change shares for default IO class from 1 to 200 scylladb/seastar#1706

Closed

michoecho mentioned this issue Jun 21, 2023

reactor: change shares for default IO class from 1 to 200 scylladb/scylla-seastar#3

Merged

avikivity added a commit that referenced this issue Jun 21, 2023

Switch seastar submodule to scylla-seastar.git

994645c

This allows us to start backporting Seastar patches. Ref #13753.

avikivity added a commit that referenced this issue Jun 21, 2023

Update seastar submodule (default priority class shares)

37e6e65

* seastar f94b1bb9cb...e45cef9ce8 (1): > reactor: change shares for default IO class from 1 to 200 Fixes #13753.

avikivity added a commit that referenced this issue Jun 21, 2023

Update seastar submodule (default priority class shares)

73b8669

* seastar 32ab15cda6...29a0e64513 (1): > reactor: change shares for default IO class from 1 to 200 Fixes #13753. In 5.3: 37e6e65

avikivity added a commit that referenced this issue Jun 21, 2023

Update seastar submodule (default priority class shares)

342d13e

* seastar 8d7cc3129d...5c27348333 (1): > reactor: change shares for default IO class from 1 to 200 Fixes #13753. In 5.3: 37e6e65

avikivity removed the Backport candidate label Jun 21, 2023

DoronArazii removed the backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed label Jun 21, 2023

xemul mentioned this issue Jul 26, 2023

cqlsh fails with "Operation timed out for system.peers" (after nodetool refresh used) #11016

Closed

A lot of lsa-timing log messages during node replace cause c-s stucked and aborted #13753

A lot of lsa-timing log messages during node replace cause c-s stucked and aborted #13753

Comments

aleksbykov commented May 3, 2023

aleksbykov commented May 3, 2023

mykaul commented May 3, 2023

aleksbykov commented May 3, 2023

DoronArazii commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

avikivity commented May 3, 2023

michoecho commented Jun 15, 2023 • edited

denesb commented Jun 15, 2023

avikivity commented Jun 15, 2023

michoecho commented Jun 15, 2023 • edited

xemul commented Jun 15, 2023

mykaul commented Jun 18, 2023 • edited

michoecho commented Jun 18, 2023

mykaul commented Jun 18, 2023

avikivity commented Jun 18, 2023

avikivity commented Jun 21, 2023

michoecho commented Jun 15, 2023 •

edited

michoecho commented Jun 15, 2023 •

edited

mykaul commented Jun 18, 2023 •

edited