Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A lot of lsa-timing log messages during node replace cause c-s stucked and aborted #13753

Closed
aleksbykov opened this issue May 3, 2023 · 106 comments
Assignees
Labels
P2 High Priority
Milestone

Comments

@aleksbykov
Copy link
Contributor

Installation details
Scylla version (or git commit hash): 5.2.0~rc5-0.20230425.a89867d8c230 with build-id 15657fcbd406469aa63748f0b8aab3330c137b4e
Cluster size: 3
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0859cdda6c3ecfd84(eu-west-1)
instance type: i3en.2xlarge
test-id: db29e3d2-3903-44e5-bc8b-e4c3c74bc4b0
job: https://jenkins.scylladb.com/job/Perf-Regression/job/scylla-release-perf-regression-latency-650gb-with-nemesis/2/

One of operation during latency_with_nemesis write_test was Terminate and replace. Node 3 was terminated and in 5 minutes node was added and start bootstrapping. the boot strapping took about 3 hours. Also during boot strap process in node4 log there are a lot of next messages:

2023-04-28T20:17:36+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 3] lsa-timing - - segments to release: 30
2023-04-28T20:17:36+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 3] lsa-timing - - processed 12 regions: reclaimed from 1, compacted 11
2023-04-28T20:17:36+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 3] lsa-timing - - evicted memory: 3.127 MiB
2023-04-28T20:17:36+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 3] lsa-timing - - compacted segments: 72
2023-04-28T20:17:36+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 3] lsa-timing - - compacted memory: 5.512 MiB
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4  !WARNING | scylla[5319]:  [shard 4] lsa-timing - compact took 12000 us, trying to release 3.750 MiB non-preemptibly, reserve: {goal: 0, max: 30}, at 0x52e2e5e 0x52e3410 0x52e36f8 0x1bfb55d 0x1c00f7c 0x1c08600 0x19db713 0x18ac6e3 0x174fbe9 0x17e771f 0x1807509 0x18070dd 0x4f07d33 0x4f56544 0x4f577c7 0x4f785b1 0x4f288aa /opt/scylladb/libreloc/libc.so.6+0x8b14c /opt/scylladb/libreloc/libc.so.6+0x10c9ff
   --------
   seastar::lambda_task<seastar::execution_stage::flush()::$_1>
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - segments to release: 30
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - processed 1 regions: reclaimed from 1, compacted 0
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - evicted memory: 3.877 MiB
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted segments: 78
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted memory: 6.068 MiB
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4  !WARNING | scylla[5319]:  [shard 4] lsa-timing - compact took 16000 us, trying to release 3.500 MiB non-preemptibly, reserve: {goal: 0, max: 30}, at 0x52e2e5e 0x52e3410 0x52e36f8 0x1bfb55d 0x1c00f7c 0x1c08600 0x19db713 0x18ac6e3 0x174fbe9 0x17e771f 0x1807509 0x18070dd 0x4f07d33 0x4f56544 0x4f577c7 0x4f785b1 0x4f288aa /opt/scylladb/libreloc/libc.so.6+0x8b14c /opt/scylladb/libreloc/libc.so.6+0x10c9ff
   --------
   seastar::lambda_task<seastar::execution_stage::flush()::$_1>
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - segments to release: 30
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - processed 2 regions: reclaimed from 1, compacted 1
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - evicted memory: 3.500 MiB
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted segments: 74
2023-04-28T20:17:41+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted memory: 5.844 MiB
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4  !WARNING | scylla[5319]:  [shard 4] lsa-timing - compact took 12000 us, trying to release 2.625 MiB non-preemptibly, reserve: {goal: 0, max: 30}, at 0x52e2e5e 0x52e3410 0x52e36f8 0x1bfb55d 0x1c00f7c 0x1c08600 0x19db713 0x18ac6e3 0x174fbe9 0x17e771f 0x1807509 0x18070dd 0x4f07d33 0x4f56544 0x4f577c7 0x4f785b1 0x4f288aa /opt/scylladb/libreloc/libc.so.6+0x8b14c /opt/scylladb/libreloc/libc.so.6+0x10c9ff
   --------
   seastar::lambda_task<seastar::execution_stage::flush()::$_1>
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - segments to release: 30
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - processed 3 regions: reclaimed from 1, compacted 2
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - evicted memory: 2.626 MiB
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted segments: 57
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted memory: 4.524 MiB
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4  !WARNING | scylla[5319]:  [shard 4] lsa-timing - compact took 12000 us, trying to release 2.750 MiB non-preemptibly, reserve: {goal: 0, max: 30}, at 0x52e2e5e 0x52e3410 0x52e36f8 0x1bfb55d 0x1c00f7c 0x1c08600 0x19db713 0x18ac6e3 0x174fbe9 0x17e771f 0x1807509 0x18070dd 0x4f07d33 0x4f56544 0x4f577c7 0x4f785b1 0x4f288aa /opt/scylladb/libreloc/libc.so.6+0x8b14c /opt/scylladb/libreloc/libc.so.6+0x10c9ff
   --------
   seastar::lambda_task<seastar::execution_stage::flush()::$_1>
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - segments to release: 30
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - processed 2 regions: reclaimed from 1, compacted 1
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - evicted memory: 2.750 MiB
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted segments: 60
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted memory: 4.825 MiB
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4  !WARNING | scylla[5319]:  [shard 4] lsa-timing - compact took 12000 us, trying to release 3.625 MiB non-preemptibly, reserve: {goal: 0, max: 30}, at 0x52e2e5e 0x52e3410 0x52e36f8 0x1bfb55d 0x1c00f7c 0x1c08600 0x19db713 0x18ac6e3 0x174fbe9 0x17e771f 0x1807509 0x18070dd 0x4f07d33 0x4f56544 0x4f577c7 0x4f785b1 0x4f288aa /opt/scylladb/libreloc/libc.so.6+0x8b14c /opt/scylladb/libreloc/libc.so.6+0x10c9ff
   --------
   seastar::lambda_task<seastar::execution_stage::flush()::$_1>
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - segments to release: 30
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - processed 7 regions: reclaimed from 1, compacted 6
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - evicted memory: 3.501 MiB
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted segments: 80
2023-04-28T20:17:42+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted memory: 6.385 MiB
2023-04-28T20:17:44+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4  !WARNING | scylla[5319]:  [shard 4] lsa-timing - compact took 12000 us, trying to release 3.000 MiB non-preemptibly, reserve: {goal: 0, max: 30}, at 0x52e2e5e 0x52e3410 0x52e36f8 0x1bfb55d 0x1c00f7c 0x1c08600 0x1a2ad54 0x1a2a5f4 0x1a29ebb 0x205cb14 0x51ced86
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::async<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33>(seastar::thread_attributes, sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33&&)::{lambda()#2}, seastar::future<void>::then_impl_nrvo<seastar::async<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33>(seastar::thread_attributes, sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33&&)::{lambda()#2}, seastar::future<void> >(sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::async<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33>(seastar::thread_attributes, auto:1&&)::{lambda()#2}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<seastar::async<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33>(seastar::thread_attributes, sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33&&)::{lambda()#3}, false>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void>::finally_body<seastar::async<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33>(seastar::thread_attributes, sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33&&)::{lambda()#3}, false> >(seastar::future<void>::finally_body<seastar::async<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33>(seastar::thread_attributes, sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33&&)::{lambda()#3}, false>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::finally_body<seastar::async<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_33>(seastar::thread_attributes, auto:1&&)::{lambda()#3}, false>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_34, false>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void>::finally_body<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_34, false> >(seastar::future<void>::finally_body<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_34, false>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::finally_body<sstables::sstable::write_components(flat_mutation_reader_v2, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::$_34, false>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::handle_exception<replica::dirty_memory_manager::flush_one(replica::memtable_list&, replica::flush_permit&&)::$_6>(replica::dirty_memory_manager::flush_one(replica::memtable_list&, replica::flush_permit&&)::$_6&&)::{lambda(auto:1&&)#1}, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void>::handle_exception<replica::dirty_memory_manager::flush_one(replica::memtable_list&, replica::flush_permit&&)::$_6>(replica::dirty_memory_manager::flush_one(replica::memtable_list&, replica::flush_permit&&)::$_6&&)::{lambda(auto:1&&)#1}>(seastar::future<void>::handle_exception<replica::dirty_memory_manager::flush_one(replica::memtable_list&, replica::flush_permit&&)::$_6>(replica::dirty_memory_manager::flush_one(replica::memtable_list&, replica::flush_permit&&)::$_6&&)::{lambda(auto:1&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::handle_exception<replica::dirty_memory_manager::flush_one(replica::memtable_list&, replica::flush_permit&&)::$_6>(auto:1&&)::{lambda(auto:1&&)#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::handle_exception<replica::dirty_memory_manager::flush_when_needed()::$_2::operator()() const::{lambda()#2}::operator()() const::{lambda(auto:1)#1}::operator()<replica::flush_permit>(replica::flush_permit) const::{lambda(std::__exception_ptr::exception_ptr)#1}>(replica::dirty_memory_manager::flush_when_needed()::$_2::operator()() const::{lambda()#2}::operator()() const::{lambda(auto:1)#1}::operator()<replica::flush_permit>(replica::flush_permit) const::{lambda(std::__exception_ptr::exception_ptr)#1}&&)::{lambda(auto:1&&)#1}, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void>::handle_exception<replica::dirty_memory_manager::flush_when_needed()::$_2::operator()() const::{lambda()#2}::operator()() const::{lambda(auto:1)#1}::operator()<replica::flush_permit>(replica::flush_permit) const::{lambda(std::__exception_ptr::exception_ptr)#1}>(replica::dirty_memory_manager::flush_when_needed()::$_2::operator()() const::{lambda()#2}::operator()() const::{lambda(auto:1)#1}::operator()<replica::flush_permit>(replica::flush_permit) const::{lambda(std::__exception_ptr::exception_ptr)#1}&&)::{lambda(auto:1&&)#1}>(seastar::future<void>::handle_exception<replica::dirty_memory_manager::flush_when_needed()::$_2::operator()() const::{lambda()#2}::operator()() const::{lambda(auto:1)#1}::operator()<replica::flush_permit>(replica::flush_permit) const::{lambda(std::__exception_ptr::exception_ptr)#1}>(replica::dirty_memory_manager::flush_when_needed()::$_2::operator()() const::{lambda()#2}::operator()() const::{lambda(auto:1)#1}::operator()<replica::flush_permit>(replica::flush_permit) const::{lambda(std::__exception_ptr::exception_ptr)#1}&&)::{lambda(auto:1&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::handle_exception<replica::dirty_memory_manager::flush_when_needed()::$_2::operator()() const::{lambda()#2}::operator()() const::{lambda(auto:1)#1}::operator()<replica::flush_permit>(auto:1) const::{lambda(std::__exception_ptr::exception_ptr)#1}>(auto:1&&)::{lambda(auto:1&&)#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
2023-04-28T20:17:44+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - segments to release: 30
2023-04-28T20:17:44+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - processed 13 regions: reclaimed from 1, compacted 12
2023-04-28T20:17:44+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - evicted memory: 2.626 MiB
2023-04-28T20:17:44+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted segments: 68
2023-04-28T20:17:44+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] lsa-timing - - compacted memory: 5.479 MiB
2023-04-28T20:17:44+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4  !WARNING | scylla[5319]:  [shard 4] lsa-timing - compact took 12000 us, trying to release 3.375 MiB non-preemptibly, reserve: {goal: 0, max: 30}, at 0x52e2e5e 0x52e3410 0x52e36f8 0x1bfb55d 0x1c00f7c 0x1c08600 0x19db713 0x18ac6e3 0x174fbe9 0x17e771f 0x1807509 0x18070dd 0x4f07d33 0x4f56544 0x4f577c7 0x4f785b1 0x4f288aa /opt/scylladb/libreloc/libc.so.6+0x8b14c /opt/scylladb/libreloc/libc.so.6+0x10c9ff
   --------
   seastar::lambda_task<seastar::execution_stage::flush()::$_1>

2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: Reactor stalled for 6 ms on shard 5. Backtrace: 0x4f45c63 0x4f45050 0x4f46430 0x3cb4f 0x3c2f547 0x18d8ac6 0x4f56544 0x4f577c7 0x4f785b1 0x4f288aa 0x8b14c 0x10c9ff
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack: 0xffffffffffffff80 0xffffffffb1d1df8f 0xffffffffb1cc0ec7 0xffffffffb24de6bd 0xffffffffb24e0866 0xffffffffb25bde13 0xffffffffb25bec89 0xffffffffb25fd16c 0xffffffffb24ce431 0xffffffffb24d0885 0xffffffffb24d0994 0xffffffffb27b118c 0xffffffffb2800099
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack: 0xffffffffffffff80 0xffffffffb20a47ce 0xffffffffb24e5579 0xffffffffb24e572b 0xffffffffb24e58e8 0xffffffffb25bd957 0xffffffffb25bec89 0xffffffffb25fd16c 0xffffffffb24ce431 0xffffffffb24d0885 0xffffffffb24d0994 0xffffffffb27b118c 0xffffffffb2800099
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack: 0xffffffffffffff80 0xffffffffb2a0006f 0xffffffffb1ac5834 0xffffffffb27b46f3 0xffffffffb2800eeb
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack:
2023-04-28T20:19:43+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]: kernel callstack: 0xffffffffffffff80 0xffffffffb2a0006f 0xffffffffb1ac5834 0xffffffffb27b4663 0xffffffffb2800f0b

Right after Scylla was started, next error appeared in node4 log :

2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] init - Scylla version 5.2.0~rc5-0.20230425.a89867d8c230 initialization completed.
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:26.011Z","N":"http","M":"GET /gossiper/endpoint/live/","from":"10.4.1.17:52708","status":200,"bytes":36,"duration":"0ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:26.013Z","N":"http","M":"GET /storage_service/nodes/joining","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"0ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:26.014Z","N":"http","M":"GET /storage_service/nodes/leaving","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"0ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:32.006Z","N":"http","M":"GET /storage_service/host_id","from":"10.4.1.17:52708","status":200,"bytes":210,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:32.018Z","N":"http","M":"GET /storage_service/nodes/leaving","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4   !NOTICE | sudo[13563]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13563]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13563]: pam_unix(sudo:session): session closed for user root
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:35.005Z","N":"http","M":"GET /storage_service/host_id","from":"10.4.1.17:52708","status":200,"bytes":210,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:35.010Z","N":"http","M":"GET /storage_service/nodes/joining","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:35.016Z","N":"http","M":"GET /storage_service/scylla_release_version","from":"10.4.1.17:52708","status":200,"bytes":35,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 5] compaction - [Compact keyspace1.standard1 e3eec3c0-e619-11ed-858b-8cd8ac24a596] Compacting [/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7334-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7285-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7348-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7355-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7341-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7320-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7313-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7306-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7299-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7327-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7292-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7278-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7264-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7271-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7201-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7257-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7250-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7243-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7236-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7215-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7229-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7180-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7222-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7152-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7194-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7131-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7117-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7040-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7103-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7012-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7075-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-6963-big-Data.db:level=0:origin=memtable]
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:41.010Z","N":"http","M":"GET /gossiper/endpoint/live/","from":"10.4.1.17:52708","status":200,"bytes":36,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:41.013Z","N":"http","M":"GET /storage_service/nodes/joining","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:41.015Z","N":"http","M":"GET /storage_service/nodes/leaving","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:47.014Z","N":"http","M":"GET /storage_service/nodes/joining","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 3] compaction - [Compact keyspace1.standard1 edeac3b0-e619-11ed-bad7-8cd4ac24a596] Compacting [/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7199-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7227-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7241-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7234-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7248-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7255-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7220-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7213-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7262-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7206-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7178-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7185-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7171-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7192-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7164-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7157-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7094-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7150-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7269-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7143-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7136-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7087-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7129-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7122-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7115-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7108-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7045-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7017-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7073-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-6996-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-6947-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-6982-big-Data.db:level=0:origin=memtable]
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:56.011Z","N":"http","M":"GET /gossiper/endpoint/live/","from":"10.4.1.17:52708","status":200,"bytes":36,"duration":"1ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:10:56.021Z","N":"http","M":"GET /storage_service/nodes/leaving","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"5ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4   !NOTICE | sudo[13577]: scyllaadm : PWD=/home/scyllaadm ; USER=scylla ; COMMAND=/usr/bin/touch /var/lib/scylla-housekeeping/housekeeping.uuid.marked
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13577]: pam_unix(sudo:session): session opened for user scylla(uid=112) by (uid=1000)
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13577]: pam_unix(sudo:session): session closed for user scylla
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 4] compaction - [Compact keyspace1.standard1 efeb75b0-e619-11ed-9d4a-8cd7ac24a596] Compacting [/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7249-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7277-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7291-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7284-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7298-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7305-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7312-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7270-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7263-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7256-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7228-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7235-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7221-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7242-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7214-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7207-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7200-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7319-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7193-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7123-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7186-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7179-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7172-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7165-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7158-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7151-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7144-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7137-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7116-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7067-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7102-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/keyspace1/standard1-4ddaf240e5b111ed97fd55e9de3cc62e/me-7046-big-Data.db:level=0:origin=memtable]
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:11:02.014Z","N":"http","M":"GET /storage_service/host_id","from":"10.4.1.17:52708","status":200,"bytes":210,"duration":"3ms"}
2023-04-28T23:11:03+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4   !NOTICE | syslog-ng[4697]: Syslog connection established; fd='16', server='AF_INET(10.4.3.72:32768)', local='AF_INET(0.0.0.0:0)'
2023-04-28T23:11:04+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4   !NOTICE | sudo[13653]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-04-28T23:11:04+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13653]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-04-28T23:11:04+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13653]: pam_unix(sudo:session): session closed for user root
2023-04-28T23:11:10+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4   !NOTICE | sudo[13665]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/sed -i -e 's/^replace_node_first_boot:/# replace_node_first_boot:/g' /etc/scylla/scylla.yaml
2023-04-28T23:11:10+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13665]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-04-28T23:11:10+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13665]: pam_unix(sudo:session): session closed for user root
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4   !NOTICE | sudo[13667]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/grep ^SCYLLA_ARGS= /etc/default/scylla-server
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13667]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | sudo[13667]: pam_unix(sudo:session): session closed for user root
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] gossip - 60000 ms elapsed, 10.4.2.143 gossip quarantine over
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:11:11.010Z","N":"http","M":"GET /gossiper/endpoint/live/","from":"10.4.1.17:52708","status":200,"bytes":36,"duration":"0ms"}
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:11:11.013Z","N":"http","M":"GET /storage_service/nodes/joining","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"0ms"}
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla-manager-agent[6513]: {"L":"INFO","T":"2023-04-28T23:11:11.015Z","N":"http","M":"GET /storage_service/nodes/leaving","from":"10.4.1.17:52708","status":200,"bytes":2,"duration":"1ms"}
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - Set host_id=9c6c8303-17d2-4f27-8314-e36da32806f6 to be owned by node=10.4.2.143
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token -8542634246853768106. Ignoring 10.4.2.143
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token 165501024429053374. Ignoring 10.4.2.143
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token -7446035036376763312. Ignoring 10.4.2.143
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token -4339697659981632653. Ignoring 10.4.2.143
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token 2828690592628119438. Ignoring 10.4.2.143
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token 8130484774468784335. Ignoring 10.4.2.143
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token -9089163918726289559. Ignoring 10.4.2.143
2023-04-28T23:11:11+00:00 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4     !INFO | scylla[5319]:  [shard 0] storage_service - handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token -6630159402286116649. Ignoring 10.4.2.143

once handle_state_normal: Nodes 10.4.2.143 and 10.4.0.27 have the same token -6630159402286116649. Ignoring 10.4.2.143 start appearing in log, c-s start reporting next errors:

2023-04-28 23:17:07.964: (CassandraStressLogEvent Severity.CRITICAL) period_type=one-time event_id=be5165af-25c8-4b03-a19b-ea6713f6d4ae during_nemesis=RunUniqueSequence: type=OperationOnKey regex=Operation x10 on key\(s\) \[ line_number=4619 node=Node perf-latency-nemesis-ubuntu-loader-node-db29e3d2-2 [54.170.153.56 | 10.4.1.75] (seed: False)
java.io.IOException: Operation x10 on key(s) [384d4d334f3335363530]: Error executing: (WriteTimeoutException): Cassandra timeout during SIMPLE write query at consistency QUORUM (2 replica were required but only 1 acknowledged the write)
2023-04-28 23:17:07.964: (CassandraStressLogEvent Severity.CRITICAL) period_type=one-time event_id=87c9a43f-f1f2-4871-89c3-40553ae3842a during_nemesis=RunUniqueSequence: type=OperationOnKey regex=Operation x10 on key\(s\) \[ line_number=4571 node=Node perf-latency-nemesis-ubuntu-loader-node-db29e3d2-3 [52.212.240.95 | 10.4.1.10] (seed: False)
java.io.IOException: Operation x10 on key(s) [364d3937344f4e4e3431]: Error executing: (WriteTimeoutException): Cassandra timeout during SIMPLE write query at consistency QUORUM (2 replica were required but only 1 acknowledged the write)
2023-04-28 23:17:07.965: (CassandraStressLogEvent Severity.CRITICAL) period_type=one-time event_id=87c9a43f-f1f2-4871-89c3-40553ae3842a during_nemesis=RunUniqueSequence: type=OperationOnKey regex=Operation x10 on key\(s\) \[ line_number=4587 node=Node perf-latency-nemesis-ubuntu-loader-node-db29e3d2-3 [52.212.240.95 | 10.4.1.10] (seed: False)
java.io.IOException: Operation x10 on key(s) [4c32344c334b4d504e30]: Error executing: (WriteTimeoutException): Cassandra timeout during SIMPLE write query at consistency QUORUM (2 replica were required but only 1 acknowledged the write)

and test was terminated by crirical issue

db_logs: https://cloudius-jenkins-test.s3.amazonaws.com/db29e3d2-3903-44e5-bc8b-e4c3c74bc4b0/20230428_231723/db-cluster-db29e3d2.tar.gz

monitoring:
http://3.235.167.195:3000/d/Pnn1R-s4k/scylla-release-perf-regression-latency-650gb-with-nemesis-scylla-per-server-metrics-nemesis-master?from=1682676687923&to=1682726481734&var-by=dc&var-cluster=&var-dc=All&var-node=All&var-shard=All&var-mount_point=&var-func=sum&var-dash_version=5-2&var-all_scyllas_versions=All&var-count_dc=All&var-scylla_version=5.2&var-monitoring_version=4.2.1&orgId=1
http://3.235.167.195:3000/d/advanced-5-2/advanced?orgId=1&refresh=30s&from=1682676687923&to=1682726481734

Similar happened also during test_read when new node was bootstrapped with replace and during bootstrapping new nodes
db_logs: https://cloudius-jenkins-test.s3.amazonaws.com/cc78a106-e5b6-4b9a-bfca-44c1dce36118/20230429_030853/db-cluster-cc78a106.tar.gz

@aleksbykov
Copy link
Contributor Author

Screenshot from 2023-05-03 14-42-10

Green lines on graf is bootstrapped with replace new node

@mykaul
Copy link
Contributor

mykaul commented May 3, 2023

@aleksbykov - can you describe the topology and the workload? Specifically, how many nodes and what was the workload? If I'm seeing correctly, you only had 3 nodes and you've replaced one of them? If the load is too high, I reckon the other 2 'live' nodes could not sustain it (just a theory, looking at the above - haven't looked at the logs or monitor yet!)

@aleksbykov
Copy link
Contributor Author

topology is 3 nodes. 1 node was terminated and was replaced with new one.
next command are running from 4 loaders for write test:

stress_cmd_w: "cassandra-stress write no-warmup cl=QUORUM duration=800m -schema 'replication(factor=3)' -mode cql3 native -rate 'threads=250 fixed=20332/s' -col 'size=FIXED(128) n=FIXED(8)' -pop 'dist=gauss(1..650000000,325000000,9750000)' "

@DoronArazii
Copy link

@avikivity / @tgrabiec can you please give some directions

@avikivity
Copy link
Member

Suspecting #8828.

@avikivity
Copy link
Member

I see many tcp related kernel stalls, but also #8828 stalls.

[Backtrace #19]
0xffffffffffffff80 (1065761408 bytes after last symbol)
clear_page_erms+0x7
get_page_from_freelist+0x353
__alloc_pages+0x17e
alloc_pages_vma+0x29e
do_huge_pmd_anonymous_page+0x13c
__handle_mm_fault+0x6e7
handle_mm_fault+0xd8
do_user_addr_fault+0x1c9
exc_page_fault+0x77
asm_exc_page_fault+0x27
copy_user_enhanced_fast_string+0xe
simple_copy_to_iter+0x39
__skb_datagram_iter+0x78
skb_copy_datagram_iter+0x38
tcp_recvmsg_locked+0x2a7
tcp_recvmsg+0x79
inet_recvmsg+0x5c
sock_recvmsg+0x71
__sys_recvfrom+0xe5
__x64_sys_recvfrom+0x24
do_syscall_64+0x5c
entry_SYSCALL_64_after_hwframe+0x61

It's hard to be sure because the release predates seastar/seastar@d41af8b59237a949620a9e0b79541b934e0a5375. Still, it's a good indication.

@avikivity
Copy link
Member

High shares in background_reclaim indicate very high memory pressure, but low cpu activity.

image

@avikivity
Copy link
Member

Apr 28 19:03:30 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4 scylla[5319]: Reactor stalled for 8 ms on shard 1. Backtrace:
[Backtrace #1]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at ./build/release/seastar/./seastar/src/core/reactor.cc:797
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:816
seastar::internal::cpu_stall_detector::generate_trace() at ./build/release/seastar/./seastar/src/core/reactor.cc:1346
seastar::internal::cpu_stall_detector::maybe_report() at ./build/release/seastar/./seastar/src/core/reactor.cc:1123
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1140
 (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1382
?? ??:0
unsigned int utils::uleb64_decode_forwards<void (&)(char const*, unsigned long), void (&)(char const*, unsigned long)>(char const*&, void (&)(char const*, unsigned long), void (&)(char const*, unsigned long)) at ././utils/vle.hh:105
 (inlined by) logalloc::region_impl::object_descriptor::decode_forwards(char const*&) at ./utils/logalloc.cc:1624
 (inlined by) void logalloc::region_impl::for_each_live<logalloc::region_impl::compact_segment_locked(logalloc::segment*, logalloc::segment_descriptor&)::{lambda(logalloc::region_impl::object_descriptor const*, void*, unsigned long)#1}>(logalloc::segment*, logalloc::region_impl::compact_segment_locked(logalloc::segment*, logalloc::segment_descriptor&)::{lambda(logalloc::region_impl::object_descriptor const*, void*, unsigned long)#1}&&) at ./utils/logalloc.cc:1722
 (inlined by) logalloc::region_impl::compact_segment_locked(logalloc::segment*, logalloc::segment_descriptor&) at ./utils/logalloc.cc:1878
logalloc::region_impl::compact() at ./utils/logalloc.cc:2218
 (inlined by) logalloc::tracker::impl::compact_and_evict_locked(unsigned long, unsigned long, seastar::bool_class<is_preemptible_tag>) at ./utils/logalloc.cc:2699
logalloc::tracker::impl::compact_and_evict(unsigned long, unsigned long, seastar::bool_class<is_preemptible_tag>) at ./utils/logalloc.cc:2623
 (inlined by) logalloc::segment_pool::allocate_segment(unsigned long) at ./utils/logalloc.cc:1284
 (inlined by) logalloc::segment_pool::refill_emergency_reserve() at ./utils/logalloc.cc:1297
 (inlined by) logalloc::allocating_section::reserve(logalloc::tracker::impl&) at ./utils/logalloc.cc:2876
decltype(auto) logalloc::allocating_section::with_reserve<partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(logalloc::region&, partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&) at ././utils/logalloc.hh:469
 (inlined by) decltype(auto) partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::lsa_partition_reader::with_reserve<partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&) at ././partition_snapshot_reader.hh:239
 (inlined by) operator() at ././partition_snapshot_reader.hh:490
 (inlined by) seastar::internal::do_until_state<partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#1}, partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}>::run_and_dispose() at ././seastar/include/seastar/core/loop.hh:297
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2509
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2946
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3115
operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:4326
 (inlined by) void std::__invoke_impl<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>(std::__invoke_other, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>, void>::type std::__invoke_r<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:111
 (inlined by) std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94>::_M_invoke(std::_Any_data const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290
std::function<void ()>::operator()() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591
 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:73
?? ??:0

That's not a true #8828, the stall is during decode, which means it's just struggling to reclaim memory.

@avikivity
Copy link
Member

Memory stress around 21:09, will cross reference with the logs:

image

@avikivity
Copy link
Member

Also seen with cache memory usage. Certainly something strange is going on.

image

@avikivity
Copy link
Member

Free memory. 21:09 anomaly is visible, but actually more memory is free for the bad shard.

image

@avikivity
Copy link
Member

A #8828 stall (but I don't believe that's the main issue):

Apr 28 19:04:23 perf-latency-nemesis-ubuntu-db-node-db29e3d2-4 scylla[5319]: Reactor stalled for 9 ms on shard 2. Backtrace:
[Backtrace #35]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at ./build/release/seastar/./seastar/src/core/reactor.cc:797
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:816
seastar::internal::cpu_stall_detector::generate_trace() at ./build/release/seastar/./seastar/src/core/reactor.cc:1346
seastar::internal::cpu_stall_detector::maybe_report() at ./build/release/seastar/./seastar/src/core/reactor.cc:1123
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1140
 (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1382
?? ??:0
utils::uleb64_express_encode_impl(char*&, unsigned long, unsigned long) at ././utils/vle.hh:73
 (inlined by) void utils::uleb64_express_encode<void (&)(char const*, unsigned long), void (&)(char const*, unsigned long)>(char*&, unsigned int, unsigned long, unsigned long, void (&)(char const*, unsigned long), void (&)(char const*, unsigned long)) at ././utils/vle.hh:82
 (inlined by) logalloc::region_impl::object_descriptor::encode(char*&, unsigned long, unsigned long) const at ./utils/logalloc.cc:1620
 (inlined by) logalloc::region_impl::alloc_small(logalloc::region_impl::object_descriptor const&, unsigned int, unsigned long) at ./utils/logalloc.cc:1705
operator() at ./utils/logalloc.cc:1879
 (inlined by) void logalloc::region_impl::for_each_live<logalloc::region_impl::compact_segment_locked(logalloc::segment*, logalloc::segment_descriptor&)::{lambda(logalloc::region_impl::object_descriptor const*, void*, unsigned long)#1}>(logalloc::segment*, logalloc::region_impl::compact_segment_locked(logalloc::segment*, logalloc::segment_descriptor&)::{lambda(logalloc::region_impl::object_descriptor const*, void*, unsigned long)#1}&&) at ./utils/logalloc.cc:1725
 (inlined by) logalloc::region_impl::compact_segment_locked(logalloc::segment*, logalloc::segment_descriptor&) at ./utils/logalloc.cc:1878
logalloc::region_impl::compact() at ./utils/logalloc.cc:2218
 (inlined by) logalloc::reclaim_from_evictable(logalloc::region_impl&, unsigned long, seastar::bool_class<is_preemptible_tag>) at ./utils/logalloc.cc:2537
logalloc::tracker::impl::compact_and_evict_locked(unsigned long, unsigned long, seastar::bool_class<is_preemptible_tag>) at ./utils/logalloc.cc:2696
logalloc::tracker::impl::compact_and_evict(unsigned long, unsigned long, seastar::bool_class<is_preemptible_tag>) at ./utils/logalloc.cc:2623
 (inlined by) logalloc::segment_pool::allocate_segment(unsigned long) at ./utils/logalloc.cc:1284
 (inlined by) logalloc::segment_pool::refill_emergency_reserve() at ./utils/logalloc.cc:1297
 (inlined by) logalloc::allocating_section::reserve(logalloc::tracker::impl&) at ./utils/logalloc.cc:2876
decltype(auto) logalloc::allocating_section::with_reserve<partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(logalloc::region&, partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&) at ././utils/logalloc.hh:469
 (inlined by) decltype(auto) partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::lsa_partition_reader::with_reserve<partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&) at ././partition_snapshot_reader.hh:239
 (inlined by) operator() at ././partition_snapshot_reader.hh:490
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}&>(partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}&) at ././seastar/include/seastar/core/future.hh:2147
 (inlined by) auto seastar::futurize_invoke<partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}&>(partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}&) at ././seastar/include/seastar/core/future.hh:2178
 (inlined by) seastar::future<void> seastar::do_until<partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}, partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#1}>(partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#1}, partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer()::{lambda()#2}) at ././seastar/include/seastar/core/loop.hh:343
 (inlined by) partition_snapshot_flat_reader<false, replica::partition_snapshot_flush_accounter>::fill_buffer() at ././partition_snapshot_reader.hh:489
operator() at ././readers/flat_mutation_reader_v2.hh:209
 (inlined by) seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > seastar::futurize<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >::invoke<flat_mutation_reader_v2::impl::consume_pausable<replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1}>(replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1})::{lambda()#1}&>(flat_mutation_reader_v2::impl::consume_pausable<replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1}>(replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1})::{lambda()#1}&) at ././seastar/include/seastar/core/future.hh:2147
 (inlined by) auto seastar::futurize_invoke<flat_mutation_reader_v2::impl::consume_pausable<replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1}>(replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1})::{lambda()#1}&>(flat_mutation_reader_v2::impl::consume_pausable<replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1}>(replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1})::{lambda()#1}&) at ././seastar/include/seastar/core/future.hh:2178
 (inlined by) seastar::internal::repeater<flat_mutation_reader_v2::impl::consume_pausable<replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1}>(replica::flush_reader::fill_buffer()::{lambda()#2}::operator()() const::{lambda(mutation_fragment_v2)#1})::{lambda()#1}>::run_and_dispose() at ././seastar/include/seastar/core/loop.hh:73
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2509
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2946
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3115
operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:4326
 (inlined by) void std::__invoke_impl<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>(std::__invoke_other, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>, void>::type std::__invoke_r<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:111
 (inlined by) std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94>::_M_invoke(std::_Any_data const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290
std::function<void ()>::operator()() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591
 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:73
?? ??:0
?? ??:0

@avikivity
Copy link
Member

Looking for the memory consumer. Execution stages are clean.

@avikivity
Copy link
Member

coordinator background reads/writes are clean.

@avikivity
Copy link
Member

No anomaly in memtables:

image

@avikivity
Copy link
Member

Active reads on the bad shard go to zero 21:09.

image

@avikivity
Copy link
Member

Something strange. On all nodes, the system reader concurrency semaphore, shard 6, reports 176MB of memory in use by active reads.

image

@avikivity
Copy link
Member

/cc @denesb is this some leak?

@avikivity
Copy link
Member

The shard that suffers the memory anomaly at 21:09 is also shard 6. Coincidence? I think not.

@avikivity
Copy link
Member

QUORUM queries on all nodes all shard. Shard 6 of the bad node shows 0 for most of the time (even before the anomaly). I think it is cursed.

image

@avikivity
Copy link
Member

At 23:10 it somehow recovers.

@avikivity
Copy link
Member

Nothing special in reactor tasks pending.

@avikivity
Copy link
Member

Repair rx byte rate dropped to zero when the anomaly started:

image

@avikivity
Copy link
Member

(only on shard 6 of course)

@avikivity
Copy link
Member

The logs don't show anything special at 21:09, and the trouble (in the logs) started long before that.

@avikivity
Copy link
Member

21:09 is partially explained as that's when the data transfer part of repair completed.

image

@avikivity
Copy link
Member

CPU steal time looks broken.

image

@avikivity
Copy link
Member

But what happened at 21:09 shard 6? Repair completed, so it started offstrategy reshape. Is reshape so bad it consumed gigabytes of memory? Sounds unrealistic. /cc @raphaelsc

@avikivity
Copy link
Member

Mr. 6 sees a large spike in aio writes.

So:

  1. shard 6 repair completed early, starts reshape
  2. not much competition, so @xemul gives it more disk bandwidth
  3. does that cause compaction to consume more memory? unlikely

@michoecho
Copy link
Contributor

michoecho commented Jun 15, 2023

So 66e4391 is a fix. But for older branches we need something else. I propose increasing the default priority class shares to 200.

Yes, and this is a good idea regardless of the problems it fixes. 1 is not a sane default number of shares for anything.

@denesb
Copy link
Contributor

denesb commented Jun 15, 2023

Why didn't we see this before?

@avikivity
Copy link
Member

Maybe we did, and the reports were buried under other problems.

@michoecho
Copy link
Contributor

michoecho commented Jun 15, 2023

Why didn't we see this before?

This test is new. Maybe the problem was there in older tests but the delays induced by it didn't happen to be quite large enough to last through all of cassandra-stress's retries. And we only noticed it when it finally caused a test to have a hard failure.

@xemul
Copy link
Contributor

xemul commented Jun 15, 2023

Why didn't we see this before?

We did, pretty much every customer-side "large IO delays" metrics I looked at had default class terrible latencies

@mykaul mykaul modified the milestones: 5.3, 5.4 Jun 18, 2023
@mykaul
Copy link
Contributor

mykaul commented Jun 18, 2023

So 66e4391 is a fix. But for older branches we need something else. I propose increasing the default priority class shares to 200.

Is there a specific bug about this request (raise the default priority class shares to 200) ?

@michoecho
Copy link
Contributor

So 66e4391 is a fix. But for older branches we need something else. I propose increasing the default priority class shares to 200.

Is there a specific bug about this request (raise the default priority class shares to 200) ?

No.

@mykaul
Copy link
Contributor

mykaul commented Jun 18, 2023

So 66e4391 is a fix. But for older branches we need something else. I propose increasing the default priority class shares to 200.

Is there a specific bug about this request (raise the default priority class shares to 200) ?

No.

#14281

@avikivity
Copy link
Member

Fixed by 66e4391 (indirectly)

michoecho added a commit to michoecho/seastar that referenced this issue Jun 19, 2023
The number of shares is supposed to be set in the 1-1000 range,
and user-defined IO classes have shares on the order of hundreds
in practice.

Therefore, 1 share is not a sane default, because the default class
can very easily be starved by accident.
We have seen in practice for example in scylladb/scylladb#13753.
michoecho added a commit to michoecho/scylla-seastar that referenced this issue Jun 21, 2023
The number of shares is supposed to be set in the 1-1000 range,
and user-defined IO classes have shares on the order of hundreds
in practice.

Therefore, 1 share is not a sane default, because the default class
can very easily be starved by accident.
We have seen in practice for example in scylladb/scylladb#13753.
michoecho added a commit to michoecho/scylla-seastar that referenced this issue Jun 21, 2023
The number of shares is supposed to be set in the 1-1000 range,
and user-defined IO classes have shares on the order of hundreds
in practice.

Therefore, 1 share is not a sane default, because the default class
can very easily be starved by accident.
We have seen in practice for example in scylladb/scylladb#13753.
avikivity pushed a commit to scylladb/scylla-seastar that referenced this issue Jun 21, 2023
The number of shares is supposed to be set in the 1-1000 range,
and user-defined IO classes have shares on the order of hundreds
in practice.

Therefore, 1 share is not a sane default, because the default class
can very easily be starved by accident.
We have seen in practice for example in scylladb/scylladb#13753.
avikivity added a commit that referenced this issue Jun 21, 2023
This allows us to start backporting Seastar patches.

Ref #13753.
avikivity added a commit that referenced this issue Jun 21, 2023
* seastar f94b1bb9cb...e45cef9ce8 (1):
  > reactor: change shares for default IO class from 1 to 200

Fixes #13753.
avikivity pushed a commit to scylladb/scylla-seastar that referenced this issue Jun 21, 2023
The number of shares is supposed to be set in the 1-1000 range,
and user-defined IO classes have shares on the order of hundreds
in practice.

Therefore, 1 share is not a sane default, because the default class
can very easily be starved by accident.
We have seen in practice for example in scylladb/scylladb#13753.

(cherry picked from commit e45cef9)
avikivity added a commit that referenced this issue Jun 21, 2023
* seastar 32ab15cda6...29a0e64513 (1):
  > reactor: change shares for default IO class from 1 to 200

Fixes #13753.

In 5.3: 37e6e65
avikivity pushed a commit to scylladb/scylla-seastar that referenced this issue Jun 21, 2023
The number of shares is supposed to be set in the 1-1000 range,
and user-defined IO classes have shares on the order of hundreds
in practice.

Therefore, 1 share is not a sane default, because the default class
can very easily be starved by accident.
We have seen in practice for example in scylladb/scylladb#13753.

(cherry picked from commit e45cef9)
avikivity added a commit that referenced this issue Jun 21, 2023
* seastar 8d7cc3129d...5c27348333 (1):
  > reactor: change shares for default IO class from 1 to 200

Fixes #13753.

In 5.3: 37e6e65
@avikivity
Copy link
Member

Alternative fix queued for 5.1, 5.2, 5.3.

@DoronArazii DoronArazii removed the backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed label Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 High Priority
Projects
None yet
Development

No branches or pull requests