Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dtest-raft] test_single_column_blob_max_size_with_cdc_preimage_full_postimage fails with [lsa - Aborting due to allocation failure] #15278

Closed
fruch opened this issue Sep 5, 2023 · 27 comments · Fixed by #15581
Assignees
Labels
area/cdc symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework tests/dtest
Milestone

Comments

@fruch
Copy link
Contributor

fruch commented Sep 5, 2023

test_single_column_blob_max_size_with_cdc_preimage_full_postimage fails from time to time, with the following error:

failed on teardown with "AssertionError: Critical errors found: [('node1', ['ERROR 2023-09-04 20:30:12,699 [shard 0:stat] lsa - Aborting due to allocation failure', 'Aborting on shard 0.'])]
Other errors: [('node1', ['ERROR 2023-09-04 20:30:12,699 [shard 0:stat] lsa - Aborting due to allocation failure'])]"

It's with raft enabled (the other cases can be verified, since they happened long more then 2 weeks ago)

INFO  2023-09-04 20:29:45,224 [shard 1:stat] reader_concurrency_semaphore - Semaphore _read_concurrency_sem with 2/100 count and 36878503/9688842 memory resources: kill limit triggered, dumping permit diagnostics:
permits	count	memory	table/operation/state
1	1	28M	ks.cf_scylla_cdc_log/shard-reader/active/need_cpu
1	0	0B	ks.cf_scylla_cdc_log/multishard-mutation-query/active

2	1	28M	total

Stats:
permit_based_evictions: 4
time_based_evictions: 0
inactive_reads: 1
total_successful_reads: 11
total_failed_reads: 0
total_reads_shed_due_to_overload: 0
total_reads_killed_due_to_kill_limit: 1
reads_admitted: 41
reads_enqueued_for_admission: 0
reads_enqueued_for_memory: 0
reads_admitted_immediately: 41
reads_queued_because_ready_list: 0
reads_queued_because_need_cpu_permits: 0
reads_queued_because_memory_resources: 0
reads_queued_because_count_resources: 0
reads_queued_with_eviction: 0
total_permits: 50
current_permits: 3
need_cpu_permits: 1
awaits_permits: 0
disk_reads: 1
sstables_read: 2
ERROR 2023-09-04 20:29:45,225 [shard 1:stat] storage_proxy - Exception when communicating with 127.0.37.1, to read from ks.cf_scylla_cdc_log: std::bad_alloc (std::bad_alloc)
ERROR 2023-09-04 20:29:45,748 [shard 1:stat] storage_proxy - Exception when communicating with 127.0.37.1, to read from ks.cf_scylla_cdc_log: std::bad_alloc (std::bad_alloc)
ERROR 2023-09-04 20:29:46,273 [shard 0:stat] storage_proxy - Exception when communicating with 127.0.37.1, to read from ks.cf_scylla_cdc_log: std::bad_alloc (std::bad_alloc)

...

ERROR 2023-09-04 20:30:12,699 [shard 0:stat] lsa - Aborting due to allocation failure
Aborting on shard 0.
Backtrace:
  0x58f0598
  0x59265b2
  /jenkins/workspace/scylla-staging/dtest-pytest-gating/scylla/.ccm/scylla-repository/9a3d57256a60b3ca9e920b4e8a62e8c52555dd82/libreloc/libc.so.6+0x3db6f
  /jenkins/workspace/scylla-staging/dtest-pytest-gating/scylla/.ccm/scylla-repository/9a3d57256a60b3ca9e920b4e8a62e8c52555dd82/libreloc/libc.so.6+0x8e843
  /jenkins/workspace/scylla-staging/dtest-pytest-gating/scylla/.ccm/scylla-repository/9a3d57256a60b3ca9e920b4e8a62e8c52555dd82/libreloc/libc.so.6+0x3dabd
  /jenkins/workspace/scylla-staging/dtest-pytest-gating/scylla/.ccm/scylla-repository/9a3d57256a60b3ca9e920b4e8a62e8c52555dd82/libreloc/libc.so.6+0x2687e
  0x1fc0eaa
  0x1c17b06
  0x1c16a51
  0x1c162fe
  0x202f4e2
  0x20303af
  0x203d25a
  0x203f658
  0x5901e4f
  0x5903127
  0x5902499
  0x58a4d77
  0x58a3f2c
  0x1281d7f
  0x12837f0
  0x128030e
  /jenkins/workspace/scylla-staging/dtest-pytest-gating/scylla/.ccm/scylla-repository/9a3d57256a60b3ca9e920b4e8a62e8c52555dd82/libreloc/libc.so.6+0x27b49
  /jenkins/workspace/scylla-staging/dtest-pytest-gating/scylla/.ccm/scylla-repository/9a3d57256a60b3ca9e920b4e8a62e8c52555dd82/libreloc/libc.so.6+0x27c0a
  0x127e0a4

decoded backtrace:

[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:64
 (inlined by) seastar::backtrace_buffer::append_backtrace() at ./build/release/seastar/./seastar/src/core/reactor.cc:821
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:851
seastar::print_with_backtrace(char const*, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:863
 (inlined by) seastar::sigabrt_action() at ./build/release/seastar/./seastar/src/core/reactor.cc:4026
 (inlined by) operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:4002
 (inlined by) __invoke at ./build/release/seastar/./seastar/src/core/reactor.cc:3998
/data/scylla-s3-reloc.cache/by-build-id/4c60ca8c4afe862855242a77b77b35a377e373dc/extracted/scylla/libreloc/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=245240a31888ad5c11bbc55b18e02d87388f59a9, for GNU/Linux 3.2.0, not stripped

__GI___sigaction at :?
__pthread_kill_implementation at ??:?
__GI_raise at :?
__GI_abort at :?
logalloc::allocating_section::reserve(logalloc::tracker::impl&) at ./utils/logalloc.cc:2887
 (inlined by) logalloc::allocating_section::on_alloc_failure(logalloc::region&) at ./utils/logalloc.cc:2902
decltype(auto) logalloc::allocating_section::with_reclaiming_disabled<partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::lsa_partition_reader::in_alloc_section<partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::lsa_partition_reader::next_interval(nonwrapping_interval<clustering_key_prefix> const&)::{lambda()#1}>(partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::lsa_partition_reader::next_interval(nonwrapping_interval<clustering_key_prefix> const&)::{lambda()#1}&&)::{lambda()#1}>(logalloc::region&, partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::lsa_partition_reader::in_alloc_section<partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::lsa_partition_reader::next_interval(nonwrapping_interval<clustering_key_prefix> const&)::{lambda()#1}>(partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::lsa_partition_reader::next_interval(nonwrapping_interval<clustering_key_prefix> const&)::{lambda()#1}&&)::{lambda()#1}) at ././utils/logalloc.hh:501
 (inlined by) decltype(auto) partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::lsa_partition_reader::in_alloc_section<partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::lsa_partition_reader::next_interval(nonwrapping_interval<clustering_key_prefix> const&)::{lambda()#1}>(partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::lsa_partition_reader::next_interval(nonwrapping_interval<clustering_key_prefix> const&)::{lambda()#1}&&) at ././partition_snapshot_reader.hh:72
 (inlined by) partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::lsa_partition_reader::next_interval(nonwrapping_interval<clustering_key_prefix> const&) at ././partition_snapshot_reader.hh:116
 (inlined by) partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::emit_next_interval() at ././partition_snapshot_reader.hh:188
partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::do_fill_buffer() at ././partition_snapshot_reader.hh:240
 (inlined by) operator() at ././partition_snapshot_reader.hh:273
 (inlined by) decltype(auto) logalloc::allocating_section::with_reserve<partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(logalloc::region&, partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&) at ././utils/logalloc.hh:470
decltype(auto) partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::lsa_partition_reader::with_reserve<partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&) at ././partition_snapshot_reader.hh:100
 (inlined by) operator() at ././partition_snapshot_reader.hh:267
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::fill_buffer()::{lambda()#2}&>(partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::fill_buffer()::{lambda()#2}&) at ././seastar/include/seastar/core/future.hh:2006
 (inlined by) auto seastar::futurize_invoke<partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::fill_buffer()::{lambda()#2}&>(partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::fill_buffer()::{lambda()#2}&) at ././seastar/include/seastar/core/future.hh:2037
 (inlined by) seastar::future<void> seastar::do_until<partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::fill_buffer()::{lambda()#2}, partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::fill_buffer()::{lambda()#1}>(partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::fill_buffer()::{lambda()#1}, partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::fill_buffer()::{lambda()#2}) at ././seastar/include/seastar/core/loop.hh:349
 (inlined by) partition_snapshot_flat_reader<false, replica::partition_snapshot_read_accounter>::fill_buffer() at ././partition_snapshot_reader.hh:266
flat_mutation_reader_v2::impl::operator()() at ././readers/flat_mutation_reader_v2.hh:194
 (inlined by) flat_mutation_reader_v2::operator()() at ././readers/flat_mutation_reader_v2.hh:410
 (inlined by) mutation_reader_merger::prepare_one(mutation_reader_merger::reader_and_last_fragment_kind, seastar::bool_class<mutation_reader_merger::reader_galloping_tag>) at ./readers/combined.cc:384
 (inlined by) operator() at ./readers/combined.cc:375
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<mutation_reader_merger::prepare_next()::$_1, mutation_reader_merger::reader_and_last_fragment_kind&>(mutation_reader_merger::prepare_next()::$_1&&, mutation_reader_merger::reader_and_last_fragment_kind&) at ././seastar/include/seastar/core/future.hh:2006
 (inlined by) auto seastar::futurize_invoke<mutation_reader_merger::prepare_next()::$_1, mutation_reader_merger::reader_and_last_fragment_kind&>(mutation_reader_merger::prepare_next()::$_1&&, mutation_reader_merger::reader_and_last_fragment_kind&) at ././seastar/include/seastar/core/future.hh:2037
 (inlined by) seastar::future<void> seastar::parallel_for_each<mutation_reader_merger::reader_and_last_fragment_kind*, mutation_reader_merger::reader_and_last_fragment_kind*, mutation_reader_merger::prepare_next()::$_1>(mutation_reader_merger::reader_and_last_fragment_kind*, mutation_reader_merger::reader_and_last_fragment_kind*, mutation_reader_merger::prepare_next()::$_1&&) at ././seastar/include/seastar/core/loop.hh:575
 (inlined by) seastar::future<void> seastar::internal::parallel_for_each_impl<utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_1>(utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_1&&) at ././seastar/include/seastar/core/loop.hh:628
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<seastar::future<void> (*&)(utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_1&&), utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_1>(seastar::future<void> (*&)(utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_1&&), utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_1&&) at ././seastar/include/seastar/core/future.hh:2006
 (inlined by) auto seastar::futurize_invoke<seastar::future<void> (*&)(utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_1&&), utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_1>(seastar::future<void> (*&)(utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_1&&), utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_1&&) at ././seastar/include/seastar/core/future.hh:2037
 (inlined by) seastar::future<void> seastar::parallel_for_each<utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_1>(utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_1&&) at ././seastar/include/seastar/core/loop.hh:643
 (inlined by) mutation_reader_merger::prepare_next() at ./readers/combined.cc:374
mutation_reader_merger::maybe_produce_batch() at ./readers/combined.cc:511
operator() at ./readers/combined.cc:474
 (inlined by) seastar::future<std::optional<boost::iterator_range<mutation_fragment_and_stream_id*> > > seastar::futurize<seastar::future<std::optional<boost::iterator_range<mutation_fragment_and_stream_id*> > > >::invoke<mutation_reader_merger::operator()()::$_0&>(mutation_reader_merger::operator()()::$_0&) at ././seastar/include/seastar/core/future.hh:2006
 (inlined by) seastar::repeat_until_value_type_helper<seastar::futurize<std::invoke_result<mutation_reader_merger::operator()()::$_0>::type>::type>::future_type seastar::repeat_until_value<mutation_reader_merger::operator()()::$_0>(mutation_reader_merger::operator()()::$_0) at ././seastar/include/seastar/core/loop.hh:244
 (inlined by) mutation_reader_merger::operator()() at ./readers/combined.cc:474
 (inlined by) operator() at ./readers/combined.cc:97
 (inlined by) seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > seastar::futurize<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >::invoke<mutation_fragment_merger<mutation_reader_merger>::operator()()::{lambda()#1}&>(mutation_fragment_merger<mutation_reader_merger>::operator()()::{lambda()#1}&) at ././seastar/include/seastar/core/future.hh:2006
 (inlined by) seastar::future<void> seastar::repeat<mutation_fragment_merger<mutation_reader_merger>::operator()()::{lambda()#1}>(mutation_fragment_merger<mutation_reader_merger>::operator()()::{lambda()#1}&&) at ././seastar/include/seastar/core/loop.hh:126
mutation_fragment_merger<mutation_reader_merger>::operator()() at ./readers/combined.cc:96
 (inlined by) operator() at ./readers/combined.cc:625
 (inlined by) seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > seastar::futurize<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >::invoke<merging_reader<mutation_reader_merger>::fill_buffer()::{lambda()#1}&>(merging_reader<mutation_reader_merger>::fill_buffer()::{lambda()#1}&) at ././seastar/include/seastar/core/future.hh:2006
 (inlined by) auto seastar::futurize_invoke<merging_reader<mutation_reader_merger>::fill_buffer()::{lambda()#1}&>(merging_reader<mutation_reader_merger>::fill_buffer()::{lambda()#1}&) at ././seastar/include/seastar/core/future.hh:2037
 (inlined by) seastar::internal::repeater<merging_reader<mutation_reader_merger>::fill_buffer()::{lambda()#1}>::run_and_dispose() at ././seastar/include/seastar/core/loop.hh:79
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2647
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:3110
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3279
seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3162
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:276
seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:167
scylla_main(int, char**) at ./main.cc:631
std::function<int (int, char**)>::operator()(int, char**) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591
main at ./main.cc:2007
__libc_start_call_main at ??:?
__libc_start_main_alias_2 at :?
_start at ??:?

https://jenkins.scylladb.com/job/scylla-staging/job/dtest-pytest-gating/57/testReport/junit/test_cdc_large_values/TestLargeColumnsWithCDC/FullDtest___full_split000___test_single_column_blob_max_size_with_cdc_preimage_full_postimage_prepared_statements__2/

Logs

https://jenkins.scylladb.com/job/scylla-staging/job/dtest-pytest-gating/57/artifact/logs-full.release.000/1693859448888_test_cdc_large_values.py%3A%3ATestLargeColumnsWithCDC%3A%3Atest_single_column_blob_max_size_with_cdc_preimage_full_postimage%5Bprepared_statements%5D/

@fruch fruch added bad_alloc tests/dtest area/cdc triage/master Looking for assignee symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework labels Sep 5, 2023
@mykaul
Copy link
Contributor

mykaul commented Sep 7, 2023

@bhalevy - does that belong to your team?

@mykaul
Copy link
Contributor

mykaul commented Sep 13, 2023

ping @bhalevy

@bhalevy
Copy link
Member

bhalevy commented Sep 13, 2023

@bhalevy - does that belong to your team?

I don't know. We need to understand the root cause first.
Since this is with raft (and flavored with cdc on top), I suggest that @kostja / @kbr-scylla will look into it first to determine what exactly caused the allocation failure.

@kbr-scylla
Copy link
Contributor

It's an OOM during table scan query, the table has some columns with large blobs (the test is test_single_column_blob_max_size_with_cdc_preimage_full_postimage after all). Doesn't seem related to Raft at all.

The table is a CDC log table, but I'm not sure if it matters, the table just contains large data blobs which could happen with regular tables too, and then we're fetching those blobs during table scans.

It could be some regression in the readers (not sure if there were any changes there recently?) or perhaps something elsewhere is creating large memory pressure while the readers are running.

@fruch do we know when this first started appearing?

It's with raft enabled (the other cases can be verified, since they happened long more then 2 weeks ago)

Not sure I understand -- so it might have happened without Raft, but we don't know?

@kbr-scylla
Copy link
Contributor

20:29:43,231 790     cassandra.policies             INFO     policies.py         :289  | test_single_column_blob_max_size_with_cdc_preimage_full_postimage[prepared_statements]: Using datacenter 'datacenter1' for DCAwareRoundRobinPolicy (via host '127.0.37.1:9042'); if incorrect, please specify a local_dc to the constructor, or limit contact points to local cluster nodes
20:29:43,270 790     test_cdc_large_values          DEBUG    test_cdc_large_values.py:199  | test_single_column_blob_max_size_with_cdc_preimage_full_postimage[prepared_statements]: Insert <PreparedStatement query="INSERT INTO ks.cf (pk, ck, v) VALUES (?, ?, ?)", consistency=Not Set>
20:29:45,225 790     test_cdc_large_values          ERROR    test_cdc_large_values.py:260  | test_single_column_blob_max_size_with_cdc_preimage_full_postimage[prepared_statements]: Error: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed for ks.cf_scylla_cdc_log - received 0 responses and 1 failures from 1 CL=ONE." info={'consistency': 'ONE', 'required_responses': 1, 'received_responses': 0, 'failures': 1}
        create_table_stmt = "CREATE TABLE ks.cf (pk bigint, ck bigint, v blob, PRIMARY KEY (pk, ck)) \
                             WITH cdc={'enabled': true, 'preimage': 'full', 'postimage': true}"
        session.execute(create_table_stmt)
    
        insert_value = bytes("1".encode()) * 7 * MB
        if prepare_statements:
            insert_statement = session.prepare("INSERT INTO ks.cf (pk, ck, v) VALUES (?, ?, ?)")
        else:
            insert_statement = SimpleStatement("INSERT INTO ks.cf (pk, ck, v) VALUES (%(pk)s, %(ck)s, %(v)s)")
        insert_parameters = [{"pk": i, "ck": j, "v": insert_value} for i in range(4) for j in range(3)]
    
        update_value = bytes("2".encode()) * 4 * MB
        if prepare_statements:
            update_statement = session.prepare("UPDATE ks.cf set v = ? where pk = ? and ck = ?")
        else:
            update_statement = SimpleStatement("UPDATE ks.cf set v = %(v)s where pk = %(pk)s and ck=%(ck)s")
        update_parameters = [{"pk": i, "ck": j, "v": update_value} for i in range(4) for j in range(3)]
    
>       self.execute_case(node, session,
                          insert_data={
                              "insert_statement": insert_statement,
                              "insert_parameters": insert_parameters
                          },
                          update_data={
                              "update_statement": update_statement,
                              "update_parameters": update_parameters
                          },
                          check_stalls=prepare_statements)

execute_case seems to be executing updates and select concurrently.

The failing scan query is

        cdc_select_statement = SimpleStatement("SELECT * FROM ks.cf_scylla_cdc_log LIMIT 10")

@fruch
Copy link
Contributor Author

fruch commented Sep 13, 2023

Not sure I understand -- so it might have happened without Raft, but we don't know?

yes, since older job were evacuated by jenkins we don't have any log

@kbr-scylla
Copy link
Contributor

It could be some regression in the readers (not sure if there were any changes there recently?)

Or with semaphores -- which should be preventing from this large memory consumption IIUC
@denesb were there any significant semaphore or reader related changes in the last ~month?

@kbr-scylla
Copy link
Contributor

We're also fetching rows from the base table during the test, but those are single partition and even single row queries, compared to the CDC log table queries which are scans:

        select_statement = SimpleStatement("SELECT * FROM ks.cf WHERE pk = %(pk)s and ck = %(ck)s")
        select_parameters = [{"pk": i, "ck": j} for i in range(4) for j in range(3)]

        cdc_select_statement = SimpleStatement("SELECT * FROM ks.cf_scylla_cdc_log LIMIT 10")

@denesb
Copy link
Contributor

denesb commented Sep 13, 2023

It looks like the semaphore's kill limit was triggered. This was part of the OOM protection PR which is not new. The LSA alloc failure is unrelated, the semaphore cannot poison LSA allocations.

@denesb
Copy link
Contributor

denesb commented Sep 13, 2023

I couldn't make the likely cause out of the logs.
I guess someone should look at the core.

@mykaul
Copy link
Contributor

mykaul commented Sep 13, 2023

It looks like the semaphore's kill limit was triggered. This was part of the OOM protection PR which is not new. The LSA alloc failure is unrelated, the semaphore cannot poison LSA allocations.

Do we have a log when we cross the soft limit? We can search the cloud to see how common that happens.

@denesb
Copy link
Contributor

denesb commented Sep 13, 2023

It looks like the semaphore's kill limit was triggered. This was part of the OOM protection PR which is not new. The LSA alloc failure is unrelated, the semaphore cannot poison LSA allocations.

Do we have a log when we cross the soft limit? We can search the cloud to see how common that happens.

No, reads are just throttled when the soft limit is reached (but, again this requires cooperation from the reads, and practically only applies to reads that go to the disk).

@denesb
Copy link
Contributor

denesb commented Sep 14, 2023

Another occurence: #15371.

@mykaul
Copy link
Contributor

mykaul commented Sep 26, 2023

@denesb - who should look at the coredump?

@denesb
Copy link
Contributor

denesb commented Sep 26, 2023

@denesb - who should look at the coredump?

I will have a look.

@denesb
Copy link
Contributor

denesb commented Sep 27, 2023

happened again on next in Sep 22

https://jenkins.scylladb.com/job/scylla-master/job/next/6567/testReport/test_cdc_large_values/TestLargeColumnsWithCDC/Tests___Sanity_Tests_RAFT___test_large_blob_in_map_delta_preimage_full_unprepared_statements__2/

AssertionError: ERROR 2023-09-22 18:28:25,531 [shard 0:stat] lsa - Aborting due to allocation failure: failed to reclaim 191348736 bytes of memory, while attempting to ensure an std reserve of 536870912

A reserve of 536M seems excessive. There could be a very large allocation somewhere.

@denesb
Copy link
Contributor

denesb commented Sep 27, 2023

We want a reserve larger than all of memory.

@denesb
Copy link
Contributor

denesb commented Sep 27, 2023

Attempting to increase std reserve means that we have an allocation larger than a segment (128K). These are passed-through to the std allocator.
According to scylla memory [1], the largest available span is 134M. I think we are trying to allocate even more and fail due to fragmentation.

[1]

(gdb) scylla memory
Used memory:     138919936
Free memory:     345522176
Total memory:    484442112

LSA:
  allocated:      62652416
  used:           62521344
  free:             131072

Cache:
  total:            655360
  used:             158112
  free:             497248

Memtables:
 total:            61997056
 Regular:
  real dirty:      58851328
  unspooled:       58851328
 System:
  real dirty:       2883584
  unspooled:        2883584

Coordinator:
  bg write bytes:             0 B
  hints:                      0 B
  view hints:                 0 B
  06 "statement"
    fg writes:              0
    bg writes:              0
    fg reads:               1
    bg reads:               0

Replica:
  Read Concurrency Semaphores:
    read:              2/100,      34852306/      9688842, queued: 0
    streaming:         0/ 10,             0/      9688842, queued: 0
    system:            0/ 10,             0/      9688842, queued: 0
  Execution Stages:
    apply stage:
         Total                            0
  Tables - Ongoing Operations:
    pending writes phaser (top 10):
              0 Total (all)
    pending reads phaser (top 10):
              1 ks.cf_scylla_cdc_log, ks.cf
              1 Total (all)
    pending streams phaser (top 10):
              0 Total (all)

Small pools:
objsz spansz    usedobj       memory       unused  wst%
    8   4096        929         8192          760   9.3
   10   4096          9         8192         8102  98.8
   12   4096        173         8192         6116  74.6
   14   4096          4         8192         8136  99.1
   16   4096       2365        40960         3120   7.6
   32   4096       3266       106496         1984   1.9
   32   4096       7591       245760         2848   1.2
   32   4096       2601        90112         6880   7.6
   32   4096       3696       122880         4608   3.8
   48   4096       3116       151552         1984   0.9
   48   4096       1567        77824         2608   3.0
   64   4096       5416       352256         5632   1.6
   64   4096      17101      1097728         3264   0.3
   80   4096       2436       200704         5824   2.5
   96   4096       3865       380928         9888   1.0
  112   4096        787        94208         6064   4.9
  128   4096       1156       155648         7680   4.9
  160   4096       1102       188416        12096   4.1
  192   4096        996       204800        13568   5.1
  224   4096       1800       425984        22784   3.8
  256   4096        282        90112        17920  19.9
  320   8192        393       147456        21696  12.4
  384   8192        232        98304         9216   7.8
  448   4096        198       110592        21888  18.2
  512   4096        345       188416        11776   6.2
  640  16384        221       196608        55168  25.7
  768  16384        423       344064        19200   4.0
  896   8192        154       180224        42240  21.9
 1024   4096        180       208896        24576  11.8
 1280  32768        104       196608        63488  29.9
 1536  32768          7       131072       120320  90.2
 1792  16384          6       147456       136704  91.1
 2048   8192        452       933888         8192   0.9
 2560  65536         93       393216       155136  37.1
 3072  65536          7       262144       240640  90.2
 3584  32768          2       294912       287744  96.0
 4096  16384       3248     13516800       212992   1.6
 5120 131072         18       393216       301056  74.2
 6144 131072          3       524288       505856  94.9
 7168  65536          2       589824       575488  96.0
 8192  32768        350      3538944       671744  19.0
10240  65536          2       851968       831488  91.3
12288  65536         12       983040       835584  78.8
14336 131072          1      1179648      1165312  97.2
16384  65536         18      1245184       950272  76.3
Small allocations: 30715904 [B]
Page spans:
index      size [B]      free [B]     large [B] [spans]
    0          4096             0             0       0
    1          8192         16384             0       0
    2         16384             0             0       0
    3         32768             0       2818048      86
    4         65536             0       1048576      16
    5        131072        262144      91226112     696
    6        262144        786432        524288       2
    7        524288        524288             0       0
    8       1048576      10485760             0       0
    9       2097152      10485760             0       0
   10       4194304      12582912      12582912       3
   11       8388608       8388608             0       0
   12      16777216      33554432             0       0
   13      33554432      67108864             0       0
   14      67108864      67108864             0       0
   15     134217728     134217728             0       0
   16     268435456             0             0       0
   17     536870912             0             0       0
   18    1073741824             0             0       0
   19    2147483648             0             0       0
   20    4294967296             0             0       0
   21    8589934592             0             0       0
   22   17179869184             0             0       0
   23   34359738368             0             0       0
   24   68719476736             0             0       0
   25  137438953472             0             0       0
   26  274877906944             0             0       0
   27  549755813888             0             0       0
   28 1099511627776             0             0       0
   29 2199023255552             0             0       0
   30 4398046511104             0             0       0
   31 8796093022208             0             0       0
Large allocations: 108199936 [B]

@denesb
Copy link
Contributor

denesb commented Sep 27, 2023

The test uses 1MB blobs, although the data type is a map<text,blob>.
There are 4 partitions with 3 rows each inserted at first, each with one map entry. Then this is updated and 5 new keys are written into each map. This should add up to a map of 6MB in size.

@denesb
Copy link
Contributor

denesb commented Sep 27, 2023

The failing read is a data query (single partition read) into ks.cf.

@denesb
Copy link
Contributor

denesb commented Sep 27, 2023

It looks like the semaphore's kill limit was triggered. This was part of the OOM protection PR which is not new. The LSA alloc failure is unrelated, the semaphore cannot poison LSA allocations.

I was wrong. The OOM kill can poison LSA allocations. The site of the LSA crash is:

return _lsa_manager.run_in_read_section([this] {
auto next_valid = _next_row.iterators_valid();
clogger.trace("csm {}: reading_from_cache, range=[{}, {}), next={}, valid={}, rt={}", fmt::ptr(this), _lower_bound,
_upper_bound, _next_row.position(), next_valid, _current_tombstone);
// We assume that if there was eviction, and thus the range may
// no longer be continuous, the cursor was invalidated.
if (!next_valid) {
auto adjacent = _next_row.advance_to(_lower_bound);
_next_row_in_range = !after_current_range(_next_row.position());
if (!adjacent && !_next_row.continuous()) {
_last_row = nullptr; // We could insert a dummy here, but this path is unlikely.
start_reading_from_underlying();
return make_ready_future<>();
}
}
_next_row.maybe_refresh();
clogger.trace("csm {}: next={}", fmt::ptr(this), _next_row);
while (_state == state::reading_from_cache) {
copy_from_cache_to_buffer();
if (need_preempt() || is_buffer_full()) {
break;
}
}
return make_ready_future<>();
});

In this scope we call copy_from_cache_to_buffer() which involves copying mutation fragment objects, which involves consuming memory from the semaphore, possibly triggering a kill limit and throwing an std::bad_alloc. The read section then handles this by bumping reserves and trying again. This of course will not do anything to reduce the semaphore's memory pressure, so the next iteration will also throw, continuing until ensuring the reserve fails.

Looking at the semaphore:

(gdb) p $11._kill_limit_multiplier
$12 = {
  <utils::updateable_value_base> = {
    _source = 0xd975218
  }, 
  members of utils::updateable_value<unsigned int>:
  _value = 4
}
(gdb) p $11._initial_resources.memory
$13 = 9688842
(gdb) p $13 * 4
$14 = 38755368
(gdb) p $11._initial_resources.memory - $11._resources.memory
$15 = 34852306

The current consumption is just 4M from the kill limit, which can easily be tripped with the large blobs this test uses.

@denesb
Copy link
Contributor

denesb commented Sep 28, 2023

While working on a unit-test reproducer for this, I stepped on yet another bug: #15578.

@mykaul
Copy link
Contributor

mykaul commented Nov 5, 2023

@denesb - can you confirm if that's the same issue? From https://jenkins.scylladb.com/job/scylla-5.4/job/longevity/job/longevity-lwt-3h-test/2/consoleFull#-1083344827fcc21424-66d2-4bd8-8e0d-9746405e5b16 :

Found module scylla with build-id: a8346d5963227cbfa11ba5a55df9009ba5b89141
 Stack trace of thread 6301:
 #0  0x00007f75e5ef2884 __pthread_kill_implementation (libc.so.6 + 0x8e884)
 #1  0x00007f75e5ea1afe raise (libc.so.6 + 0x3dafe)
 #2  0x00007f75e5e8a87f abort (libc.so.6 + 0x2687f)
 #3  0x0000000005a6b8a8 _ZN7seastar17on_internal_errorERNS_6loggerESt17basic_string_viewIcSt11char_traitsIcEE (scylla + 0x586b8a8)
 #4  0x000000000456c302 _ZN5query13querier_cache14lookup_querierINS_22shard_mutation_querierEEESt8optionalIT_ERSt18unordered_multimapIN5utils11tagged_uuidI12query_id_tagEESt10unique_ptrINS_12querier_baseESt14default_deleteISC_EESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SF_EEESA_RK6schemaN3dht21partition_ranges_viewERKNS_15partition_sliceER28reader_concurrency_semaphoreN7tracing15trace_state_ptrENSt6chrono10time_pointIN7seastar12lowres_clockENS12_8durationIlSt5ratioILl1ELl1000000000EEEEEE (scylla + 0x436c302)
 #5  0x000000000456a9c3 _ZN5query13querier_cache29lookup_shard_mutation_querierEN5utils11tagged_uuidI12query_id_tagEERK6schemaRKSt6vectorI20nonwrapping_intervalIN3dht13ring_positionEESaISC_EERKNS_15partition_sliceER28reader_concurrency_semaphoreN7tracing15trace_state_ptrENSt6chrono10time_pointIN7seastar12lowres_clockENSO_8durationIlSt5ratioILl1ELl1000000000EEEEEE (scylla + 0x436a9c3)
 #6  0x0000000004599d1f _ZZN12read_context14lookup_readersENSt6chrono10time_pointIN7seastar12lowres_clockENS0_8durationIlSt5ratioILl1ELl1000000000EEEEEEEN3$_0clERN7replica8databaseE (scylla + 0x4399d1f)
 #7  0x00000000045995ec _ZNSt17_Function_handlerIFN7seastar6futureIvEERN7replica8databaseEEZNS0_7shardedIS4_E13invoke_on_allIZN12read_context14lookup_readersENSt6chrono10time_pointINS0_12lowres_clockENSB_8durationIlSt5ratioILl1ELl1000000000EEEEEEE3$_0JEEES2_NS0_21smp_submit_to_optionsET_DpT0_EUlS5_E_E9_M_invokeERKSt9_Any_dataS5_ (scylla + 0x43995ec)
 #8  0x00000000013e9116 _ZNSt17_Function_handlerIFN7seastar6futureIvEEjEZNS0_7shardedIN7replica8databaseEE13invoke_on_allENS0_21smp_submit_to_optionsESt8functionIFS2_RS6_EEEUljE_E9_M_invokeERKSt9_Any_dataOj (scylla + 0x11e9116)
 #9  0x0000000005e68ae2 _ZN7seastar17parallel_for_eachIN5boost12range_detail16integer_iteratorIjEES4_St8functionIFNS_6futureIvEEjEEEES7_T_T0_OT1_ (scylla + 0x5c68ae2)
 #10 0x0000000005e68a70 _ZN7seastar8internal25sharded_parallel_for_eachEjSt8functionIFNS_6futureIvEEjEE (scylla + 0x5c68a70)
 #11 0x000000000458c0aa _ZN12read_context14lookup_readersENSt6chrono10time_pointIN7seastar12lowres_clockENS0_8durationIlSt5ratioILl1ELl1000000000EEEEEE (scylla + 0x438c0aa)
 #12 0x00000000045b4e91 _Z8do_queryIN12_GLOBAL__N_125data_query_result_builderEEN7seastar6futureINT_11result_typeEEERNS2_7shardedIN7replica8databaseEEENS2_13lw_shared_ptrIK6schemaEERKN5query12read_commandERKSt6vectorI20nonwrapping_intervalIN3dht13ring_positionEESaISO_EEN7tracing15trace_state_ptrENSt6chrono10time_pointINS2_12lowres_clockENSV_8durationIlSt5ratioILl1ELl1000000000EEEEEENS2_20noncopyable_functionIFS4_RK22compact_mutation_stateIL20compact_for_sstables0EEEEE (scylla + 0x43b4e91)
 #13 0x0000000004591fcd _ZL22do_query_on_all_shardsIN12_GLOBAL__N_125data_query_result_builderEEN7seastar6futureISt5tupleIJNS2_11foreign_ptrINS2_13lw_shared_ptrINT_11result_typeEEEEE17cache_temperatureEEEERNS2_7shardedIN7replica8databaseEEENS6_IK6schemaEERKN5query12read_commandERKSt6vectorI20nonwrapping_intervalIN3dht13ring_positionEESaISU_EEN7tracing15trace_state_ptrENSt6chrono10time_pointINS2_12lowres_clockENS11_8durationIlSt5ratioILl1ELl1000000000EEEEEESt8functionIFS7_ONSM_23result_memory_accounterERK22compact_mutation_stateIL20compact_for_sstables0EEEE (scylla + 0x4391fcd)
 #14 0x0000000004590aba _Z24query_data_on_all_shardsRN7seastar7shardedIN7replica8databaseEEENS_13lw_shared_ptrIK6schemaEERKN5query12read_commandERKSt6vectorI20nonwrapping_intervalIN3dht13ring_positionEESaISH_EENS9_14result_optionsEN7tracing15trace_state_ptrENSt6chrono10time_pointINS_12lowres_clockENSP_8durationIlSt5ratioILl1ELl1000000000EEEEEE (scylla + 0x4390aba)
 #15 0x000000000313170c _ZN7service13storage_proxy30query_nonsingular_data_locallyEN7seastar13lw_shared_ptrIK6schemaEENS2_IN5query12read_commandEEEOKSt6vectorI20nonwrapping_intervalIN3dht13ring_positionEESaISD_EENS6_14result_optionsEN7tracing15trace_state_ptrENSt6chrono10time_pointINS1_12lowres_clockENSL_8durationIlSt5ratioILl1ELl1000000000EEEEEE (scylla + 0x2f3170c)
 #16 0x000000000312eb90 _ZN7service13storage_proxy18query_result_localEN7seastar10shared_ptrIKN7locator25effective_replication_mapEEENS1_13lw_shared_ptrIK6schemaEENS7_IN5query12read_commandEEERK20nonwrapping_intervalIN3dht13ring_positionEENSB_14result_optionsEN7tracing15trace_state_ptrENSt6chrono10time_pointINS1_12lowres_clockENSN_8durationIlSt5ratioILl1ELl1000000000EEEEEESt7variantIJSt9monostateN2db24per_partition_rate_limit12account_onlyENSY_19account_and_enforceEEE (scylla + 0x2f2eb90)
 #17 0x0000000003229ce8 _ZN7service22abstract_read_executor17make_data_requestEN3gms12inet_addressENSt6chrono10time_pointIN7seastar12lowres_clockENS3_8durationIlSt5ratioILl1ELl1000000000EEEEEEb (scylla + 0x3029ce8)
 #18 0x0000000003228edc _ZN7service22abstract_read_executor18make_data_requestsEN7seastar10shared_ptrINS_20digest_read_resolverEEEPN3gms12inet_addressES7_NSt6chrono10time_pointINS1_12lowres_clockENS8_8durationIlSt5ratioILl1ELl1000000000EEEEEEb (scylla + 0x3028edc)
 #19 0x000000000322855e _ZN7service22abstract_read_executor13make_requestsEN7seastar10shared_ptrINS_20digest_read_resolverEEENSt6chrono10time_pointINS1_12lowres_clockENS5_8durationIlSt5ratioILl1ELl1000000000EEEEEE (scylla + 0x302855e)
 #20 0x0000000003138e15 _ZN7service22abstract_read_executor7executeENSt6chrono10time_pointIN7seastar12lowres_clockENS1_8durationIlSt5ratioILl1ELl1000000000EEEEEE (scylla + 0x2f38e15)
 #21 0x00000000032c5d95 _ZN7service13storage_proxy36query_partition_key_range_concurrentENSt6chrono10time_pointIN7seastar12lowres_clockENS1_8durationIlSt5ratioILl1ELl1000000000EEEEEENS3_10shared_ptrIKN7locator25effective_replication_mapEEENS3_13lw_shared_ptrIN5query12read_commandEEEN2db17consistency_levelE32query_ranges_to_vnodes_generatoriN7tracing15trace_state_ptrEmjSt13unordered_mapI20nonwrapping_intervalIN3dht5tokenEESt6vectorIN5utils11tagged_uuidINSB_11host_id_tagEEESaISX_EESt4hashISS_ESt8equal_toISS_ESaISt4pairIKSS_SZ_EEE14service_permit.resume (scylla + 0x30c5d95)
 #22 0x00000000031b86bb _ZN7seastar8internal21coroutine_traits_baseIN5boost10outcome_v212basic_resultIN7service43query_partition_key_range_concurrent_resultEN5utils19exception_containerIJN10exceptions32mutation_write_timeout_exceptionENS9_22read_timeout_exceptionENS9_22read_failure_exceptionENS9_20rate_limit_exceptionEEEENS7_32exception_container_throw_policyEEEE12promise_type15run_and_disposeEv (scylla + 0x2fb86bb)
 #23 0x0000000005a9c930 _ZN7seastar7reactor14run_some_tasksEv (scylla + 0x589c930)
 #24 0x0000000005a9dc08 _ZN7seastar7reactor6do_runEv (scylla + 0x589dc08)
 #25 0x0000000005ac1654 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x58c1654)
 #26 0x0000000005a6c56b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x586c56b)
 #27 0x00007f75e5ef0947 start_thread (libc.so.6 + 0x8c947)
 #28 0x00007f75e5f76870 __clone3 (libc.so.6 + 0x112870)
 Stack trace of thread 6309:
 #0  0x00007f75e5f650fa read (libc.so.6 + 0x1010fa)
 #1  0x0000000005ae4865 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x58e4865)
 #2  0x0000000005ae4b73 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x58e4b73)
 #3  0x0000000005a6c56b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x586c56b)
 #4  0x00007f75e5ef0947 start_thread (libc.so.6 + 0x8c947)
 #5  0x00007f75e5f76870 __clone3 (libc.so.6 + 0x112870)
 Stack trace of thread 6311:
 #0  0x00007f75e5f650fa read (libc.so.6 + 0x1010fa)
 #1  0x0000000005ae4865 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x58e4865)
 #2  0x0000000005ae4b73 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x58e4b73)
 #3  0x0000000005a6c56b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x586c56b)
 #4  0x00007f75e5ef0947 start_thread (libc.so.6 + 0x8c947)
 #5  0x00007f75e5f76870 __clone3 (libc.so.6 + 0x112870)
 Stack trace of thread 6308:
 #0  0x00007f75e5f650fa read (libc.so.6 + 0x1010fa)
 #1  0x0000000005ae4865 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x58e4865)
 #2  0x0000000005ae4b73 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x58e4b73)
 #3  0x0000000005a6c56b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x586c56b)
 #4  0x00007f75e5ef0947 start_thread (libc.so.6 + 0x8c947)
 #5  0x00007f75e5f76870 __clone3 (libc.so.6 + 0x112870)
 Stack trace of thread 6310:
 #0  0x00007f75e5f650fa read (libc.so.6 + 0x1010fa)
 #1  0x0000000005ae4865 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x58e4865)
 #2  0x0000000005ae4b73 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x58e4b73)
 #3  0x0000000005a6c56b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x586c56b)
 #4  0x00007f75e5ef0947 start_thread (libc.so.6 + 0x8c947)
 #5  0x00007f75e5f76870 __clone3 (libc.so.6 + 0x112870)
 Stack trace of thread 6305:
 #0  0x00007f75e5f650fa read (libc.so.6 + 0x1010fa)
 #1  0x0000000005ae4865 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x58e4865)
 #2  0x0000000005ae4b73 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x58e4b73)
 #3  0x0000000005a6c56b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x586c56b)
 #4  0x00007f75e5ef0947 start_thread (libc.so.6 + 0x8c947)
 #5  0x00007f75e5f76870 __clone3 (libc.so.6 + 0x112870)
 Stack trace of thread 6300:
 #0  0x0000000005f7fb30 _ZZZZZZN7seastar3rpc6server10connection7processEvEN3$_0clEvENKUlvE_clEvENUlvE0_clEvENKUlSt5tupleIJSt8optionalImEmlS7_INS0_7rcv_bufEEEEE_clESB_ENUlvE_clEv (scylla + 0x5d7fb30)
 #1  0x0000000005f7f801 _ZZZZZN7seastar3rpc6server10connection7processEvEN3$_0clEvENKUlvE_clEvENUlvE0_clEvENKUlSt5tupleIJSt8optionalImEmlS7_INS0_7rcv_bufEEEEE_clESB_ (scylla + 0x5d7f801)
 #2  0x0000000005f7f424 _ZZZZN7seastar3rpc6server10connection7processEvEN3$_0clEvENKUlvE_clEvENUlvE0_clEv (scylla + 0x5d7f424)
 #3  0x0000000005f80506 _ZN7seastar8internal14do_until_stateIZZZNS_3rpc6server10connection7processEvEN3$_0clEvENKUlvE_clEvEUlvE_ZZZNS4_7processEvENS5_clEvENKS6_clEvEUlvE0_E15run_and_disposeEv (scylla + 0x5d80506)
 #4  0x0000000005a9c930 _ZN7seastar7reactor14run_some_tasksEv (scylla + 0x589c930)
 #5  0x0000000005a9dc08 _ZN7seastar7reactor6do_runEv (scylla + 0x589dc08)
 #6  0x0000000005ac1654 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x58c1654)
 #7  0x0000000005a6c56b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x586c56b)
 #8  0x00007f75e5ef0947 start_thread (libc.so.6 + 0x8c947)
 #9  0x00007f75e5f76870 __clone3 (libc.so.6 + 0x112870)
 Stack trace of thread 6302:
 #0  0x0000000001cd1350 _ZN29partition_snapshot_row_cursor7advanceEb (scylla + 0x1ad1350)
 #1  0x0000000001cd9cd8 _ZZN30partition_snapshot_flat_readerILb0EN7replica33partition_snapshot_read_accounterEE20lsa_partition_reader13next_intervalERK20nonwrapping_intervalI21clustering_key_prefixEENKUlvE_clEv (scylla + 0x1ad9cd8)
 #2  0x0000000001cd7bc3 _ZN30partition_snapshot_flat_readerILb0EN7replica33partition_snapshot_read_accounterEE18emit_next_intervalEv (scylla + 0x1ad7bc3)
 #3  0x0000000001cd6b12 _ZN8logalloc18allocating_section12with_reserveIZZN30partition_snapshot_flat_readerILb0EN7replica33partition_snapshot_read_accounterEE11fill_bufferEvENKUlvE0_clEvEUlvE_EEDcRNS_6regionEOT_ (scylla + 0x1ad6b12)
 #4  0x0000000001cd63af _ZN30partition_snapshot_flat_readerILb0EN7replica33partition_snapshot_read_accounterEE11fill_bufferEv (scylla + 0x1ad63af)
 #5  0x00000000020f2cf3 _ZN22mutation_reader_merger12prepare_nextEv (scylla + 0x1ef2cf3)
 #6  0x00000000020f3bc0 _ZN22mutation_reader_merger19maybe_produce_batchEv (scylla + 0x1ef3bc0)
 #7  0x0000000002100a6b _ZN7seastar6repeatIZN24mutation_fragment_mergerI22mutation_reader_mergerEclEvEUlvE_EENS_6futureIvEEOT_ (scylla + 0x1f00a6b)
 #8  0x0000000002100180 _ZN7seastar6repeatIZN14merging_readerI22mutation_reader_mergerE11fill_bufferEvEUlvE_EENS_6futureIvEEOT_ (scylla + 0x1f00180)
 #9  0x00000000020fef97 _ZN14merging_readerI22mutation_reader_mergerE11fill_bufferEv (scylla + 0x1efef97)
 #10 0x0000000001bbc7e8 _ZN5query12consume_pageI20query_result_builderEEDaR23flat_mutation_reader_v2N7seastar13lw_shared_ptrI22compact_mutation_stateIL20compact_for_sstables0EEEERKNS_15partition_sliceEOT_mjNSt6chrono10time_pointI8gc_clockNSF_8durationIlSt5ratioILl1ELl1EEEEEE (scylla + 0x19bc7e8)
 #11 0x0000000001b85860 _ZN5query7querier12consume_pageI20query_result_builderEEDaOT_mjNSt6chrono10time_pointI8gc_clockNS5_8durationIlSt5ratioILl1ELl1EEEEEEN7tracing15trace_state_ptrE (scylla + 0x1985860)
 #12 0x0000000001b803e4 _ZN7replica5table5queryEN7seastar13lw_shared_ptrIK6schemaEE13reader_permitRKN5query12read_commandENS7_14result_optionsERKSt6vectorI20nonwrapping_intervalIN3dht13ring_positionEESaISG_EEN7tracing15trace_state_ptrERNS7_21result_memory_limiterENSt6chrono10time_pointINS1_12lowres_clockENSP_8durationIlSt5ratioILl1ELl1000000000EEEEEEPSt8optionalINS7_7querierEE (scylla + 0x19803e4)
 #13 0x0000000001aaacd9 _ZN7seastar20noncopyable_functionIFNS_6futureIvEE13reader_permitEE19indirect_vtable_forIZN7replica8database5queryENS_13lw_shared_ptrIK6schemaEERKN5query12read_commandENSD_14result_optionsERKSt6vectorI20nonwrapping_intervalIN3dht13ring_positionEESaISM_EEN7tracing15trace_state_ptrENSt6chrono10time_pointINS_12lowres_clockENST_8durationIlSt5ratioILl1ELl1000000000EEEEEESt7variantIJSt9monostateN2db24per_partition_rate_limit12account_onlyENS14_19account_and_enforceEEEE3$_0E4callEPKS5_S3_ (scylla + 0x18aacd9)
 #14 0x00000000045fa550 _ZN28reader_concurrency_semaphore14execution_loopEv.resume (scylla + 0x43fa550)
 #15 0x000000000138d81b _ZN7seastar8internal21coroutine_traits_baseIvE12promise_type15run_and_disposeEv (scylla + 0x118d81b)
 #16 0x0000000005a9c930 _ZN7seastar7reactor14run_some_tasksEv (scylla + 0x589c930)
 #17 0x0000000005a9dc08 _ZN7seastar7reactor6do_runEv (scylla + 0x589dc08)
 #18 0x0000000005ac1654 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x58c1654)
 #19 0x0000000005a6c56b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x586c56b)
 #20 0x00007f75e5ef0947 start_thread (libc.so.6 + 0x8c947)
 #21 0x00007f75e5f76870 __clone3 (libc.so.6 + 0x112870)
 Stack trace of thread 6303:
 #0  0x000000000383663e _ZN2db9commitlog7segment8allocateERNS0_12entry_writerERN7seastar15semaphore_unitsINS0_15segment_manager44request_controller_timeout_exception_factoryENS4_12lowres_clockEEENSt6chrono10time_pointIS8_NSB_8durationIlSt5ratioILl1ELl1000000000EEEEEE (scylla + 0x363663e)
 #1  0x00000000037ffcb5 _ZN2db9commitlog15segment_manager22allocate_when_possibleIZNS0_9add_entryERKN5utils11tagged_uuidI12table_id_tagEERK22commitlog_entry_writerNSt6chrono10time_pointIN7seastar12lowres_clockENSC_8durationIlSt5ratioILl1ELl1000000000EEEEEEE15cl_entry_writerNS_9rp_handleEEENSE_6futureIT0_EET_SK_ (scylla + 0x35ffcb5)
 #2  0x00000000037fec3e _ZN2db9commitlog9add_entryERKN5utils11tagged_uuidI12table_id_tagEERK22commitlog_entry_writerNSt6chrono10time_pointIN7seastar12lowres_clockENSA_8durationIlSt5ratioILl1ELl1000000000EEEEEE (scylla + 0x35fec3e)
 #3  0x00000000019e21d7 _ZN7replica8database8do_applyEN7seastar13lw_shared_ptrIK6schemaEERK15frozen_mutationN7tracing15trace_state_ptrENSt6chrono10time_pointINS1_12lowres_clockENSB_8durationIlSt5ratioILl1ELl1000000000EEEEEENS1_10bool_classIN2db14force_sync_tagEEESt7variantIJSt9monostateNSK_24per_partition_rate_limit12account_onlyENSP_19account_and_enforceEEE (scylla + 0x17e21d7)
 #4  0x0000000001a91642 _ZN7seastar20noncopyable_functionIFNS_6futureIvEEPN7replica8databaseENS_13lw_shared_ptrIK6schemaEERK15frozen_mutationN7tracing15trace_state_ptrENSt6chrono10time_pointINS_12lowres_clockENSF_8durationIlSt5ratioILl1ELl1000000000EEEEEENS_10bool_classIN2db14force_sync_tagEEESt7variantIJSt9monostateNSO_24per_partition_rate_limit12account_onlyENST_19account_and_enforceEEEEE17direct_vtable_forISt7_Mem_fnIMS4_FS2_S9_SC_SE_SM_SQ_SW_EEE4callEPKSY_S5_S9_SC_SE_SM_SQ_SW_ (scylla + 0x1891642)
 #5  0x0000000001ab39dc _ZN7seastar20noncopyable_functionIFNS_6futureIvEEPN7replica8databaseENS_13lw_shared_ptrIK6schemaEERK15frozen_mutationN7tracing15trace_state_ptrENSt6chrono10time_pointINS_12lowres_clockENSF_8durationIlSt5ratioILl1ELl1000000000EEEEEENS_10bool_classIN2db14force_sync_tagEEESt7variantIJSt9monostateNSO_24per_partition_rate_limit12account_onlyENST_19account_and_enforceEEEEE17direct_vtable_forIZNS_35inheriting_concrete_execution_stageIS2_JS5_S9_SC_SE_SM_SQ_SW_EE20make_stage_for_groupENS_16scheduling_groupEEUlS5_S9_SC_SE_SM_SQ_SW_E_E4callEPKSY_S5_S9_SC_SE_SM_SQ_SW_ (scylla + 0x18b39dc)
 #6  0x0000000001ab35a9 _ZN7seastar24concrete_execution_stageINS_6futureIvEEJPN7replica8databaseENS_13lw_shared_ptrIK6schemaEERK15frozen_mutationN7tracing15trace_state_ptrENSt6chrono10time_pointINS_12lowres_clockENSF_8durationIlSt5ratioILl1ELl1000000000EEEEEENS_10bool_classIN2db14force_sync_tagEEESt7variantIJSt9monostateNSO_24per_partition_rate_limit12account_onlyENST_19account_and_enforceEEEEE8do_flushEv (scylla + 0x18b35a9)
 #7  0x0000000005a477d3 _ZN7seastar11lambda_taskIZNS_15execution_stage5flushEvE3$_0E15run_and_disposeEv (scylla + 0x58477d3)
 #8  0x0000000005a9c930 _ZN7seastar7reactor14run_some_tasksEv (scylla + 0x589c930)
 #9  0x0000000005a9dc08 _ZN7seastar7reactor6do_runEv (scylla + 0x589dc08)
 #10 0x0000000005ac1654 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x58c1654)
 #11 0x0000000005a6c56b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x586c56b)
 #12 0x00007f75e5ef0947 start_thread (libc.so.6 + 0x8c947)
 #13 0x00007f75e5f76870 __clone3 (libc.so.6 + 0x112870)
 Stack trace of thread 6297:
 #0  0x0000000003260404 _ZZN7service13storage_proxy6remote12handle_writeIN5utils11tagged_uuidI24table_schema_version_tagEENS_5paxos8proposalEZNS1_18handle_paxos_learnERKN7seastar3rpc11client_infoENSA_14opt_time_pointES8_NS3_12small_vectorIN3gms12inet_addressELm3EEESH_jmSt8optionalIN7tracing10trace_infoEEEUlRNS9_10shared_ptrIS0_EENSK_15trace_state_ptrENS9_13lw_shared_ptrIK6schemaEERKS8_NSt6chrono10time_pointINS9_12lowres_clockENSX_8durationIlSt5ratioILl1ELl1000000000EEEEEENS_13fencing_tokenEE_ZNS1_18handle_paxos_learnESD_SE_S8_SI_SH_jmSM_EUlSP_N4netw8msg_addrES14_SW_SH_jmRKSM_S15_E_EENS9_6futureINSA_12no_wait_typeEEES18_SE_T_T0_RKSI_SH_jmS1A_S15_OT1_OT2_ENKUlvE0_clEv (scylla + 0x3060404)
 #1  0x000000000325f2b6 _ZN7seastar9coroutine3allIJNS_6futureIvEES3_EEC2IJZN7service13storage_proxy6remote12handle_writeIN5utils11tagged_uuidI24table_schema_version_tagEENS6_5paxos8proposalEZNS8_18handle_paxos_learnERKNS_3rpc11client_infoENSG_14opt_time_pointESF_NSA_12small_vectorIN3gms12inet_addressELm3EEESN_jmSt8optionalIN7tracing10trace_infoEEEUlRNS_10shared_ptrIS7_EENSQ_15trace_state_ptrENS_13lw_shared_ptrIK6schemaEERKSF_NSt6chrono10time_pointINS_12lowres_clockENS13_8durationIlSt5ratioILl1ELl1000000000EEEEEENS6_13fencing_tokenEE_ZNS8_18handle_paxos_learnESJ_SK_SF_SO_SN_jmSS_EUlSV_N4netw8msg_addrES1A_S12_SN_jmRKSS_S1B_E_EENS2_INSG_12no_wait_typeEEES1E_SK_T_T0_RKSO_SN_jmS1G_S1B_OT1_OT2_EUlvE0_ZNS9_ISD_SF_S1C_S1H_EES1J_S1E_SK_S1K_S1L_S1N_SN_jmS1G_S1B_S1P_S1R_EUlvE1_EEEDpOT_ (scylla + 0x305f2b6)
 #2  0x000000000325e316 _ZN7service13storage_proxy6remote12handle_writeIN5utils11tagged_uuidI24table_schema_version_tagEENS_5paxos8proposalEZNS1_18handle_paxos_learnERKN7seastar3rpc11client_infoENSA_14opt_time_pointES8_NS3_12small_vectorIN3gms12inet_addressELm3EEESH_jmSt8optionalIN7tracing10trace_infoEEEUlRNS9_10shared_ptrIS0_EENSK_15trace_state_ptrENS9_13lw_shared_ptrIK6schemaEERKS8_NSt6chrono10time_pointINS9_12lowres_clockENSX_8durationIlSt5ratioILl1ELl1000000000EEEEEENS_13fencing_tokenEE_ZNS1_18handle_paxos_learnESD_SE_S8_SI_SH_jmSM_EUlSP_N4netw8msg_addrES14_SW_SH_jmRKSM_S15_E_EENS9_6futureINSA_12no_wait_typeEEES18_SE_T_T0_RKSI_SH_jmS1A_S15_OT1_OT2_ (scylla + 0x305e316)
 #3  0x000000000324fc0b _ZN7service13storage_proxy6remote18handle_paxos_learnERKN7seastar3rpc11client_infoENS3_14opt_time_pointENS_5paxos8proposalEN5utils12small_vectorIN3gms12inet_addressELm3EEESD_jmSt8optionalIN7tracing10trace_infoEE (scylla + 0x304fc0b)
 #4  0x0000000003261866 _ZNSt17_Function_handlerIFN7seastar6futureINS0_3rpc12no_wait_typeEEERKNS2_11client_infoENS2_14opt_time_pointEN7service5paxos8proposalEN5utils12small_vectorIN3gms12inet_addressELm3EEESF_jmSt8optionalIN7tracing10trace_infoEEESt11_Bind_frontIMNS9_13storage_proxy6remoteEFS4_S7_S8_SB_SG_SF_jmSK_EJPSO_EEE9_M_invokeERKSt9_Any_dataS7_OS8_OSB_OSG_OSF_OjOmOSK_ (scylla + 0x3061866)
 #5  0x00000000017c6a60 _ZZZZN7seastar3rpc11recv_helperIN4netw10serializerESt8functionIFNS_6futureINS0_12no_wait_typeEEERKNS0_11client_infoENS0_14opt_time_pointEN7service5paxos8proposalEN5utils12small_vectorIN3gms12inet_addressELm3EEESI_jmSt8optionalIN7tracing10trace_infoEEEES7_JSE_SJ_SI_jmSN_ENS0_19do_want_client_infoENS0_18do_want_time_pointEEEDaNS0_9signatureIFT1_DpT2_EEEOT0_T3_T4_ENUlNS_10shared_ptrINS0_6server10connectionEEESK_INSt6chrono10time_pointINS_12lowres_clockENS16_8durationIlSt5ratioILl1ELl1000000000EEEEEEElNS0_7rcv_bufEE_clES15_S1E_lS1F_ENUlT_E_clINS_15semaphore_unitsINS_35semaphore_default_exception_factoryES18_EEEEDaS1H_ENUlvE_clEv (scylla + 0x15c6a60)
 #6  0x00000000017c56ca _ZZZN7seastar3rpc11recv_helperIN4netw10serializerESt8functionIFNS_6futureINS0_12no_wait_typeEEERKNS0_11client_infoENS0_14opt_time_pointEN7service5paxos8proposalEN5utils12small_vectorIN3gms12inet_addressELm3EEESI_jmSt8optionalIN7tracing10trace_infoEEEES7_JSE_SJ_SI_jmSN_ENS0_19do_want_client_infoENS0_18do_want_time_pointEEEDaNS0_9signatureIFT1_DpT2_EEEOT0_T3_T4_ENUlNS_10shared_ptrINS0_6server10connectionEEESK_INSt6chrono10time_pointINS_12lowres_clockENS16_8durationIlSt5ratioILl1ELl1000000000EEEEEEElNS0_7rcv_bufEE_clES15_S1E_lS1F_ENUlT_E_clINS_15semaphore_unitsINS_35semaphore_default_exception_factoryES18_EEEEDaS1H_ (scylla + 0x15c56ca)
 #7  0x00000000017c4d66 _ZN7seastar6futureINS_15semaphore_unitsINS_35semaphore_default_exception_factoryENS_12lowres_clockEEEE9then_implIZZNS_3rpc11recv_helperIN4netw10serializerESt8functionIFNS0_INS7_12no_wait_typeEEERKNS7_11client_infoENS7_14opt_time_pointEN7service5paxos8proposalEN5utils12small_vectorIN3gms12inet_addressELm3EEESO_jmSt8optionalIN7tracing10trace_infoEEEESD_JSK_SP_SO_jmST_ENS7_19do_want_client_infoENS7_18do_want_time_pointEEEDaNS7_9signatureIFT1_DpT2_EEEOT0_T3_T4_ENUlNS_10shared_ptrINS7_6server10connectionEEESQ_INSt6chrono10time_pointIS3_NS1C_8durationIlSt5ratioILl1ELl1000000000EEEEEEElNS7_7rcv_bufEE_clES1B_S1J_lS1K_EUlT_E_NS0_IvEEEES14_OS1M_ (scylla + 0x15c4d66)
 #8  0x00000000017c2e22 _ZZN7seastar3rpc11recv_helperIN4netw10serializerESt8functionIFNS_6futureINS0_12no_wait_typeEEERKNS0_11client_infoENS0_14opt_time_pointEN7service5paxos8proposalEN5utils12small_vectorIN3gms12inet_addressELm3EEESI_jmSt8optionalIN7tracing10trace_infoEEEES7_JSE_SJ_SI_jmSN_ENS0_19do_want_client_infoENS0_18do_want_time_pointEEEDaNS0_9signatureIFT1_DpT2_EEEOT0_T3_T4_ENUlNS_10shared_ptrINS0_6server10connectionEEESK_INSt6chrono10time_pointINS_12lowres_clockENS16_8durationIlSt5ratioILl1ELl1000000000EEEEEEElNS0_7rcv_bufEE_clES15_S1E_lS1F_ (scylla + 0x15c2e22)
 #9  0x00000000017c219f _ZNSt17_Function_handlerIFN7seastar6futureIvEENS0_10shared_ptrINS0_3rpc6server10connectionEEESt8optionalINSt6chrono10time_pointINS0_12lowres_clockENS9_8durationIlSt5ratioILl1ELl1000000000EEEEEEElNS4_7rcv_bufEEZNS4_11recv_helperIN4netw10serializerESt8functionIFNS1_INS4_12no_wait_typeEEERKNS4_11client_infoENS4_14opt_time_pointEN7service5paxos8proposalEN5utils12small_vectorIN3gms12inet_addressELm3EEES10_jmS8_IN7tracing10trace_infoEEEESP_JSW_S11_S10_jmS14_ENS4_19do_want_client_infoENS4_18do_want_time_pointEEEDaNS4_9signatureIFT1_DpT2_EEEOT0_T3_T4_EUlS7_SH_lSI_E_E9_M_invokeERKSt9_Any_dataOS7_OSH_OlOSI_ (scylla + 0x15c219f)
 #10 0x0000000005f7fc3a _ZZZZZZN7seastar3rpc6server10connection7processEvEN3$_0clEvENKUlvE_clEvENUlvE0_clEvENKUlSt5tupleIJSt8optionalImEmlS7_INS0_7rcv_bufEEEEE_clESB_ENUlvE_clEv (scylla + 0x5d7fc3a)
 #11 0x0000000005f7f801 _ZZZZZN7seastar3rpc6server10connection7processEvEN3$_0clEvENKUlvE_clEvENUlvE0_clEvENKUlSt5tupleIJSt8optionalImEmlS7_INS0_7rcv_bufEEEEE_clESB_ (scylla + 0x5d7f801)
 #12 0x0000000005f802aa _ZN7seastar12continuationINS_8internal22promise_base_with_typeIvEEZZZZNS_3rpc6server10connection7processEvEN3$_0clEvENKUlvE_clEvENUlvE0_clEvEUlSt5tupleIJSt8optionalImEmlSB_INS4_7rcv_bufEEEEE_ZNS_6futureISF_E14then_impl_nrvoISG_NSH_IvEEEET0_OT_EUlOS3_RSG_ONS_12future_stateISF_EEE_SF_E15run_and_disposeEv (scylla + 0x5d802aa)
 #13 0x0000000005a9c930 _ZN7seastar7reactor14run_some_tasksEv (scylla + 0x589c930)
 #14 0x0000000005a9dc08 _ZN7seastar7reactor6do_runEv (scylla + 0x589dc08)
 #15 0x0000000005a9cf7a _ZN7seastar7reactor3runEv (scylla + 0x589cf7a)
 #16 0x0000000005a3f858 _ZN7seastar12app_template14run_deprecatedEiPPcOSt8functionIFvvEE (scylla + 0x583f858)
 #17 0x0000000005a3ea0d _ZN7seastar12app_template3runEiPPcOSt8functionIFNS_6futureIiEEvEE (scylla + 0x583ea0d)
 #18 0x000000000131114f _ZL11scylla_mainiPPc (scylla + 0x111114f)
 #19 0x0000000001312bb1 _ZNKSt8functionIFiiPPcEEclEiS1_ (scylla + 0x1112bb1)
 #20 0x000000000130f6bd main (scylla + 0x110f6bd)
 #21 0x00007f75e5e8bb8a __libc_start_call_main (libc.so.6 + 0x27b8a)
 #22 0x00007f75e5e8bc4b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x27c4b)
 #23 0x000000000130d0e5 _start (scylla + 0x110d0e5)
 Stack trace of thread 6306:
 #0  0x00007f75e5f650fa read (libc.so.6 + 0x1010fa)
 #1  0x0000000005ae4865 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x58e4865)
 #2  0x0000000005ae4b73 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x58e4b73)
 #3  0x0000000005a6c56b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x586c56b)
 #4  0x00007f75e5ef0947 start_thread (libc.so.6 + 0x8c947)
 #5  0x00007f75e5f76870 __clone3 (libc.so.6 + 0x112870)
 Stack trace of thread 6307:
 #0  0x00007f75e5f650fa read (libc.so.6 + 0x1010fa)
 #1  0x0000000005ae4865 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x58e4865)
 #2  0x0000000005ae4b73 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x58e4b73)
 #3  0x0000000005a6c56b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x586c56b)
 #4  0x00007f75e5ef0947 start_thread (libc.so.6 + 0x8c947)
 #5  0x00007f75e5f76870 __clone3 (libc.so.6 + 0x112870)
 Stack trace of thread 6304:
 #0  0x00007f75e5f6eb5d syscall (libc.so.6 + 0x10ab5d)
 #1  0x0000000005ae55b1 _ZN7seastar19aio_storage_context11submit_workEv (scylla + 0x58e55b1)
 #2  0x0000000005ae69c0 _ZN7seastar19reactor_backend_aio18kernel_submit_workEv (scylla + 0x58e69c0)
 #3  0x0000000005ac0659 _ZNSt17_Function_handlerIFbvEZN7seastar7reactor6do_runEvE3$_5E9_M_invokeERKSt9_Any_data (scylla + 0x58c0659)
 #4  0x0000000005a9dc46 _ZN7seastar7reactor6do_runEv (scylla + 0x589dc46)
 #5  0x0000000005ac1654 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x58c1654)
 #6  0x0000000005a6c56b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x586c56b)
 #7  0x00007f75e5ef0947 start_thread (libc.so.6 + 0x8c947)
 #8  0x00007f75e5f76870 __clone3 (libc.so.6 + 0x112870)
 Stack trace of thread 6298:
 #0  0x00000000034227c9 _ZNK4cql312restrictions22statement_restrictions24get_partition_key_rangesERKNS_13query_optionsE (scylla + 0x32227c9)
 #1  0x0000000002c0f7cf _ZNK4cql310statements16select_statement10do_executeERNS_15query_processorERN7service11query_stateERKNS_13query_optionsE (scylla + 0x2a0f7cf)
 #2  0x0000000002c85f98 _ZN7seastar20noncopyable_functionIFNS_6futureINS_10shared_ptrIN13cql_transport8messages14result_messageEEEEEPKN4cql310statements16select_statementERNS8_15query_processorERN7service11query_stateERKNS8_13query_optionsEEE17direct_vtable_forISt7_Mem_fnIMSA_KFS7_SE_SH_SK_EEE4callEPKSM_SC_SE_SH_SK_ (scylla + 0x2a85f98)
 #3  0x0000000002c8650d _ZN7seastar20noncopyable_functionIFNS_6futureINS_10shared_ptrIN13cql_transport8messages14result_messageEEEEEPKN4cql310statements16select_statementERNS8_15query_processorERN7service11query_stateERKNS8_13query_optionsEEE17direct_vtable_forIZNS_35inheriting_concrete_execution_stageIS7_JSC_SE_SH_SK_EE20make_stage_for_groupENS_16scheduling_groupEEUlSC_SE_SH_SK_E_E4callEPKSM_SC_SE_SH_SK_ (scylla + 0x2a8650d)
 #4  0x0000000002c861a6 _ZN7seastar24concrete_execution_stageINS_6futureINS_10shared_ptrIN13cql_transport8messages14result_messageEEEEEJPKN4cql310statements16select_statementERNS8_15query_processorERN7service11query_stateERKNS8_13query_optionsEEE8do_flushEv (scylla + 0x2a861a6)
 #5  0x0000000005a477d3 _ZN7seastar11lambda_taskIZNS_15execution_stage5flushEvE3$_0E15run_and_disposeEv (scylla + 0x58477d3)
 #6  0x0000000005a9c930 _ZN7seastar7reactor14run_some_tasksEv (scylla + 0x589c930)
 #7  0x0000000005a9dc08 _ZN7seastar7reactor6do_runEv (scylla + 0x589dc08)
 #8  0x0000000005ac1654 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x58c1654)
 #9  0x0000000005a6c56b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x586c56b)
 #10 0x00007f75e5ef0947 start_thread (libc.so.6 + 0x8c947)
 #11 0x00007f75e5f76870 __clone3 (libc.so.6 + 0x112870)
 download_instructions=gsutil cp gs://upload.scylladb.com/core.scylla.112.e6ec3a11e2e449ad9d15d93da1da4aae.6297.1697764301000000/core.scylla.112.e6ec3a11e2e449ad9d15d93da1da4aae.6297.1697764301000000.gz .
 gunzip /var/lib/systemd/coredump/core.scylla.112.e6ec3a11e2e449ad9d15d93da1da4aae.6297.1697764301000000.gz
 ----- LAST WARNING EVENT -----------------------------------------------------
 2023-10-20 01:58:26.491: (DataValidatorEvent Severity.WARNING) period_type=one-time event_id=ee15da0d-5d50-4578-83ee-3f1cbdb57a41: type=UpdatedRowsValidator message=View blogposts_update_2_columns_lwt_indicator. Actual dataset length 619948 more then expected dataset length: 619947. Issue #6181
 ----- LAST NORMAL EVENT ------------------------------------------------------
 2023-10-20 02:01:13.064: (InfoEvent Severity.NORMAL) period_type=not-set event_id=c6dc0648-0447-4bc6-a003-02f023588a4a: message=TEST_END
 ----- LAST DEBUG EVENT -------------------------------------------------------
 2023-10-19 23:54:08.666 <2023-10-19 23:53:46.293>: (DatabaseLogEvent Severity.DEBUG) period_type=one-time event_id=57b22170-7807-470c-9cd3-8fb7d384053b: type=REACTOR_STALLED regex=Reactor stalled line_number=109518 node=longevity-lwt-3h-5-4-db-node-82136da5-2
 2023-10-19T23:53:46.293+00:00 longevity-lwt-3h-5-4-db-node-82136da5-2     !INFO | scylla[680]: Reactor stalled for 42 ms on shard 3. Backtrace: 0x5a8b00a 0x5a8a415 0x5a8b7bf 0x3dbaf 0x24fd024 0x24d9e77 0x24d9396 0x24de192 0x24ace9f 0x24af8cc 0x20f2cf2 0x20f3bbf 0x2100a6a 0x210017f 0x20fef96 0x25d530e 0x25c7010 0x25c5db5 0x5e7a556
 0x7fa8b7aee380

Just before that we see:

2023-10-20 01:11:41.408 <2023-10-20 01:11:41.394>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2e9120a0-a636-40c1-b7da-138af72f50c9: type=DATABASE_ERROR regex=(^ERROR|!\s*?ERR).*\[shard.*\] line_number=4327 node=longevity-lwt-3h-5-4-db-node-82136da5-1
2023-10-20T01:11:41.394+00:00 longevity-lwt-3h-5-4-db-node-82136da5-1      !ERR | scylla[6297]:  [shard 3:stre] querier_cache - semaphore mismatch detected, dropping reader cqlstress_lwt_example.blogposts_update_one_column_lwt_indicator:shard-reader: reader belongs to _read_concurrency_sem (0x604003f1e628) but the query class appropriate is _streaming_concurrency_sem (0x604003f1e838), at: 0x5f995ce 0x5f99b90 0x5f99e68 0x5a6b827 0x456c301 0x456a9c2 0x4599d1e 0x45995eb 0x13e9115 0x5e68ae1 0x5e68a6f 0x458c0a9 0x45b4e90 0x4591fcc 0x4590ab9 0x313170b 0x312eb8f 0x3229ce7 0x3228edb 0x322855d 0x3138e14 0x32c5d94 0x31b86ba 0x5a9c92f 0x5a9dc07 0x5ac1653 0x5a6c56a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x11286f

@denesb
Copy link
Contributor

denesb commented Nov 6, 2023

@avikivity
Copy link
Member

The oom killer was added in 5.4, so no backport is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cdc symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework tests/dtest
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants