Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

permit ks.mv_cf_view_13:shard-reader detected a leak of {count=0, memory=32768} resources #13539

Closed
bhalevy opened this issue Apr 17, 2023 · 10 comments
Assignees
Labels
symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework
Milestone

Comments

@bhalevy
Copy link
Member

bhalevy commented Apr 17, 2023

Seen in https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-release/232/artifact/logs-heavy.release.001/1681710153096_materialized_views_test.py%3A%3ATestMaterializedViews%3A%3Atest_mv_alter_with_synchronous_updates/node2.log

Scylla version 5.3.0~dev-0.20230417.c501163f955e with build-id 07cfea1bc38ee632328349d10316fc5d8cdaa35a starting ...

ERROR 2023-04-17 05:36:44,786 [shard 1] reader_concurrency_semaphore - reader_permit::impl::~impl(): permit ks.mv_cf_view_13:shard-reader detected a leak of {count=0, memory=32768} resources, at: 0x580588e 0x5805e40 0x5806128 0x543dda3 0x40b503d 0x40b5c68 0x4057c17 0x409a1f6 0x1bca2ba 0x546e434 0x546f6b7 0x5491ca1 0x543e88a /jenkins/workspace/scylla-master/dtest-daily-release/scylla/.ccm/scylla-repository/c501163f955e8f665211fb1456f92fcedf07dd3d/libreloc/libc.so.6+0x8b12c /jenkins/workspace/scylla-master/dtest-daily-release/scylla/.ccm/scylla-repository/c501163f955e8f665211fb1456f92fcedf07dd3d/libreloc/libc.so.6+0x10cbbf
   --------
   seastar::internal::coroutine_traits_base<query::result>::promise_type
   --------
   seastar::internal::coroutine_traits_base<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >::promise_type
   --------
   seastar::internal::coroutine_traits_base<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >::promise_type
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >, service::storage_proxy::query_result_local(seastar::lw_shared_ptr<schema const>, seastar::lw_shared_ptr<query::read_command>, nonwrapping_interval<dht::ring_position> const&, query::result_options, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)::$_31, seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >::then_impl_nrvo<service::storage_proxy::query_result_local(seastar::lw_shared_ptr<schema const>, seastar::lw_shared_ptr<query::read_command>, nonwrapping_interval<dht::ring_position> const&, query::result_options, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)::$_31, seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> > >(service::storage_proxy::query_result_local(seastar::lw_shared_ptr<schema const>, seastar::lw_shared_ptr<query::read_command>, nonwrapping_interval<dht::ring_position> const&, query::result_options, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)::$_31&&)::{lambda(seastar::internal::promise_base_with_type<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >&&, service::storage_proxy::query_result_local(seastar::lw_shared_ptr<schema const>, seastar::lw_shared_ptr<query::read_command>, nonwrapping_interval<dht::ring_position> const&, query::result_options, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)::$_31&, seastar::future_state<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >&&)#1}, seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, service::abstract_read_executor::make_data_requests(seastar::shared_ptr<service::digest_read_resolver>, gms::inet_address*, gms::inet_address*, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, bool)::{lambda(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >)#1}, seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >::then_wrapped_nrvo<void, service::abstract_read_executor::make_data_requests(seastar::shared_ptr<service::digest_read_resolver>, gms::inet_address*, gms::inet_address*, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, bool)::{lambda(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >)#1}>(service::abstract_read_executor::make_data_requests(seastar::shared_ptr<service::digest_read_resolver>, gms::inet_address*, gms::inet_address*, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, bool)::{lambda(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, service::abstract_read_executor::make_data_requests(seastar::shared_ptr<service::digest_read_resolver>, gms::inet_address*, gms::inet_address*, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, bool)::{lambda(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >)#1}&, seastar::future_state<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >&&)#1}, seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >
Aborting on shard 1.
Backtrace:
  0x545d318
  0x54917f2
  /jenkins/workspace/scylla-master/dtest-daily-release/scylla/.ccm/scylla-repository/c501163f955e8f665211fb1456f92fcedf07dd3d/libreloc/libc.so.6+0x3cb1f
  /jenkins/workspace/scylla-master/dtest-daily-release/scylla/.ccm/scylla-repository/c501163f955e8f665211fb1456f92fcedf07dd3d/libreloc/libc.so.6+0x8ce5b
  /jenkins/workspace/scylla-master/dtest-daily-release/scylla/.ccm/scylla-repository/c501163f955e8f665211fb1456f92fcedf07dd3d/libreloc/libc.so.6+0x3ca75
  /jenkins/workspace/scylla-master/dtest-daily-release/scylla/.ccm/scylla-repository/c501163f955e8f665211fb1456f92fcedf07dd3d/libreloc/libc.so.6+0x267fb
  0x543de22
  0x40b503d
  0x40b5c68
  0x4057c17
  0x409a1f6
  0x1bca2ba
  0x546e434
  0x546f6b7
  0x5491ca1
  0x543e88a
  /jenkins/workspace/scylla-master/dtest-daily-release/scylla/.ccm/scylla-repository/c501163f955e8f665211fb1456f92fcedf07dd3d/libreloc/libc.so.6+0x8b12c
  /jenkins/workspace/scylla-master/dtest-daily-release/scylla/.ccm/scylla-repository/c501163f955e8f665211fb1456f92fcedf07dd3d/libreloc/libc.so.6+0x10cbbf

Decoded:

[Backtrace #0]
void seastar::backtrace<seastar::current_backtrace_tasklocal()::$_3>(seastar::current_backtrace_tasklocal()::$_3&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:60
 (inlined by) seastar::current_backtrace_tasklocal() at ./build/release/seastar/./seastar/src/util/backtrace.cc:86
seastar::current_tasktrace() at ./build/release/seastar/./seastar/src/util/backtrace.cc:137
seastar::current_backtrace() at ./build/release/seastar/./seastar/src/util/backtrace.cc:170
void log_error_and_backtrace<std::basic_string_view<char, std::char_traits<char> > >(seastar::logger&, std::basic_string_view<char, std::char_traits<char> > const&) at ./build/release/seastar/./seastar/src/core/on_internal_error.cc:38
 (inlined by) seastar::on_internal_error_noexcept(seastar::logger&, std::basic_string_view<char, std::char_traits<char> >) at ./build/release/seastar/./seastar/src/core/on_internal_error.cc:60
~impl at ./reader_concurrency_semaphore.cc:246
~impl at ./reader_concurrency_semaphore.cc:240
~remote_parts at ./multishard_mutation_query.cc:93
 (inlined by) std::default_delete<read_context::reader_meta::remote_parts>::operator()(read_context::reader_meta::remote_parts*) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/unique_ptr.h:95
 (inlined by) ~unique_ptr at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/unique_ptr.h:396
 (inlined by) ~ at ./multishard_mutation_query.cc:392
 (inlined by) operator() at ./multishard_mutation_query.cc:392
 (inlined by) seastar::future<void> std::__invoke_impl<seastar::future<void>, read_context::stop()::$_0::operator()(unsigned int) const::{lambda(replica::database&)#1}, replica::database&>(std::__invoke_other, read_context::stop()::$_0::operator()(unsigned int) const::{lambda(replica::database&)#1}&&, replica::database&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::__invoke_result<read_context::stop()::$_0::operator()(unsigned int) const::{lambda(replica::database&)#1}, replica::database&>::type std::__invoke<read_context::stop()::$_0::operator()(unsigned int) const::{lambda(replica::database&)#1}, replica::database&>(std::__invoke_result&&, (read_context::stop()::$_0::operator()(unsigned int) const::{lambda(replica::database&)#1}&&)...) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:96
 (inlined by) decltype(auto) std::__apply_impl<read_context::stop()::$_0::operator()(unsigned int) const::{lambda(replica::database&)#1}, std::tuple<replica::database&>, 0ul>(read_context::stop()::$_0::operator()(unsigned int) const::{lambda(replica::database&)#1}&&, std::tuple<replica::database&>&&, std::integer_sequence<unsigned long, 0ul>) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:1852
 (inlined by) decltype(auto) std::apply<read_context::stop()::$_0::operator()(unsigned int) const::{lambda(replica::database&)#1}, std::tuple<replica::database&> >(read_context::stop()::$_0::operator()(unsigned int) const::{lambda(replica::database&)#1}&&, std::tuple<replica::database&>&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:1863
 (inlined by) operator() at ././seastar/include/seastar/core/sharded.hh:458
 (inlined by) _ZN7seastar8futurizeINS_6futureIvEEE6invokeIRZNS_7shardedIN7replica8databaseEE9invoke_onIZZN12read_context4stopEvENK3$_0clEjEUlRS7_E_JES2_EET1_jNS_21smp_submit_to_optionsEOT_DpOT0_EUlvE_JEEES2_SH_SK_ at ././seastar/include/seastar/core/future.hh:1999
_ZN7seastar3smp9submit_toIZNS_7shardedIN7replica8databaseEE9invoke_onIZZN12read_context4stopEvENK3$_0clEjEUlRS4_E_JENS_6futureIvEEEET1_jNS_21smp_submit_to_optionsEOT_DpOT0_EUlvE_EENS_8futurizeINSt13invoke_resultISF_JEE4typeEE4typeEjSE_SG_ at ././seastar/include/seastar/core/smp.hh:358
 (inlined by) seastar::future<void> seastar::sharded<replica::database>::invoke_on<read_context::stop()::$_0::operator()(unsigned int) const::{lambda(replica::database&)#1}, , seastar::future<void> >(unsigned int, seastar::smp_submit_to_options, read_context::stop()::$_0::operator()(unsigned int) const::{lambda(replica::database&)#1}&&) at ././seastar/include/seastar/core/sharded.hh:456
 (inlined by) seastar::future<void> seastar::sharded<replica::database>::invoke_on<read_context::stop()::$_0::operator()(unsigned int) const::{lambda(replica::database&)#1}, , seastar::future<void> >(unsigned int, read_context::stop()::$_0::operator()(unsigned int) const::{lambda(replica::database&)#1}&&) at ././seastar/include/seastar/core/sharded.hh:474
 (inlined by) operator() at ./multishard_mutation_query.cc:387
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<read_context::stop()::$_0, unsigned int>(read_context::stop()::$_0&&, unsigned int&&) at ././seastar/include/seastar/core/future.hh:1999
 (inlined by) auto seastar::futurize_invoke<read_context::stop()::$_0, unsigned int>(read_context::stop()::$_0&&, unsigned int&&) at ././seastar/include/seastar/core/future.hh:2030
 (inlined by) seastar::future<void> seastar::parallel_for_each<boost::range_detail::integer_iterator<unsigned int>, boost::range_detail::integer_iterator<unsigned int>, read_context::stop()::$_0>(boost::range_detail::integer_iterator<unsigned int>, boost::range_detail::integer_iterator<unsigned int>, read_context::stop()::$_0&&) at ././seastar/include/seastar/core/loop.hh:571
 (inlined by) seastar::future<void> seastar::internal::parallel_for_each_impl<boost::integer_range<unsigned int>, read_context::stop()::$_0>(boost::integer_range<unsigned int>&&, read_context::stop()::$_0&&) at ././seastar/include/seastar/core/loop.hh:624
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<seastar::future<void> (*&)(boost::integer_range<unsigned int>&&, read_context::stop()::$_0&&), boost::integer_range<unsigned int>, read_context::stop()::$_0>(seastar::future<void> (*&)(boost::integer_range<unsigned int>&&, read_context::stop()::$_0&&), boost::integer_range<unsigned int>&&, read_context::stop()::$_0&&) at ././seastar/include/seastar/core/future.hh:1999
 (inlined by) auto seastar::futurize_invoke<seastar::future<void> (*&)(boost::integer_range<unsigned int>&&, read_context::stop()::$_0&&), boost::integer_range<unsigned int>, read_context::stop()::$_0>(seastar::future<void> (*&)(boost::integer_range<unsigned int>&&, read_context::stop()::$_0&&), boost::integer_range<unsigned int>&&, read_context::stop()::$_0&&) at ././seastar/include/seastar/core/future.hh:2030
 (inlined by) seastar::future<void> seastar::parallel_for_each<boost::integer_range<unsigned int>, read_context::stop()::$_0>(boost::integer_range<unsigned int>&&, read_context::stop()::$_0&&) at ././seastar/include/seastar/core/loop.hh:639
 (inlined by) read_context::stop() at ./multishard_mutation_query.cc:385
 (inlined by) seastar::future<(anonymous namespace)::data_query_result_builder::result_type> do_query<(anonymous namespace)::data_query_result_builder>(seastar::sharded<replica::database>&, seastar::lw_shared_ptr<schema const>, query::read_command const&, std::vector<nonwrapping_interval<dht::ring_position>, std::allocator<nonwrapping_interval<dht::ring_position> > > const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::noncopyable_function<(anonymous namespace)::data_query_result_builder (compact_mutation_state<(compact_for_sstables)0> const&)>) at ./multishard_mutation_query.cc:769
std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<query::result>::promise_type>::resume() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/coroutine:244
 (inlined by) seastar::internal::coroutine_traits_base<query::result>::promise_type::run_and_dispose() at ././seastar/include/seastar/core/coroutine.hh:78
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2557
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:3020
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3189
operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:4393
 (inlined by) void std::__invoke_impl<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_97&>(std::__invoke_other, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_97&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_97&>, void>::type std::__invoke_r<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_97&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_97&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:111
 (inlined by) std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_97>::_M_invoke(std::_Any_data const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290
std::function<void ()>::operator()() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591
 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:73

Coredump is here: https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-release/232/artifact/logs-heavy.release.001/1681710153096_materialized_views_test.py%3A%3ATestMaterializedViews%3A%3Atest_mv_alter_with_synchronous_updates/node2-reactor-1.1233.1681709804.core.gz
Reloc package: http://downloads.scylladb.com/unstable/scylla/master/relocatable/2023-04-17T03:05:10Z/scylla-unstripped-5.3.0~dev-0.20230417.c501163f955e.x86_64.tar.gz

@bhalevy bhalevy added the triage/master Looking for assignee label Apr 17, 2023
@DoronArazii DoronArazii added this to the 5.3 milestone Apr 17, 2023
@denesb
Copy link
Contributor

denesb commented Apr 20, 2023

All resource consumption happens via RAII objects, that also have a reader_permit onboard.
The fact that the permit is destroyed in the stop() here means that there were no other copies around, this was the last. Consequently there are no RAII resource objects alive.

There is only one object that doesn't completely conform to RAII: tracking_allocator. This object uses raw consume()/signal() API and it relies on the container using it to properly release all memory through it. That said this allocator also has a permit instance onboard and if it was destroyed, all memory should have been released. Unless in some cases circular_buffer<> can report a different size in deallocate() compared to the original allocate() call.

denesb added a commit to denesb/scylla that referenced this issue Apr 20, 2023
This API is dangerous, all resource consumption should happen via RAII
objects that guarantee that all consumed resources are appropriately
released.
At this poit, said API is just a low-level building block for
higher-level, RAII objects. To ensure nobody thinks of using it for
other purposes, make it private and make external users friends instead.

Refs: scylladb#13539
@denesb
Copy link
Contributor

denesb commented Apr 20, 2023

I see a lot of std::bad_alloc in the logs. Maybe some code-paths are not exception safe.

@denesb
Copy link
Contributor

denesb commented Apr 20, 2023

INFO  2023-04-17 05:36:39,749 [shard 1] reader_concurrency_semaphore - Semaphore _read_concurrency_sem with 100/100 count and 38437565/9688842 memory resources: kill limit triggered, dumping permit diagnostics:
permits	count	memory	table/description/state
92	92	35M	ks.mv_cf_view_13/shard-reader/active/await
4	4	1414K	ks.mv_cf_view_13/shard-reader/waiting_for_memory
4	4	435K	ks.mv_cf_view_13/shard-reader/active/need_cpu
115	0	12K	ks.mv_cf_view_13/multishard-mutation-query/active
14	0	0B	ks.mv_cf_view_13/shard-reader/waiting_for_admission
1	0	0B	ks.mv_cf_view_13/shard-reader/evicted

230	100	37M	total

Stats:
permit_based_evictions: 4238
time_based_evictions: 1
inactive_reads: 0
total_successful_reads: 18
total_failed_reads: 0
total_reads_shed_due_to_overload: 0
total_reads_killed_due_to_kill_limit: 1
reads_admitted: 33219
reads_enqueued_for_admission: 23579
reads_enqueued_for_memory: 1621
reads_admitted_immediately: 9654
reads_queued_because_ready_list: 0
reads_queued_because_used_permits: 23175
reads_queued_because_memory_resources: 291
reads_queued_because_count_resources: 113
reads_queued_with_eviction: 17
total_permits: 65126
current_permits: 230
used_permits: 96
blocked_permits: 92
disk_reads: 100
sstables_read: 500

After the kill limit is triggered, all consume() calls throw until memory consumption drops back below the limit.

@DoronArazii
Copy link

Picked up by dev, removing "Master/Triage".

@DoronArazii DoronArazii removed the triage/master Looking for assignee label Apr 20, 2023
@denesb
Copy link
Contributor

denesb commented Apr 24, 2023

I found this in at least one other run: https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-release/231/

So it is not a one-off fluke. Interestingly, the other failure also leaks the exact same amount. Not sure if this has any significance.

I still haven't managed to zero in on the bug. I haven't manage to find any evidence that this is related to throwing consume() although that is still my primary suspect.

Looking at the core I found inbound references to the reader permit which is getting destroyed (and has 0 references according to shared pointer accounting). I have not managed to identify any of these references I found, it could be coincidence (random blob looking like a pointer).

@mykaul mykaul added the symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework label Apr 24, 2023
@denesb
Copy link
Contributor

denesb commented Apr 25, 2023

Looks like the throwing consume() was a red herring. After digging in the core some more, I found out that the read successfully finished. This is also confirmed by reader_permit::impl::_oom_kills being 0 (this counter is bumped each time the semaphore throws in a consume() called from the permit).

@denesb
Copy link
Contributor

denesb commented Apr 26, 2023

I think I found the problem. As I learned not too long ago, a permit which is in the waiting_for_memory state, can be marked inactive (#12700). This is possible if the permit was put into waiting_for_memory via a background read-ahead, at the same time as the high-level read is becoming inactive. After #12777, all memory requests are killed when a permit becomes inactive. Since the memory requests were made on behalf of background read-ahead, the read itself can survive this and go on. This causes a problem, because the code assumes that a killed read is dead and as a consequence, the _requested_memory field in the permit, which tracks how much memory was requested is not cleared. If there is a subsequent block on memory request, that is eventually granted, the consumed memory amount will include the previous failed amount as well, but the released amount will only be that of the new amount. The mismatch will remain on the permit's books as a leak, as seen in this issue.

denesb added a commit to denesb/scylla that referenced this issue Apr 26, 2023
This API is dangerous, all resource consumption should happen via RAII
objects that guarantee that all consumed resources are appropriately
released.
At this poit, said API is just a low-level building block for
higher-level, RAII objects. To ensure nobody thinks of using it for
other purposes, make it private and make external users friends instead.

Refs: scylladb#13539
@denesb
Copy link
Contributor

denesb commented Apr 26, 2023

Successfully reproduced with a unit test.

denesb added a commit to denesb/scylla that referenced this issue Apr 26, 2023
…mory requests

When requesting memory via `reader_permit::request_memory()`, the
requested amount is added to `_requested_memory` member of the permit
impl. This is because multiple concurrent requests may be blocked and
waiting at the same time. When the requests are fulfilled, the entire
amount is consumed and individual requests track their requested amount
with `resource_units` to release later.
There is a corner-case related to this: if a reader permit is registered
as inactive while it is waiting for memory, its active requests are
killed with `std::bad_alloc`, but the `_requested_memory` fields is not
cleared. If the read survives because the killed requests were part of
a non-vital background read-ahead, a later memory request will also
include amount from the failed requests. This extra amount wil not be
released and hence will cause a resource leak when the permit is
destroyed.
Fix by detecting this corner case and clearing the `_requested_memory`
field. Modify the existing unit test for the scenario of a permit
waiting on memory being registered as inactive, to also cover this
corner case, reproducing the bug.

Fixes: scylladb#13539
@denesb
Copy link
Contributor

denesb commented Apr 26, 2023

Fix here: #13679

@denesb
Copy link
Contributor

denesb commented Dec 18, 2023

No branches affected, no backports needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants