Asserts and segmentation faults in topic creation under load #5558

travisdowns · 2022-07-21T18:59:44Z

Version & Environment

Redpanda version: 3c02b03

What went wrong?

During a load test, the following assert was observed:

ERROR 2022-07-19 22:03:06,831751 [shard 27 seq 259] assert - Assert failure: (../../../src/v/cluster/scheduling/allocation_node.cc:68) '_allocated_partitions > allocation_capacity{0} && _weights[core] > 0' unable to deallocate partition from core 23 at node {node: 27, max_partitions_per_core: 7000, state: active, partition_capacity: 223998, weights: [(3)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(0)(0)(0)(0)(0)(1)(1)(1)(1)(1)(1)(1)(1)(1)]}
ERROR 2022-07-19 22:03:06,831841 [shard 27 seq 260] assert - Backtrace below:
0x48ac094 0x3bdafec 0x3bc7e20 0x378424d 0x465c78f 0x4660467 0x46a38f5 0x45fd31f /opt/redpanda/lib/libpthread.so.0+0x8608 /opt/redpanda/lib/libc.so.6+0x11f132
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13, seastar::futurize<seastar::future<cluster::topic_result> >::type seastar::future<std::__1::error_code>::then_wrapped_nrvo<seastar::future<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13>(cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&&)::'lambda'(seastar::internal::promise_base_with_type<cluster::topic_result>&&, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&, seastar::future_state<std::__1::error_code>&&), std::__1::error_code>

The relevant backtrace is:

cluster::allocation_node::deallocate(unsigned int) at /home/ubuntu/redpanda/vbuild/release/clang/../../../src/v/cluster/scheduling/allocation_node.cc:64
cluster::allocation_units::~allocation_units() at /home/ubuntu/redpanda/vbuild/release/clang/../../../src/v/cluster/scheduling/types.cc:61
cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13::~$_13() at /home/ubuntu/redpanda/vbuild/release/clang/../../../src/v/cluster/topics_frontend.cc:488
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13, seastar::futurize<seastar::future<cluster::topic_result> >::type seastar::future<std::__1::error_code>::then_wrapped_nrvo<seastar::future<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13>(cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&&)::'lambda'(seastar::internal::promise_base_with_type<cluster::topic_result>&&, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&, seastar::future_state<std::__1::error_code>&&), std::__1::error_code>::~continuation() at /home/ubuntu/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/future.hh:757
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13, seastar::futurize<seastar::future<cluster::topic_result> >::type seastar::future<std::__1::error_code>::then_wrapped_nrvo<seastar::future<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13>(cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&&)::'lambda'(seastar::internal::promise_base_with_type<cluster::topic_result>&&, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&, seastar::future_state<std::__1::error_code>&&), std::__1::error_code>::run_and_dispose() at /home/ubuntu/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/future.hh:778
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /home/ubuntu/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2356
 (inlined by) seastar::reactor::run_some_tasks() at /home/ubuntu/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2769
seastar::reactor::do_run() at /home/ubuntu/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2938
seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94::operator()() const at /home/ubuntu/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4163

Similarly, a segmentation fault on shard 24 occurred in another run of the same load test with the following backtrace:

cluster::allocation_node::deallocate(unsigned int) at /home/ubuntu/redpanda/vbuild/release/clang/../../../src/v/cluster/scheduling/allocation_node.cc:59
cluster::allocation_units::~allocation_units() at /home/ubuntu/redpanda/vbuild/release/clang/../../../src/v/cluster/scheduling/types.cc:61
cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13::~$_13() at /home/ubuntu/redpanda/vbuild/release/clang/../../../src/v/cluster/topics_frontend.cc:488
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13, seastar::futurize<seastar::future<cluster::topic_result> >::type seastar::future<std::__1::error_code>::then_wrapped_nrvo<seastar::future<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13>(cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&&)::'lambda'(seastar::internal::promise_base_with_type<cluster::topic_result>&&, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&, seastar::future_state<std::__1::error_code>&&), std::__1::error_code>::~continuation() at /home/ubuntu/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/future.hh:757
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13, seastar::futurize<seastar::future<cluster::topic_result> >::type seastar::future<std::__1::error_code>::then_wrapped_nrvo<seastar::future<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13>(cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&&)::'lambda'(seastar::internal::promise_base_with_type<cluster::topic_result>&&, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&, seastar::future_state<std::__1::error_code>&&), std::__1::error_code>::run_and_dispose() at /home/ubuntu/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/future.hh:778

What should have happened instead?

No asserts or segmentation faults.

How to reproduce the issue?

This issue is difficult to reproduce deterministically, but the problem can be seen by inspection at

redpanda/src/v/cluster/topics_frontend.cc

Lines 460 to 469 in a80aca9

    
           auto units = co_await _allocator.invoke_on( 
        
             partition_allocator::shard, [assignable_config](partition_allocator& al) { 
        
                 return al.allocate(make_allocation_request(assignable_config)); 
        
             }); 
        
           if (!units) { 
        
               co_return make_error_result(assignable_config.cfg.tp_ns, units.error()); 
        
           } 
        
           co_return co_await replicate_create_topic( 
        
             std::move(assignable_config.cfg), std::move(units.value()), timeout);

.

This uses some cluster::allocation_units on "this" shard which were originally created on shard 0 (the invoke_on part). Internally units have a _state pointer which points back to an internal structure of the partition_allocator on shard 0 (indeed, the design of partition_allocator is that it only exists on shard 0).

These units are manipulated and destroyed (e.g., if there is an error in the highlighted section above, or in topics_frontend::replicate_create_topic otherwise) on this shard, which will call back into the allocator on shard 0 via the _state pointer, e.g. here.

This is a cross-shard race condition: multiple shards may manipulate this state on shard 0 in parallel.

The text was updated successfully, but these errors were encountered:

dotnwat · 2022-07-21T19:05:16Z

Potential duplicate? #3335

travisdowns · 2022-07-21T20:32:44Z

@dotnwat - oh interesting.

It's hard to say. It's the same assert but there's no decoded backtrace so it's difficult to assess whether allocation_units destructor is involved. Do you know why it was re-opened back in May?

In addition to my fix I'm adding oncore checks in debug so it should be caught more quickly and deterministically in the future.

travisdowns · 2022-07-21T20:36:39Z

Note that this code was refactored in the last month or so to co-routinize it, though the bug existed in the prior version too, but perhaps this change affected the timing or some other factor making it more likely now? Earlier load tests of a similar nature didn't trigger that as far as I noticed.

Return the allocatoin_units wrapped in a foreign pointer, since they need to be freed on the same core on which they were allocated. Fixes redpanda-data#5558.

dotnwat · 2022-07-22T19:00:46Z

Do you know why it was re-opened back in May?

Looks like I reopened it based on the last comment by John which seemed to cast some doubt on whether it was fixed.

Return the allocatoin_units wrapped in a foreign pointer, since they need to be freed on the same core on which they were allocated. Fixes redpanda-data#5558.

Const member prevents copy and move assignment, but copy and move assignment seem to have reasonable semantics (the destination shard id is replaced by the source) and we need it for allocation_units oncore tracking. Issue redpanda-data#5558.

Return the allocation_units wrapped in a foreign pointer, since they need to be freed on the same core on which they were allocated. Fixes redpanda-data#5558.

Const member prevents copy and move assignment, but copy and move assignment seem to have reasonable semantics (the destination shard id is replaced by the source) and we need it for allocation_units oncore tracking. Issue redpanda-data#5558.

Return the allocation_units wrapped in a foreign pointer, since they need to be freed on the same core on which they were allocated. Fixes redpanda-data#5558.

Const member prevents copy and move assignment, but copy and move assignment seem to have reasonable semantics (the destination shard id is replaced by the source) and we need it for allocation_units oncore tracking. Issue redpanda-data#5558. (cherry picked from commit c64fdbb)

Return the allocation_units wrapped in a foreign pointer, since they need to be freed on the same core on which they were allocated. Fixes redpanda-data#5558. (cherry picked from commit 413cbe3)

Const member prevents copy and move assignment, but copy and move assignment seem to have reasonable semantics (the destination shard id is replaced by the source) and we need it for allocation_units oncore tracking. Issue redpanda-data#5558. (cherry picked from commit c64fdbb)

Return the allocation_units wrapped in a foreign pointer, since they need to be freed on the same core on which they were allocated. Fixes redpanda-data#5558. (cherry picked from commit 413cbe3)

travisdowns added kind/bug Something isn't working area/controller DW labels Jul 21, 2022

travisdowns self-assigned this Jul 21, 2022

travisdowns mentioned this issue Jul 22, 2022

cluster: assertion deallocating partition from core #3335

Open

travisdowns mentioned this issue Nov 17, 2022

Fix cross-shard allocator manipulation #7351

Merged

6 tasks

piyushredpanda closed this as completed in #7351 Nov 21, 2022

vbotbuildovich mentioned this issue Nov 21, 2022

[v22.3.x] Asserts and segmentation faults in topic creation under load #7414

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asserts and segmentation faults in topic creation under load #5558

Asserts and segmentation faults in topic creation under load #5558

travisdowns commented Jul 21, 2022

dotnwat commented Jul 21, 2022

travisdowns commented Jul 21, 2022 •

edited

travisdowns commented Jul 21, 2022

dotnwat commented Jul 22, 2022

Asserts and segmentation faults in topic creation under load #5558

Asserts and segmentation faults in topic creation under load #5558

Comments

travisdowns commented Jul 21, 2022

Version & Environment

What went wrong?

What should have happened instead?

How to reproduce the issue?

dotnwat commented Jul 21, 2022

travisdowns commented Jul 21, 2022 • edited

travisdowns commented Jul 21, 2022

dotnwat commented Jul 22, 2022

travisdowns commented Jul 21, 2022 •

edited