Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asserts and segmentation faults in topic creation under load #5558

Closed
travisdowns opened this issue Jul 21, 2022 · 4 comments · Fixed by #7351
Closed

Asserts and segmentation faults in topic creation under load #5558

travisdowns opened this issue Jul 21, 2022 · 4 comments · Fixed by #7351
Assignees
Labels
area/controller DW kind/bug Something isn't working

Comments

@travisdowns
Copy link
Member

Version & Environment

Redpanda version: 3c02b03

What went wrong?

During a load test, the following assert was observed:

ERROR 2022-07-19 22:03:06,831751 [shard 27 seq 259] assert - Assert failure: (../../../src/v/cluster/scheduling/allocation_node.cc:68) '_allocated_partitions > allocation_capacity{0} && _weights[core] > 0' unable to deallocate partition from core 23 at node {node: 27, max_partitions_per_core: 7000, state: active, partition_capacity: 223998, weights: [(3)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(0)(0)(0)(0)(0)(1)(1)(1)(1)(1)(1)(1)(1)(1)]}
ERROR 2022-07-19 22:03:06,831841 [shard 27 seq 260] assert - Backtrace below:
0x48ac094 0x3bdafec 0x3bc7e20 0x378424d 0x465c78f 0x4660467 0x46a38f5 0x45fd31f /opt/redpanda/lib/libpthread.so.0+0x8608 /opt/redpanda/lib/libc.so.6+0x11f132
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13, seastar::futurize<seastar::future<cluster::topic_result> >::type seastar::future<std::__1::error_code>::then_wrapped_nrvo<seastar::future<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13>(cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&&)::'lambda'(seastar::internal::promise_base_with_type<cluster::topic_result>&&, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&, seastar::future_state<std::__1::error_code>&&), std::__1::error_code>

The relevant backtrace is:

cluster::allocation_node::deallocate(unsigned int) at /home/ubuntu/redpanda/vbuild/release/clang/../../../src/v/cluster/scheduling/allocation_node.cc:64
cluster::allocation_units::~allocation_units() at /home/ubuntu/redpanda/vbuild/release/clang/../../../src/v/cluster/scheduling/types.cc:61
cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13::~$_13() at /home/ubuntu/redpanda/vbuild/release/clang/../../../src/v/cluster/topics_frontend.cc:488
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13, seastar::futurize<seastar::future<cluster::topic_result> >::type seastar::future<std::__1::error_code>::then_wrapped_nrvo<seastar::future<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13>(cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&&)::'lambda'(seastar::internal::promise_base_with_type<cluster::topic_result>&&, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&, seastar::future_state<std::__1::error_code>&&), std::__1::error_code>::~continuation() at /home/ubuntu/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/future.hh:757
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13, seastar::futurize<seastar::future<cluster::topic_result> >::type seastar::future<std::__1::error_code>::then_wrapped_nrvo<seastar::future<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13>(cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&&)::'lambda'(seastar::internal::promise_base_with_type<cluster::topic_result>&&, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&, seastar::future_state<std::__1::error_code>&&), std::__1::error_code>::run_and_dispose() at /home/ubuntu/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/future.hh:778
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /home/ubuntu/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2356
 (inlined by) seastar::reactor::run_some_tasks() at /home/ubuntu/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2769
seastar::reactor::do_run() at /home/ubuntu/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2938
seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94::operator()() const at /home/ubuntu/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4163

Similarly, a segmentation fault on shard 24 occurred in another run of the same load test with the following backtrace:

cluster::allocation_node::deallocate(unsigned int) at /home/ubuntu/redpanda/vbuild/release/clang/../../../src/v/cluster/scheduling/allocation_node.cc:59
cluster::allocation_units::~allocation_units() at /home/ubuntu/redpanda/vbuild/release/clang/../../../src/v/cluster/scheduling/types.cc:61
cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13::~$_13() at /home/ubuntu/redpanda/vbuild/release/clang/../../../src/v/cluster/topics_frontend.cc:488
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13, seastar::futurize<seastar::future<cluster::topic_result> >::type seastar::future<std::__1::error_code>::then_wrapped_nrvo<seastar::future<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13>(cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&&)::'lambda'(seastar::internal::promise_base_with_type<cluster::topic_result>&&, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&, seastar::future_state<std::__1::error_code>&&), std::__1::error_code>::~continuation() at /home/ubuntu/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/future.hh:757
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13, seastar::futurize<seastar::future<cluster::topic_result> >::type seastar::future<std::__1::error_code>::then_wrapped_nrvo<seastar::future<cluster::topic_result>, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13>(cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&&)::'lambda'(seastar::internal::promise_base_with_type<cluster::topic_result>&&, cluster::topics_frontend::replicate_create_topic(cluster::topic_configuration, cluster::allocation_units, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::$_13&, seastar::future_state<std::__1::error_code>&&), std::__1::error_code>::run_and_dispose() at /home/ubuntu/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/future.hh:778

What should have happened instead?

No asserts or segmentation faults.

How to reproduce the issue?

This issue is difficult to reproduce deterministically, but the problem can be seen by inspection at

auto units = co_await _allocator.invoke_on(
partition_allocator::shard, [assignable_config](partition_allocator& al) {
return al.allocate(make_allocation_request(assignable_config));
});
if (!units) {
co_return make_error_result(assignable_config.cfg.tp_ns, units.error());
}
co_return co_await replicate_create_topic(
std::move(assignable_config.cfg), std::move(units.value()), timeout);
.

This uses some cluster::allocation_units on "this" shard which were originally created on shard 0 (the invoke_on part). Internally units have a _state pointer which points back to an internal structure of the partition_allocator on shard 0 (indeed, the design of partition_allocator is that it only exists on shard 0).

These units are manipulated and destroyed (e.g., if there is an error in the highlighted section above, or in topics_frontend::replicate_create_topic otherwise) on this shard, which will call back into the allocator on shard 0 via the _state pointer, e.g. here.

This is a cross-shard race condition: multiple shards may manipulate this state on shard 0 in parallel.

@travisdowns travisdowns added kind/bug Something isn't working area/controller DW labels Jul 21, 2022
@travisdowns travisdowns self-assigned this Jul 21, 2022
@dotnwat
Copy link
Member

dotnwat commented Jul 21, 2022

Potential duplicate? #3335

@travisdowns
Copy link
Member Author

travisdowns commented Jul 21, 2022

@dotnwat - oh interesting.

It's hard to say. It's the same assert but there's no decoded backtrace so it's difficult to assess whether allocation_units destructor is involved. Do you know why it was re-opened back in May?

In addition to my fix I'm adding oncore checks in debug so it should be caught more quickly and deterministically in the future.

@travisdowns
Copy link
Member Author

Note that this code was refactored in the last month or so to co-routinize it, though the bug existed in the prior version too, but perhaps this change affected the timing or some other factor making it more likely now? Earlier load tests of a similar nature didn't trigger that as far as I noticed.

travisdowns added a commit to travisdowns/redpanda that referenced this issue Jul 21, 2022
Return the allocatoin_units wrapped in a foreign pointer, since they
need to be freed on the same core on which they were allocated.

Fixes redpanda-data#5558.
travisdowns added a commit to travisdowns/redpanda that referenced this issue Jul 22, 2022
Return the allocatoin_units wrapped in a foreign pointer, since they
need to be freed on the same core on which they were allocated.

Fixes redpanda-data#5558.
travisdowns added a commit to travisdowns/redpanda that referenced this issue Jul 22, 2022
Return the allocatoin_units wrapped in a foreign pointer, since they
need to be freed on the same core on which they were allocated.

Fixes redpanda-data#5558.
@dotnwat
Copy link
Member

dotnwat commented Jul 22, 2022

Do you know why it was re-opened back in May?

Looks like I reopened it based on the last comment by John which seemed to cast some doubt on whether it was fixed.

travisdowns added a commit to travisdowns/redpanda that referenced this issue Aug 11, 2022
Return the allocatoin_units wrapped in a foreign pointer, since they
need to be freed on the same core on which they were allocated.

Fixes redpanda-data#5558.
travisdowns added a commit to travisdowns/redpanda that referenced this issue Nov 17, 2022
Const member prevents copy and move assignment, but copy and move
assignment seem to have reasonable semantics (the destination
shard id is replaced by the source) and we need it for allocation_units
oncore tracking.

Issue redpanda-data#5558.
travisdowns added a commit to travisdowns/redpanda that referenced this issue Nov 17, 2022
Return the allocation_units wrapped in a foreign pointer, since they
need to be freed on the same core on which they were allocated.

Fixes redpanda-data#5558.
travisdowns added a commit to travisdowns/redpanda that referenced this issue Nov 17, 2022
Const member prevents copy and move assignment, but copy and move
assignment seem to have reasonable semantics (the destination
shard id is replaced by the source) and we need it for allocation_units
oncore tracking.

Issue redpanda-data#5558.
travisdowns added a commit to travisdowns/redpanda that referenced this issue Nov 17, 2022
Return the allocation_units wrapped in a foreign pointer, since they
need to be freed on the same core on which they were allocated.

Fixes redpanda-data#5558.
travisdowns added a commit to travisdowns/redpanda that referenced this issue Nov 17, 2022
Return the allocation_units wrapped in a foreign pointer, since they
need to be freed on the same core on which they were allocated.

Fixes redpanda-data#5558.
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Nov 21, 2022
Const member prevents copy and move assignment, but copy and move
assignment seem to have reasonable semantics (the destination
shard id is replaced by the source) and we need it for allocation_units
oncore tracking.

Issue redpanda-data#5558.

(cherry picked from commit c64fdbb)
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Nov 21, 2022
Return the allocation_units wrapped in a foreign pointer, since they
need to be freed on the same core on which they were allocated.

Fixes redpanda-data#5558.

(cherry picked from commit 413cbe3)
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue Nov 21, 2022
Const member prevents copy and move assignment, but copy and move
assignment seem to have reasonable semantics (the destination
shard id is replaced by the source) and we need it for allocation_units
oncore tracking.

Issue redpanda-data#5558.

(cherry picked from commit c64fdbb)
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue Nov 21, 2022
Return the allocation_units wrapped in a foreign pointer, since they
need to be freed on the same core on which they were allocated.

Fixes redpanda-data#5558.

(cherry picked from commit 413cbe3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller DW kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants