Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in storage::segment_reader_handle::close() from tiered storage #18213

Closed
andrwng opened this issue May 2, 2024 · 0 comments · Fixed by #18229
Closed

Segfault in storage::segment_reader_handle::close() from tiered storage #18213

andrwng opened this issue May 2, 2024 · 0 comments · Fixed by #18229
Labels
area/cloud-storage Shadow indexing subsystem kind/bug Something isn't working sev/high loss of availability, pathological performance degradation, recoverable corruption

Comments

@andrwng
Copy link
Contributor

andrwng commented May 2, 2024

Version & Environment

Redpanda version: (use rpk version): v23.3.12 arm

We saw in one cluster several crashes that all appear to have similar root causes.

23.3.12-arm$ ~/Repos/redpanda/vbuild/debug/clang/v_deps_build/seastar-prefix/src/seastar/scripts/seastar-addr2line  -e ~/xfs/23.3.12-arm/libexec/redpanda 0x7b61a43   0x7ba66f3   0x6dd5b6b   0x635ab43   0x2c6645f   0x7b5c36b   0x7b5ea1f   0x7b5cbf3   0x7a81e43   0x7a80997   0x2b56ca7   0x7df9163   0x2b4fd2f
[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/util/backtrace.hh:64
 (inlined by) seastar::backtrace_buffer::append_backtrace() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:824
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:854
 (inlined by) seastar::print_with_backtrace(char const*, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:866
seastar::sigsegv_action() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4150
 (inlined by) seastar::install_oneshot_signal_handler<11, (void (*)())(&seastar::sigsegv_action)>()::{lambda(int, siginfo_t*, void*)#1}::operator()(int, siginfo_t*, void*) const at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4131
 (inlined by) seastar::install_oneshot_signal_handler<11, (void (*)())(&seastar::sigsegv_action)>()::{lambda(int, siginfo_t*, void*)#1}::__invoke(int, siginfo_t*, void*) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4127
storage::segment_reader_handle::~segment_reader_handle() at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-0831432e5e658e55f-1/redpanda/redpanda/src/v/storage/segment_reader.cc:208
storage::continuous_batch_parser::~continuous_batch_parser() at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-0831432e5e658e55f-1/redpanda/redpanda/src/v/storage/parser.h:108
 (inlined by) std::__1::default_delete<storage::continuous_batch_parser>::operator()[abi:v160004](storage::continuous_batch_parser*) const at /vectorized/llvm/bin/../include/c++/v1/__memory/unique_ptr.h:65
 (inlined by) std::__1::unique_ptr<storage::continuous_batch_parser, std::__1::default_delete<storage::continuous_batch_parser> >::reset[abi:v160004](storage::continuous_batch_parser*) at /vectorized/llvm/bin/../include/c++/v1/__memory/unique_ptr.h:297
 (inlined by) cloud_storage::remote_segment_batch_reader::stop() at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-0831432e5e658e55f-1/redpanda/redpanda/src/v/cloud_storage/remote_segment.cc:1484
23.3.12-arm$ ~/Repos/redpanda/vbuild/debug/clang/v_deps_build/seastar-prefix/src/seastar/scripts/seastar-addr2line  -e ~/xfs/23.3.12-arm/libexec/redpanda 0x7b61a43   0x7ba66f3   linux-vdso.so.1+0x84f   0x6dda8bf   0x2c6645f   0x7b5c36b   0x7b5ea1f   0x7ba7d83   0x7afca1b   /opt/redpanda/lib/libc.so.6+0x843b7   /opt/redpanda/lib/libc.so.6
+0xef2db
[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/util/backtrace.hh:64
 (inlined by) seastar::backtrace_buffer::append_backtrace() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:824
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:854
 (inlined by) seastar::print_with_backtrace(char const*, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:866
seastar::sigsegv_action() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4150
 (inlined by) seastar::install_oneshot_signal_handler<11, (void (*)())(&seastar::sigsegv_action)>()::{lambda(int, siginfo_t*, void*)#1}::operator()(int, siginfo_t*, void*) const at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4131
 (inlined by) seastar::install_oneshot_signal_handler<11, (void (*)())(&seastar::sigsegv_action)>()::{lambda(int, siginfo_t*, void*)#1}::__invoke(int, siginfo_t*, void*) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4127
addr2line: 'linux-vdso.so.1': No such file
linux-vdso.so.1 0x84f
boost::intrusive::list_node_traits<void*>::get_previous(boost::intrusive::list_node<void*>*) at /vectorized/include/boost/intrusive/detail/list_node.hpp:54
 (inlined by) boost::intrusive::circular_list_algorithms<boost::intrusive::list_node_traits<void*> >::unlink(boost::intrusive::list_node<void*>*) at /vectorized/include/boost/intrusive/circular_list_algorithms.hpp:153
 (inlined by) boost::intrusive::generic_hook<(boost::intrusive::algo_types)0, boost::intrusive::list_node_traits<void*>, boost::intrusive::member_tag, (boost::intrusive::link_mode_type)2, (boost::intrusive::base_hook_type)0>::unlink() at /vectorized/include/boost/intrusive/detail/generic_hook.hpp:214
 (inlined by) storage::segment_reader_handle::close() at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-0831432e5e658e55f-1/redpanda/redpanda/src/v/storage/segment_reader.cc:218
23.3.12-arm$ ~/Repos/redpanda/vbuild/debug/clang/v_deps_build/seastar-prefix/src/seastar/scripts/seastar-addr2line  -e ~/xfs/23.3.12-arm/libexec/redpanda 0x6dd385b   0x635aacf   0x2c6645f   0x7b5c36b   0x7b5ea1f   0x7b5cbf3   0x7a81e43   0x7a80997   0x2b56ca7   0x7df9163
[Backtrace #0]
boost::intrusive::list_node_traits<void*>::get_previous(boost::intrusive::list_node<void*>*) at /vectorized/include/boost/intrusive/detail/list_node.hpp:54
 (inlined by) boost::intrusive::circular_list_algorithms<boost::intrusive::list_node_traits<void*> >::unlink(boost::intrusive::list_node<void*>*) at /vectorized/include/boost/intrusive/circular_list_algorithms.hpp:153
 (inlined by) boost::intrusive::generic_hook<(boost::intrusive::algo_types)0, boost::intrusive::list_node_traits<void*>, boost::intrusive::member_tag, (boost::intrusive::link_mode_type)2, (boost::intrusive::base_hook_type)0>::unlink() at /vectorized/include/boost/intrusive/detail/generic_hook.hpp:214
 (inlined by) storage::segment_reader_handle::close() at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-0831432e5e658e55f-1/redpanda/redpanda/src/v/storage/segment_reader.cc:218
storage::continuous_batch_parser::close() at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-0831432e5e658e55f-1/redpanda/redpanda/src/v/storage/parser.h:114
 (inlined by) cloud_storage::remote_segment_batch_reader::stop() at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-0831432e5e658e55f-1/redpanda/redpanda/src/v/cloud_storage/remote_segment.cc:1483
23.3.12-arm$ ~/Repos/redpanda/vbuild/debug/clang/v_deps_build/seastar-prefix/src/seastar/scripts/seastar-addr2line  -e ~/xfs/23.3.12-arm/libexec/redpanda 0x7b61a43 0x7ba66f3 0x7e776ef
[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/util/backtrace.hh:64
 (inlined by) seastar::backtrace_buffer::append_backtrace() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:824
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:854
 (inlined by) seastar::print_with_backtrace(char const*, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:866
seastar::sigsegv_action() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4150
 (inlined by) seastar::install_oneshot_signal_handler<11, (void (*)())(&seastar::sigsegv_action)>()::{lambda(int, siginfo_t*, void*)#1}::operator()(int, siginfo_t*, void*) const at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4131
 (inlined by) seastar::install_oneshot_signal_handler<11, (void (*)())(&seastar::sigsegv_action)>()::{lambda(int, siginfo_t*, void*)#1}::__invoke(int, siginfo_t*, void*) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4127
$d.1462 at fetch.cc:?

From these, we can see:

  • the crash is coming from a fetch handler (not timequery, compaction, uploader, or any other users of segment_readers)
  • it appears to be caused by unsound lifecycle management of remote_segment_batch_reader
  • sometimes the crash happens in the destructor ~segment_reader_handle() rather than in close(), further pointing to a lifecycle bug

One suspicious piece of code, from

co_await _parser->close();
it looks like stop() is not thread-safe. Multiple concurrent calls to stop() may call _parser->close() on the same parser and it to null. It's unclear if this is the exact failure path, given reader eviction happens from a single fiber, and set_end_of_stream() appears to be called at most once in do_load_slice().

Another suspicious piece of code from

if (_seg_reader) {
_partition->evict_segment_reader(std::move(_seg_reader));
}
it looks like at any point, _segment_reader may be moved. In most places, we check if _segment_reader is a nullptr before using it, but it seems possible that e.g. here
ss::future<> set_end_of_stream() {
co_await _seg_reader->stop();
_seg_reader = {};
}
the scheduling point in the coroutine allows us to continue after the abort has wiped away _segment_reader.

JIRA Link: CORE-2750

@andrwng andrwng added kind/bug Something isn't working area/cloud-storage Shadow indexing subsystem sev/high loss of availability, pathological performance degradation, recoverable corruption labels May 2, 2024
andrwng added a commit to andrwng/redpanda that referenced this issue May 2, 2024
It's possible that when a client disconnects, the abort source
subscription associated with the client connection ends up evicting the
current segment reader.

1. Remote partition reader is in the process of reading a segment
2. The read eventually scans past the LSO, and stops. This stop() is
   asynchonous, closing IO streams, etc.
3. The Kafka client disconnects, triggering the abort source
   subscription callback to call evict_segment_reader(), moving the
   segment reader to the remote partition's eviction list
4. The remote partition's eviction loop sees this reader and runs
   stop(), and destructs the underlying parser
5. The read fiber, still stopping, resumes control, but the underlying
   parser is destructed

This is roughly the sequence of events seen below.

```
DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::raft_configuration
DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1398
DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1398-1398], current state: {{base offset/delta: {1397}/1221, map size: 1, last delta: 1221}}
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::archival_metadata
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1399
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1399-1400], current state: {{base offset/delta: {1397}/1221, map size: 2, last delta: 1222}}
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::archival_metadata
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1401
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1401-1401], current state: {{base offset/delta: {1397}/1221, map size: 3, last delta: 1224}}
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::tx_fence
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1402
DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1185 - [{{0.0.0.0:0}}] accept_batch_start stop parser because 1403 > 177(kafka offset)
DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber41664 kafka/topic/0] - remote_partition.cc:435 - No results, current rp offset: 1403, current kafka offset: 178, max rp offset: 177
DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1476 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop
DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1482 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop - parser-close
DEBUG 2024-05-02 14:55:06,234 [shard 2:main] cloud_storage - [fiber41664 kafka/topic/0] - remote_partition.cc:247 - abort requested via config.abort_source
DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber41665 defa47c0/kafka/topic/0_18762/1398-1405-846-134-v1.log.137] - segment_chunk_data_source.cc:48 - chunk data source destroyed
DEBUG 2024-05-02 14:55:06,235 [shard 2:main] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1476 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop
DEBUG 2024-05-02 14:55:06,235 [shard 2:main] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1482 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop - parser-close
ERROR 2024-05-02 14:55:06,235 [shard 2:fetc] assert - Assert failure: (/var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-009787d74859cb582-1/redpanda/redpanda/src/v/storage/segment_reader.cc:210) '_parent == nullptr' Must close before destroying
```

Note, the tx_fence is assigned a Kafka offset 178, but isn't reflected
by the LSO-1=177 calculation used as an upper bound for the log reader
config, hence the early return.

This commit fixes the issue by stopping the segment reader before stopping,
ensuring only one concurrent caller of stop().

This also adds test simulates a failure where a client disconnect racing with
the natural stopping of a segment reader led to a crash. Without the fix, and
adding some sleeps in remote_segment_batch_reader::stop(), this would crash
fairly consistently, albeit with slightly different backtraces than reported in
the issue.

Fixes redpanda-data#18213
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue May 3, 2024
It's possible that when a client disconnects, the abort source
subscription associated with the client connection ends up evicting the
current segment reader.

1. Remote partition reader is in the process of reading a segment
2. The read eventually scans past the LSO, and stops. This stop() is
   asynchonous, closing IO streams, etc.
3. The Kafka client disconnects, triggering the abort source
   subscription callback to call evict_segment_reader(), moving the
   segment reader to the remote partition's eviction list
4. The remote partition's eviction loop sees this reader and runs
   stop(), and destructs the underlying parser
5. The read fiber, still stopping, resumes control, but the underlying
   parser is destructed

This is roughly the sequence of events seen below.

```
DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::raft_configuration
DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1398
DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1398-1398], current state: {{base offset/delta: {1397}/1221, map size: 1, last delta: 1221}}
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::archival_metadata
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1399
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1399-1400], current state: {{base offset/delta: {1397}/1221, map size: 2, last delta: 1222}}
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::archival_metadata
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1401
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1401-1401], current state: {{base offset/delta: {1397}/1221, map size: 3, last delta: 1224}}
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::tx_fence
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1402
DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1185 - [{{0.0.0.0:0}}] accept_batch_start stop parser because 1403 > 177(kafka offset)
DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber41664 kafka/topic/0] - remote_partition.cc:435 - No results, current rp offset: 1403, current kafka offset: 178, max rp offset: 177
DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1476 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop
DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1482 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop - parser-close
DEBUG 2024-05-02 14:55:06,234 [shard 2:main] cloud_storage - [fiber41664 kafka/topic/0] - remote_partition.cc:247 - abort requested via config.abort_source
DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber41665 defa47c0/kafka/topic/0_18762/1398-1405-846-134-v1.log.137] - segment_chunk_data_source.cc:48 - chunk data source destroyed
DEBUG 2024-05-02 14:55:06,235 [shard 2:main] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1476 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop
DEBUG 2024-05-02 14:55:06,235 [shard 2:main] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1482 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop - parser-close
ERROR 2024-05-02 14:55:06,235 [shard 2:fetc] assert - Assert failure: (/var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-009787d74859cb582-1/redpanda/redpanda/src/v/storage/segment_reader.cc:210) '_parent == nullptr' Must close before destroying
```

Note, the tx_fence is assigned a Kafka offset 178, but isn't reflected
by the LSO-1=177 calculation used as an upper bound for the log reader
config, hence the early return.

This commit fixes the issue by stopping the segment reader before stopping,
ensuring only one concurrent caller of stop().

This also adds test simulates a failure where a client disconnect racing with
the natural stopping of a segment reader led to a crash. Without the fix, and
adding some sleeps in remote_segment_batch_reader::stop(), this would crash
fairly consistently, albeit with slightly different backtraces than reported in
the issue.

Fixes redpanda-data#18213

(cherry picked from commit 226c43e)
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue May 3, 2024
It's possible that when a client disconnects, the abort source
subscription associated with the client connection ends up evicting the
current segment reader.

1. Remote partition reader is in the process of reading a segment
2. The read eventually scans past the LSO, and stops. This stop() is
   asynchonous, closing IO streams, etc.
3. The Kafka client disconnects, triggering the abort source
   subscription callback to call evict_segment_reader(), moving the
   segment reader to the remote partition's eviction list
4. The remote partition's eviction loop sees this reader and runs
   stop(), and destructs the underlying parser
5. The read fiber, still stopping, resumes control, but the underlying
   parser is destructed

This is roughly the sequence of events seen below.

```
DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::raft_configuration
DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1398
DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1398-1398], current state: {{base offset/delta: {1397}/1221, map size: 1, last delta: 1221}}
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::archival_metadata
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1399
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1399-1400], current state: {{base offset/delta: {1397}/1221, map size: 2, last delta: 1222}}
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::archival_metadata
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1401
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1401-1401], current state: {{base offset/delta: {1397}/1221, map size: 3, last delta: 1224}}
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::tx_fence
DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1402
DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1185 - [{{0.0.0.0:0}}] accept_batch_start stop parser because 1403 > 177(kafka offset)
DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber41664 kafka/topic/0] - remote_partition.cc:435 - No results, current rp offset: 1403, current kafka offset: 178, max rp offset: 177
DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1476 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop
DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1482 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop - parser-close
DEBUG 2024-05-02 14:55:06,234 [shard 2:main] cloud_storage - [fiber41664 kafka/topic/0] - remote_partition.cc:247 - abort requested via config.abort_source
DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber41665 defa47c0/kafka/topic/0_18762/1398-1405-846-134-v1.log.137] - segment_chunk_data_source.cc:48 - chunk data source destroyed
DEBUG 2024-05-02 14:55:06,235 [shard 2:main] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1476 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop
DEBUG 2024-05-02 14:55:06,235 [shard 2:main] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1482 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop - parser-close
ERROR 2024-05-02 14:55:06,235 [shard 2:fetc] assert - Assert failure: (/var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-009787d74859cb582-1/redpanda/redpanda/src/v/storage/segment_reader.cc:210) '_parent == nullptr' Must close before destroying
```

Note, the tx_fence is assigned a Kafka offset 178, but isn't reflected
by the LSO-1=177 calculation used as an upper bound for the log reader
config, hence the early return.

This commit fixes the issue by stopping the segment reader before stopping,
ensuring only one concurrent caller of stop().

This also adds test simulates a failure where a client disconnect racing with
the natural stopping of a segment reader led to a crash. Without the fix, and
adding some sleeps in remote_segment_batch_reader::stop(), this would crash
fairly consistently, albeit with slightly different backtraces than reported in
the issue.

Fixes redpanda-data#18213

(cherry picked from commit 226c43e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem kind/bug Something isn't working sev/high loss of availability, pathological performance degradation, recoverable corruption
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant