Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stack-use-after-scope in topology_change_coordinator_fiber #13522

Closed
tgrabiec opened this issue Apr 14, 2023 · 7 comments
Closed

stack-use-after-scope in topology_change_coordinator_fiber #13522

tgrabiec opened this issue Apr 14, 2023 · 7 comments
Assignees
Milestone

Comments

@tgrabiec
Copy link
Contributor

Seen in https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/607/artifact/testlog/x86_64/debug/boost.tablets_test.test_tablet_metadata_persistence.1.log

INFO  2023-04-13 16:42:01,887 [shard 0] storage_service - raft_state_monitor_fiber aborted with raft::stopped_error (Raft instance is stopped)
=================================================================
==13235==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fd75e75bdc8 at pc 0x00000db39ae4 bp 0x7fd7633d1690 sp 0x7fd7633d1688
READ of size 8 at 0x7fd75e75bdc8 thread T1
    #0 0xdb39ae3 in service::raft_group0::client() ./service/raft/raft_group0.hh:221:16
    #1 0xdb39ae3 in service::storage_service::topology_change_coordinator_fiber(raft::server&, raft::internal::tagged_uint64<raft::term_tag>, seastar::sharded<db::system_distributed_keyspace>&, seastar::abort_source&)::$_3::operator()(seastar::abort_source&) const service/storage_service.cc:823:40
    #2 0xdc5a16d in service::storage_service::topology_change_coordinator_fiber(raft::server&, raft::internal::tagged_uint64<raft::term_tag>, seastar::sharded<db::system_distributed_keyspace>&, seastar::abort_source&) (.resume) service/storage_service.cc:889:34
    #3 0xc11d0e9 in std::__n4861::coroutine_handle<void>::resume() const /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/coroutine:135:29
    #4 0xc11d0e9 in seastar::condition_variable::awaiter::run_and_dispose() ./seastar/include/seastar/core/condition-variable.hh:119:25
    #5 0xc11d0e9 in non-virtual thunk to seastar::condition_variable::awaiter::run_and_dispose() ./seastar/include/seastar/core/condition-variable.hh
    #6 0x7fd76ea1a108 in seastar::reactor::run_tasks(seastar::reactor::task_queue&) build/debug/seastar/./seastar/src/core/reactor.cc:2529:14
    #7 0x7fd76ea2159e in seastar::reactor::run_some_tasks() build/debug/seastar/./seastar/src/core/reactor.cc:2966:9
    #8 0x7fd76ea255e0 in seastar::reactor::do_run() build/debug/seastar/./seastar/src/core/reactor.cc:3135:9
    #9 0x7fd76ea23206 in seastar::reactor::run() build/debug/seastar/./seastar/src/core/reactor.cc:3018:16
    #10 0x7fd76e5a9e00 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) build/debug/seastar/./seastar/src/core/app-template.cc:265:31
    #11 0x7fd76e5a769a in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) build/debug/seastar/./seastar/src/core/app-template.cc:156:12
    #12 0x7fd769288dfe in seastar::testing::test_runner::start_thread(int, char**)::$_0::operator()() build/debug/seastar/./seastar/src/testing/test_runner.cc:75:26
    #13 0x7fd769288710 in void std::__invoke_impl<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>(std::__invoke_other, seastar::testing::test_runner::start_thread(int, char**)::$_0&) /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61:14
    #14 0x7fd769288600 in std::enable_if<is_invocable_r_v<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>, void>::type std::__invoke_r<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>(seastar::testing::test_runner::start_thread(int, char**)::$_0&) /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:111:2
    #15 0x7fd76928806c in std::_Function_handler<void (), seastar::testing::test_runner::start_thread(int, char**)::$_0>::_M_invoke(std::_Any_data const&) /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290:9
    #16 0x7fd76e7f2b48 in seastar::posix_thread::start_routine(void*) build/debug/seastar/./seastar/src/core/posix.cc:73:5
    #17 0x7fd7686da12c in start_thread (/lib64/libc.so.6+0x8b12c) (BuildId: 81daba31ee66dbd63efdc4252a872949d874d136)
    #18 0x7fd76875bbbf in __GI___clone3 (/lib64/libc.so.6+0x10cbbf) (BuildId: 81daba31ee66dbd63efdc4252a872949d874d136)

Address 0x7fd75e75bdc8 is a wild pointer inside of access range of size 0x000000000008.
SUMMARY: AddressSanitizer: stack-use-after-scope ./service/raft/raft_group0.hh:221:16 in service::raft_group0::client()
Shadow bytes around the buggy address:
  0x0ffb6bce3760: f8 f2 f2 f2 f2 f2 f8 f8 f2 f2 f8 f2 f2 f2 f8 f2
  0x0ffb6bce3770: f2 f2 f8 f2 f2 f2 f8 f2 f2 f2 f8 f2 f2 f2 f8 f2
  0x0ffb6bce3780: f2 f2 f8 f2 f2 f2 f8 f2 f2 f2 f8 f8 f2 f2 f8 f8
  0x0ffb6bce3790: f8 f2 f2 f2 f2 f2 f8 f8 f2 f2 f8 f2 f2 f2 f8 f8
  0x0ffb6bce37a0: f2 f2 f8 f2 f2 f2 f8 f8 f2 f2 f8 f8 f8 f8 f8 f8
=>0x0ffb6bce37b0: f8 f8 f8 f8 f8 f8 f8 f8 f8[f8]f8 f8 f8 f8 f8 f8
  0x0ffb6bce37c0: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x0ffb6bce37d0: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x0ffb6bce37e0: f8 f8 f8 f2 f2 f2 f2 f2 f2 f2 f2 f2 f8 f8 f2 f2
  0x0ffb6bce37f0: f8 f8 f2 f2 f8 f8 f2 f2 00 00 00 00 00 f2 f2 f2
  0x0ffb6bce3800: f2 f2 f8 f8 f8 f2 f2 f2 f2 f2 f8 f2 f2 f2 f8 f2
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
Thread T1 created by T0 here:
    #0 0xa8ea956 in pthread_create (/jenkins/workspace/scylla-master/scylla-ci/scylla/build/debug/test/boost/tablets_test+0xa8ea956) (BuildId: 5913761ec2ce9cab686fb6f1103b4255fe36ac52)
    #1 0x7fd76e7f322d in seastar::posix_thread::posix_thread(seastar::posix_thread::attr, std::function<void ()>) build/debug/seastar/./seastar/src/core/posix.cc:108:9
    #2 0x7fd76e7f2d2e in seastar::posix_thread::posix_thread(std::function<void ()>) build/debug/seastar/./seastar/src/core/posix.cc:78:7
    #3 0x7fd7692870a4 in std::__detail::_MakeUniq<seastar::posix_thread>::__single_object std::make_unique<seastar::posix_thread, seastar::testing::test_runner::start_thread(int, char**)::$_0>(seastar::testing::test_runner::start_thread(int, char**)::$_0&&) /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/unique_ptr.h:1065:34
    #4 0x7fd769286c0f in seastar::testing::test_runner::start_thread(int, char**) build/debug/seastar/./seastar/src/testing/test_runner.cc:68:15
    #5 0x7fd76928760d in seastar::testing::test_runner::run_sync(std::function<seastar::future<void> ()>) build/debug/seastar/./seastar/src/testing/test_runner.cc:122:9
    #6 0x7fd76926e9fa in seastar::testing::seastar_test::run() build/debug/seastar/./seastar/src/testing/seastar_test.cc:50:26
    #7 0x7fd769272b6f in seastar::testing::seastar_test::seastar_test(char const*, char const*, int, boost::unit_test::decorator::collector_t&)::$_3::operator()() const build/debug/seastar/./seastar/src/testing/seastar_test.cc:61:59
    #8 0x7fd769271b68 in boost::detail::function::void_function_obj_invoker0<seastar::testing::seastar_test::seastar_test(char const*, char const*, int, boost::unit_test::decorator::collector_t&)::$_3, void>::invoke(boost::detail::function::function_buffer&) /usr/include/boost/function/function_template.hpp:158:11
    #9 0x7fd7693b36d1  (/lib64/libboost_unit_test_framework.so.1.78.0+0x236d1) (BuildId: a36d99071d10c86578feab1480b17b8842ebf327)

==13235==ABORTING
@gleb-cloudius
Copy link
Contributor

@tgrabiec this is not regular Scylla process. This is a single process cql_env test environment. Right? Where is the code?

@avikivity
Copy link
Member

It's tablets_test from #13387.

@gleb-cloudius
Copy link
Contributor

gleb-cloudius commented Apr 17, 2023

The code looks something like that:

sharded<service::storage_service> ss;
ss.start()
auto stop_storage_service = defer([&ss] { ss.stop().get(); });

service::raft_group0 group0_service{ss.local()}

group0_service.start().get();
auto stop_group0_service = defer([&group0_service] {
      group0_service.abort().get();
 });

ss.local().join_cluster(group0_service).get();

So storage_service uses group0_service, but it is stopped after group0_service is already destroyed. And since there is cyclic dependency (group0_service depends on the storage_service) we cannot move group0_service creation above storage_service. The cycle has to be broken somehow.

@kbr-scylla
Copy link
Contributor

And since there is cyclic dependency (group0_service depends on the storage_service)

Maybe we should move the topology coordinator stuff out of storage_service.

@gleb-cloudius
Copy link
Contributor

And since there is cyclic dependency (group0_service depends on the storage_service)

Maybe we should move the topology coordinator stuff out of storage_service.

Not sure how it will help. They have a lot of dependencies between them.

@avikivity
Copy link
Member

@gleb-cloudius does this need to be backported? if so please prepare a backport.

@gleb-cloudius
Copy link
Contributor

@gleb-cloudius does this need to be backported? if so please prepare a backport.

There is no released version with the topology change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants