-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
~timer crashed during shutdown in simple_add_new_node_while_schema_changes_test: Assertion `!hook.is_linked()' failed. #5999
Comments
Destroying a timer not unlisted from reactor lists. |
Aborted start. The join_token_ring -> bootstrap resulted in exception after set_mode(mode::JOINING, "Starting to bootstrap...", true);, then the unroll began. From logs:
|
The timer is the one one expiring_fifo. The latter is used by reader_concurrency_semaphore and logalloc::region_group from scylla and shared_future from seastar |
Crash is in shared_future::resolve -> expiring_fifo::pop_front -> unique_ptr::reset -> ... -> timer::~delete |
#7 seastar::timerseastar::lowres_clock::~timer (this=0x6000000c83a8, __in_chrg=) at ./seastar/src/core/reactor.cc:589 |
#17 seastar::future<>::then_wrapped_common<false, void, seastar::shared_future<>::shared_state::get_future(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(seastar::future<>&&)#1}>(seastar::shared_future<>::shared_state::get_future(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(seastar::future<>&&)#1}&&)::{lambda()#1}::operator()() const::{lambda(seastar::future_state<>&&)#1}::operator()(seastar::future_state) (state=..., this=0x6000019719b8) at ./seastar/include/seastar/core/future.hh:1270 |
The entry being deleted is the container of the timer:
timer is 0x6000000c83a8. Its offset in expiring_fifo::entry is
so the entry is
|
Timer's list is
this is why it thinks it's is_linked(). |
mm notifier is already stopped, but
|
[PATCH v2] migration_manager: Run background schema merge in gate |
The call for merge_schema_from in some cases is run in the background and thus is not aborted/waited on shutdown. This may result in use-after-free one of which is merge_schema_from -> read_schema_for_keyspace -> db::system_keyspace::query -> storage_proxy::query -> query_partition_key_range_concurrent in the latter function the proxy._token_metadata is accessed, while the respective object can be already free (unlike the storage_proxy itself that's still leaked on shutdown). Related bug: #5903, #5999 (cannot reproduce though) Tests: unit(dev), manual start-stop dtest(consistency.TestConsistency, dev) dtest(schema_management, dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Reviewed-by: Pekka Enberg <penberg@scylladb.com> Message-Id: <20200316150348.31118-1-xemul@scylladb.com>
14de126 is on master. |
…erge The definitions_update() verb captures a shared_ptr to storage_proxy to keep it alive while the background task executes. This was introduced in (2016!): commit 1429213 Author: Pekka Enberg <penberg@scylladb.com> Date: Mon Mar 14 17:57:08 2016 +0200 main: Defer migration manager RPC verb registration after commitlog replay Defer registering migration manager RPC verbs after commitlog has has been replayed so that our own schema is fully loaded before other other nodes start querying it or sending schema updates. Message-Id: <1457971028-7325-1-git-send-email-penberg@scylladb.com> when moving this code from storage_proxy.cc. Later, better protection with a gate was added: commit 14de126 Author: Pavel Emelyanov <xemul@scylladb.com> Date: Mon Mar 16 18:03:48 2020 +0300 migration_manager: Run background schema merge in gate The call for merge_schema_from in some cases is run in the background and thus is not aborted/waited on shutdown. This may result in use-after-free one of which is merge_schema_from -> read_schema_for_keyspace -> db::system_keyspace::query -> storage_proxy::query -> query_partition_key_range_concurrent in the latter function the proxy._token_metadata is accessed, while the respective object can be already free (unlike the storage_proxy itself that's still leaked on shutdown). Related bug: scylladb#5903, scylladb#5999 (cannot reproduce though) Tests: unit(dev), manual start-stop dtest(consistency.TestConsistency, dev) dtest(schema_management, dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Reviewed-by: Pekka Enberg <penberg@scylladb.com> Message-Id: <20200316150348.31118-1-xemul@scylladb.com> Since now the task execution is protected by the gate and therefore migration_manager lifetime (which is contained within that of storage_proxy, as it is constructed afterwards), capturing the shared_ptr is not needed, and we therefore remove it, as it uses the deprecated global storage_proxy accessors.
Seen in https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-release/417/artifact/logs-release.2/1583810078409_update_cluster_layout_tests.TestUpdateClusterLayout.simple_add_new_node_while_schema_changes_test/node4.log
Decoded bracktrace:
Core dump here: https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-release/417/artifact/logs-release.2/1583810078409_update_cluster_layout_tests.TestUpdateClusterLayout.simple_add_new_node_while_schema_changes_test/node4-scylla.45929.1583809469.core.gz
Reloc package here: https://jenkins.scylladb.com/view/master/job/scylla-master/job/build/501/artifact/scylla/build/release/scylla-package.tar.gz
And on s3: http://downloads.scylladb.com/relocatable/unstable/master/2020-03-09T21:59:53Z/scylla-package.tar.gz
The text was updated successfully, but these errors were encountered: