Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The messaging_service::_clients list is not cleared on shutdown #14624

Closed
xemul opened this issue Jul 11, 2023 · 1 comment
Closed

The messaging_service::_clients list is not cleared on shutdown #14624

xemul opened this issue Jul 11, 2023 · 1 comment
Milestone

Comments

@xemul
Copy link
Contributor

xemul commented Jul 11, 2023

template <typename Fn>
requires std::is_invocable_r_v<bool, Fn, const messaging_service::shard_info&>
void messaging_service::find_and_remove_client(clients_map& clients, msg_addr id, Fn&& filter) {
    if (_shutting_down) {
        // if messaging service is in a processed of been stopped no need to
        // stop and remove connection here since they are being stopped already
        // and we'll just interfere
        return;
    }
    ...

this is to prevent rpc::client::stop() from being called twice (357c91a), but as a result we cannot trust scylla netw of scylla-gdb.py after nodetool drain :( Not nice

@mykaul mykaul added this to the 5.4 milestone Jul 11, 2023
xemul added a commit to xemul/scylla that referenced this issue Jul 11, 2023
When messaging_service shuts down it first sets _shutting_down to true
and proceeds with stopping clients and servers. Stopping clients, in
turn, is calling client.stop() on each.

Setting _shutting_down is used in two places.

First, when a client is stopped it may happen that it's in the middle of
some operation, which may result in call to remove_error_rpc_client()
and not to call .stop() for the second time it just does nothing if the
shutdown flag is set (see 357c91a).

Second, get_rpc_client() asserts that this flag is not set, so once
shutdown started it can make sure that it will call .stop() on _all_
clients and no new ones would appear in parallel.

However, after shutdown() is complete the _clients vector of maps
remains intact even though all clients from it are stopped. This is not
very debugging-friendly, the clients are better be removed on shutdown.

fixes: scylladb#14624

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
xemul added a commit to xemul/scylla that referenced this issue Jul 25, 2023
When messaging_service shuts down it first sets _shutting_down to true
and proceeds with stopping clients and servers. Stopping clients, in
turn, is calling client.stop() on each.

Setting _shutting_down is used in two places.

First, when a client is stopped it may happen that it's in the middle of
some operation, which may result in call to remove_error_rpc_client()
and not to call .stop() for the second time it just does nothing if the
shutdown flag is set (see 357c91a).

Second, get_rpc_client() asserts that this flag is not set, so once
shutdown started it can make sure that it will call .stop() on _all_
clients and no new ones would appear in parallel.

However, after shutdown() is complete the _clients vector of maps
remains intact even though all clients from it are stopped. This is not
very debugging-friendly, the clients are better be removed on shutdown.

fixes: scylladb#14624

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
patjed41 pushed a commit to patjed41/scylladb that referenced this issue Jul 27, 2023
When messaging_service shuts down it first sets _shutting_down to true
and proceeds with stopping clients and servers. Stopping clients, in
turn, is calling client.stop() on each.

Setting _shutting_down is used in two places.

First, when a client is stopped it may happen that it's in the middle of
some operation, which may result in call to remove_error_rpc_client()
and not to call .stop() for the second time it just does nothing if the
shutdown flag is set (see 357c91a).

Second, get_rpc_client() asserts that this flag is not set, so once
shutdown started it can make sure that it will call .stop() on _all_
clients and no new ones would appear in parallel.

However, after shutdown() is complete the _clients vector of maps
remains intact even though all clients from it are stopped. This is not
very debugging-friendly, the clients are better be removed on shutdown.

fixes: scylladb#14624

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb#14632
@avikivity
Copy link
Member

No user impact, not backporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants