-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Split rpc::server stop into two parts #1671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split rpc::server stop into two parts #1671
Conversation
src/rpc/rpc.cc
Outdated
| _servers.erase(*_options.streaming_domain); | ||
| } | ||
| return when_all(_ss_stopped.get_future(), | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you still wait for connections to top here. From the commit message I was expecting to see void shutdown() that only kicks the process and all the waiting happens in stop().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the same, but then changed my mind. The main goal here is to make messaging_service::shutdown() stop doing networking communications, just shutting down all sockets (with a syscall) doesn't look sufficient, it should also wait for packets to stop traveling and that's what connection::stop() does. The server::stop() is to destroy the server completely.
I admit that some parts of connection::stop() are better not to be waited in shutdown. If there are some then we'd need future<> connection::shutdown() too, but, again, so far I don't see how it will differ from connection::stop().
Does this make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to understand what guaranties you need from the shutdown(). If you want to make sure that no rpc handlers issued on this connection are running any longer connection->stop() is not enough. What it guaranties is that no new rpcs will be admitted (but they are may be some running already) and no replies will be sent to already admitted once. Is this what you want? What is the use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to make sure that no rpc handlers issued on this connection are running any longer connection->stop() is not enough.
No, that's not the goal here.
What it guaranties is that no new rpcs will be admitted (but they are may be some running already) and no replies will be sent to already admitted once. Is this what you want?
Yup, that's what I need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What it guaranties is that no new rpcs will be admitted (but they are may be some running already) and no replies will be sent to already admitted once. Is this what you want?
Yup, that's what I need.
Then I can confirm it is what you get with this implementation :) But what is the use case may I ask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I can confirm it is what you get with this implementation :)
🍾
But what is the use case may I ask?
Sure! The use-case is described in scylladb/scylladb#14031 . The thing is that storage_service::do_drain() eventually calls messaging_service::shutdown() which is a bit heavyweight -- it, among other things, waits for all RPC handlers to stop running with the help of _reply_gate (I added debug for it in #1619 , you're welcome to review and commit comment it too). I want to relax messaging_service::shutdown() to only isolate the service from the networking -- i.e. call this PR's rpc::server::shutdown() -- and do full stopping later, with rpc::server::stop(). It's going to look like this -- https://github.com/xemul/scylla/tree/br-messaging-service-split-drain-and-stop (no PR for this branch yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But should isolation also makes sure that no connection are accepted any longer? I think there is a bug in the current implementation that spilled into this patch as well. Before we stop all connections we need to wait for accept fiber to complete otherwise it may add more connections later. Current code waits for accept in parallel with connection closing and after your patch it does it much later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... The accept looks like
_ss.accept().then([] (auto sk) {
... // no co_await-s here
_conns.emplace(sk);
});
so if accept() wasn't aborted and resolved there's no way shutdown could start killing connections before this new one gets emplaced into _conns. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it can smp::submit_to connections between shards...
| server(protocol_base* proto, server_options opts, server_socket, resource_limits memory_limit = resource_limits()); | ||
| void accept(); | ||
| future<> shutdown(); | ||
| future<> stop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add documentation. Yes, rpc is undocumented and the policy is to only update documented interfaces, but stop() and shutdown() are too similar and must be distinguished by an explanation.
|
|
||
| future<> server::stop() { | ||
| return when_all(_ss_stopped.get_future(), | ||
| shutdown(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not have two copies of shutdown() running? Or is the user required to wait for shutdown first (sensible)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter -- user is required to wait for shutdown first. I'll mention it in the doc comment
|
|
||
| future<> server::stop() { | ||
| return when_all(_ss_stopped.get_future(), | ||
| shutdown(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not have two copies of shutdown() running? Or is the user required to wait for shutdown first (sensible)?
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
After _ss.abort_accept() is called two futures are spawned in parallel -- wait for the accepting fiber to exit and connections stopping in parallel. It looks racy, while iterating over connections list accepting socket may still have chance to push more stuff into it through it still active continuations. The fix is in waiting for the accepting fiber _before_ stopping the connections. Formatting is deliberately left broken for the ease of review. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Split the rpc::server::stop() into two phases. First, the shutdown() one that closes accepting socket and stops all the active connections. Second, the stop() that waits for all of the above to wrap up. The shutdown() stage can be called first, if not stop() would call it on its own. Formatting is deliberately left broken for the ease of review. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
1a3b58d to
b4b3331
Compare
|
upd:
|
This is to give users the ability to isolate the node from the network without waiting for the internal RPC activity to wrap up. When shutdown()-ed the rpc::server aborts all the connections and accepting socket and returns.