-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
close request receiving channel on errors immediately #416
close request receiving channel on errors immediately #416
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
I signed the CLA but don't know how to rerun the CLA job. |
Clicking "rescan" gives me a "400. That’s an error. That’s all we know." |
tarpc/src/client.rs
Outdated
@@ -352,6 +352,7 @@ where | |||
let _entered = span.enter(); | |||
tracing::info!("ReceiveError"); | |||
} | |||
self.pending_requests_mut().close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing that's confusing me is, pending requests should close once the request dispatch is dropped. Can you show me your code that polls the request dispatch? What does the code do after RequestDispatch::poll
returns an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the documentation of tokio
it is possible to have messages in the channel after dropping the Receiver
. This does not happen on each run but maybe once in 10 or 100 times. My test currently produces the following log.
2024-01-23T22:14:29.394867Z INFO tarpc::client: dropping request dispatch
2024-01-23T22:14:29.394871Z INFO RPC{rpc.deadline=2024-01-23T22:14:39.394769825Z otel.kind="client" otel.name="TarpcCspVault.idkg_gen_dealing_encryption_key_pair" rpc.trace_id=00}: tarpc::client: sending request to dispatch
2024-01-23T22:14:29.394973Z TRACE tarpc::server: Expired requests: Closed, Inbound: Closed
2024-01-23T22:14:29.395005Z TRACE tarpc::server: poll_flush
2024-01-23T22:14:29.395019Z TRACE tokio_util::codec::framed_impl: flushing framed transport
2024-01-23T22:14:29.395032Z TRACE tokio_util::codec::framed_impl: framed transport flushed
2024-01-23T22:14:29.395025Z WARN tarpc::client: Connection broken: could not read from the transport
Caused by:
frame size too big
2024-01-23T22:14:29.395086Z INFO RPC{rpc.deadline=2024-01-23T22:14:39.394769825Z otel.kind="client" otel.name="TarpcCspVault.idkg_gen_dealing_encryption_key_pair" rpc.trace_id=00}: tarpc::client: sending request to dispatch done; waiting for response
test rpc_connection::should_unfortunately_be_dead_after_response_from_server_cannot_be_received_by_client_because_too_large has been running for over 60 seconds
Our code makes use of the tarpc
Client
as a black box by calling its generated methods using tokio
's block_on
. Here's an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Will review your code and get back to you.
(It shouldn't be a problem if there are messages in the channel when the channel is dropped — the messages in the channel contain the sender side of oneshot channels used to send individual RPC responses. When the channel is dropped, the senders are also dropped, and that should cause the receiver side to receive an error.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
I'm not sure if the channel is actually dropped. The only thing that is dropped (IIUC) is the receiver part of the dispatch channel and some messages from the channel. But the channel itself, again IIUC, is just a synchronized queue that is shared by both sender and receiver. The sender part is not dropped until we drop the client, right, it only changes to an errored state to prevent sending further messages to the channel? My hypothesis so far was that:
- A dispatch request (which contains the sender part of the response channel) is sent to the request dispatcher.
- The caller waits on the receiver end of the response channel.
- The dispatcher receiver is being dropped but the channel is not properly cleared, so the dispatch request along with the response sender handle is not dropped.
- Client waits indefinitely.
WDYT @tikue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's really interesting and makes a lot of sense, thanks!
tarpc/src/client.rs
Outdated
@@ -352,6 +352,7 @@ where | |||
let _entered = span.enter(); | |||
tracing::info!("ReceiveError"); | |||
} | |||
self.pending_requests_mut().close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there could be pending requests after closing the receiver. Maybe those requests should be completed as well, similar to what's done with in_flight_requests (immediately above)? Something like this: https://github.com/google/tarpc/blob/876fd4724b4ff051d8631be8bbc892023bd98e71/tarpc/src/client/in_flight_requests.rs#L102C19-L102C69
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed some commits, PTAL.
// channel is open or a spurious failure | ||
Poll::Pending => continue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, acually, the channel returns Pending when it's empty, I think. https://docs.rs/tokio/latest/tokio/sync/mpsc/struct.Receiver.html#method.poll_recv
So this loop can break on Poll::Pending | Poll::Ready(None)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see now why you did it this way — basically, the receiver is never supposed to return Poll::Pending
after closure. Interesting — I guess this is good then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think there's some subtlety here: reading the docs for close, it looks like outstanding Permits are still allowed to send into the channel. But there's no guarantee when the send will happen, so this loop is actually a blocking call. Since this code path is nonblocking, I think that's a problem we have to solve.
What about spawning a task that drains the channel?
On a separate note, I think there are other errors that could occur besides the error in pump_read
, and I hadn't made any attempt to drain the requests for those paths. And likewise, errors in other areas (like in pump_write) will not result in the receiver being drained. What if all of this draining was moved into a drop impl for RequestDispatch?
(BTW, if this is more than you want to sign up for, I'm happy to take a look at fixing this sometime in the next couple of weeks)
Obsoleted by #423. Thanks so much for identifying this problem as well as the solution! |
Fixes the problem described in #415.