-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connections cannot be re-established after network loss/recovery #3266
Comments
I'm not an expert on these issues, but how gRPC Java's own KeepAliveManager works is that it shuts down the connection if the server does not respond to the keep-alive ping within the configured time limit. I believe you would want to do something similar in your case when you get deadline exceeded. Otherwise, your outgoing RPCs will continue to use the broken connection. Until you see the broken pipe exception, the socket used by Netty/gRPC remains in a usable state as far as your side of the connection is aware. You'll eventually see the broken pipe exception, but this can take a while and your best bet is to tear down the connection as soon as you see the deadline exceeded error (or consider using gRPC Java's built-in keep-alives, if they would satisfy your use case) and attempt to reconnect. Reconnecting will fail if the network is still down, and you can either set wait-for-ready on the stub or call options (e.g.,
|
Thanks @ericgribkoff! You're right, making a new channel each time works. I learned a few things from poking around there:
So, I might have to keep my hacky manual reconnect code until #2292 is fixed, but it will be great to get rid of it when I can. Thanks again! |
FWIW I started using KeepAliveManager, but am seeing pings sent too often, and so consistently get "more calm" errors. Filed more details as #3274. |
Please answer these questions before submitting your issue.
What version of gRPC are you using?
1.5.0
What JVM are you using (
java -version
)?java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
What did you do?
Ran a grpc-java client program with netty that uses application-level pings to a grpc-java server. Ran the client in a loop, it will ping, sleep, ping, sleep. If I disconnect the network, I get deadline exceeded (good), but if I reconnect the network, I continue to get deadline exceeded messages.
If possible, provide a recipe for reproducing the error.
What did you expect to see?
For new connections work successfully after the network was restored.
FWIW, while debugging the issue, I paused the ClientCalls thread and poked around for awhile, e.g. ~5-10 minutes. I didn't really find anything, but when I hit "resume", I saw a broken pipe exception (which I don't usually see, usually it's just the deadline exceeded), and then the connection started working. E.g. I don't want to lead you astray, but it seems like until this pipe was broken, the connection was not fully getting restarted.
Understood this may not be a grpc-java issue, but some underlying netty or even inherent TCP issue that I just don't understand.
The text was updated successfully, but these errors were encountered: