-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruby client deadline greatly exceeded with unavailable hosts #14685
Comments
Hey guys, I might be able to shed some light on this issue! In late June I opened issue #15889 which similarly to this issue involved rpc calls not respecting the set deadlines and blocking for large periods of time. All the information about the bug and it's eventual solution can be found in the now closed issue linked. The result of all this was proposal A18. Initially seeing my original reproduction case as non-reproducible with the nightly builds after PR #16419 was merged was great. We have however since uncovered another edge case where by passed rpc call deadlines are not respected which I believe is the exact same issue the OP is experiencing here. What we're currently experiencing is while connectivity is broken (iptables rule is active from the original repro case on #15889) the rpc calls timeout as expected but if you wait for the first re-connection (approx. What we've discovered is that, in our case this blocking consistently lasts exactly 127s at which point a DEADLINE_EXCEEDED exception is finally raised. We thought this was quite suspicious and we rather quickly suspected that the max
So we re-tested our repro case with varying values for Low and behold the deadline on rpc calls while grpc was attempting a reconnect directly correlated to the max expected connection timeout determined by the current value of
All of these timings were consistent between multiple iterations of the test. Our solution to this was to set the |
@AspirinSJL Do you know from the top of your head why the deadline on the RPC might be getting delayed? |
For the subchannel part, I can't think of anything that can delay the RPC deadline. At first glance, I thought #17010 might be related. |
This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 180 days. It will be closed automatically if no further update occurs in 1 day. Thank you for your contributions! |
@yashykt Do you think this issue can be solved by something similar to https://github.com/grpc/proposal/blob/master/A18-tcp-user-timeout.md as analyzed and suggested in #14685 (comment)? |
Before providing any such option, I would like more information as to why gRPC is not canceling the RPC immediately. I believe we have timers on a connect call and we should not be waiting that long. |
This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 180 days. It will be closed automatically if no further update occurs in 1 day. Thank you for your contributions! |
What version of gRPC and what language are you using?
grpc 1.10/ ruby
What operating system (Linux, Windows, …) and version?
Linux
Debian 3.16.39-1+deb8u1~bpo70+1 (2017-02-24)
What runtime / compiler are you using (e.g. python version or version of gcc)
Ruby 2.4/
gcc version 4.7.2
What did you do?
In an application, it was noticed that we would sometimes have requests that went toward unavailable hosts far exceed the specified deadline, roughly 127 seconds instead of 1. This is important for predictable performance when GRPC services are unavailable. I deduced that this condition was relatively repeatable and I used
gdb
to attach to the process during one of these extended "hands" and upon getting the contents found that it was always ingpr_cv_wait
similar to the issue described here.What did you expect to see?
That the grpc client either fails with
Unavailable
when a host/port is not bound or at least always respects a 200ms deadline.What did you see instead?
Wanting to see if this was our application code, I looked at the Ruby examples and modified the simple file there as this gist - strangely enough this same 'hang' would happen every 100th iteration and every 200th iteration if the
deadline
was cut in half. It also seems to not happen if the stub is created each time and everything fails almost immediately with a more expectedGRPC::Unavailable
iflocalhost
is used instead of a remote one.I tried setting
GRPC_TRACE=all
andGRPC_VERBOSITY=DEBUG
to get more information and in an additional gist I have attached that output.Anything else we should know about your project / environment?
I tried running this same script on my El Capitan MacBook Pro and was unable to reproduce the behavior, so I am wondering if this an issue with the settings under which the extension is compiled when the gem is installed.
The text was updated successfully, but these errors were encountered: