-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load balancing: support creating additional connections when MAX_CONCURRENT_STREAM is reached #21386
Comments
FYI @ola-rozenfeld @buchgr - In relation to #11704, I suppose if you run Bazel on GCE (where it doesn't connect to IPv6 addresses), you will also remote execution parallelism being constrained to a around ~200 concurrent requests. |
This would have to be a feature provided by all languages, not just Go. cc @ejona @markdroth |
I expect #7957 is really the appropriate tracking issue for this. Generally when you want multiple connections there is a proxy involved. We've considered this some in the past, but there's some complexities. In particular, when MAX_CONCURRENT_STREAMS is used when a backend is overloaded (like C core can do) it is unclear what actions are appropriate. Creating more connections in that case is clearly a bad idea. Sending traffic to other backends when using round_robin may be fine, but there's also risk there. For a case going through a proxy, using multiple connections is good and proper, but the client isn't aware of the server's topology. Although there should also clearly be a limit. It is very normal for clients to have bursty workloads, creating 1000s or 10s of thousands of RPCs all-at-once. |
Oh, and to be clear, the solution to date from my perspective is "create multiple Channels." Although I also understand that is much easier to do in Java than many other languages, since you can hide the behavior behind the Channel interface and use stubs like normal (instead of needing to round-robin across multiple stubs). |
This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 30 days. It will be closed automatically if no further update occurs in 7 day. Thank you for your contributions! |
@ejona86 Do you know of any public examples of someone doing this with the Java API? I'd also love to hear about examples in the other languages if you know of any. Edit: |
@ajmath, there's also https://github.com/googleapis/gax-java/blob/main/gax-grpc/src/main/java/com/google/api/gax/grpc/ChannelPool.java (ignore the "refreshing" piece of it). Not that if you choose to implement just Channel instead of ManagedChannel, then it becomes even easier. |
@patrickfreed We do have a design proposal that was mostly agreed upon internally, but the effort to implement it was deprioritized since our primary use case no longer required it. Long-term it is still something we'd like to do. @ejona86 @markdroth what do you think about the priority of this vs. our other projects in the next few months? Should I write up a gRFC for the design even if we don't have resources assigned to implement it? |
gRFC without implementation resources doesn't seem all that helpful. Maybe just externalize the internal doc (copy the contents to your personal account and share it)? That lets people determine if the solution would help them, gives them an idea of the work involved, and would be a guide for any would-be contributors that want to step up. |
Didn't we say that the internal design we had been evaluating wasn't going to work in the xDS case? If so, I don't think we'd actually want to pursue that anyway. |
Either one would be super helpful and much appreciated.
I think this feature is still very useful for the L4 load balancer case (#7957). |
I do understand that, but we don't want to have to support two mechanisms for the same thing. Ultimately, we will need a mechanism that solves this problem for the xDS case, so if we introduce this mechanism now for the L4 case and then need to introduce a separate mechanism later for the xDS case, then we're stuck supporting two mechanisms for the same thing. I would prefer not to introduce a mechanism for this until we know that it will solve all the use cases that we care about. |
I think it would still be valuable if the internal doc were released publicly, since it's possible the community could help brainstorm ideas for a solution that accommodates the xDS case too. |
Any updates on this? |
we're seeing the same issue in python as well |
bump for visibility - really interested in this and found this issue from the docs page
|
What version of gRPC and what language are you using?
1.24.3
What operating system (Linux, Windows,...) and version?
Linux - Ubuntu 14.04 (GCE instance)
What runtime / compiler are you using (e.g. python version or version of gcc)
Go - 1.13
What did you do?
If possible, provide a recipe for reproducing the error. Try being specific and include code snippets if helpful.
What did you expect to see?
All the 500 rpc's execute in parallel
What did you see instead?
Only 100 rpc's run concurrently, the remaining 400 block until the first 100 finishes running.
Anything else we should know about your project / environment?
Immediate connect fail for 2a00:1450:400c:c06::93: Network is unreachable
error when we try to connect to IPv6 addresses.Also this issue is blocking our Go client (which we are writing for our API) from achieving high parallelism, which makes the client less useful to our customers.
The text was updated successfully, but these errors were encountered: