load balancing: support creating additional connections when MAX_CONCURRENT_STREAM is reached #21386

gkousik · 2019-12-05T03:02:45Z

What version of gRPC and what language are you using?

1.24.3

What operating system (Linux, Windows,...) and version?

Linux - Ubuntu 14.04 (GCE instance)

What runtime / compiler are you using (e.g. python version or version of gcc)

Go - 1.13

What did you do?

If possible, provide a recipe for reproducing the error. Try being specific and include code snippets if helpful.

Connect to remotebuildexecution endpoint (remotebuildexecution.googleapis.com:443) using github.com/bazelbuild/remote-apis-sdks (https://github.com/bazelbuild/remote-apis-sdks/blob/master/go/pkg/client/client.go#L185)
Make ~500 concurrent requests to Execute() streaming API with "sleep 45" command using a single connection (as seen in the code for Dial() linked in (1) ). API proto - https://github.com/bazelbuild/remote-apis/blob/master/build/bazel/remote/execution/v2/remote_execution.proto#L106 .

What did you expect to see?

All the 500 rpc's execute in parallel

What did you see instead?

Only 100 rpc's run concurrently, the remaining 400 block until the first 100 finishes running.

Anything else we should know about your project / environment?

We previously reported a similar issue for the Java client at Use multiple connections to avoid the server's SETTINGS_MAX_CONCURRENT_STREAMS limit #11704 - the work around suggested does not work for us now since the DNS name resolution returns IPv6 addresses, which don't resolve on GCE instances (which is where our client would run). We get a Immediate connect fail for 2a00:1450:400c:c06::93: Network is unreachable error when we try to connect to IPv6 addresses.
The grpc-go client has a similar issue reported in its repository - Control MAX_CONCURRENT_STREAMS server-side and account for it on client-side grpc-go#2412. It also has a comment about the work involved in the Go client for the feature.
Ideally we would like to avoid having to manage multiple connections (connection pool) in our client.

Also this issue is blocking our Go client (which we are writing for our API) from achieving high parallelism, which makes the client less useful to our customers.

The text was updated successfully, but these errors were encountered:

gkousik · 2019-12-11T18:38:31Z

FYI @ola-rozenfeld @buchgr - In relation to #11704, I suppose if you run Bazel on GCE (where it doesn't connect to IPv6 addresses), you will also remote execution parallelism being constrained to a around ~200 concurrent requests.

dfawley · 2019-12-11T21:20:01Z

This would have to be a feature provided by all languages, not just Go.

cc @ejona @markdroth

ejona86 · 2019-12-11T21:30:31Z

I expect #7957 is really the appropriate tracking issue for this. Generally when you want multiple connections there is a proxy involved.

We've considered this some in the past, but there's some complexities. In particular, when MAX_CONCURRENT_STREAMS is used when a backend is overloaded (like C core can do) it is unclear what actions are appropriate. Creating more connections in that case is clearly a bad idea. Sending traffic to other backends when using round_robin may be fine, but there's also risk there.

For a case going through a proxy, using multiple connections is good and proper, but the client isn't aware of the server's topology. Although there should also clearly be a limit. It is very normal for clients to have bursty workloads, creating 1000s or 10s of thousands of RPCs all-at-once.

ejona86 · 2019-12-11T21:32:16Z

Oh, and to be clear, the solution to date from my perspective is "create multiple Channels." Although I also understand that is much easier to do in Java than many other languages, since you can hide the behavior behind the Channel interface and use stubs like normal (instead of needing to round-robin across multiple stubs).

stale · 2020-05-06T10:22:53Z

This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 30 days. It will be closed automatically if no further update occurs in 7 day. Thank you for your contributions!

ajmath · 2021-10-21T16:20:59Z

Oh, and to be clear, the solution to date from my perspective is "create multiple Channels." Although I also understand that is much easier to do in Java than many other languages, since you can hide the behavior behind the Channel interface and use stubs like normal (instead of needing to round-robin across multiple stubs).

@ejona86 Do you know of any public examples of someone doing this with the Java API? I'd also love to hear about examples in the other languages if you know of any.

Edit:
Found this example in bazel.

ejona86 · 2021-10-21T16:48:36Z

@ajmath, there's also https://github.com/googleapis/gax-java/blob/main/gax-grpc/src/main/java/com/google/api/gax/grpc/ChannelPool.java (ignore the "refreshing" piece of it). Not that if you choose to implement just Channel instead of ManagedChannel, then it becomes even easier.

patrickfreed · 2022-11-01T20:49:55Z

@dfawley mentioned there was an effort to create a cross-language design for this functionality, are there any updates on that?

dfawley · 2022-11-01T21:22:36Z

@patrickfreed We do have a design proposal that was mostly agreed upon internally, but the effort to implement it was deprioritized since our primary use case no longer required it. Long-term it is still something we'd like to do. @ejona86 @markdroth what do you think about the priority of this vs. our other projects in the next few months? Should I write up a gRFC for the design even if we don't have resources assigned to implement it?

ejona86 · 2022-11-03T15:29:13Z

gRFC without implementation resources doesn't seem all that helpful. Maybe just externalize the internal doc (copy the contents to your personal account and share it)? That lets people determine if the solution would help them, gives them an idea of the work involved, and would be a guide for any would-be contributors that want to step up.

markdroth · 2022-11-03T16:28:55Z

Didn't we say that the internal design we had been evaluating wasn't going to work in the xDS case? If so, I don't think we'd actually want to pursue that anyway.

patrickfreed · 2022-11-03T16:35:42Z

gRFC without implementation resources doesn't seem all that helpful. Maybe just externalize the internal doc (copy the contents to your personal account and share it)? That lets people determine if the solution would help them, gives them an idea of the work involved, and would be a guide for any would-be contributors that want to step up.

Either one would be super helpful and much appreciated.

Didn't we say that the internal design we had been evaluating wasn't going to work in the xDS case? If so, I don't think we'd actually want to pursue that anyway.

I think this feature is still very useful for the L4 load balancer case (#7957).

markdroth · 2022-11-03T16:46:00Z

I think this feature is still very useful for the L4 load balancer case (#7957).

I do understand that, but we don't want to have to support two mechanisms for the same thing. Ultimately, we will need a mechanism that solves this problem for the xDS case, so if we introduce this mechanism now for the L4 case and then need to introduce a separate mechanism later for the xDS case, then we're stuck supporting two mechanisms for the same thing. I would prefer not to introduce a mechanism for this until we know that it will solve all the use cases that we care about.

patrickfreed · 2022-12-08T17:12:50Z

I think it would still be valuable if the internal doc were released publicly, since it's possible the community could help brainstorm ideas for a solution that accommodates the xDS case too.

bogdan-patraucean · 2024-02-12T10:05:59Z

Any updates on this?

vincentyl · 2024-02-26T16:49:51Z

we're seeing the same issue in python as well

bgdnvk · 2024-04-08T21:25:29Z

bump for visibility - really interested in this and found this issue from the docs page

grpc performance note:

Side note: The gRPC team has plans to add a feature to fix these performance issues (see grpc/grpc#21386 for more info), so any solution involving creating multiple channels is a temporary workaround that should eventually not be needed.

gkousik added kind/bug priority/P2 labels Dec 5, 2019

gkousik assigned karthikravis Dec 5, 2019

stanley-cheung assigned dfawley and unassigned karthikravis Dec 11, 2019

stanley-cheung added the lang/go label Dec 11, 2019

dfawley assigned ejona86 and markdroth Dec 11, 2019

dfawley removed the lang/go label Dec 11, 2019

dfawley changed the title ~~Grpc-go client - support creating additional connections when MAX_CONCURRENT_STREAM is reached~~ load balancing: support creating additional connections when MAX_CONCURRENT_STREAM is reached Dec 11, 2019

dfawley mentioned this issue Dec 11, 2019

Control MAX_CONCURRENT_STREAMS server-side and account for it on client-side grpc/grpc-go#2412

Closed

dfawley added kind/enhancement area/client channel and removed kind/bug labels Dec 11, 2019

stale bot added the disposition/stale label May 6, 2020

markdroth added disposition/never stale and removed disposition/stale labels May 6, 2020

markdroth added lang/core lang/other labels Jun 17, 2020

EricBurnett mentioned this issue Jul 17, 2020

Remote concurrency limited by gRPC connections bazelbuild/bazel#11801

Closed

meisterT mentioned this issue Aug 13, 2020

Add channel pool for remote execution to overcome gRPC connections limitation. bazelbuild/bazel#11937

Closed

This was referenced Jan 12, 2021

Improve support for L4 Load Balancers #7957

Open

LocalSubchannelPool and GlobalSubchannelPool should provide more than one instance of an unique socket address #25062

Closed

markdroth mentioned this issue May 25, 2021

Add performance best practices guide grpc/grpc.io#790

Merged

gidxl03 mentioned this issue Sep 27, 2021

Performance Best Practices grpc/grpc.io#852

Open

CMCDragonkai mentioned this issue Oct 19, 2021

Transport Agnostic RPC MatrixAI/Polykey#249

Closed

14 tasks

tanmayv25 mentioned this issue Jan 13, 2022

Create new GRPC channel to work around max connection concurrency triton-inference-server/client#55

Merged

cmacknz mentioned this issue Jun 21, 2022

Evaluate defining the PublishEvents() RPC as a stream elastic/elastic-agent-shipper#63

Open

gautamomento mentioned this issue Jul 20, 2022

feat: support for multiple gRPC channels momentohq/client-sdk-python#125

Merged

This was referenced Nov 30, 2022

feat: use multiple gRPC channels using connection pool jina-ai/jina#5466

Closed

Epic: Improve gRPC transport jina-ai/jina#5471

Closed

ItalyPaleAle mentioned this issue Aug 2, 2023

Dapr GRPC proxy connection leak dapr/dapr#6734

Closed

This was referenced Aug 31, 2023

Remove capacity-based channels with grpc fix bazelbuild/bazel-buildfarm#1430

Open

Separate channel for write api bazelbuild/bazel-buildfarm#1424

Merged

dfawley mentioned this issue Sep 13, 2023

client: detect max_concurrent_streams coming from the server grpc/grpc-go#6576

Closed

shushenghong mentioned this issue May 6, 2024

[Question]How to implement grpc connection pool via kratos 2.x go-kratos/kratos#2860

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load balancing: support creating additional connections when MAX_CONCURRENT_STREAM is reached #21386

load balancing: support creating additional connections when MAX_CONCURRENT_STREAM is reached #21386

gkousik commented Dec 5, 2019

gkousik commented Dec 11, 2019

dfawley commented Dec 11, 2019

ejona86 commented Dec 11, 2019

ejona86 commented Dec 11, 2019

stale bot commented May 6, 2020

ajmath commented Oct 21, 2021 •

edited

ejona86 commented Oct 21, 2021

patrickfreed commented Nov 1, 2022

dfawley commented Nov 1, 2022

ejona86 commented Nov 3, 2022

markdroth commented Nov 3, 2022

patrickfreed commented Nov 3, 2022

markdroth commented Nov 3, 2022

patrickfreed commented Dec 8, 2022

bogdan-patraucean commented Feb 12, 2024

vincentyl commented Feb 26, 2024

bgdnvk commented Apr 8, 2024

load balancing: support creating additional connections when MAX_CONCURRENT_STREAM is reached #21386

load balancing: support creating additional connections when MAX_CONCURRENT_STREAM is reached #21386

Comments

gkousik commented Dec 5, 2019

What version of gRPC and what language are you using?

What operating system (Linux, Windows,...) and version?

What runtime / compiler are you using (e.g. python version or version of gcc)

What did you do?

What did you expect to see?

What did you see instead?

Anything else we should know about your project / environment?

gkousik commented Dec 11, 2019

dfawley commented Dec 11, 2019

ejona86 commented Dec 11, 2019

ejona86 commented Dec 11, 2019

stale bot commented May 6, 2020

ajmath commented Oct 21, 2021 • edited

ejona86 commented Oct 21, 2021

patrickfreed commented Nov 1, 2022

dfawley commented Nov 1, 2022

ejona86 commented Nov 3, 2022

markdroth commented Nov 3, 2022

patrickfreed commented Nov 3, 2022

markdroth commented Nov 3, 2022

patrickfreed commented Dec 8, 2022

bogdan-patraucean commented Feb 12, 2024

vincentyl commented Feb 26, 2024

bgdnvk commented Apr 8, 2024

ajmath commented Oct 21, 2021 •

edited