Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use multiple connections to avoid the server's SETTINGS_MAX_CONCURRENT_STREAMS limit #11704

Closed
ola-rozenfeld opened this issue Jul 6, 2017 · 6 comments

Comments

@ola-rozenfeld
Copy link

@ola-rozenfeld ola-rozenfeld commented Jul 6, 2017

Please answer these questions before submitting your issue.

Should this be an issue in the gRPC issue tracker?

Yes

What version of gRPC and what language are you using?

I am using grpc 1.3.0 with Java.

What operating system (Linux, Windows, …) and version?

Linux Ubuntu 14.04 LTS

What runtime / compiler are you using (e.g. python version or version of gcc)

Java 8

What did you do?

I did not use any client side load balancer.

What did you expect to see?

I expected to see the default load balancing creating multiple connections and spreading load over these.

What did you see instead?

Instead it repeatedly took out our server-side balancer.

Anything else we should know about your project / environment?

Our code is actually open source (Bazel), see e.g. https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/remote/GrpcRemoteCache.java#L130

Thank you!

@carl-mastrangelo

This comment has been minimized.

Copy link
Contributor

@carl-mastrangelo carl-mastrangelo commented Jul 6, 2017

cc: @markdroth @dgquintas

This is the issue we talked about recently. @ola-rozenfeld is using Java, but the issue applies to all languages.

@markdroth markdroth changed the title Default load balancer only creates one connection Use multiple connections to avoid the server's SETTINGS_MAX_CONCURRENT_STREAMS limit Jul 7, 2017
@markdroth

This comment has been minimized.

Copy link
Member

@markdroth markdroth commented Jul 7, 2017

Based on our discussion, I think the issue here is that the client is hitting the server's HTTP/2 SETTINGS_MAX_CONCURRENT_STREAMS limit for a single connection. So the request here is to create multiple connections to the backend server and balance the number of open streams across those connections.

Note that, in principle, this is somewhat orthogonal to our existing load-balancing mechanism, since this is not about balancing load across backends; it's about balancing open streams across connections. The two types of balancing are actually orthogonal, although we might wind up conflating the two, depending on how we choose to implement this.

We have talked about supporting this in the past, but it will require some careful design work and then implementation in all languages. Right now, our plate is full for Q3, but if this is hurting a lot of people, we can consider prioritizing it for Q4.

@ola-rozenfeld

This comment has been minimized.

Copy link
Author

@ola-rozenfeld ola-rozenfeld commented Jul 7, 2017

So if I understand correctly, even using a load balancer such as RoundRobinLoadBalancerFactory might not eliminate our problem, even though it will likely help (because we will have more connections, it will be less likely for us to hit the SETTINGS_MAX_CONCURRENT_STREAMS limit on a single one). Right?

@markdroth

This comment has been minimized.

Copy link
Member

@markdroth markdroth commented Jul 7, 2017

Yes, but only if your server name resolves to multiple IP addresses. If your server name only resolves to one IP address, then the client will only use a single connection, regardless of which load-balancing policy you choose.

gRPC's load-balancing mechanism is intended for balancing load across multiple backend servers. The basic idea is that we treat each IP address returned by the name resolver as a separate backend, and then we apply a particular LB policy to balance the load across those backends. The workflow is described here:

https://github.com/grpc/grpc/blob/master/doc/load-balancing.md#workflow

The default LB policy is pick_first, which iterates through the list of addresses returned by the resolver and picks the first one to which it can establish a connection. It then sends all RPCs across that single connection. So each client will wind up sending all of its RPCs to a single backend server, but if the name resolver returns the IP addresses in randomized order (which most DNS servers do), then different clients will pick different backend addresses. If you have a large enough number of clients, and if each client sends about the same amount of load, this may be good enough to balance load across the backends, but if either of those conditions are not met, then it probably won't do a very good job.

The round_robin LB policy is better, although still somewhat naive. It will attempt to establish connections to all addresses returned by the name resolver. It will then send each subsequent RPC to each successive address in the list, wrapping back around to the beginning when it hits the end. This does a better job than pick_first, since it can spread out the load when there are a small number of clients or different clients sending RPCs at different rates. However, the reason I said that this is a naive algorithm is that it assumes that every RPC imposes the same amount of load on every server. So if you have some RPCs that are a lot more expensive for the server to process than others, or if some of your servers are faster than others, then this is still not really good enough.

The grpclb LB policy is the one that we intend to be the most extensible, although it's not yet fully implemented in open-source. The idea is that there is a separate tier of look-aside load balancers that can receive load reports from the backend servers and intelligently decide where to send the load in a more dynamic way. The gRPC client will contact the load balancer to find out where to send each RPC.

Anyway, as you can see, all of the above is designed around the idea of balancing load across multiple backends. However, from what @carl-mastrangelo described to me yesterday, it seems that your problem here is not actually about balancing the load on the backend servers themselves, but rather balancing the number of open streams across multiple connections. We don't support a mechanism for doing this yet. But in the interim, if you do have multiple server addresses, then using the round_robin LB policy will probably alleviate some of the problems you're seeing by spreading the load across multiple connections.

I hope this explanation is helpful (and I apologize if I wound up explaining things that you already understood). Please let me know if you have any other questions.

@ola-rozenfeld

This comment has been minimized.

Copy link
Author

@ola-rozenfeld ola-rozenfeld commented Jul 7, 2017

Yes, this is very helpful, thank you!

bazel-io pushed a commit to bazelbuild/bazel that referenced this issue Jul 17, 2017
…undRobin client side load balancer instead of the default PickFirst.

A RoundRobin rotates between all addresses returned by the resolver, while PickFirst sends all traffic to the first address. This might help resolve some of the load problems we encountered (see grpc/grpc#11704 for more details).

TESTED=remote worker
PiperOrigin-RevId: 161960008
@dgquintas dgquintas added the lang/Java label Jun 9, 2018
@dgquintas

This comment has been minimized.

Copy link

@dgquintas dgquintas commented Jun 9, 2018

This issue seems resolved. Closing.

@dgquintas dgquintas closed this Jun 9, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Sep 29, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.