Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should we implement subsetting in gRPC? #6370

Closed
s-matyukevich opened this issue Jun 13, 2023 · 8 comments
Closed

How should we implement subsetting in gRPC? #6370

s-matyukevich opened this issue Jun 13, 2023 · 8 comments

Comments

@s-matyukevich
Copy link
Contributor

s-matyukevich commented Jun 13, 2023

In our organization most of the teams use round_robin with healthchecks on the client to establish a reliable connection to the servers and achieve even load distribution. This works great, but after some point this doesn't scale very well.

The main problem with this setup is that the overal number of connections grows exponentially while the number of clients increases. This problem is especially bad in grpc-go, because grpc-go has issues with managing memory overhead per connection. Not only memory is wasted as other resources are consumed on processing health-checks and and maintaining connections at OS level.

There are a few options that we have tried or are considering:

  • Put envoy as a sidecar in front of every grpc server. Some teams did this and it helps as envoy is a lot more efficient with handling per-connection overhead than grpc-go, but this doesn't solve the root case of the problem.
  • Use a load balancer. Some teams also did this, but this increases overall complexity a lot and increases out cloud-provider costs, either if we use a cloud load balancer or an internal envoy cluster as a load balancer.
  • Use pick_first instead of round_robin. The users reported that this results in a load imbalance on the server (as we don't directly control what server should be picked by a client) This problem is especially bad during server rollouts, as DNS has some propagation delay and the clients end up not picking the last rotated server pod(s) at all.
  • Use some form of subsetting on the client. In theory this should be the best option, but we don't have a consensus what is the best way to implement it

Here are the subsetting implementation options that we have tried or considered:

  • Implement a custom load balancer that randomly picks N addresses from a DNS response.
  • Implement a custom load balancer that uses a ring hash and client index in the ring to deterministically pick a subset of servers for a given cleint.
  • Use xDS and implement subsetting in the management server. The management server in this case will have to return different EDS response per client.

We tried the first 2 options and it works ok, but we still have some imbalance on the server and we have to maintain the custom LB. This code is really generic - if this is the recommended way of doing subsetting we can create a PR and donate our subsetting LB to grpc-go. I guess we can also combine this with some ORCA load reporting functionality to deal with server imbalance and provide a generic reusable solution.

We didn't try the xDS option yet, but it will definitely increase the overhead of managing EDS keys on our management server. Ideally we would like to reuse the same EDS response for every client and make the client to pick a subset based on some parameters we send from xDS control plane, Is that something feasible? A related envoy PR was closed.

Are there any other options that we didn't consider? Any guidance here will be greatly appreciated.

cc @markdroth

@easwars
Copy link
Contributor

easwars commented Jun 27, 2023

We recently added a configuration knob to pick_first which enables it to randomly shuffle the list of received addresses. This will ensure that each of your clients will connect to different servers. Do you think this might help you?

@s-matyukevich

Please see: https://github.com/grpc/proposal/blob/master/A62-pick-first.md

@markdroth
Copy link
Member

We've had a bunch of internal discussions about subsetting. At the moment, we're using an approach that involves doing the subsetting on the management server (i.e., sending a different EDS resource to each client). That does cause some problems for xDS resource cacheability, but since we don't (yet?) have any case where we need to cache the EDS resources, it's a workable approach.

In the past, we have talked about building an LB policy to do subsetting on the client side, but we haven't pursued it. The thing that generally seems hard about subsetting is making it stable as the set of endpoints changes (e.g., when k8s pods are created or destroyed via auto-scaling), so that you don't cause a bunch of unnecessary connection churn on the clients. In the existing xDS ecosystem, that tends to be a bit harder on the client side, since there is no unique identifier for each endpoint instance that you can use in the subsetting algorithm to avoid this kind of churn.

@s-matyukevich
Copy link
Contributor Author

We recently added a configuration knob to pick_first which enables it to randomly shuffle the list of received addresses. This will ensure that each of your clients will connect to different servers. Do you think this might help you?

This won't help as we already have randomized DNS responses per client. Still with pick_first some of our users reported big imbalance on the server side, especially during server rollouts. I guess we can make it better by setting low MaxConnectionAge, but this generates unnecessary connection churn and doesn't fix the issue entirely.

@s-matyukevich
Copy link
Contributor Author

Thanks @markdroth this make sense.

I, however, still don't fully understand this part In the existing xDS ecosystem, that tends to be a bit harder on the client side, since there is no unique identifier for each endpoint instance that you can use in the subsetting algorithm to avoid this kind of churn Doesn't something like twitter aperture achieves both perfect load distribution on the server and low connection churn on the client? As far as I understand it needs only 1 additional parameter that is not currently present in xDS: client index in the ring hash, is this the missing parameter you are referring to?. Looks like envoyproxy/envoy#22991 was closed because there wasn't a consensus on how exactly to add this parameter to the xDS API.

However, something like twitter aperture is exactly what we need. I think I understand what you want from xDS api here If we come up with an alternative envoy PR that implements your suggestions and modifies only xDS API + gRFC for an aperture implementation + an actual implementation in Go, would it be something you can potentially accept?

Alternatively we can follow your suggestion and implement subsetting logic on the management server, but the complexity of doing it on the server in my opinion is the same as doing it on the client. However, if we do it on the client we can benefit from community support. We can also combine it with ORCA to achieve even better load distribution.

@markdroth
Copy link
Member

What I mean by a unique identifier is a name for a given endpoint that remains constant even if that endpoint moves to a different address (e.g., if a machine in a k8s cluster fails and the pods that were running on that machine get moved to a different machine). Internally, we have some subsetting algorithms that use that kind of unique identifier to avoid unnecessary connection churn in cases like that, because the endpoint changing addresses will not result in changing which clients are assigned to that endpoint. These algorithms can also handle dynamic adjustment of the subset when some of the initially chosen endpoints are down, without breaking load distribution. I don't think you can get those properties with twitter aperature subsetting -- although to be fair, you can't get some of it with subsetting on the management server either.

In any case, I have no objection to supporting something like twitter aperature subsetting in xDS or in gRPC. In principle, I think it's totally reasonable to have a "parent" LB policy that performs subsetting, and I think people can experiment with a variety of such policies that provide different subsetting algorithms, some of which may be better in some situations than others.

I think it's a shame that the Envoy PR you mentioned didn't move forward, but I wasn't able to get the contributor to understand what I was asking for, which was just a clean separation of the subsetting piece from the actual load balancing piece. I just don't think it's a good idea to hard-code the two pieces together by providing them both in a single LB policy, because that's unnecessarily inflexible. In principle, I think there can be a variety of subsetting policies and a variety of load balancing policies, and it should be possible to mix and match them in any way that may be desirable (e.g., use aperature subsetting with WRR load balancing, or use some other subsetting algorithm with P2C load balancing).

So yes, if you would like to put forward a proposal for a "parent" LB policy that supports aperature subsetting and/or a "leaf" policy that supports P2C load balancing, I'd be happy to review. I'd suggest starting with a gRFC describing both the gRPC functionality and the proposed xDS protos. Once we have consensus on that, you can put together a PR for the xDS proto changes, which I can help review; the gRFC will probably be useful context for getting the xDS proto change through.

I hope this info is helpful. Please let me know if you have any questions.

@easwars
Copy link
Contributor

easwars commented Jun 30, 2023

@s-matyukevich : Are you satisfied with the answers for your question here. Can we close this? Thanks.

@s-matyukevich
Copy link
Contributor Author

Yes, thanks for the answers! We are still discussing internally if we are going to try to implement aperture subletting in grpc or proceed with some other solution.

Closing this.

@s-matyukevich
Copy link
Contributor Author

I created a gRFC grpc/proposal#383 and a POC in Go #6488
We decided to use Google subsetting algorithm instead of Aperture for the reasons I described in the gRFC.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants