Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roundrobin: randomize starting address when rebuilding the picker #2579

Merged
merged 2 commits into from
Jan 17, 2019

Conversation

dfawley
Copy link
Member

@dfawley dfawley commented Jan 16, 2019

No description provided.

@dfawley
Copy link
Member Author

dfawley commented Jan 16, 2019

cc @ejona86 @markdroth FYI

@markdroth
Copy link
Member

The more I think about this, the less convinced I am that randomization is the right behavior.

As we've discussed, there are two cases here:

  1. A new picker is created because one of the subchannels changed its connectivity state into or out of READY.
  2. A new picker is created because the resolver gave us a new list of addresses.

In case 2, I think starting at zero is fine. Starting at a random index could be harmful if there are cases where there are a small rate of very expensive RPCs and a resolver or balancer gave us addresses in a particular order and expected us to follow that order. Admittedly, that's probably a rare case, so we could probably get away with using a random index in this case, but since starting at zero is probably easier to implement than randomizing, I still think it's the better choice.

Case 1 is the more interesting case. Starting at a random index is strictly better than starting over at zero, for the reasons you note in this PR. However, I still think the right behavior is that we should keep going from the index we were at previously. Otherwise, this means that any time an individual backend is flapping, we will be constantly skipping randomly throughout the list. Even worse, all clients will be doing that random skipping, which means that round_robin winds up degenerating into pick_random.

@ejona86
Copy link
Member

ejona86 commented Jan 17, 2019

That doesn't sound that bad, as a degenerate case. Being the literal worst case, degenerating to pick_random... meh. Like, I have trouble caring.

To change that behavior would require a larger change in how subchannels are passed to the picker. So it sounds like this PR should still be merged, even if it is expected to be replaced later.

@@ -47,12 +48,19 @@ type rrPickerBuilder struct{}

func (*rrPickerBuilder) Build(readySCs map[resolver.Address]balancer.SubConn) balancer.Picker {
grpclog.Infof("roundrobinPicker: newPicker called with readySCs: %v", readySCs)
if len(readySCs) == 0 {
return base.NewErrPicker(balancer.ErrNoSubConnAvailable)
}
var scs []balancer.SubConn
for _, sc := range readySCs {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So go is iterating through the channels in a shuffled order?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Effectively, yes. They are ordered by how they happen to be in the readySCs map, keyed by the address (as a string, plus a "type" and a pointer to optional user metadata). This is also not right, but I do think randomizing here is better than not randomizing, so we should merge this PR for now and file an issue to improve further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wasn't wanting to block this PR.

@markdroth
Copy link
Member

That doesn't sound that bad, as a degenerate case. Being the literal worst case, degenerating to pick_random... meh. Like, I have trouble caring.

A task flapping is not that unusual of a case. I think this is common enough that it's something we should handle gracefully.

To change that behavior would require a larger change in how subchannels are passed to the picker.

I'm not sure exactly how Go (or Java) are passing the subchannels to the picker, but it's not obvious to me why that has to change. Presumably, the picker stores the index into its list. So couldn't we have the client channel let the new picker get some state from the old picker before swapping the new picker in? That would give us the ability to retain the last index used in the previous picker.

So it sounds like this PR should still be merged, even if it is expected to be replaced later.

If this PR only adds randomization for the case 1, not case 2, then I agree that it can go forward, since it's a strict improvement over resetting to 0 every time a subchannel's state changes. But we should follow up with a better fix in a separate PR.

@ejona86
Copy link
Member

ejona86 commented Jan 17, 2019

Presumably, the picker stores the index into its list. So couldn't we have the client channel let the new picker get some state from the old picker before swapping the new picker in?

The problem is that the lists will be of different sizes. Go and Java pass a list of size equal to the number of ready subchannels.

In addition, because there is no locking around uses of the Picker the old and new picker are used simultaneously (or rather, old calls to the old picker may still be in progress when new calls to the new picker begin). So "get state from the old picker" is probably not a solution like it is in C. (Instead, we would probably share data between the two, but we have a GC so that doesn't cause lifetime issues.)

If this PR only adds randomization for the case 1, not case 2, then I agree that it can go forward, since it's a strict improvement over resetting to 0 every time a subchannel's state changes.

It would randomize on both cases. Case 2 is rare; while I can agree it'd be good to start at the beginning, I don't think it practically matters. The current implementation for case 1 is dire; it degrades to all RPCs being sent to one backend. It seems weird to me to block an obvious "this could cause an outage" fix because of an "occasionally we will do things in a slightly different order, but nobody will probably even notice".

@dfawley
Copy link
Member Author

dfawley commented Jan 17, 2019

So "get state from the old picker" is probably not a solution like it is in C.

Correct. If maintaining the position in the list is a requirement, probably what I'd do is make it so there actually is only one picker permanently, and update its state atomically/with a lock when a change needs to be made.

Either way, I think this is a real improvement and the full fix is going to be lower priority for us. I will file an issue for the follow-up changes. In that, it would be nice if we could define the required/desired/acceptable behaviors for a RR balancer in gRPC in case we can find a user that would like to contribute a fix.

Also note that in case 2 we aren't even starting at the top today - as @ejona86 observed, we are shuffling our list of addresses due to the map from addresses to subchannels.

@dfawley dfawley merged commit efaac52 into grpc:master Jan 17, 2019
@dfawley
Copy link
Member Author

dfawley commented Jan 17, 2019

Also note that in case 2 we aren't even starting at the top today - as @ejona86 observed, we are shuffling our list of addresses due to the map from addresses to subchannels.

And per my observation in #2580, starting at the top doesn't even seem like a reliable thing that can be tested or relied upon by users, given that the first RPC will go to the first-connected backend and not unconditionally to the first address in the list.

@dfawley dfawley deleted the rrrand branch January 17, 2019 17:18
@dfawley dfawley added this to the 1.19 Release milestone Jan 17, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jul 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants