Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

leastrequest: fix data race in leastrequest picker #6587

Merged
merged 3 commits into from
Aug 31, 2023

Conversation

huangchong94
Copy link
Contributor

@huangchong94 huangchong94 commented Aug 26, 2023

This PR fixes a data race in least request picker which will make race detector happy by using atomic type.

Note: Even after this PR merged, there is still some raciness in this implementation, but as section Outstanding request counter raciness in A48 points out, it's on purpose for performance reason.

RELEASE NOTES: none

@easwars
Copy link
Contributor

easwars commented Aug 29, 2023

@zasweq : Could you please take a look?

Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"

Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Comment on lines 157 to 158
// There is some degree of raciness in Pick, this is allowed: "it may be argued that
// it's better to accept some degree of raciness in the picker instead." - A48
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this comment, I don't think you need to state it. If you do, post the full quote and wrap to 60 characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

60 characters

It's irrelevant now, but note that we wrap at 80 columns, not 60.

@zasweq zasweq assigned dfawley and unassigned zasweq Aug 30, 2023
@zasweq
Copy link
Contributor

zasweq commented Aug 30, 2023

I need to add a test for concurrent RPCs (i.e. a bunch of RPCs happening in parallel on a channel with the least request lb present), as that would have caught this problem. Thanks for the contribution.

@zasweq zasweq requested a review from dfawley August 30, 2023 17:58
@@ -162,18 +162,18 @@ func (p *picker) Pick(balancer.PickInfo) (balancer.PickResult, error) {
pickedSC = &sc
continue
}
if *sc.numRPCs < *pickedSC.numRPCs {
if sc.numRPCs.Load() < pickedSC.numRPCs.Load() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q for @zasweq: should we cache pickedSC.numRPCs, or reload it on every iteration as we are currently doing?

for ... {
	index := ...
	sc := ...
	scRPCs := sc.numRPCs.Load()
	if pickedSC == nil || scRPCs < pickedSCRPCs{
		pickedSC = &sc
		pickedSCRPCs = scRPCs
	}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying. I think both caching it and the way it is fall within the language described here: https://github.com/grpc/proposal/blob/master/A48-xds-least-request-lb-policy.md#outstanding-request-counter-raciness. Caching it before the iteration would get rid of the race where other threads and picks can add or subtract between iterations. "While reading the outstanding request counters of samples in the picker, previously read values may become outdated" - specifically, this language I think encompasses a. scs that have already been checked and b. the picked sc. As Eric mentioned offline, I think eventually this algorithm goes towards consistency even if the checks aren't perfect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, just talked to Eric about this and he agrees with your comment. a. loads are expensive, particularly if you get a subsequent write in another Pick call, since it requires exclusive access similar to a rw lock. Thus, it is more performant to perform the operation once. b. Wrt atomics, Eric has the philosophy of only Loading once, as it makes the code easier to reason about and also prevents concurrent operations on that data breaking assumptions and logic about the code. After this is merged I'll change it and write a test for concurrent RPC's.

Comment on lines 157 to 158
// There is some degree of raciness in Pick, this is allowed: "it may be argued that
// it's better to accept some degree of raciness in the picker instead." - A48
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

60 characters

It's irrelevant now, but note that we wrap at 80 columns, not 60.

@dfawley dfawley assigned zasweq and unassigned dfawley Aug 31, 2023
@dfawley dfawley added this to the 1.58 Release milestone Aug 31, 2023
@dfawley
Copy link
Member

dfawley commented Aug 31, 2023

Also, this will need to be backported to the v1.58.x branch.

@zasweq zasweq changed the title leastrequest: fix data race in leastrequest picker using atomic type leastrequest: fix data race in leastrequest picker Aug 31, 2023
@zasweq zasweq merged commit e498bbc into grpc:master Aug 31, 2023
11 checks passed
arvindbr8 pushed a commit to arvindbr8/grpc-go that referenced this pull request Sep 5, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants