Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: use least-requests loadbalancer for workers #6004

Merged
merged 1 commit into from
Apr 11, 2023

Conversation

SaveTheRbtz
Copy link
Contributor

@SaveTheRbtz SaveTheRbtz commented Feb 2, 2023

This is a proposal to change load-balancing mechanism from round robin to least-requests by switching an array of channels to a single channel. The downside of the existing code is that round-robin may pick a worker that is already busy while there are plenty of workers free.

In the future we could also add a customizable queue discipline for the cases where all workers are busy so that it can act as a cheap circuit breaker / load-shedding mechanism.

While here, I'm also switching wg.Done() to be consistently ran in a defer.

Ref: #3204

RELEASE NOTES:

  • server: improve stream handler goroutine worker allocation when NumStreamWorkers is used

@SaveTheRbtz
Copy link
Contributor Author

cc: @adtac @menghanl @dfawley

@zasweq
Copy link
Contributor

zasweq commented Feb 3, 2023

This is marked as draft. I'll wait until it gets unmarked as draft until I take a look.

@SaveTheRbtz SaveTheRbtz marked this pull request as ready for review February 3, 2023 09:04
@dfawley dfawley self-requested a review February 7, 2023 18:50
@dfawley dfawley self-assigned this Feb 7, 2023
@dfawley dfawley added the Type: Performance Performance improvements (CPU, network, memory, etc) label Feb 7, 2023
@dfawley dfawley added this to the 1.54 Release milestone Feb 7, 2023
@dfawley
Copy link
Member

dfawley commented Feb 17, 2023

Thank you for the PR. This seems like the obvious way to do this, which makes me wonder why it was done the way it was before.

Is it possible for you to run the benchmarks and see what happens to performance when this is enabled before/after this change?

@github-actions
Copy link

This PR is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@github-actions github-actions bot added the stale label Feb 23, 2023
@SaveTheRbtz
Copy link
Contributor Author

Sorry for the slow reply!

None of the existing benchmarks show any difference, mostly because they do not exercise the affected codepath. I've created a very dirty benchmark specifically for this[1] but even it does not show much difference in go1.20: the only difference i was able to spot is that now requests are more likely to be handled by workers instead of one-off goroutines.

This is likely do to the fact that go right now automatically adjust stack size based on historic data, which means even if request is not processed in worker, it won't be pathologically slow.

[1] https://gist.github.com/SaveTheRbtz/cae6e8abb9311e2016acbab6bc63752a

@github-actions github-actions bot removed the stale label Mar 1, 2023
@github-actions
Copy link

github-actions bot commented Mar 7, 2023

This PR is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@dfawley dfawley assigned easwars and unassigned dfawley Apr 6, 2023
// connections to reduce the time spent overall on runtime.morestack.
func (s *Server) initServerWorkers() {
s.serverWorkerChannels = make([]chan *serverWorkerData, s.opts.numServerWorkers)
s.serverWorkerChannel = make(chan *serverWorkerData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it at all make sense to create this channel with a buffer of size numServerWorkers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would increase latencies, since then up to numServerWorkers requests can be indefinitely queued until we free up a worker. Without buffering such a request would instead be handled by a spawned goroutine immediately.

@SaveTheRbtz SaveTheRbtz removed their assignment Apr 11, 2023
@easwars easwars closed this Apr 11, 2023
@easwars easwars reopened this Apr 11, 2023
@easwars easwars merged commit 6eabd7e into grpc:master Apr 11, 2023
1 check passed
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Performance Performance improvements (CPU, network, memory, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants