Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
runtime: sema: many many goroutines queueing up on many many distinct addresses -> slow #38420
Hello up there. libcsp claims to be 10x faster on a benchmark involving
Go commit 45c6f59 says
It seems the above particular scenario is being hit in this benchmark.
What version of Go are you using (
If there are many goroutines contending for two different locks and both locks hash to the same semaRoot, the scans to find the goroutines for a particular lock can end up being O(n), making n lock acquisitions quadratic. As long as only one actively-used lock hashes to each semaRoot there's no problem, since the list operations in that case are O(1). But when the second actively-used lock hits the same semaRoot, then scans for entries with for a given lock have to scan over the entries for the other lock. Fix this problem by changing the semaRoot to hold only one sudog per unique address. In the running example, this drops the length of that list from O(n) to 2. Then attach other goroutines waiting on the same address to a separate list headed by the sudog in the semaRoot list. Those "same address list" operations are still O(1), so now the example from above works much better. There is still an assumption here that in real programs you don't have many many goroutines queueing up on many many distinct addresses. If we end up with that problem, we can replace the top-level list with a treap. Fixes #17953. Change-Id: I78c5b1a5053845275ab31686038aa4f6db5720b2 Reviewed-on: https://go-review.googlesource.com/36792 Run-TryBot: Russ Cox <email@example.com> TryBot-Result: Gobot Gobot <firstname.lastname@example.org> Reviewed-by: Ian Lance Taylor <email@example.com>
Storing part of the data on the side mapped using a global container is unnecessary overhead and contention. sync.Cond removed all contention and O(n) behavior with sync.syncSema in a20b4a6a9d (nowadays sync.notifyList). We could do the same for all other sync primitives.