-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testing: provide a way to work around "race: limit on 8128 simultaneously alive goroutines is exceeded" error #47056
Comments
To expand a little: with the race detector enforcing a limit of 8k (active?) goroutines, presumably the testing package could refuse to start new parallel tests or sub-tests if we are currently over 50% of that limit of (active?) goroutines. That would still leave the "limit exceeded" error as a possibility, but it would presumably make them far less likely, and should only make hugely parallel tests slightly slower. And, best of all, it should allow |
@mvdan That kind of heuristic could be effective in the general case, but could end up more confusing in cases where tests themselves spawn many goroutines. I'm not really sure what to do here besides trying to make the race detector more robust (I think we haven't kept up with TSAN...? Don't quote me on that.). |
I believe this is a limitation of the TSAN library itself. It has a max thread limit. We do pull TSAN somewhat periodically, the last pull was 2020-10-20. All that being said, there's no way to fix this in the Go project directly. Someone would have to redesign TSAN somehow to handle more threads. It's probably a big lift, but @dvyukov might know more about why that limit exists and how we might increase it. |
(The error comes from the |
I just happen to have a redesigned tsan runtime that supports infinite number of goroutines (also twice faster and consumes less memory). However, there is still a long road to upstreaming it. |
Getting rid of the limit entirely would be a far better solution, of course :) My "limit parallelism" suggestion was more of a temporary workaround until that happens. |
I just used your experimental CL to help debug a very tricky memory corruption bug at work, where the software spawns tens of thousands of goroutines :) Really, really looking forward to this new tsan runtime being upstreamed, so Go can ship with it. Is there any way I can help make that happen? |
Just got back from a long vacation. |
That CL's HEAD did end up failing after a few days, for what it's worth :)
|
Nothing immediately comes to mind. I would need to a reproducer to debug. |
The code is open source, but a large daemon that ran for days, so not a reproducer. Not worth pursuing for now, I think - wanted to bring it up in case it was helpful on its own. |
I seem to be hitting a thing which isn't this, but might be vaguely related, where sometimes Nothing even close to a viable reproducer yet, unfortunately. Originally saw this with 1.16.9, with 1.17.2 I get similar things but the tracebacks can have 3000 lines of |
Change https://golang.org/cl/333529 mentions this issue: |
I've uploaded a new version at https://go-review.googlesource.com/c/go/+/333529. A number of bugs were fixed. Testing is welcome. |
Lines 1482 to 1493 in c5fee93
$ git clone https://github.com/go-faster/ch.git && cd ch && git checkout bde7621d
$ go test -race ./proto
race: limit on 8128 simultaneously alive goroutines is exceeded, dying
FAIL github.com/go-faster/ch/proto 0.427s
FAIL UPD: I'm not sure about the cause, still investigating, probably |
$ go test -trace trace.out .
Oh, now I see that N=19735. |
The zstd.NewReader function spawns multiple goroutines. Ref: golang/go#47056
The linux/amd64 version of race runtime is submitted. |
Issue #38184 seems to be a reasonably hard limit in the race detector library, but it's not always easy to work around it.
In tests, particularly, this issue more commonly arises in higher level tests that run more slowly and hence use
testing.T.Parallel
to speed themselves up. But this means that if a large instance is used to run tests (for example in continuous integration), we might see more simultaneously alive goroutines because the value ofGOMAXPROCS
is greater so more tests will be run in parallel, leading to this issue arising.There are a few current possibilities for working around this at present, although none of them feel great:
testing.T.Parallel
when the race detector is enabled, but this disables parallelism completely, which feels like overkill.-test.parallel
flag, but this means that the top level CI script needs to be changed (you'll no longer be able to rungo test -race ./...
) which has maintainability issues.-test.parallel
within a given package by usingflag.Lookup
and setting the flag directly, but this is a nasty hack.Ideally, we'd be able to configure the maximum number of active goroutines allowed, but failing that, another possibility might be to make the default value of
testing.parallel
come fromruntime.GOMAXPROCS
afterTestMain
has run rather than before, allowing a test package to adjust the amount of test parallelism itself. Another idea (courtesy of @mvdan) is that thetesting
package could avoid starting new parallel tests when the number of goroutines grows large.The text was updated successfully, but these errors were encountered: