-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: lock ordering problem: gscan, profBlock #66004
Comments
@mknyszek , is this the same problem described in #64706 (comment) ? That was tabled in December because 1/ it was test-only and 2/ it was too late in the release cycle to change Gscan users to also note their non-preemptibility in |
I continue to see these on the dashboard at a low rate. Marking release-blocker. It sounds like we could remove runtime locks from the mutex profile at the very least. |
Looks like more reports are accumulating on #58277. |
Profiling of runtime-internal locks checks gp.m.locks to see if it's safe to add a new record to the profile, but direct use of acquireLockRank can change the list of the M's active lock ranks without updating gp.m.locks to match. The runtime's internal rwmutex implementation makes a point of calling acquirem/releasem when manipulating the lock rank list, but the other user of acquireLockRank (the GC's Gscan bit) relied on the GC's invariants to avoid deadlocks. Codify the rwmutex approach by having acquireLockRank (now called acquireLockRankAndM) include a call to aquirem. Do the same for release. For golang#64706 For golang#66004
Change https://go.dev/cl/571056 mentions this issue: |
Profiling of runtime-internal locks checks gp.m.locks to see if it's safe to add a new record to the profile, but direct use of acquireLockRank can change the list of the M's active lock ranks without updating gp.m.locks to match. The runtime's internal rwmutex implementation makes a point of calling acquirem/releasem when manipulating the lock rank list, but the other user of acquireLockRank (the GC's Gscan bit) relied on the GC's invariants to avoid deadlocks. Codify the rwmutex approach by adding a variant of acquireLockRank, acquireLockRankAndM, include a call to aquirem. Do the same for release. Leave runtime/time.go's use of the old variants intact for the moment. For golang#64706 For golang#66004 Change-Id: Id18e4d8de1036de743d2937fad002c6feebe2faf
Profiling of runtime-internal locks checks gp.m.locks to see if it's safe to add a new record to the profile, but direct use of acquireLockRank can change the list of the M's active lock ranks without updating gp.m.locks to match. The runtime's internal rwmutex implementation makes a point of calling acquirem/releasem when manipulating the lock rank list, but the other user of acquireLockRank (the GC's Gscan bit) relied on the GC's invariants to avoid deadlocks. Codify the rwmutex approach by adding a variant of acquireLockRank, acquireLockRankAndM, include a call to aquirem. Do the same for release. Leave runtime/time.go's use of the old variants intact for the moment. For golang#64706 For golang#66004 Change-Id: Id18e4d8de1036de743d2937fad002c6feebe2faf
Is there any status update to report on this issue? |
Profiling of runtime-internal locks checks gp.m.locks to see if it's safe to add a new record to the profile, but direct use of acquireLockRank can change the list of the M's active lock ranks without updating gp.m.locks to match. The runtime's internal rwmutex implementation makes a point of calling acquirem/releasem when manipulating the lock rank list, but the other user of acquireLockRank (the GC's Gscan bit) relied on the GC's invariants to avoid deadlocks. Codify the rwmutex approach by renaming acquireLockRank to acquireLockRankAndM and having it include a call to aquirem. Do the same for release. For golang#64706 For golang#66004 Change-Id: Id18e4d8de1036de743d2937fad002c6feebe2faf
I had to step away from https://go.dev/cl/571056 for a few weeks, but it's once again ready for review. |
https://build.golang.org/log/b20065c3dd85e387e63949a5dcd73cf49f3ab7a5 shows a staticlockranking failure, reproduced below.
This looks real: gscan is held while profBlock is acquired. But the ranking order seems to say profBlock has to be less than STACKGROW, which has to be less than gscan. So it does not appear that adjusting the policy will fix the problem.
/cc @aclements
The text was updated successfully, but these errors were encountered: