-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: staticlockranking builders failing on release branches on LUCI #64722
Comments
Change https://go.dev/cl/549536 mentions this issue: |
@gopherbot please backport to 1.20 and 1.21. This test-only problem is causing failures on the LUCI release branches. |
Backport issue(s) opened: #64760 (for 1.20), #64761 (for 1.21). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases. |
Currently, lock ranking doesn't really try to model rwmutex. It records the internal locks rLock and wLock, but in a subpar fashion: 1. wLock is held from lock to unlock, so it works OK, but it conflates write locks of all rwmutexes as rwmutexW, rather than allowing different rwmutexes to have different rankings. 2. rLock is an internal implementation detail that is only taken when there is contention in rlock. As as result, the reader lock path is almost never checked. Add proper modeling. rwmutexR and rwmutexW remain as the ranks of the internal locks, which have their own ordering. The new init method is passed the ranks of the higher level lock that this represents, just like lockInit for mutex. execW ordered before MALLOC captures the case from #64722. i.e., there can be allocation between BeforeFork and AfterFork. For #64722. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking Change-Id: I23335b28faa42fb04f1bc9da02fdf54d1616cd28 Reviewed-on: https://go-review.googlesource.com/c/go/+/549536 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Is this a regression, or has lock ranking never worked on the LUCI builders for the release branches? |
It has never worked on the LUCI builders. |
Change https://go.dev/cl/554976 mentions this issue: |
Change https://go.dev/cl/554995 mentions this issue: |
Change https://go.dev/cl/555055 mentions this issue: |
CL 549536 intended to decouple the internal implementation of rwmutex from the semantic meaning of an rwmutex read/write lock in the static lock ranking. Unfortunately, it was not thought through well enough. The internals were represented with the rwmutexR and rwmutexW lock ranks. The idea was that the internal lock ranks need not model the higher-level ordering, since those have separate rankings. That is incorrect; rwmutexW is held for the duration of a write lock, so it must be ranked before any lock taken while any write lock is held, which is precisely what we were trying to avoid. This is visible in violations like: 0 : execW 11 0x0 1 : rwmutexW 51 0x111d9c8 2 : fin 30 0x111d3a0 fatal error: lock ordering problem execW < fin is modeled, but rwmutexW < fin is missing. Fix this by eliminating the rwmutexR/W lock ranks shared across different types of rwmutex. Instead require users to define an additional "internal" lock rank to represent the implementation details of rwmutex.rLock. We can avoid an additional "internal" lock rank for rwmutex.wLock because the existing writeRank has the same semantics for semantic and internal locking. i.e., writeRank is held for the duration of a write lock, which is exactly how rwmutex.wLock is used, so we can use writeRank directly on wLock. For #64722. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking Change-Id: Ia572de188a46ba8fe054ae28537648beaa16b12c Reviewed-on: https://go-review.googlesource.com/c/go/+/555055 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
(This cherry-pick combines CL 549536 and the follow-up fix CL 555055.) Currently, lock ranking doesn't really try to model rwmutex. It records the internal locks rLock and wLock, but in a subpar fashion: 1. wLock is held from lock to unlock, so it works OK, but it conflates write locks of all rwmutexes as rwmutexW, rather than allowing different rwmutexes to have different rankings. 2. rLock is an internal implementation detail that is only taken when there is contention in rlock. As as result, the reader lock path is almost never checked. Add proper modeling. rwmutexR and rwmutexW remain as the ranks of the internal locks, which have their own ordering. The new init method is passed the ranks of the higher level lock that this represents, just like lockInit for mutex. execW ordered before MALLOC captures the case from #64722. i.e., there can be allocation between BeforeFork and AfterFork. For #64722. Fixes #64760. ------ runtime: replace rwmutexR/W with per-rwmutex lock rank CL 549536 intended to decouple the internal implementation of rwmutex from the semantic meaning of an rwmutex read/write lock in the static lock ranking. Unfortunately, it was not thought through well enough. The internals were represented with the rwmutexR and rwmutexW lock ranks. The idea was that the internal lock ranks need not model the higher-level ordering, since those have separate rankings. That is incorrect; rwmutexW is held for the duration of a write lock, so it must be ranked before any lock taken while any write lock is held, which is precisely what we were trying to avoid. This is visible in violations like: 0 : execW 11 0x0 1 : rwmutexW 51 0x111d9c8 2 : fin 30 0x111d3a0 fatal error: lock ordering problem execW < fin is modeled, but rwmutexW < fin is missing. Fix this by eliminating the rwmutexR/W lock ranks shared across different types of rwmutex. Instead require users to define an additional "internal" lock rank to represent the implementation details of rwmutex.rLock. We can avoid an additional "internal" lock rank for rwmutex.wLock because the existing writeRank has the same semantics for semantic and internal locking. i.e., writeRank is held for the duration of a write lock, which is exactly how rwmutex.wLock is used, so we can use writeRank directly on wLock. For #64722. Cq-Include-Trybots: luci.golang.try:go1.20-linux-amd64-staticlockranking Change-Id: I23335b28faa42fb04f1bc9da02fdf54d1616cd28 Reviewed-on: https://go-review.googlesource.com/c/go/+/549536 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> (cherry picked from commit 9b4b3e5) (cherry picked from commit dcbe772) Reviewed-on: https://go-review.googlesource.com/c/go/+/554995
(This cherry-pick combines CL 549536 and the follow-up fix CL 555055.) Currently, lock ranking doesn't really try to model rwmutex. It records the internal locks rLock and wLock, but in a subpar fashion: 1. wLock is held from lock to unlock, so it works OK, but it conflates write locks of all rwmutexes as rwmutexW, rather than allowing different rwmutexes to have different rankings. 2. rLock is an internal implementation detail that is only taken when there is contention in rlock. As as result, the reader lock path is almost never checked. Add proper modeling. rwmutexR and rwmutexW remain as the ranks of the internal locks, which have their own ordering. The new init method is passed the ranks of the higher level lock that this represents, just like lockInit for mutex. execW ordered before MALLOC captures the case from #64722. i.e., there can be allocation between BeforeFork and AfterFork. For #64722. Fixes #64761. ------ runtime: replace rwmutexR/W with per-rwmutex lock rank CL 549536 intended to decouple the internal implementation of rwmutex from the semantic meaning of an rwmutex read/write lock in the static lock ranking. Unfortunately, it was not thought through well enough. The internals were represented with the rwmutexR and rwmutexW lock ranks. The idea was that the internal lock ranks need not model the higher-level ordering, since those have separate rankings. That is incorrect; rwmutexW is held for the duration of a write lock, so it must be ranked before any lock taken while any write lock is held, which is precisely what we were trying to avoid. This is visible in violations like: 0 : execW 11 0x0 1 : rwmutexW 51 0x111d9c8 2 : fin 30 0x111d3a0 fatal error: lock ordering problem execW < fin is modeled, but rwmutexW < fin is missing. Fix this by eliminating the rwmutexR/W lock ranks shared across different types of rwmutex. Instead require users to define an additional "internal" lock rank to represent the implementation details of rwmutex.rLock. We can avoid an additional "internal" lock rank for rwmutex.wLock because the existing writeRank has the same semantics for semantic and internal locking. i.e., writeRank is held for the duration of a write lock, which is exactly how rwmutex.wLock is used, so we can use writeRank directly on wLock. For #64722. Cq-Include-Trybots: luci.golang.try:go1.21-linux-amd64-staticlockranking Change-Id: I23335b28faa42fb04f1bc9da02fdf54d1616cd28 Reviewed-on: https://go-review.googlesource.com/c/go/+/549536 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> (cherry picked from commit 9b4b3e5) (cherry picked from commit dcbe772) Reviewed-on: https://go-review.googlesource.com/c/go/+/554976
Currently, lock ranking doesn't really try to model rwmutex. It records the internal locks rLock and wLock, but in a subpar fashion: 1. wLock is held from lock to unlock, so it works OK, but it conflates write locks of all rwmutexes as rwmutexW, rather than allowing different rwmutexes to have different rankings. 2. rLock is an internal implementation detail that is only taken when there is contention in rlock. As as result, the reader lock path is almost never checked. Add proper modeling. rwmutexR and rwmutexW remain as the ranks of the internal locks, which have their own ordering. The new init method is passed the ranks of the higher level lock that this represents, just like lockInit for mutex. execW ordered before MALLOC captures the case from golang#64722. i.e., there can be allocation between BeforeFork and AfterFork. For golang#64722. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking Change-Id: I23335b28faa42fb04f1bc9da02fdf54d1616cd28 Reviewed-on: https://go-review.googlesource.com/c/go/+/549536 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
CL 549536 intended to decouple the internal implementation of rwmutex from the semantic meaning of an rwmutex read/write lock in the static lock ranking. Unfortunately, it was not thought through well enough. The internals were represented with the rwmutexR and rwmutexW lock ranks. The idea was that the internal lock ranks need not model the higher-level ordering, since those have separate rankings. That is incorrect; rwmutexW is held for the duration of a write lock, so it must be ranked before any lock taken while any write lock is held, which is precisely what we were trying to avoid. This is visible in violations like: 0 : execW 11 0x0 1 : rwmutexW 51 0x111d9c8 2 : fin 30 0x111d3a0 fatal error: lock ordering problem execW < fin is modeled, but rwmutexW < fin is missing. Fix this by eliminating the rwmutexR/W lock ranks shared across different types of rwmutex. Instead require users to define an additional "internal" lock rank to represent the implementation details of rwmutex.rLock. We can avoid an additional "internal" lock rank for rwmutex.wLock because the existing writeRank has the same semantics for semantic and internal locking. i.e., writeRank is held for the duration of a write lock, which is exactly how rwmutex.wLock is used, so we can use writeRank directly on wLock. For golang#64722. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking Change-Id: Ia572de188a46ba8fe054ae28537648beaa16b12c Reviewed-on: https://go-review.googlesource.com/c/go/+/555055 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Example failure:
https://ci.chromium.org/ui/p/golang/builders/try/go1.21-linux-amd64-staticlockranking/b8762252922810888305/test-results?sortby=&groupby=
This specific ordering violation is not problematic, though it is unclear to me why this is only failing on LUCI, and even there only on the release branches. More importantly, digging into this reveals fundamental problems with the may we model
rwmutex
.rwmutex
the same. That is, they all userwmutexR
andrwmutexW
even though they are semantically different locks. This is technically OK, but it reduces precision in the lock ranking and makes it more difficult to understand.rwmutexR
is not actually held across read locks, it is just an internal implementation detail held temporarily when there is contention. As a result the read lock rank is not consistently modeled since the lock is so rarely taken. We should have a rank that is always acquired on read lock.cc @mknyszek
The text was updated successfully, but these errors were encountered: