New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/crypto/acme/autocert: renewal test failures due to connection errors on windows/arm64 #51080
Comments
Given the timing of these failures, I suspect that the This is a release-blocker via #11811. (The tests can be fixed, or can have skips added. Given the recency of that CL, we might consider reverting it if that is reasonably straightforward.) |
Change https://go.dev/cl/384594 mentions this issue: |
TestRenewFromCache and TestRenewFromCacheAlreadyRenewed had several races and API misuses: 1. They called t.Fatalf from a goroutine other than the one invoking the Test function, which is explicitly disallowed (see https://pkg.go.dev/testing#T). 2. The test did not stop the renewal timers prior to restoring test-hook functions, and the process of stopping the renewal timers itself did not wait for in-flight calls to complete. That could cause data races if one of the renewals failed and triggered a retry with a short-enough randomized backoff. (One such race was observed in https://build.golang.org/log/1a19e22ad826bedeb5a939c6130f368f9979208a.) 3. The testDidRenewLoop hooks accessed the Manager.renewal field without locking the Mutex guarding that field. 4. TestGetCertificate_failedAttempt set a testDidRemoveState hook, but didn't wait for the timers referring to that hook to complete before restoring it, causing races with other timers. I tried pulling on that thread a bit, but couldn't untangle the numerous untracked goroutines in the package. Instead, I have made a smaller and more local change to copy the value of testDidRemoveState into a local variable in the timer's closure. Given the number of untracked goroutines in this package, it is likely that races and/or deadlocks remain. Notably, so far I have been unable to spot the actual cause of golang/go#51080. For golang/go#51080 Change-Id: I7797f6ac34ef3c272f16ca805251dac3aa7f0009 Reviewed-on: https://go-review.googlesource.com/c/crypto/+/384594 Trust: Bryan Mills <bcmills@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Filippo Valsorda <filippo@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
Still failing (as expected) after the above CL, with the same failure modes (plus once instance of #51019).
2022-02-11T04:50:35-f4118a5-9fdcfb7/windows-arm64-11 |
Change https://go.dev/cl/385675 mentions this issue: |
For golang/go#51080 Change-Id: Icf4414ab58bdea44b793a66770b4c05f9faf5387 Reviewed-on: https://go-review.googlesource.com/c/crypto/+/385675 Trust: Bryan Mills <bcmills@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Jeremy Faller <jeremy@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
TestRenewFromCache and TestRenewFromCacheAlreadyRenewed had several races and API misuses: 1. They called t.Fatalf from a goroutine other than the one invoking the Test function, which is explicitly disallowed (see https://pkg.go.dev/testing#T). 2. The test did not stop the renewal timers prior to restoring test-hook functions, and the process of stopping the renewal timers itself did not wait for in-flight calls to complete. That could cause data races if one of the renewals failed and triggered a retry with a short-enough randomized backoff. (One such race was observed in https://build.golang.org/log/1a19e22ad826bedeb5a939c6130f368f9979208a.) 3. The testDidRenewLoop hooks accessed the Manager.renewal field without locking the Mutex guarding that field. 4. TestGetCertificate_failedAttempt set a testDidRemoveState hook, but didn't wait for the timers referring to that hook to complete before restoring it, causing races with other timers. I tried pulling on that thread a bit, but couldn't untangle the numerous untracked goroutines in the package. Instead, I have made a smaller and more local change to copy the value of testDidRemoveState into a local variable in the timer's closure. Given the number of untracked goroutines in this package, it is likely that races and/or deadlocks remain. Notably, so far I have been unable to spot the actual cause of golang/go#51080. For golang/go#51080 Change-Id: I7797f6ac34ef3c272f16ca805251dac3aa7f0009 Reviewed-on: https://go-review.googlesource.com/c/crypto/+/384594 Trust: Bryan Mills <bcmills@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Filippo Valsorda <filippo@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
For golang/go#51080 Change-Id: Icf4414ab58bdea44b793a66770b4c05f9faf5387 Reviewed-on: https://go-review.googlesource.com/c/crypto/+/385675 Trust: Bryan Mills <bcmills@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Jeremy Faller <jeremy@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
Change https://go.dev/cl/433815 mentions this issue: |
Removes the skips from TestRenewFromCache and TestRenewFromCacheAlreadyRenewed, which were added due to flakes which may have been fixed by the renewal timer change. Updates golang/go#51080 Change-Id: Ib953a24e610e89dfbbea450a4c257c105055ce7e Reviewed-on: https://go-review.googlesource.com/c/crypto/+/433815 Run-TryBot: Roland Shoemaker <roland@golang.org> Auto-Submit: Roland Shoemaker <roland@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com>
Removes the skips from TestRenewFromCache and TestRenewFromCacheAlreadyRenewed, which were added due to flakes which may have been fixed by the renewal timer change. Updates golang/go#51080 Change-Id: Ib953a24e610e89dfbbea450a4c257c105055ce7e Reviewed-on: https://go-review.googlesource.com/c/crypto/+/433815 Run-TryBot: Roland Shoemaker <roland@golang.org> Auto-Submit: Roland Shoemaker <roland@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com>
Removes the skips from TestRenewFromCache and TestRenewFromCacheAlreadyRenewed, which were added due to flakes which may have been fixed by the renewal timer change. Updates golang/go#51080 Change-Id: Ib953a24e610e89dfbbea450a4c257c105055ce7e Reviewed-on: https://go-review.googlesource.com/c/crypto/+/433815 Run-TryBot: Roland Shoemaker <roland@golang.org> Auto-Submit: Roland Shoemaker <roland@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com>
TestRenewFromCache and TestRenewFromCacheAlreadyRenewed had several races and API misuses: 1. They called t.Fatalf from a goroutine other than the one invoking the Test function, which is explicitly disallowed (see https://pkg.go.dev/testing#T). 2. The test did not stop the renewal timers prior to restoring test-hook functions, and the process of stopping the renewal timers itself did not wait for in-flight calls to complete. That could cause data races if one of the renewals failed and triggered a retry with a short-enough randomized backoff. (One such race was observed in https://build.golang.org/log/1a19e22ad826bedeb5a939c6130f368f9979208a.) 3. The testDidRenewLoop hooks accessed the Manager.renewal field without locking the Mutex guarding that field. 4. TestGetCertificate_failedAttempt set a testDidRemoveState hook, but didn't wait for the timers referring to that hook to complete before restoring it, causing races with other timers. I tried pulling on that thread a bit, but couldn't untangle the numerous untracked goroutines in the package. Instead, I have made a smaller and more local change to copy the value of testDidRemoveState into a local variable in the timer's closure. Given the number of untracked goroutines in this package, it is likely that races and/or deadlocks remain. Notably, so far I have been unable to spot the actual cause of golang/go#51080. For golang/go#51080 Change-Id: I7797f6ac34ef3c272f16ca805251dac3aa7f0009 Reviewed-on: https://go-review.googlesource.com/c/crypto/+/384594 Trust: Bryan Mills <bcmills@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Filippo Valsorda <filippo@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
For golang/go#51080 Change-Id: Icf4414ab58bdea44b793a66770b4c05f9faf5387 Reviewed-on: https://go-review.googlesource.com/c/crypto/+/385675 Trust: Bryan Mills <bcmills@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Jeremy Faller <jeremy@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
Removes the skips from TestRenewFromCache and TestRenewFromCacheAlreadyRenewed, which were added due to flakes which may have been fixed by the renewal timer change. Updates golang/go#51080 Change-Id: Ib953a24e610e89dfbbea450a4c257c105055ce7e Reviewed-on: https://go-review.googlesource.com/c/crypto/+/433815 Run-TryBot: Roland Shoemaker <roland@golang.org> Auto-Submit: Roland Shoemaker <roland@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com>
greplogs --dashboard -md -l -e FAIL\\s+golang\\.org/x/crypto/acme/autocert --since=2021-01-01 --omit=.\*-n2d
2022-02-08T08:41:09-20e1d8d-c856fbf/windows-arm64-10
2022-02-07T20:21:30-30dcbda-49030c8/windows-arm64-11
2022-02-07T18:43:23-30dcbda-fbcc30a/windows-arm64-10
(attn @bradfitz @rolandshoemaker @FiloSottile per
dev.golang.org/owners
)The text was updated successfully, but these errors were encountered: