Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/crypto/acme/autocert: renewal test failures due to connection errors on windows/arm64 #51080

Open
bcmills opened this issue Feb 8, 2022 · 5 comments
Labels
arch-arm64 NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@bcmills
Copy link
Member

bcmills commented Feb 8, 2022

greplogs --dashboard -md -l -e FAIL\\s+golang\\.org/x/crypto/acme/autocert --since=2021-01-01 --omit=.\*-n2d

2022-02-08T08:41:09-20e1d8d-c856fbf/windows-arm64-10

panic: close of closed channel

goroutine 476 [running]:
runtime.Goexit()
	C:/workdir/go/src/runtime/panic.go:482 +0x190
testing.(*common).FailNow(0x4000234820)
	C:/workdir/go/src/testing/testing.go:864 +0x48
testing.(*common).Fatalf(0x4000234820, {0x7ff6a835e0c5?, 0x0?}, {0x400033be80?, 0xeda0b3426?, 0x0?})
	C:/workdir/go/src/testing/testing.go:948 +0x64
golang.org/x/crypto/acme/autocert.TestRenewFromCacheAlreadyRenewed.func2(0x2f3c74aa43a?, {0x7ff6a83f1b28?, 0x40001e1740?})
	C:/workdir/gopath/src/golang.org/x/crypto/acme/autocert/renewal_test.go:208 +0x634
golang.org/x/crypto/acme/autocert.(*domainRenewal).renew(0x400011a680)
	C:/workdir/gopath/src/golang.org/x/crypto/acme/autocert/renewal.go:71 +0x1e0
created by time.goFunc
	C:/workdir/go/src/time/sleep.go:176 +0x3c
FAIL	golang.org/x/crypto/acme/autocert	7.099s

2022-02-07T20:21:30-30dcbda-49030c8/windows-arm64-11

--- FAIL: TestRenewFromCache (2.35s)
    ca.go:253: GET /
    ca.go:253: POST /new-reg
    ca.go:253: POST /new-order
    ca.go:253: POST /authz/0
    ca.go:253: POST /challenge/tls-alpn-01/0
    ca.go:569: validated "tls-alpn-01" for "example.org", err: <nil>
    ca.go:570: authz 0 is now valid
    ca.go:599: order 0 is now ready
    ca.go:253: POST /authz/0
    ca.go:253: POST /orders/0
    ca.go:253: POST /new-cert/0
    renewal_test.go:70: testDidRenewLoop: Get "http://127.0.0.1:50039": dial tcp 127.0.0.1:50039: connectex: No connection could be made because the target machine actively refused it.
    renewal_test.go:77: testDidRenewLoop: next = 36m2.864378606s; want >= 2112h0m0s
    ca.go:253: HEAD /new-nonce
    renewal_test.go:87: cache leaf.NotAfter = 2022-02-08 04:25:30 +0000 UTC; want > 2022-05-06 21:24:30.835158 -0700 PDT m=+7603206.680959301
    renewal_test.go:102: state leaf.NotAfter = 2022-02-08 04:25:30 +0000 UTC; want > 2022-05-06 21:24:30.835158 -0700 PDT m=+7603206.680959301
FAIL
FAIL	golang.org/x/crypto/acme/autocert	9.467s

2022-02-07T18:43:23-30dcbda-fbcc30a/windows-arm64-10

--- FAIL: TestRenewFromCache (2.47s)
    ca.go:253: GET /
    renewal_test.go:70: testDidRenewLoop: Get "http://127.0.0.1:49913": dial tcp 127.0.0.1:49913: connectex: No connection could be made because the target machine actively refused it.
    renewal_test.go:77: testDidRenewLoop: next = 48m56.057939107s; want >= 2112h0m0s
    ca.go:253: POST /new-reg
    ca.go:253: POST /new-order
    ca.go:253: POST /authz/0
    renewal_test.go:87: cache leaf.NotAfter = 2022-02-08 02:12:57 +0000 UTC; want > 2022-05-06 19:11:57.5448975 -0700 PDT m=+7603206.549318601
    renewal_test.go:102: state leaf.NotAfter = 2022-02-08 02:12:57 +0000 UTC; want > 2022-05-06 19:11:57.5448975 -0700 PDT m=+7603206.549318601
    ca.go:253: POST /challenge/tls-alpn-01/0
    ca.go:569: validated "tls-alpn-01" for "example.org", err: <nil>
    ca.go:570: authz 0 is now valid
    ca.go:599: order 0 is now ready
FAIL
FAIL	golang.org/x/crypto/acme/autocert	9.367s

(attn @bradfitz @rolandshoemaker @FiloSottile per dev.golang.org/owners)

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 8, 2022
@gopherbot gopherbot added this to the Unreleased milestone Feb 8, 2022
@bcmills
Copy link
Member Author

bcmills commented Feb 9, 2022

Given the timing of these failures, I suspect that the connectex: No connection could be made failure mode may stem from CL 381715.

This is a release-blocker via #11811. (The tests can be fixed, or can have skips added. Given the recency of that CL, we might consider reverting it if that is reasonably straightforward.)

@bcmills bcmills modified the milestones: Unreleased, Go1.18 Feb 9, 2022
@gopherbot
Copy link

Change https://go.dev/cl/384594 mentions this issue: acme/autocert: fix races in renewal tests

gopherbot pushed a commit to golang/crypto that referenced this issue Feb 10, 2022
TestRenewFromCache and TestRenewFromCacheAlreadyRenewed had several
races and API misuses:

1. They called t.Fatalf from a goroutine other than the one invoking
   the Test function, which is explicitly disallowed (see
   https://pkg.go.dev/testing#T).

2. The test did not stop the renewal timers prior to restoring
   test-hook functions, and the process of stopping the renewal timers
   itself did not wait for in-flight calls to complete. That could
   cause data races if one of the renewals failed and triggered a
   retry with a short-enough randomized backoff.
   (One such race was observed in
   https://build.golang.org/log/1a19e22ad826bedeb5a939c6130f368f9979208a.)

3. The testDidRenewLoop hooks accessed the Manager.renewal field
   without locking the Mutex guarding that field.

4. TestGetCertificate_failedAttempt set a testDidRemoveState hook, but
   didn't wait for the timers referring to that hook to complete
   before restoring it, causing races with other timers. I tried
   pulling on that thread a bit, but couldn't untangle the numerous
   untracked goroutines in the package. Instead, I have made a smaller
   and more local change to copy the value of testDidRemoveState into
   a local variable in the timer's closure.

Given the number of untracked goroutines in this package, it is likely
that races and/or deadlocks remain. Notably, so far I have been unable
to spot the actual cause of golang/go#51080.

For golang/go#51080

Change-Id: I7797f6ac34ef3c272f16ca805251dac3aa7f0009
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/384594
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
@bcmills
Copy link
Member Author

bcmills commented Feb 11, 2022

@gopherbot
Copy link

Change https://go.dev/cl/385675 mentions this issue: acme/autocert: skip renewal tests broken on windows/arm64

gopherbot pushed a commit to golang/crypto that referenced this issue Feb 14, 2022
For golang/go#51080

Change-Id: Icf4414ab58bdea44b793a66770b4c05f9faf5387
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/385675
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
@bcmills bcmills modified the milestones: Go1.18, Backlog Feb 14, 2022
owenthereal pushed a commit to owenthereal/upterm.crypto that referenced this issue Mar 5, 2022
TestRenewFromCache and TestRenewFromCacheAlreadyRenewed had several
races and API misuses:

1. They called t.Fatalf from a goroutine other than the one invoking
   the Test function, which is explicitly disallowed (see
   https://pkg.go.dev/testing#T).

2. The test did not stop the renewal timers prior to restoring
   test-hook functions, and the process of stopping the renewal timers
   itself did not wait for in-flight calls to complete. That could
   cause data races if one of the renewals failed and triggered a
   retry with a short-enough randomized backoff.
   (One such race was observed in
   https://build.golang.org/log/1a19e22ad826bedeb5a939c6130f368f9979208a.)

3. The testDidRenewLoop hooks accessed the Manager.renewal field
   without locking the Mutex guarding that field.

4. TestGetCertificate_failedAttempt set a testDidRemoveState hook, but
   didn't wait for the timers referring to that hook to complete
   before restoring it, causing races with other timers. I tried
   pulling on that thread a bit, but couldn't untangle the numerous
   untracked goroutines in the package. Instead, I have made a smaller
   and more local change to copy the value of testDidRemoveState into
   a local variable in the timer's closure.

Given the number of untracked goroutines in this package, it is likely
that races and/or deadlocks remain. Notably, so far I have been unable
to spot the actual cause of golang/go#51080.

For golang/go#51080

Change-Id: I7797f6ac34ef3c272f16ca805251dac3aa7f0009
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/384594
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
owenthereal pushed a commit to owenthereal/upterm.crypto that referenced this issue Mar 5, 2022
For golang/go#51080

Change-Id: Icf4414ab58bdea44b793a66770b4c05f9faf5387
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/385675
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
@gopherbot
Copy link

Change https://go.dev/cl/433815 mentions this issue: acme/autocert: remove TestRenewFromCache skips

gopherbot pushed a commit to golang/crypto that referenced this issue Sep 26, 2022
Removes the skips from TestRenewFromCache and
TestRenewFromCacheAlreadyRenewed, which were added due to flakes which
may have been fixed by the renewal timer change.

Updates golang/go#51080

Change-Id: Ib953a24e610e89dfbbea450a4c257c105055ce7e
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/433815
Run-TryBot: Roland Shoemaker <roland@golang.org>
Auto-Submit: Roland Shoemaker <roland@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
maisem pushed a commit to tailscale/golang-x-crypto that referenced this issue Nov 15, 2022
Removes the skips from TestRenewFromCache and
TestRenewFromCacheAlreadyRenewed, which were added due to flakes which
may have been fixed by the renewal timer change.

Updates golang/go#51080

Change-Id: Ib953a24e610e89dfbbea450a4c257c105055ce7e
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/433815
Run-TryBot: Roland Shoemaker <roland@golang.org>
Auto-Submit: Roland Shoemaker <roland@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
maisem pushed a commit to tailscale/golang-x-crypto that referenced this issue Nov 15, 2022
Removes the skips from TestRenewFromCache and
TestRenewFromCacheAlreadyRenewed, which were added due to flakes which
may have been fixed by the renewal timer change.

Updates golang/go#51080

Change-Id: Ib953a24e610e89dfbbea450a4c257c105055ce7e
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/433815
Run-TryBot: Roland Shoemaker <roland@golang.org>
Auto-Submit: Roland Shoemaker <roland@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
LewiGoddard pushed a commit to LewiGoddard/crypto that referenced this issue Feb 16, 2023
TestRenewFromCache and TestRenewFromCacheAlreadyRenewed had several
races and API misuses:

1. They called t.Fatalf from a goroutine other than the one invoking
   the Test function, which is explicitly disallowed (see
   https://pkg.go.dev/testing#T).

2. The test did not stop the renewal timers prior to restoring
   test-hook functions, and the process of stopping the renewal timers
   itself did not wait for in-flight calls to complete. That could
   cause data races if one of the renewals failed and triggered a
   retry with a short-enough randomized backoff.
   (One such race was observed in
   https://build.golang.org/log/1a19e22ad826bedeb5a939c6130f368f9979208a.)

3. The testDidRenewLoop hooks accessed the Manager.renewal field
   without locking the Mutex guarding that field.

4. TestGetCertificate_failedAttempt set a testDidRemoveState hook, but
   didn't wait for the timers referring to that hook to complete
   before restoring it, causing races with other timers. I tried
   pulling on that thread a bit, but couldn't untangle the numerous
   untracked goroutines in the package. Instead, I have made a smaller
   and more local change to copy the value of testDidRemoveState into
   a local variable in the timer's closure.

Given the number of untracked goroutines in this package, it is likely
that races and/or deadlocks remain. Notably, so far I have been unable
to spot the actual cause of golang/go#51080.

For golang/go#51080

Change-Id: I7797f6ac34ef3c272f16ca805251dac3aa7f0009
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/384594
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
LewiGoddard pushed a commit to LewiGoddard/crypto that referenced this issue Feb 16, 2023
For golang/go#51080

Change-Id: Icf4414ab58bdea44b793a66770b4c05f9faf5387
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/385675
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
LewiGoddard added a commit to LewiGoddard/crypto that referenced this issue Feb 16, 2023
Removes the skips from TestRenewFromCache and
TestRenewFromCacheAlreadyRenewed, which were added due to flakes which
may have been fixed by the renewal timer change.

Updates golang/go#51080

Change-Id: Ib953a24e610e89dfbbea450a4c257c105055ce7e
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/433815
Run-TryBot: Roland Shoemaker <roland@golang.org>
Auto-Submit: Roland Shoemaker <roland@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
None yet
Development

No branches or pull requests

2 participants