-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crypto/tls: panic in TransportTLS handshake in Windows crashes app with panic #21376
Comments
This looks like an exception being raised in the system function |
@bburket can you easily reproduce this? What versions of Windows do you run? Can you reproduce it on different Windows computer? Thank you. Alex |
Not easily reproducible - but seems to happen somewhat frequently (I've seen it happen anywhere between 10 minutes and 4 hours of execution). This is on Windows 10,. Microsoft Windows [Version 10.0.15063] One interesting thing to note is that our corporate environment uses some hardware-based main-in-the-middle type SSL interception for all traffic (Palo Alto Networks hardware). Would not be surprised if this is related to that - but its speculation at this point. However it would make sense that the lib should return some error up the stack, rather than panicking a background goroutine |
I was thinking about that possibility - that you might be running some non-standard software somewhere as part of your OS - as soon as I saw this report. Windows APIs don't raise exceptions.
Returning error messages is the norm. But you forget about possibility of a bug in the software you are running. Maybe you could ask your admin - perhaps there are some logs about what happened to you somewhere on the system. It is also worth asking for help from that lib developers, if it is at all possible. You also did not say if you could reproduce the problem on different computer - computer that is not affected by your "SSL interceptor". Alex |
@alexbrainman Go’s TLS package should not cause an access violation, ever, unless there is a data race. That error message means that Go passed garbage to a Windows API function, which should not occur. |
I've seen this issue raised somewhere in 2016 with no solution. The 3rd argument to the library function is a null pointer. If it is, it's not supposed to be since
|
I am not sure what are you trying to say here. And how is that helpful?
Again that could be one of many reasons, but it does not have to be the only one.
Good suggestion @as. But I don't see pPolicyPara is set to 0 on this stack trace. I think pPolicyPara is 0xc042b9f3d0 (3rd parameter) - pszPolicyOID is 0x4, pChainContext is 0x328e9d0 and pPolicyPara is 0xc042b9f3d0. Am I wrong, @as? Thank you. Alex |
@alexbrainman I think 0x4 may be the number of arguments to the call. The underlying function is syscall6 with four arguments. pszPolicyOID is Microsoft's hungarianized prefix notation (pointer to string zero terminated), so pointing to 0x4 sounds like an access violation. Here's was my train of thought on the whole thing, please let me know if I overlooked something.
|
There are two function calls listed on the stack trace.
0x4 is pszPolicyOID (we pass syscall.CERT_CHAIN_POLICY_SSL to this parameter and syscall.CERT_CHAIN_POLICY_SSL is indeed equal to 4);
0x7ffb1ef3ca40 is trap (it is pointer to CertVerifyCertificateChainPolicy); I still don't see any problem. Sorry. Alex |
@alexbrainman I've been running a few Windows machines trying to reproduce this issue in the background. After around 200 hours Windows 7, 8, 2008R2, 2012R2 and 10 can't hit it dialing out to an internal web server. |
I ship a golang binary to customers. One of my customers has received an identical panic:
Running on windows (bitness unknown), binary compiled with go1.8.3. |
Here is the full panic of my client application, including other goroutines (redacting only my internal pure-go no-unsafe application code). Some of them are in CGO syscalls, some in other net/http functions, some in other Windows IOCP routines. I don't know if this is information is material, but I researched to find the "Exception 0xc0000005" issue also reported in some other threads. For instance, #9356 was found to be related to the linker. My binary was compiled with x86_64-w64-mingw32-gcc 5.4.0, The client application had previously made a successful SSL connection to the same server during its same lifespan (but not necessarily with the same http.Client object). The server it is connecting to is also written in Go, and is using a Let's Encrypt certificate (if that tells you anything about the intermediaries, algorithm etc). I do not know if any SSL interception proxy exists in the middle. The stack trace i have is dated 2017-08-26. If it was caused by a third-party program on the machine (e.g. AV / firewall / web-safety program) then i suppose it might resolve itself if the third-party program is updated. |
@aclements is it possible that compiler is getting smarter about keeping variables on the stack? The syscall.CertVerifyCertificateChainPolicy call, I suspect, takes long time sometimes, because it needs to verify certificates and those could be stored on disk, and maybe even downloaded from another computer. Perhaps Go garbage collector moves stack of that gorouting while it waits in syscall.CertVerifyCertificateChainPolicy. The syscall.CertVerifyCertificateChainPolicy parameters 2, 3 and 4 are all pointers to Go memory. I run:
and I can see
Does that mean any of syscall.CertVerifyCertificateChainPolicy parameters are kept on stack? Alex |
More identical issue reports and stack traces: |
These eventually lead up to some Docker bug where the cause was confirmed as some kind of "WebCompanion" software running on the user's machine. |
It does. In fact, the However, I think this is okay. |
No Windows itself does not call into Go. Thank you for explaining. Alex |
I've run into this too (1.9.2 on Windows 10 amd64). Application was making a lot of HTTPS requests over the course of ~4-5 hours when the crash happened. Stack trace below in case it helps.
|
Experienced this again today on go1.9.2, same stack trace. There's a few more folks at git-lfs/git-lfs#1786 with the same problem.
No, I think that's different. That's the SetFileCompletionNotificationModes panic (a.k.a. #22149) which is fixed since go1.9.2, but, this problem is still occurring. (That also leads down a rabbithole with 0x20000 flags to identify your non-IFS LSPs/BSPs, but, I have seen this issue occur on a machine with no non-IFS LSPs/BSPs.) Is there any reason to think the problems are related? |
func checkChainSSLServerPolicy(c *Certificate, chainCtx *syscall.CertChainContext, opts *VerifyOptions) error {
servernamep, err := syscall.UTF16PtrFromString(opts.DNSName)
if err != nil {
return err
}
sslPara := &syscall.SSLExtraCertChainPolicyPara{
AuthType: syscall.AUTHTYPE_SERVER,
ServerName: servernamep,
}
sslPara.Size = uint32(unsafe.Sizeof(*sslPara))
para := &syscall.CertChainPolicyPara{
ExtraPolicyPara: uintptr(unsafe.Pointer(sslPara)),
}
para.Size = uint32(unsafe.Sizeof(*para))
status := syscall.CertChainPolicyStatus{}
err = syscall.CertVerifyCertificateChainPolicy(syscall.CERT_CHAIN_POLICY_SSL, chainCtx, para, &status) What if GC happened before |
@zhangyoufu, I think you're on to something. Above, I'd only considered the direct arguments to
@alexbrainman, @mkrautz, there seem to be several uintptr-typed fields in |
That would be a problem, but as far as I can tell it doesn't apply to any of the |
Change https://golang.org/cl/106275 mentions this issue: |
I don't see any in syscall and internal/syscall/windows/*
I hope I interpreted your idea correctly https://go-review.googlesource.com/#/c/go/+/106275
I am fine fixing syscall package. But we could also copy and adjust syscall code somewhere else - these would leave syscall users with broken code. Whatever we decide, we should also fix golang.org/x/sys/windows package.
That is what I see too. Alex |
Is this point-release material? From a glance, it seems any Windows Go program using TLS is vulnerable to rare but unavoidable crashes. |
@eliasnaur, yes, we should fix this in a point release. Thanks for the reminder. We can't port CL 106275 to a point release because it breaks a public API, but I can think of a few ways to fix this without breaking the (admittedly broken) API:
I'd be inclined to go with solution 1. Solution 3 is sort of intriguing in general. While we're breaking things, we could even move all of the crypto APIs from |
Option 4: fix the broken API, but use compiler magic to hide the change
from user programs.
…On Wed, Apr 18, 2018, 12:23 PM Austin Clements ***@***.***> wrote:
@eliasnaur <https://github.com/eliasnaur>, yes, we should fix this in a
point release. Thanks for the reminder.
We can't port CL 106275 to a point release because it breaks a public API,
but I can think of a few ways to fix this without breaking the (admittedly
broken) API:
1. Copy the definition of syscall.CertChainPolicyPara into crypto/x509
with the corrected field type, use that in checkChainSSLServerPolicy,
and unsafe cast it for the call to
syscall.CertVerifyCertificateChainPolicy.
2. Force the syscall.CertChainPolicyPara to the heap and
runtime.KeepAlive it across the syscall.
3. Copy the corrected APIs into internal/syscall/windows and modify
crypto/x509 to use that.
I'd be inclined to go with solution 1.
Solution 3 is sort of intriguing in general. While we're breaking things,
we could even *move* all of the crypto APIs from syscall to
internal/syscall/windows and tell people to use x/sys/windows instead
(which we also need to fix).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#21376 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGGWB6YylVDOnjy0W9WafUr7VK0_G1MOks5tp2hegaJpZM4OyXbU>
.
|
@gopherbot please file this for backport. See @aclements's comment #21376 (comment) about having to develop a different fix from CL 106275 in order not to break a public API. |
Backport issue(s) opened: #25033 (for 1.10), #25034 (for 1.9). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
Change https://golang.org/cl/111715 mentions this issue: |
…tChainPolicyPara As discussed in issue #21376, it is unsafe to have syscall.CertChainPolicyPara.ExtraPolicyPara uintptr - it has to be a pointer type. So copy syscall.CertChainPolicyPara into crypto/tls package, make ExtraPolicyPara unsafe.Pointer, and use new struct instead of syscall.CertChainPolicyPara. Fixes #25033 Change-Id: If914af056cbbb0c4d93ffaa915b3d2cb5ecad0cd Reviewed-on: https://go-review.googlesource.com/111715 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com>
Change https://golang.org/cl/112095 mentions this issue: |
Change https://golang.org/cl/112179 mentions this issue: |
…tChainPolicyPara As discussed in issue #21376, it is unsafe to have syscall.CertChainPolicyPara.ExtraPolicyPara uintptr - it has to be a pointer type. So copy syscall.CertChainPolicyPara into crypto/tls package, make ExtraPolicyPara unsafe.Pointer, and use new struct instead of syscall.CertChainPolicyPara. Fixes #25034 Change-Id: If914af056cbbb0c4d93ffaa915b3d2cb5ecad0cd Reviewed-on: https://go-review.googlesource.com/111715 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> Reviewed-on: https://go-review.googlesource.com/112179 Reviewed-by: Filippo Valsorda <filippo@golang.org> Run-TryBot: Filippo Valsorda <filippo@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
@aclements unfortunately yes. As time permits, can you please take a look at google/certificate-transparency-go#284 issue (and the proposed fix at google/certificate-transparency-go#285) and let us know if this could maybe solved in some better way? |
go version go1.8.1 windows/amd64
C:\Users\someuser>go env
set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\users\someuser\code\go
set GORACE=
set GOROOT=C:\Go
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0
set CXX=g++
set CGO_ENABLED=1
set PKG_CONFIG=pkg-config
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
I am running an application that makes multiple HTTPS calls to different endpoints (it does not listen to any incoming connections). At some point (after 309 minutes in this case) the application will eventually panic . It appears that this occurs because of this chunk of code in transport.go:
Since this is a go routine I do not have the opportunity to recover from I'm not sure what I can do about this. tlsConn.Handshake() call is what eventually raises the panic.
Here is the panic text (note that this is only the first panic. There were some 200 go routines running so I have not provided the dump of them all):
Exception 0xc0000005 0x0 0x609 0x7ffb1ef5c61f
PC=0x7ffb1ef5c61f
syscall.Syscall6(0x7ffb1ef3ca40, 0x4, 0x4, 0x328e9d0, 0xc042b9f3d0, 0xc042b9f3e0, 0x0, 0x0, 0xc04257cc80, 0x26c8f20, ...)
C:/Go/src/runtime/syscall_windows.go:174 +0x6b
syscall.CertVerifyCertificateChainPolicy(0x4, 0x328e9d0, 0xc042b9f3d0, 0xc042b9f3e0, 0x0, 0xc042e633d8)
C:/Go/src/syscall/zsyscall_windows.go:1208 +0xc1
crypto/x509.checkChainSSLServerPolicy(0xc04211fb00, 0x328e9d0, 0xc042b9f808, 0x34d5110, 0xc042e63548)
C:/Go/src/crypto/x509/root_windows.go:117 +0xfd
crypto/x509.(*Certificate).systemVerify(0xc04211fb00, 0xc042e63808, 0x0, 0x0, 0x0, 0x0, 0x0)
C:/Go/src/crypto/x509/root_windows.go:212 +0x484
crypto/x509.(*Certificate).Verify(0xc04211fb00, 0xc0421dc4c0, 0x1b, 0xc042345350, 0x0, 0xed11c8f28, 0x2622224, 0x955620, 0x0, 0x0, ...)
C:/Go/src/crypto/x509/verify.go:279 +0x86c
crypto/tls.(*clientHandshakeState).doFullHandshake(0xc042b9fe50, 0xc0424aa700, 0x66)
C:/Go/src/crypto/tls/handshake_client.go:300 +0x4c0
crypto/tls.(*Conn).clientHandshake(0xc0424ae380, 0x7b2d40, 0xc0424ae4a0)
C:/Go/src/crypto/tls/handshake_client.go:228 +0xf97
crypto/tls.(*Conn).Handshake(0xc0424ae380, 0x0, 0x0)
C:/Go/src/crypto/tls/conn.go:1307 +0x1aa
net/http.(*Transport).dialConn.func3(0x0, 0xc0424ae380, 0xc042feecc0, 0xc042f8d260)
C:/Go/src/net/http/transport.go:1082 +0x49
created by net/http.(*Transport).dialConn
C:/Go/src/net/http/transport.go:1087 +0xff3
Edit: I should also mention that I was running at ~200 calls / second at the time of failure
The text was updated successfully, but these errors were encountered: