Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/tls: panic in TransportTLS handshake in Windows crashes app with panic #21376

Closed
bburket opened this issue Aug 9, 2017 · 39 comments
Closed

Comments

@bburket
Copy link

@bburket bburket commented Aug 9, 2017

go version go1.8.1 windows/amd64

C:\Users\someuser>go env
set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\users\someuser\code\go
set GORACE=
set GOROOT=C:\Go
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0
set CXX=g++
set CGO_ENABLED=1
set PKG_CONFIG=pkg-config
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2

I am running an application that makes multiple HTTPS calls to different endpoints (it does not listen to any incoming connections). At some point (after 309 minutes in this case) the application will eventually panic . It appears that this occurs because of this chunk of code in transport.go:

	go func() {
		if trace != nil && trace.TLSHandshakeStart != nil {
			trace.TLSHandshakeStart()
		}
		err := tlsConn.Handshake()
		if timer != nil {
			timer.Stop()
		}
		errc <- err
	}()

Since this is a go routine I do not have the opportunity to recover from I'm not sure what I can do about this. tlsConn.Handshake() call is what eventually raises the panic.

Here is the panic text (note that this is only the first panic. There were some 200 go routines running so I have not provided the dump of them all):

Exception 0xc0000005 0x0 0x609 0x7ffb1ef5c61f
PC=0x7ffb1ef5c61f

syscall.Syscall6(0x7ffb1ef3ca40, 0x4, 0x4, 0x328e9d0, 0xc042b9f3d0, 0xc042b9f3e0, 0x0, 0x0, 0xc04257cc80, 0x26c8f20, ...)
C:/Go/src/runtime/syscall_windows.go:174 +0x6b
syscall.CertVerifyCertificateChainPolicy(0x4, 0x328e9d0, 0xc042b9f3d0, 0xc042b9f3e0, 0x0, 0xc042e633d8)
C:/Go/src/syscall/zsyscall_windows.go:1208 +0xc1
crypto/x509.checkChainSSLServerPolicy(0xc04211fb00, 0x328e9d0, 0xc042b9f808, 0x34d5110, 0xc042e63548)
C:/Go/src/crypto/x509/root_windows.go:117 +0xfd
crypto/x509.(*Certificate).systemVerify(0xc04211fb00, 0xc042e63808, 0x0, 0x0, 0x0, 0x0, 0x0)
C:/Go/src/crypto/x509/root_windows.go:212 +0x484
crypto/x509.(*Certificate).Verify(0xc04211fb00, 0xc0421dc4c0, 0x1b, 0xc042345350, 0x0, 0xed11c8f28, 0x2622224, 0x955620, 0x0, 0x0, ...)
C:/Go/src/crypto/x509/verify.go:279 +0x86c
crypto/tls.(*clientHandshakeState).doFullHandshake(0xc042b9fe50, 0xc0424aa700, 0x66)
C:/Go/src/crypto/tls/handshake_client.go:300 +0x4c0
crypto/tls.(*Conn).clientHandshake(0xc0424ae380, 0x7b2d40, 0xc0424ae4a0)
C:/Go/src/crypto/tls/handshake_client.go:228 +0xf97
crypto/tls.(*Conn).Handshake(0xc0424ae380, 0x0, 0x0)
C:/Go/src/crypto/tls/conn.go:1307 +0x1aa
net/http.(*Transport).dialConn.func3(0x0, 0xc0424ae380, 0xc042feecc0, 0xc042f8d260)
C:/Go/src/net/http/transport.go:1082 +0x49
created by net/http.(*Transport).dialConn
C:/Go/src/net/http/transport.go:1087 +0xff3

Edit: I should also mention that I was running at ~200 calls / second at the time of failure

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 9, 2017

This looks like an exception being raised in the system function CertVerifyCertificateChainPolicy, not in Go code.

@ianlancetaylor ianlancetaylor changed the title Panic in TransportTLS handshake in Windows crashes app with panic crypto/tls: panic in TransportTLS handshake in Windows crashes app with panic Aug 9, 2017
@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Aug 9, 2017
@alexbrainman

This comment has been minimized.

Copy link
Member

@alexbrainman alexbrainman commented Aug 11, 2017

@bburket can you easily reproduce this? What versions of Windows do you run? Can you reproduce it on different Windows computer?

Thank you.

Alex

@bburket

This comment has been minimized.

Copy link
Author

@bburket bburket commented Aug 11, 2017

Not easily reproducible - but seems to happen somewhat frequently (I've seen it happen anywhere between 10 minutes and 4 hours of execution).

This is on Windows 10,. Microsoft Windows [Version 10.0.15063]

One interesting thing to note is that our corporate environment uses some hardware-based main-in-the-middle type SSL interception for all traffic (Palo Alto Networks hardware). Would not be surprised if this is related to that - but its speculation at this point.

However it would make sense that the lib should return some error up the stack, rather than panicking a background goroutine

@alexbrainman

This comment has been minimized.

Copy link
Member

@alexbrainman alexbrainman commented Aug 12, 2017

One interesting thing to note is that our corporate environment uses some hardware-based main-in-the-middle type SSL interception for all traffic (Palo Alto Networks hardware). Would not be surprised if this is related to that - but its speculation at this point.

I was thinking about that possibility - that you might be running some non-standard software somewhere as part of your OS - as soon as I saw this report. Windows APIs don't raise exceptions.

However it would make sense that the lib should return some error up the stack, rather than panicking a background goroutine

Returning error messages is the norm. But you forget about possibility of a bug in the software you are running. Maybe you could ask your admin - perhaps there are some logs about what happened to you somewhere on the system. It is also worth asking for help from that lib developers, if it is at all possible.

You also did not say if you could reproduce the problem on different computer - computer that is not affected by your "SSL interceptor".

Alex

@DemiMarie

This comment has been minimized.

Copy link

@DemiMarie DemiMarie commented Aug 12, 2017

@alexbrainman Go’s TLS package should not cause an access violation, ever, unless there is a data race. That error message means that Go passed garbage to a Windows API function, which should not occur.

@as

This comment has been minimized.

Copy link
Contributor

@as as commented Aug 13, 2017

I've seen this issue raised somewhere in 2016 with no solution. The 3rd argument to the library function is a null pointer. If it is, it's not supposed to be since pPolicyPara is not optional.

syscall.CertVerifyCertificateChainPolicy(0x4, 0x328e9d0, 0xc042b9f3d0, 0xc042b9f3e0, 0x0, 0xc042e633d8)

BOOL WINAPI CertVerifyCertificateChainPolicy(
  _In_    LPCSTR                    pszPolicyOID,
  _In_    PCCERT_CHAIN_CONTEXT      pChainContext,
  _In_    PCERT_CHAIN_POLICY_PARA   pPolicyPara,
  _Inout_ PCERT_CHAIN_POLICY_STATUS pPolicyStatus
);
@alexbrainman

This comment has been minimized.

Copy link
Member

@alexbrainman alexbrainman commented Aug 21, 2017

@alexbrainman Go’s TLS package should not cause an access violation, ever, unless there is a data race.

I am not sure what are you trying to say here. And how is that helpful?
Yes, data race could be the reason for this crash. But there might be other reasons. For example, it could be buggy software installed on @bburket computer.

That error message means that Go passed garbage to a Windows API function, which should not occur.

Again that could be one of many reasons, but it does not have to be the only one.

I've seen this issue raised somewhere in 2016 with no solution. The 3rd argument to the library function is a null pointer. If it is, it's not supposed to be since pPolicyPara is not optional.

syscall.CertVerifyCertificateChainPolicy(0x4, 0x328e9d0, 0xc042b9f3d0, 0xc042b9f3e0, 0x0, 0xc042e633d8)

Good suggestion @as. But I don't see pPolicyPara is set to 0 on this stack trace. I think pPolicyPara is 0xc042b9f3d0 (3rd parameter) - pszPolicyOID is 0x4, pChainContext is 0x328e9d0 and pPolicyPara is 0xc042b9f3d0. Am I wrong, @as?

Thank you.

Alex

@as

This comment has been minimized.

Copy link
Contributor

@as as commented Aug 23, 2017

@alexbrainman I think 0x4 may be the number of arguments to the call. The underlying function is syscall6 with four arguments. pszPolicyOID is Microsoft's hungarianized prefix notation (pointer to string zero terminated), so pointing to 0x4 sounds like an access violation. Here's was my train of thought on the whole thing, please let me know if I overlooked something.

//
// Call site
//

err = syscall.CertVerifyCertificateChainPolicy(syscall.CERT_CHAIN_POLICY_SSL, chainCtx, para, &status)

//
// Win32 API
//

BOOL WINAPI CertVerifyCertificateChainPolicy(
  _In_    LPCSTR                    pszPolicyOID,
  _In_    PCCERT_CHAIN_CONTEXT      pChainContext,
  _In_    PCERT_CHAIN_POLICY_PARA   pPolicyPara,
  _Inout_ PCERT_CHAIN_POLICY_STATUS pPolicyStatus
);

//
// Trace
//

syscall.CertVerifyCertificateChainPolicy(
0x4,	      // Number of arguments
0x328e9d0,	  // Address of proc
0xc042b9f3d0, // Arg1 (pszPolicyOID)
0xc042b9f3e0, // Arg2 (pChainContext)
0x0,          // Arg3 (pPolicyPara)
0xc042e633d8  // Arg4 (pPolicyStatus)
)


//
// Mksyscall
//

C:\go\src\syscall\syscall_windows.go:/Policy/
//sys  CertVerifyCertificateChainPolicy(policyOID uintptr, chain *CertChainContext,para *CertChainPolicyPara, status *CertChainPolicyStatus) (err error) = crypt32.CertVerifyCertificateChainPolicy

//
// Generated syscall6
//

func CertVerifyCertificateChainPolicy(policyOID uintptr, chain *CertChainContext, para *CertChainPolicyPara, status *CertChainPolicyStatus) (err error) {
	r1, _, e1 := Syscall6(procCertVerifyCertificateChainPolicy.Addr(), 4, uintptr(policyOID), uintptr(unsafe.Pointer(chain)), uintptr(unsafe.Pointer(para)), uintptr(unsafe.Pointer(status)), 0, 0)
	if r1 == 0 {
		if e1 != 0 {
			err = errnoErr(e1)
		} else {
			err = EINVAL
		}
	}
	return
}
@alexbrainman

This comment has been minimized.

Copy link
Member

@alexbrainman alexbrainman commented Aug 23, 2017

@as

I think 0x4 may be the number of arguments to the call. The underlying function is syscall6 with four arguments. pszPolicyOID is Microsoft's hungarianized prefix notation (pointer to string zero terminated), so pointing to 0x4 sounds like an access violation.

There are two function calls listed on the stack trace.

syscall.CertVerifyCertificateChainPolicy(0x4, 0x328e9d0, 0xc042b9f3d0, 0xc042b9f3e0, 0x0, 0xc042e633d8)
C:/Go/src/syscall/zsyscall_windows.go:1208 +0xc1

0x4 is pszPolicyOID (we pass syscall.CERT_CHAIN_POLICY_SSL to this parameter and syscall.CERT_CHAIN_POLICY_SSL is indeed equal to 4);
0x328e9d0 is pChainContext (looks fine, it is just a pointer);
0xc042b9f3d0 is pPolicyPara (is another pointer);
0xc042b9f3e0 is pPolicyStatus (one more pointer).

syscall.Syscall6(0x7ffb1ef3ca40, 0x4, 0x4, 0x328e9d0, 0xc042b9f3d0, 0xc042b9f3e0, 0x0, 0x0, 0xc04257cc80, 0x26c8f20, ...)
C:/Go/src/runtime/syscall_windows.go:174 +0x6b

0x7ffb1ef3ca40 is trap (it is pointer to CertVerifyCertificateChainPolicy);
0x4 is nargs (it is number of CertVerifyCertificateChainPolicy parameters - 4 looks correct to me);
0x4 is first parameter of call into CertVerifyCertificateChainPolicy (looks same as above);
0x328e9d0, 0xc042b9f3d0, 0xc042b9f3e0 are parameters 2, 3 and 4 into CertVerifyCertificateChainPolicy (similar to first parameter all look OK to me).

I still don't see any problem. Sorry.

Alex

@as

This comment has been minimized.

Copy link
Contributor

@as as commented Aug 23, 2017

@alexbrainman
No worries, I was clearly looking at the wrong line the entire time. My bad entirely!

I've been running a few Windows machines trying to reproduce this issue in the background. After around 200 hours Windows 7, 8, 2008R2, 2012R2 and 10 can't hit it dialing out to an internal web server.

@mappu

This comment has been minimized.

Copy link

@mappu mappu commented Sep 5, 2017

I ship a golang binary to customers. One of my customers has received an identical panic:

Exception 0xc0000005 0x0 0xffffffffffffffff 0x7ff8f434c61f
PC=0x7ff8f434c61f
signal arrived during external code execution
 
syscall.Syscall6(0x7ff8f432ca40, 0x4, 0x4, 0x12f9ca0,   0xc0438473d0, 0xc0438473e0, 0x0, 0x0, 0xc043569ba0, 0x300000002, ...)
C:/go/src/runtime/syscall_windows.go:174 +0x6b
syscall.CertVerifyCertificateChainPolicy(0x4, 0x12f9ca0,   0xc0438473d0, 0xc0438473e0, 0x0, 0xc0427653d8)
C:/go/src/syscall/zsyscall_windows.go:1208 +0xc1
crypto/x509.checkChainSSLServerPolicy(0xc04274f680,   0x12f9ca0, 0xc043847808, 0x12a6bc0, 0xc042765548)
C:/go/src/crypto/x509/root_windows.go:117 +0xfd
crypto/x509.(*Certificate).systemVerify(0xc04274f680,   0xc042765808, 0x0, 0x0, 0x0, 0x0, 0x0)
C:/go/src/crypto/x509/root_windows.go:212 +0x484
crypto/x509.(*Certificate).Verify(0xc04274f680,   0xc043a1f650, 0x1e, 0xc0428df380, 0x0, 0xed133cf03, 0x2a606060, 0xee1100,   0x0, 0x0, ...)
C:/go/src/crypto/x509/verify.go:279 +0x86c
crypto/tls.(*clientHandshakeState).doFullHandshake(0xc043847e50,   0xc042009e80, 0x31)
C:/go/src/crypto/tls/handshake_client.go:300 +0x4c0
crypto/tls.(*Conn).clientHandshake(0xc042525180, 0xb755f8,   0xc0425252a0)
C:/go/src/crypto/tls/handshake_client.go:228 +0xf97
crypto/tls.(*Conn).Handshake(0xc042525180, 0x0, 0x0)
C:/go/src/crypto/tls/conn.go:1307 +0x1aa
net/http.(*Transport).dialConn.func3(0x0, 0xc042525180,   0xc042009b00, 0xc0437bdf80)
C:/go/src/net/http/transport.go:1082 +0x49
created by net/http.(*Transport).dialConn
C:/go/src/net/http/transport.go:1087 +0xff3

Running on windows (bitness unknown), binary compiled with go1.8.3.

@mappu

This comment has been minimized.

Copy link

@mappu mappu commented Sep 6, 2017

Here is the full panic of my client application, including other goroutines (redacting only my internal pure-go no-unsafe application code). Some of them are in CGO syscalls, some in other net/http functions, some in other Windows IOCP routines.

I don't know if this is information is material, but I researched to find the "Exception 0xc0000005" issue also reported in some other threads. For instance, #9356 was found to be related to the linker. My binary was compiled with x86_64-w64-mingw32-gcc 5.4.0, CGO_ENABLED=1, and -ldflags "-s -w". It was not compiled with -linkmode external but i understand that is implicit for cgo on windows.

The client application had previously made a successful SSL connection to the same server during its same lifespan (but not necessarily with the same http.Client object).

The server it is connecting to is also written in Go, and is using a Let's Encrypt certificate (if that tells you anything about the intermediaries, algorithm etc). I do not know if any SSL interception proxy exists in the middle.

The stack trace i have is dated 2017-08-26. If it was caused by a third-party program on the machine (e.g. AV / firewall / web-safety program) then i suppose it might resolve itself if the third-party program is updated.

@alexbrainman

This comment has been minimized.

Copy link
Member

@alexbrainman alexbrainman commented Sep 6, 2017

@aclements is it possible that compiler is getting smarter about keeping variables on the stack?

The syscall.CertVerifyCertificateChainPolicy call, I suspect, takes long time sometimes, because it needs to verify certificates and those could be stored on disk, and maybe even downloaded from another computer. Perhaps Go garbage collector moves stack of that gorouting while it waits in syscall.CertVerifyCertificateChainPolicy. The syscall.CertVerifyCertificateChainPolicy parameters 2, 3 and 4 are all pointers to Go memory. I run:

GOOS=windows go build -v -o /dev/null -gcflags '-m' crypto/x509

and I can see

./root_windows.go:100:105: checkChainSSLServerPolicy chainCtx does not escape
./root_windows.go:107:13: checkChainSSLServerPolicy &syscall.SSLExtraCertChainPolicyPara literal does not escape
./root_windows.go:112:18: checkChainSSLServerPolicy &syscall.CertChainPolicyPara literal does not escape
./root_windows.go:117:96: checkChainSSLServerPolicy &status does not escape

Does that mean any of syscall.CertVerifyCertificateChainPolicy parameters are kept on stack?

Alex

@mappu

This comment has been minimized.

Copy link

@mappu mappu commented Sep 8, 2017

More identical issue reports and stack traces:

@slrz

This comment has been minimized.

Copy link

@slrz slrz commented Sep 8, 2017

These eventually lead up to some Docker bug where the cause was confirmed as some kind of "WebCompanion" software running on the user's machine.

moby/moby#27423 (comment)

@bradfitz bradfitz modified the milestones: Go1.10, Unplanned Nov 16, 2017
@aclements

This comment has been minimized.

Copy link
Member

@aclements aclements commented Jan 31, 2018

Does that mean any of syscall.CertVerifyCertificateChainPolicy parameters are kept on stack?

It does. In fact, the para structure is allocated right in checkChainSSLServerPolicy stack frame.

However, I think this is okay. syscall.CertVerifyCertificateChainPolicy itself could grow the stack, but everything still has type information at that point. Once that calls Syscall6, though, everything is nosplit down to the eventual cgocall, which does an entersyscall. Once a goroutine is in syscall state, we don't shrink its stack (for exactly this reason). As long as that syscall doesn't hold on to the pointers after it returns and doesn't call back in to Go code (does it?), it should be fine.

@alexbrainman

This comment has been minimized.

Copy link
Member

@alexbrainman alexbrainman commented Jan 31, 2018

As long as that syscall doesn't hold on to the pointers after it returns and doesn't call back in to Go code (does it?), it should be fine.

No Windows itself does not call into Go.

Thank you for explaining.

Alex

@nvx

This comment has been minimized.

Copy link

@nvx nvx commented Feb 18, 2018

I've run into this too (1.9.2 on Windows 10 amd64). Application was making a lot of HTTPS requests over the course of ~4-5 hours when the crash happened.

Stack trace below in case it helps.

Exception 0xc0000005 0x0 0xffffffffffffffff 0x7ffc211ecc9f
PC=0x7ffc211ecc9f

syscall.Syscall6(0x7ffc211ef940, 0x4, 0x4, 0x67ee280, 0xc042445380, 0xc042445390, 0x0, 0x0, 0x11dd091a53ec7a6f, 0x6a938809f593b9c8, ...)
	C:/Go/src/runtime/syscall_windows.go:174 +0x69
syscall.CertVerifyCertificateChainPolicy(0x4, 0x67ee280, 0xc042445380, 0xc042445390, 0x0, 0xc0424b7388)
	C:/Go/src/syscall/zsyscall_windows.go:1208 +0xb6
crypto/x509.checkChainSSLServerPolicy(0xc042851400, 0x67ee280, 0xc0424457f8, 0x6868960, 0xc0424b7528)
	C:/Go/src/crypto/x509/root_windows.go:117 +0xf5
crypto/x509.(*Certificate).systemVerify(0xc042851400, 0xc0424b77f8, 0x0, 0x0, 0x0, 0x0, 0x0)
	C:/Go/src/crypto/x509/root_windows.go:212 +0x4c8
crypto/x509.(*Certificate).Verify(0xc042851400, 0xc04248e5b0, 0x4, 0xc0428bd6e0, 0x0, 0xbe9a75a2b0089690, 0xe636f35a699, 0x995c60, 0x0, 0x0, ...)
	C:/Go/src/crypto/x509/verify.go:289 +0x81f
crypto/tls.(*clientHandshakeState).doFullHandshake(0xc042445e68, 0xc042736bd0, 0x6a)
	C:/Go/src/crypto/tls/handshake_client.go:300 +0x5bc
crypto/tls.(*Conn).clientHandshake(0xc04283fc00, 0x7d0d00, 0xc04283fd20)
	C:/Go/src/crypto/tls/handshake_client.go:228 +0xfa2
crypto/tls.(*Conn).Handshake(0xc04283fc00, 0x0, 0x0)
	C:/Go/src/crypto/tls/conn.go:1307 +0x196
net/http.(*Transport).dialConn.func3(0x0, 0xc04283fc00, 0x0, 0xc042b6a960)
	C:/Go/src/net/http/transport.go:1151 +0x49
created by net/http.(*Transport).dialConn
	C:/Go/src/net/http/transport.go:1147 +0xdc5
@mappu

This comment has been minimized.

Copy link

@mappu mappu commented Mar 9, 2018

Experienced this again today on go1.9.2, same stack trace.

There's a few more folks at git-lfs/git-lfs#1786 with the same problem.

@slrz commented on 9 Sep 2017:

These eventually lead up to some Docker bug where the cause was confirmed as some kind of "WebCompanion" software running on the user's machine.

moby/moby#27423 (comment)

No, I think that's different. That's the SetFileCompletionNotificationModes panic (a.k.a. #22149) which is fixed since go1.9.2, but, this problem is still occurring. (That also leads down a rabbithole with 0x20000 flags to identify your non-IFS LSPs/BSPs, but, I have seen this issue occur on a machine with no non-IFS LSPs/BSPs.) Is there any reason to think the problems are related?

@zhangyoufu

This comment has been minimized.

Copy link

@zhangyoufu zhangyoufu commented Apr 9, 2018

func checkChainSSLServerPolicy(c *Certificate, chainCtx *syscall.CertChainContext, opts *VerifyOptions) error {
	servernamep, err := syscall.UTF16PtrFromString(opts.DNSName)
	if err != nil {
		return err
	}
	sslPara := &syscall.SSLExtraCertChainPolicyPara{
		AuthType:   syscall.AUTHTYPE_SERVER,
		ServerName: servernamep,
	}
	sslPara.Size = uint32(unsafe.Sizeof(*sslPara))

	para := &syscall.CertChainPolicyPara{
		ExtraPolicyPara: uintptr(unsafe.Pointer(sslPara)),
	}
	para.Size = uint32(unsafe.Sizeof(*para))

	status := syscall.CertChainPolicyStatus{}
	err = syscall.CertVerifyCertificateChainPolicy(syscall.CERT_CHAIN_POLICY_SSL, chainCtx, para, &status)

What if GC happened before syscall.CertVerifyCertificateChainPolicy is called? The stack might be moved, and uintptr(unsafe.Pointer(sslPara)) seems unsafe for me.

@aclements

This comment has been minimized.

Copy link
Member

@aclements aclements commented Apr 9, 2018

@zhangyoufu, I think you're on to something. Above, I'd only considered the direct arguments to syscall.CertVerifyCertificateChainPolicy, but I'd missed para.ExtraPolicyPara.

syscall.CertVerifyCertificateChainPolicy can move the stack on entry. If it does, we'll update the para argument since it's correctly typed as a pointer and points to the stack. But when scanning the para object in checkChainSSLServerPolicy's frame, we won't update the ExtraPolicyPara field because it's scalar-typed, even though it contains a stack "pointer" to the sslPara object. Then when we do the syscall and Windows tries to follow para.ExtraPolicyPara, it could wind up in memory that's been re-used for something else.

@alexbrainman, @mkrautz, there seem to be several uintptr-typed fields in syscall/types_windows.go that are actually pointers, particularly in the crypto APIs, but also in some other types. All of these could be susceptible to this sort of bug. Is there a reason these are uintptrs? We might have to change this in a non-backwards compatible way.

@aclements aclements modified the milestones: Unplanned, Go1.11 Apr 9, 2018
@alexbrainman

This comment has been minimized.

Copy link
Member

@alexbrainman alexbrainman commented Apr 11, 2018

We're not OK if a struct field can be either a pointer or scalar. Are there any cases like that?

I don't see any in syscall and internal/syscall/windows/*

I'd be inclined to declare something like type Pointer *struct{} in the syscall package and use that for 1 and 2.

I hope I interpreted your idea correctly https://go-review.googlesource.com/#/c/go/+/106275

I believe we should fix this. I also believe we can fix it because the syscall package is explicitly opted out of the Go 1 compatibility promise and because changing these types would fix a class of GC bugs that can't otherwise be worked around.

I am fine fixing syscall package. But we could also copy and adjust syscall code somewhere else - these would leave syscall users with broken code.

Whatever we decide, we should also fix golang.org/x/sys/windows package.

What about fields that may or may not be pointers, depending on the value of some other field?

That would be a problem, but as far as I can tell it doesn't apply to any of the uintptr fields above.

That is what I see too.

Alex

@eliasnaur

This comment has been minimized.

Copy link
Contributor

@eliasnaur eliasnaur commented Apr 17, 2018

Is this point-release material? From a glance, it seems any Windows Go program using TLS is vulnerable to rare but unavoidable crashes.

@gopherbot gopherbot closed this in 4869ec0 Apr 18, 2018
@aclements

This comment has been minimized.

Copy link
Member

@aclements aclements commented Apr 18, 2018

@eliasnaur, yes, we should fix this in a point release. Thanks for the reminder.

We can't port CL 106275 to a point release because it breaks a public API, but I can think of a few ways to fix this without breaking the (admittedly broken) API:

  1. Copy the definition of syscall.CertChainPolicyPara into crypto/x509 with the corrected field type, use that in checkChainSSLServerPolicy, and unsafe cast it for the call to syscall.CertVerifyCertificateChainPolicy.
  2. Force the syscall.CertChainPolicyPara to the heap and runtime.KeepAlive it across the syscall.
  3. Copy the corrected APIs into internal/syscall/windows and modify crypto/x509 to use that.

I'd be inclined to go with solution 1.

Solution 3 is sort of intriguing in general. While we're breaking things, we could even move all of the crypto APIs from syscall to internal/syscall/windows and tell people to use x/sys/windows instead (which we also need to fix).

@aclements aclements reopened this Apr 18, 2018
@aclements aclements modified the milestones: Go1.11, Go1.10.2 Apr 18, 2018
@DemiMarie

This comment has been minimized.

Copy link

@DemiMarie DemiMarie commented Apr 18, 2018

@FiloSottile

This comment has been minimized.

Copy link
Member

@FiloSottile FiloSottile commented Apr 24, 2018

@gopherbot please file this for backport.

See @aclements's comment #21376 (comment) about having to develop a different fix from CL 106275 in order not to break a public API.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Apr 24, 2018

Backport issue(s) opened: #25033 (for 1.10), #25034 (for 1.9).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

HalCanary pushed a commit to HalCanary/skia-hc that referenced this issue Apr 26, 2018
They're flaky due to golang/go#21376

Bug: skia:
Change-Id: Ic629c84b72c492b35d393328aa2178762ce3c7d8
Reviewed-on: https://skia-review.googlesource.com/123932
Reviewed-by: Ravi Mistry <rmistry@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Eric Boren <borenet@google.com>
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented May 5, 2018

Change https://golang.org/cl/111715 mentions this issue: [release-branch.go1.10] crypto/tls: copy and use adjusted syscall.CertChainPolicyPara

gopherbot pushed a commit that referenced this issue May 7, 2018
…tChainPolicyPara

As discussed in issue #21376, it is unsafe to have
syscall.CertChainPolicyPara.ExtraPolicyPara uintptr -
it has to be a pointer type. So copy syscall.CertChainPolicyPara
into crypto/tls package, make ExtraPolicyPara unsafe.Pointer,
and use new struct instead of syscall.CertChainPolicyPara.

Fixes #25033

Change-Id: If914af056cbbb0c4d93ffaa915b3d2cb5ecad0cd
Reviewed-on: https://go-review.googlesource.com/111715
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented May 8, 2018

Change https://golang.org/cl/112095 mentions this issue: [release-branch.go1.9] crypto/x509: copy and use adjusted syscall.CertChainPolicyPara

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented May 9, 2018

Change https://golang.org/cl/112179 mentions this issue: [release-branch.go1.9] crypto/x509: copy and use adjusted syscall.CertChainPolicyPara

gopherbot pushed a commit that referenced this issue May 9, 2018
…tChainPolicyPara

As discussed in issue #21376, it is unsafe to have
syscall.CertChainPolicyPara.ExtraPolicyPara uintptr -
it has to be a pointer type. So copy syscall.CertChainPolicyPara
into crypto/tls package, make ExtraPolicyPara unsafe.Pointer,
and use new struct instead of syscall.CertChainPolicyPara.

Fixes #25034

Change-Id: If914af056cbbb0c4d93ffaa915b3d2cb5ecad0cd
Reviewed-on: https://go-review.googlesource.com/111715
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-on: https://go-review.googlesource.com/112179
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Run-TryBot: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
HalCanary pushed a commit to HalCanary/skia-hc that referenced this issue Jun 21, 2018
This reverts commit c2a8870.

Reason for revert: Issue causing flakiness has been resolved.

Original change's description:
> [infra] Enable retries for Windows compiles
> 
> They're flaky due to golang/go#21376
> 
> Bug: skia:
> Change-Id: Ic629c84b72c492b35d393328aa2178762ce3c7d8
> Reviewed-on: https://skia-review.googlesource.com/123932
> Reviewed-by: Ravi Mistry <rmistry@google.com>
> Reviewed-by: Mike Klein <mtklein@google.com>
> Commit-Queue: Eric Boren <borenet@google.com>

TBR=borenet@google.com,mtklein@google.com,rmistry@google.com

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug: skia:
Change-Id: Ieaae0be9796e61ca8f4265198687e847afa853aa
Reviewed-on: https://skia-review.googlesource.com/136633
Reviewed-by: Eric Boren <borenet@google.com>
Reviewed-by: Ben Wagner <benjaminwagner@google.com>
Commit-Queue: Ben Wagner <benjaminwagner@google.com>
@kolyshkin

This comment has been minimized.

Copy link
Contributor

@kolyshkin kolyshkin commented Jul 4, 2018

While we're breaking things <...>

@aclements unfortunately yes. As time permits, can you please take a look at google/certificate-transparency-go#284 issue (and the proposed fix at google/certificate-transparency-go#285) and let us know if this could maybe solved in some better way?

@golang golang locked and limited conversation to collaborators Jul 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.