Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall: panic calling Windows system call in 1.7, worked in 1.6 #17908

Closed
robert-chiniquy opened this issue Nov 14, 2016 · 10 comments
Closed

syscall: panic calling Windows system call in 1.7, worked in 1.6 #17908

robert-chiniquy opened this issue Nov 14, 2016 · 10 comments

Comments

@robert-chiniquy
Copy link

@robert-chiniquy robert-chiniquy commented Nov 14, 2016

Calling syscalls on Windows can cause a panic. This does not occur in Go 1.6 with the same code.

I believe this code follows the rules listed at: https://golang.org/pkg/unsafe/#Pointer, most specifically:

If a pointer argument must be converted to uintptr for use as an argument, that conversion must appear in the call expression itself

It appears to me that this code is triggering a garbage collector or compiler bug by defining resume_handle in the scope of the function resumeHandlePtrCrashes and passing it to the syscall called in callingNetLocalGroupGetMembers.

What version of Go are you using (go version)?

go version go1.7.3 windows/amd64

What operating system and processor architecture are you using (go env)?

set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=Z:\sudo.wtf
set GORACE=
set GOROOT=C:\Go
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set CC=gcc
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0
set CXX=g++
set CGO_ENABLED=1

What did you do?

A small test program:

package main

import (
	"fmt"
	"golang.org/x/sys/windows"
	"syscall"
	"unsafe"
)

var modNetapi32 = windows.NewLazySystemDLL("Netapi32.dll")

var procNetLocalGroupGetMembers = modNetapi32.NewProc("NetLocalGroupGetMembers")

type NET_API_STATUS uint32

const MAX_PREFERRED_LENGTH int32 = -1

func main() {
	resumeHandlePtrCrashes()
}

func resumeHandlePtrCrashes() {
	var resume_handle uint32
	callingNetLocalGroupGetMembers("Administrators", &resume_handle)
}

func callingNetLocalGroupGetMembers(groupName string, resume_handle *uint32) {
	var bufptr *byte
	var entriesread uint32
	var totalentries uint32
	adminGroupName, _ := windows.UTF16PtrFromString("Administrators")
	ex := NetLocalGroupGetMembers(nil, adminGroupName, 1, &bufptr, MAX_PREFERRED_LENGTH, &entriesread, &totalentries, resume_handle)
	fmt.Printf("NetLocalGroupGetMembers = %d\n", ex)
}

func NetLocalGroupGetMembers(servername *uint16,
	localgroupname *uint16,
	level uint32, bufptr **byte,
	prefmaxlen int32, entriesread *uint32, totalentries *uint32, resumehandle *uint32) (status NET_API_STATUS) {
	r0, _, _ := syscall.Syscall9(procNetLocalGroupGetMembers.Addr(), 8,
		uintptr(unsafe.Pointer(servername)),
		uintptr(unsafe.Pointer(localgroupname)),
		uintptr(level), uintptr(unsafe.Pointer(bufptr)),
		uintptr(prefmaxlen), uintptr(unsafe.Pointer(entriesread)),
		uintptr(unsafe.Pointer(totalentries)), uintptr(unsafe.Pointer(resumehandle)), 0)
	status = NET_API_STATUS(r0)
	return
}

go run test.go

What did you expect to see?

Running under Go 1.6, this code would produce: NetLocalGroupGetMembers = 0

What did you see instead?

An error like this:

Exception 0xc0000005 0x0 0xffffffffffffffff 0x7ff8ae692573
PC=0x7ff8ae692573

syscall.Syscall9(0x7ff8ae692050, 0x8, 0x0, 0xc0420086e0, 0x1, 0xc04202ded8, 0xffffffffffffffff, 0xc04202ded0, 0xc04202de
cc, 0xc04202df1c, ...)
        C:/Go/src/runtime/syscall_windows.go:185 +0x6b
main.NetLocalGroupGetMembers(0x0, 0xc0420086e0, 0xc000000001, 0xc04202ded8, 0xffffffff, 0xc04202ded0, 0xc04202decc, 0xc0
4202df1c, 0x0)
        Z:/Go/src/[...]/test.go:52 +0x125
main.callingNetLocalGroupGetMembers(0x4b9acf, 0xe, 0xc04202df1c)
        Z:/Go/src/[...]/test.go:28 +0xac
main.resumeHandlePtrCrashes()
        Z:/Go/src/[...]/test.go +0x4f
main.main()
        Z:/Go/src/[...]/test.go:20 +0x1b
rax     0x4202df8000000000
rbx     0x4202df8000000000
rcx     0x0
rdi     0xc04202ded0
rsi     0x0
rbp     0x7fd90
rsp     0x7fc90
r8      0x1
r9      0xc04202ded8
r10     0x0
r11     0x4
r12     0x0
r13     0x1
r14     0xc04202df1c
r15     0xc04202ded8
rip     0x7ff8ae692573
rflags  0x10206
cs      0x33
fs      0x53
gs      0x2b
exit status 2

Background

The above code is generated by $GOROOT/src/syscall/mksyscall_windows.go, from this:

//sys NetLocalGroupGetMembers(servername *uint16, localgroupname *uint16, level uint32, bufptr **byte, prefmaxlen int32, entriesread *uint32, totalentries *uint32, resumehandle *uint32) (status NET_API_STATUS) = Netapi32.NetLocalGroupGetMembers
/*
NET_API_STATUS NetLocalGroupGetMembers(
  _In_    LPCWSTR    servername,
  _In_    LPCWSTR    localgroupname,
  _In_    DWORD      level,
  _Out_   LPBYTE     *bufptr,
  _In_    DWORD      prefmaxlen,
  _Out_   LPDWORD    entriesread,
  _Out_   LPDWORD    totalentries,
  _Inout_ PDWORD_PTR resumehandle
);
*/
@ianlancetaylor ianlancetaylor changed the title Panic calling Windows System Call in 1.7, worked in 1.6 syscall: panic calling Windows system call in 1.7, worked in 1.6 Nov 14, 2016
@ianlancetaylor ianlancetaylor added this to the Go1.8 milestone Nov 14, 2016
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 14, 2016

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Nov 24, 2016

It appears to me that this code is triggering a garbage collector or compiler bug by defining resume_handle in the scope of the function resumeHandlePtrCrashes and passing it to the syscall called in callingNetLocalGroupGetMembers.

Yes. The problem here is confusion between garbage collector and your Windows. In your example resume_handle is stored on stack (resume_handle is not listed as escaped in the following output):

c:\dev\src\issues\issue17908>go run -gcflags="-N -m" main.go 2>&1 | sed "/# command-line-arguments/,$!d"
# command-line-arguments
.\main.go:39: NetLocalGroupGetMembers servername does not escape
.\main.go:39: NetLocalGroupGetMembers localgroupname does not escape
.\main.go:39: NetLocalGroupGetMembers bufptr does not escape
.\main.go:39: NetLocalGroupGetMembers entriesread does not escape
.\main.go:39: NetLocalGroupGetMembers totalentries does not escape
.\main.go:39: NetLocalGroupGetMembers resumehandle does not escape
.\main.go:33: ex escapes to heap
.\main.go:27: callingNetLocalGroupGetMembers groupName does not escape
.\main.go:27: callingNetLocalGroupGetMembers resume_handle does not escape
.\main.go:32: callingNetLocalGroupGetMembers &bufptr does not escape
.\main.go:32: callingNetLocalGroupGetMembers &entriesread does not escape
.\main.go:32: callingNetLocalGroupGetMembers &totalentries does not escape
.\main.go:33: callingNetLocalGroupGetMembers ... argument does not escape
.\main.go:24: resumeHandlePtrCrashes &resume_handle does not escape
Exception 0xc0000005 0x0 0xffffffffffffffff 0x7fefb892434
PC=0x7fefb892434

syscall.Syscall9(0x7fefb892210, 0x8, 0x0, 0xc0420087a0, 0x1, 0xc042029ea0, 0xffffffffffffffff, 0xc042029e98, 0xc042029e90, 0xc042029f1c, ...)
        C:/dev/go/src/runtime/syscall_windows.go:185 +0x6b
main.NetLocalGroupGetMembers(0x0, 0xc0420087a0, 0xc000000001, 0xc042029ea0, 0xffffffff, 0xc042029e98, 0xc042029e90, 0xc042029f1c, 0x0)
        c:/dev/src/issues/issue17908/main.go:45 +0x14d
main.callingNetLocalGroupGetMembers(0x4babaf, 0xe, 0xc042029f1c)
        c:/dev/src/issues/issue17908/main.go:32 +0xc4
main.resumeHandlePtrCrashes()
        c:/dev/src/issues/issue17908/main.go:24 +0x4f
main.main()
        c:/dev/src/issues/issue17908/main.go:19 +0x1b
rax     0x0
rbx     0x0
rcx     0xc042029e98
rdi     0x42029f8000000000
rsi     0xc042029f1c
rbp     0xc042029e90
rsp     0x6fd20
r8      0x1
r9      0xc042029ea0
r10     0x0
r11     0x4
r12     0xc042029ea0
r13     0x8
r14     0x1
r15     0x0
rip     0x7fefb892434
rflags  0x10206
cs      0x33
fs      0x53
gs      0x2b
exit status 2

c:\dev\src\issues\issue17908>

I suspect NetLocalGroupGetMembers is slow to return (it, probably, searches network), and garbage collector moves goroutine stack before NetLocalGroupGetMembers returns. When NetLocalGroupGetMembers returns, Windows writes at resume_handle old address, and god only knows what is there at that time. Therefore the crash.

If I change your program just a little, it starts working again:

c:\dev\src\issues\issue17908>hg diff
diff -r 095448b519cc main.go
--- a/main.go   Thu Nov 24 15:24:46 2016 +1100
+++ b/main.go   Thu Nov 24 16:19:44 2016 +1100
@@ -22,6 +22,7 @@
 func resumeHandlePtrCrashes() {
        var resume_handle uint32
        callingNetLocalGroupGetMembers("Administrators", &resume_handle)
+       fmt.Println(resume_handle)
 }

 func callingNetLocalGroupGetMembers(groupName string, resume_handle *uint32) {

c:\dev\src\issues\issue17908>go run -gcflags="-N -m" main.go 2>&1 | sed "/# command-line-arguments/,$!d"
# command-line-arguments
.\main.go:40: NetLocalGroupGetMembers servername does not escape
.\main.go:40: NetLocalGroupGetMembers localgroupname does not escape
.\main.go:40: NetLocalGroupGetMembers bufptr does not escape
.\main.go:40: NetLocalGroupGetMembers entriesread does not escape
.\main.go:40: NetLocalGroupGetMembers totalentries does not escape
.\main.go:40: NetLocalGroupGetMembers resumehandle does not escape
.\main.go:34: ex escapes to heap
.\main.go:28: callingNetLocalGroupGetMembers groupName does not escape
.\main.go:28: callingNetLocalGroupGetMembers resume_handle does not escape
.\main.go:33: callingNetLocalGroupGetMembers &bufptr does not escape
.\main.go:33: callingNetLocalGroupGetMembers &entriesread does not escape
.\main.go:33: callingNetLocalGroupGetMembers &totalentries does not escape
.\main.go:34: callingNetLocalGroupGetMembers ... argument does not escape
.\main.go:25: resume_handle escapes to heap
.\main.go:24: resumeHandlePtrCrashes &resume_handle does not escape
.\main.go:25: resumeHandlePtrCrashes ... argument does not escape
NetLocalGroupGetMembers = 0
0

c:\dev\src\issues\issue17908>

See .\main.go:25: resume_handle escapes to heap line - this time resume_handle is on the heap, so moving stack does not affect your program.

I do not know why your program worked in the past. Perhaps garbage collector behaved differently. Maybe compiler generates more efficient code now (by using stack instead of heap).

I will let others tell you what to do to prevent this from happening. From what I know about current Go runtime, you need to keep long living variables (that are also accessible by Windows) on the heap.

Alex

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Nov 24, 2016

This is what https://golang.org/pkg/runtime/#KeepAlive is for.

But syscall.Syscall calls should be treated as if KeepAlive were used immediately after. Maybe they're not on Windows?

/cc @ianlancetaylor @rsc @randall77

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Nov 24, 2016

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 25, 2016

Based on @alexbrainman 's comment, this is not a runtime.KeepAlive issue. runtime.KeepAlive can be used to ensure that a heap pointer is not released. But the pointer in this case is on the stack, not on the heap.

But I don't agree with the full analysis. The garbage collector should not move the stack of a goroutine that is waiting for a system call to complete. The shrinkstack function in the runtime is careful to avoid doing that. So I don't think the GC moving the stack is the problem.

And I don't see a way that the stack could grow in the syscall.Syscall9 sequence.

If I am reading the exception information correctly, the error is an attempt to read from the address 0xffffffffffffffff. I'm not sure how to tie that back in the resume_handle variable, or to the fact that this identical program appears to work with Go 1.6.

A few things to try to get more information.

Set GODEBUG=gcshrinkstackoff=1 in the environment and see if the problem recurs. If it does not recur, then I am wrong, and stack shrinking is the problem.

Set GOTRACEBACK=system in the environment and run the program. That will show us the runtime code run at the point of the crash, which may suggest something.

@pquerna
Copy link

@pquerna pquerna commented Nov 25, 2016

GODEBUG=gcshrinkstackoff=1 had no changes:

PS C:\Users\pquerna\go\src\github.com\ScaleFT\issue17908> $env:GODEBUG="gcshrinkstackoff=1"
PS C:\Users\pquerna\go\src\github.com\ScaleFT\issue17908> go run .\main.go
Exception 0xc0000005 0x0 0xffffffffffffffff 0x7ffbcfc72eed
PC=0x7ffbcfc72eed

syscall.Syscall9(0x7ffbcfc729a0, 0x8, 0x0, 0xc042008700, 0x1, 0xc04202bed8, 0xffffffffffffffff, 0xc04202bed0, 0xc04202be
cc, 0xc04202bf1c, ...)
        C:/Go/src/runtime/syscall_windows.go:185 +0x6b
main.NetLocalGroupGetMembers(0x0, 0xc042008700, 0xc000000001, 0xc04202bed8, 0xffffffff, 0xc04202bed0, 0xc04202becc, 0xc0
4202bf1c, 0x0)
        C:/Users/pquerna/go/src/github.com/ScaleFT/issue17908/main.go:45 +0x125
main.callingNetLocalGroupGetMembers(0x4b9aaf, 0xe, 0xc04202bf1c)
        C:/Users/pquerna/go/src/github.com/ScaleFT/issue17908/main.go:32 +0xac
main.resumeHandlePtrCrashes()
        C:/Users/pquerna/go/src/github.com/ScaleFT/issue17908/main.go:24 +0x4f
main.main()
        C:/Users/pquerna/go/src/github.com/ScaleFT/issue17908/main.go:19 +0x1b
rax     0x4202bf8000000000
rbx     0x4202bf8000000000
rcx     0x0
rdi     0xc04202bed0
rsi     0x0
rbp     0x8fd90
rsp     0x8fc90
r8      0x1
r9      0xc04202bed8
r10     0x0
r11     0x4
r12     0x1
r13     0x0
r14     0xc04202bf1c
r15     0xc04202bed8
rip     0x7ffbcfc72eed
rflags  0x10206
cs      0x33
fs      0x53
gs      0x2b
exit status 2

Output from GOTRACEBACK=system:

PS C:\Users\pquerna\go\src\github.com\ScaleFT\issue17908> $env:GOTRACEBACK="system"
PS C:\Users\pquerna\go\src\github.com\ScaleFT\issue17908> go run .\main.go
Exception 0xc0000005 0x0 0xffffffffffffffff 0x7ffbcfc72eed
PC=0x7ffbcfc72eed

runtime.cgocall(0x452f20, 0x514208, 0x0)
        C:/Go/src/runtime/cgocall.go:131 +0x116 fp=0xc04202bda0 sp=0xc04202bd60
syscall.Syscall9(0x7ffbcfc729a0, 0x8, 0x0, 0xc042008700, 0x1, 0xc04202bed8, 0xffffffffffffffff, 0xc04202bed0, 0xc04202be
cc, 0xc04202bf1c, ...)
        C:/Go/src/runtime/syscall_windows.go:185 +0x6b fp=0xc04202bdd0 sp=0xc04202bda0
main.NetLocalGroupGetMembers(0x0, 0xc042008700, 0xc000000001, 0xc04202bed8, 0xffffffff, 0xc04202bed0, 0xc04202becc, 0xc0
4202bf1c, 0x0)
        C:/Users/pquerna/go/src/github.com/ScaleFT/issue17908/main.go:45 +0x125 fp=0xc04202be80 sp=0xc04202bdd0
main.callingNetLocalGroupGetMembers(0x4b9aaf, 0xe, 0xc04202bf1c)
        C:/Users/pquerna/go/src/github.com/ScaleFT/issue17908/main.go:32 +0xac fp=0xc04202bf00 sp=0xc04202be80
main.resumeHandlePtrCrashes()
        C:/Users/pquerna/go/src/github.com/ScaleFT/issue17908/main.go:24 +0x4f fp=0xc04202bf30 sp=0xc04202bf00
main.main()
        C:/Users/pquerna/go/src/github.com/ScaleFT/issue17908/main.go:19 +0x1b fp=0xc04202bf38 sp=0xc04202bf30
runtime.main()
        C:/Go/src/runtime/proc.go:183 +0x1f0 fp=0xc04202bf90 sp=0xc04202bf38
runtime.goexit()
        C:/Go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc04202bf98 sp=0xc04202bf90

goroutine 2 [force gc (idle)]:
runtime.gopark(0x4c4bc8, 0x513600, 0x4b9ed0, 0xf, 0x4c4a14, 0x1)
        C:/Go/src/runtime/proc.go:259 +0x14f fp=0xc04201df28 sp=0xc04201def8
runtime.goparkunlock(0x513600, 0x4b9ed0, 0xf, 0xc042016014, 0x1)
        C:/Go/src/runtime/proc.go:265 +0x65 fp=0xc04201df68 sp=0xc04201df28
runtime.forcegchelper()
        C:/Go/src/runtime/proc.go:224 +0xb6 fp=0xc04201dfa0 sp=0xc04201df68
runtime.goexit()
        C:/Go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc04201dfa8 sp=0xc04201dfa0
created by runtime.init.3
        C:/Go/src/runtime/proc.go:213 +0x3c

goroutine 3 [GC sweep wait]:
runtime.gopark(0x4c4bc8, 0x513780, 0x4b98d3, 0xd, 0x41bc14, 0x1)
        C:/Go/src/runtime/proc.go:259 +0x14f fp=0xc04201ff18 sp=0xc04201fee8
runtime.goparkunlock(0x513780, 0x4b98d3, 0xd, 0x14, 0x1)
        C:/Go/src/runtime/proc.go:265 +0x65 fp=0xc04201ff58 sp=0xc04201ff18
runtime.bgsweep(0xc04202c000)
        C:/Go/src/runtime/mgcsweep.go:63 +0xc4 fp=0xc04201ff98 sp=0xc04201ff58
runtime.goexit()
        C:/Go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc04201ffa0 sp=0xc04201ff98
created by runtime.gcenable
        C:/Go/src/runtime/mgc.go:195 +0x68

goroutine 4 [finalizer wait]:
runtime.gopark(0x4c4bc8, 0x52dbb0, 0x4b9c37, 0xe, 0x14, 0x1)
        C:/Go/src/runtime/proc.go:259 +0x14f fp=0xc042019ee8 sp=0xc042019eb8
runtime.goparkunlock(0x52dbb0, 0x4b9c37, 0xe, 0x14, 0x1)
        C:/Go/src/runtime/proc.go:265 +0x65 fp=0xc042019f28 sp=0xc042019ee8
runtime.runfinq()
        C:/Go/src/runtime/mfinal.go:158 +0xc1 fp=0xc042019fa0 sp=0xc042019f28
runtime.goexit()
        C:/Go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc042019fa8 sp=0xc042019fa0
created by runtime.createfing
        C:/Go/src/runtime/mfinal.go:139 +0x7a
rax     0x4202bf8000000000
rbx     0x4202bf8000000000
rcx     0x0
rdi     0xc04202bed0
rsi     0x0
rbp     0x8fd90
rsp     0x8fc90
r8      0x1
r9      0xc04202bed8
r10     0x0
r11     0x4
r12     0x1
r13     0x0
r14     0xc04202bf1c
r15     0xc04202bed8
rip     0x7ffbcfc72eed
rflags  0x10206
cs      0x33
fs      0x53
gs      0x2b
exit status 2
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 25, 2016

Thanks. Unfortunately I don't see anything useful there. I don't know what the problem is.

Does it make any difference if you move the call to procNetLocalGroupGetMembers.Addr() out of the call to syscall.Syscall9, by storing it in a local variable before the call and passing the local variable to syscall.Syscall9?

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Nov 28, 2016

If it is not garbage collector, perhaps @pquerna you made mistake in your program. The NetLocalGroupGetMembers resumehandle type is PDWORD_PTR, which is pointer to pointer. But you pass pointer to uint32 there. But pointers are 64 bits long on windows/amd64. Perhaps Windows writes uint64 where you only provided uint32. Try this:

diff -r 095448b519cc main.go
--- a/main.go	Thu Nov 24 15:24:46 2016 +1100
+++ b/main.go	Mon Nov 28 17:03:27 2016 +1100
@@ -20,11 +20,11 @@
 }
 
 func resumeHandlePtrCrashes() {
-	var resume_handle uint32
+	var resume_handle uintptr
 	callingNetLocalGroupGetMembers("Administrators", &resume_handle)
 }
 
-func callingNetLocalGroupGetMembers(groupName string, resume_handle *uint32) {
+func callingNetLocalGroupGetMembers(groupName string, resume_handle *uintptr) {
 	var bufptr *byte
 	var entriesread uint32
 	var totalentries uint32
@@ -36,7 +36,7 @@
 func NetLocalGroupGetMembers(servername *uint16,
 	localgroupname *uint16,
 	level uint32, bufptr **byte,
-	prefmaxlen int32, entriesread *uint32, totalentries *uint32, resumehandle *uint32) (status NET_API_STATUS) {
+	prefmaxlen int32, entriesread *uint32, totalentries *uint32, resumehandle *uintptr) (status NET_API_STATUS) {
 	r0, _, _ := syscall.Syscall9(procNetLocalGroupGetMembers.Addr(), 8,
 		uintptr(unsafe.Pointer(servername)),
 		uintptr(unsafe.Pointer(localgroupname)),

It works for me.

Alex

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Nov 30, 2016

@pquerna, can you confirm? We're trying to close Go 1.8 bugs.

@robert-chiniquy
Copy link
Author

@robert-chiniquy robert-chiniquy commented Dec 1, 2016

@bradfitz Confirmed. Thanks, this resolves the issue for us.

@golang golang locked and limited conversation to collaborators Dec 1, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.