Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile, runtime: investigate Windows stack overflows calling into system C libraries #20975

Closed
kjk opened this issue Jul 10, 2017 · 60 comments

Comments

Projects
None yet
9 participants
@kjk
Copy link

commented Jul 10, 2017

This came up with 1.8.3 when working on GUI Windows programs using lxn\walk library.

Relevant issues:

The problem is that Go linker sets very small stack size (I think 128 kB) in PE executables. The standard on Windows is more like 1-2 MB.

This is fine for code that only lightly uses system C libraries but when writing code that talks to win32 UI APIs, it's very likely to hit the stack limit and silently crash.

I encountered it because I tried to use webview (mshtml.dll) control and it crashed on 64-bit when rendering my (not very complicated) website. Other people seen such crashes as well.

There are work-arounds: one can edit PE header after the exe is built using e.g. editbin.exe , but such tool is not necessarily available to the programmer (it's part of Visual Studio).

It would be much easier on Windows developers if there was a linker flag to set custom stack size so that it could be done directly with go build. A library like lxn\walk could then document the need to increase stack size for Go Windows programs and recommend a

I don't know if such option would be relevant/needed for other OSes/exe formats.

@bradfitz bradfitz added this to the Go1.10 milestone Jul 10, 2017

@bradfitz bradfitz changed the title Add linker option for setting stack size in PE executables. cmd/compile, runtime: investigate Windows stack overflows calling into system C libraries Jul 10, 2017

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jul 10, 2017

Stack sizes isn't a knob programmers should think about in Go. That's why Go stacks dynamically split or adjust in size as needed.

When making a call to code not managed by Go's dynamic stack size checks, Go should be switching to a conservatively-sized stack. Maybe there's a bug on how that works on Windows.

/cc @aclements @alexbrainman

@kjk

This comment has been minimized.

Copy link
Author

commented Jul 10, 2017

For what it's worth, here's a callstack of the crash. It's on main thread:

.. and 0xe4 more stack frames likes this
e5 00000000`00088dd0 00007ffd`5eaacfcb MSHTML!CFormatInfo::FindFormattingParent+0x43a
e6 00000000`00088e20 00007ffd`5e99a90f MSHTML!CElement::ComputeFormatsVirtual+0x12b
e7 00000000`000897c0 00007ffd`5eab0f0a MSHTML!CElement::ComputeFormats+0x14f
e8 00000000`000898f0 00007ffd`5eaacfcb MSHTML!CFormatInfo::FindFormattingParent+0x43a
e9 00000000`00089940 00007ffd`5e99a90f MSHTML!CElement::ComputeFormatsVirtual+0x12b
ea 00000000`0008a2e0 00007ffd`5e999b61 MSHTML!CElement::ComputeFormats+0x14f
eb 00000000`0008a410 00007ffd`5e999765 MSHTML!CTreeNode::ComputeFormats+0x81
ec 00000000`0008a450 00007ffd`5e9ccde2 MSHTML!CTreeNode::ComputeFormatsHelper+0x35
ed 00000000`0008b200 00007ffd`5e9cde27 MSHTML!CTreeNode::GetCharFormatHelper+0xe
ee 00000000`0008b230 00007ffd`5ed6f105 MSHTML!CTreeNode::GetCharFormat+0x77
ef 00000000`0008b260 00007ffd`5ecd5226 MSHTML!CRecalcLinePtr::CalcAfterSpace+0x135
f0 00000000`0008b310 00007ffd`5ecc4c0e MSHTML!CRecalcLinePtr::MeasureLine+0x382
f1 00000000`0008b480 00007ffd`5ecc3375 MSHTML!CDisplay::RecalcLinesWithMeasurer+0x2ce
f2 00000000`0008b5e0 00007ffd`5ecda3d0 MSHTML!CDisplay::RecalcLines+0x65
f3 00000000`0008b820 00007ffd`5ecd7182 MSHTML!CDisplay::RecalcView+0x54
f4 00000000`0008b860 00007ffd`5ecd8196 MSHTML!CFlowLayout::CalcTextSize+0x302
f5 00000000`0008b9d0 00007ffd`5eddf0b5 MSHTML!CFlowLayout::CalcSizeCoreCSS1Strict+0xcc6
f6 00000000`0008bdc0 00007ffd`5ed4c2b4 MSHTML!CBodyLayout::CalcSizeCore+0x75
f7 00000000`0008be50 00007ffd`5ecc8af2 MSHTML!CFlowLayout::CalcSizeVirtual+0x84
f8 00000000`0008bee0 00007ffd`5ed48f67 MSHTML!CLayout::CalcSize+0x1da
f9 00000000`0008c080 00007ffd`5ecc8af2 MSHTML!CHtmlLayout::CalcSizeVirtual+0x9e7
fa 00000000`0008c3f0 00007ffd`5edf3efe MSHTML!CLayout::CalcSize+0x1da
fb 00000000`0008c590 00007ffd`5eae760f MSHTML!CLayout::CalcTopLayoutSize+0x5e
fc 00000000`0008c640 00007ffd`5ea3214d MSHTML!CView::EnsureSize+0xe7
fd 00000000`0008c680 00007ffd`5eae9e97 MSHTML!CView::EnsureView+0x43d
fe 00000000`0008c7a0 00007ffd`5ec60a6f MSHTML!CDoc::RunningToInPlace+0x1b7
ff 00000000`0008c8c0 00007ffd`5ec60754 MSHTML!CServer::TransitionTo+0xeb
100 00000000`0008c900 00007ffd`6d886ada MSHTML!CServer::Show+0x64
101 00000000`0008c930 00007ffd`6d884ce8 ieframe!CDocObjectHost::_ShowMsoView+0xba
102 00000000`0008c960 00007ffd`5eaebd4e ieframe!CDocObjectHost::ActivateMe+0x38
103 00000000`0008c990 00007ffd`5eaebcc7 MSHTML!CServer::ActivateView+0x72
104 00000000`0008c9c0 00007ffd`5ec6068f MSHTML!CServer::DoUIActivate+0x27
105 00000000`0008ca20 00007ffd`5ed682b5 MSHTML!CServer::DoVerb+0xaf
106 00000000`0008ca80 00007ffd`6d88681d MSHTML!CMarkup::Navigate+0x61
107 00000000`0008caf0 00007ffd`6d886dc0 ieframe!CDocObjectHost::_ActivateMsoView+0x91
108 00000000`0008cb90 00007ffd`6d888912 ieframe!CDocObjectHost::UIActivate+0x78
109 00000000`0008cbc0 00007ffd`6d8b0ad6 ieframe!CDocObjectView::UIActivate+0x22
10a 00000000`0008cbf0 00007ffd`6d8b21c1 ieframe!CBaseBrowser2::_UIActivateView+0xca
10b 00000000`0008cc20 00007ffd`6daa60c9 ieframe!CBaseBrowser2::v_ActivatePendingView+0x291
10c 00000000`0008ed70 00007ffd`6d9b82cb ieframe!CWebBrowserSB::v_ActivatePendingView+0x19
10d 00000000`0008eda0 00007ffd`6d8b2666 ieframe!`Microsoft::WRL::Module<1,Microsoft::WRL::Details::DefaultModule<1> >::Create'::`2'::`dynamic atexit destructor for 'moduleSingleton''+0x26e0b
10e 00000000`0008ee10 00007ffd`6daa1cf3 ieframe!CBaseBrowser2::Exec+0xd6
10f 00000000`0008ee70 00007ffd`6d9a6217 ieframe!CWebBrowserSB::Exec+0xd3
110 00000000`0008eef0 00007ffd`6d88273a ieframe!`Microsoft::WRL::Module<1,Microsoft::WRL::Details::DefaultModule<1> >::Create'::`2'::`dynamic atexit destructor for 'moduleSingleton''+0x14d570
111 00000000`0008ef60 00007ffd`6d8825a5 ieframe!CDocObjectHost::_OnReadyState+0x152
112 00000000`0008f1f0 00007ffd`6d8824c7 ieframe!CDocObjectHost::_OnChangedReadyState+0xd1
113 00000000`0008f2c0 00007ffd`5eba97d6 ieframe!CDocObjectHost::OnChanged+0x17
114 00000000`0008f2f0 00007ffd`5eba916d MSHTML!CBase::FirePropertyNotify+0x2f6
115 00000000`0008f390 00007ffd`5ead3cdc MSHTML!CMarkup::SetReadyState+0xfd
116 00000000`0008f3e0 00007ffd`5e960009 MSHTML!CMarkup::SetInteractiveInternal+0x36c
117 00000000`0008f720 00007ffd`5ead169d MSHTML!CMarkup::RequestReadystateInteractive+0xb5
118 00000000`0008f780 00007ffd`5ebbf9cc MSHTML!CMarkup::BlockScriptExecutionHelper+0x129
119 00000000`0008f7d0 00007ffd`5ead1abe MSHTML!CHtmPost::Exec+0x5dc
11a 00000000`0008f9d0 00007ffd`5ead199f MSHTML!CHtmPost::Run+0x32
11b 00000000`0008fa00 00007ffd`5ead185d MSHTML!PostManExecute+0x63
11c 00000000`0008fa40 00007ffd`5eb86780 MSHTML!PostManResume+0xa1
11d 00000000`0008fa80 00007ffd`5eb7042c MSHTML!CHtmPost::OnDwnChanCallback+0x40
11e 00000000`0008fad0 00007ffd`5ea97b71 MSHTML!CDwnChan::OnMethodCall+0x1c
11f 00000000`0008fb00 00007ffd`5ea90b39 MSHTML!GlobalWndOnMethodCall+0x251
120 00000000`0008fbb0 00007ffd`9a11bc50 MSHTML!GlobalWndProc+0xf9
121 00000000`0008fc40 00007ffd`9a11b5cf USER32!UserCallWinProcCheckWow+0x280
122 00000000`0008fda0 00000000`0045163e USER32!DispatchMessageWorker+0x19f
123 00000000`0008fe20 000000c0`42098d58 FindFilesGo+0x5163e
124 00000000`0008fe28 00000000`00790ca8 0x000000c0`42098d58
125 00000000`0008fe30 000000c0`42098cf0 FindFilesGo+0x390ca8
126 00000000`0008fe38 000000c0`42098cf0 0x000000c0`42098cf0
127 00000000`0008fe40 00000000`009d5be8 0x000000c0`42098cf0
128 00000000`0008fe48 00000000`00000000 0x9d5be8

Go code doing the windows message pump https://github.com/lxn/walk/blob/master/form.go#L336

@kjk

This comment has been minimized.

Copy link
Author

commented Jul 10, 2017

For easy reproduction, I've created https://github.com/kjk/go20975

@aclements

This comment has been minimized.

Copy link
Member

commented Jul 10, 2017

Stack sizes isn't a knob programmers should think about in Go.

The problem isn't in Go code.

The problem is that Go linker sets very small stack size (I think 128 kB) in PE executables. The standard on Windows is more like 1-2 MB.

Based on my reading of the code, the stack should be set to 2MB in cgo binaries. It's only 128KB in pure Go binaries. Could you double check what the PE header says?

@kjk

This comment has been minimized.

Copy link
Author

commented Jul 10, 2017

0:000> !dh 400000 -f

File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
    8664 machine (X64)
       C number of sections
       0 time date stamp
  4D4400 file pointer to symbol table
    2126 number of symbols
      F0 size of optional header
     223 characteristics
            Relocations stripped
            Executable
            App can handle >2gb addresses
            Debug information stripped

OPTIONAL HEADER VALUES
     20B magic #
    3.00 linker version
  32FE00 size of code
   1B600 size of initialized data
       0 size of uninitialized data
   51440 address of entry point
    1000 base of code
         ----- new -----
0000000000400000 image base
    1000 section alignment
     200 file alignment
       3 subsystem (Windows CUI)
    4.00 operating system version
    1.00 image version
    4.00 subsystem version
  56B000 size of image
     600 size of headers
       0 checksum
0000000000020000 size of stack reserve
0000000000001000 size of stack commit
0000000000100000 size of heap reserve
0000000000001000 size of heap commit

0x20000 == 131,072 bytes.

The thing is, this code doesn't use cgo. All calls to win libraries are via syscall.

@kjk

This comment has been minimized.

Copy link
Author

commented Jul 10, 2017

The discussion in lxn/walk#261 claims that adding a dummy import _ "runtime/cgo" to the program would force the large stack but it doesn't seem to be the case in my testing. Maybe that changed sometime before 1.8.3.

@kjk

This comment has been minimized.

Copy link
Author

commented Jul 10, 2017

Scratch that, it probably would have worked if I had gcc installed:

C:\Go\pkg\tool\windows_amd64\link.exe: running gcc failed: exec: "gcc": executable file not found in %PATH%
@alexbrainman

This comment has been minimized.

Copy link
Member

commented Jul 11, 2017

I had quick look at how we set reserved stack size.

For Cgo program, it is 1M for 386 and 2M for amd64.
Search for oh64.SizeOfStackReserve and oh.SizeOfStackReserve in cmd/link/internal/ld/pe.go for first thread. All other threads start with

_beginthread(..., 0, ...)

in runtime/cgo/gcc_windows_386.c and runtime/cgo/gcc_windows_amd64.c, and _beginthread defaults to what first thread does.

For non-Cgo program, it is 128K.
Search for oh64.SizeOfStackReserve and oh.SizeOfStackReserve in cmd/link/internal/ld/pe.go for first thread. All other threads start with

stdcall6(_CreateThread, ..., 0x20000, ..., _STACK_SIZE_PARAM_IS_A_RESERVATION, ...)

So Cgo programs run with standard (by Windows standards) stacks. But non-Cgo programs run with small stacks. CL 2237 doubled the stack size (from 64K to 128K) already. @kjk can you, please, try make it 256K for non-Cgo - see if it fixes your problem. Maybe we can increase stack size again. I don't think it matters on amd64, and we have less and less of 386 computers.

Thank you

Alex

@kjk

This comment has been minimized.

Copy link
Author

commented Jul 11, 2017

I tried 256kb, 512kb, 1mb, 2mb. My websites stops crashing at 512mb. Other websites need more. My vote would be for 2MB on win/64bit.

@alexbrainman

This comment has been minimized.

Copy link
Member

commented Jul 11, 2017

My websites stops crashing at 512mb.

I was hoping it would work with 256K.

My vote would be for 2MB on win/64bit.

What about 386? Should we make stacks different on 386 and amd64?

Alex

@kjk

This comment has been minimized.

Copy link
Author

commented Jul 11, 2017

For 386 the standard seems to be 1 MB.

@alexbrainman

This comment has been minimized.

Copy link
Member

commented Jul 11, 2017

For 386 the standard seems to be 1 MB.

This will only allow Go program to start about 1000 threads on 386. See my comment https://go-review.googlesource.com/#/c/2237/2//COMMIT_MSG@18 Is that acceptable?

Alex

@kjk

This comment has been minimized.

Copy link
Author

commented Jul 11, 2017

1000 real OS threads on 32 bit windows should be more than enough for anybody. Any program with that many threads would trash the CPU anyway - there's really no point launching more than ~NUMCPU os threads.

It's really mostly about main GUI thread, which is the thread pumping the message loop which ends up calling wndproc of windows and the code called from there.

That's the main thread that is created outside of control of Go runtime and uses the stack size from PE header.

I imagine other threads created by the Go runtime can have a stack size independent of PE value (given to CreateThread) and there's argument that those threads are not expected to have such heavy nesting as main GUI thread so could get away with less stack.

@alexbrainman

This comment has been minimized.

Copy link
Member

commented Jul 11, 2017

1000 real OS threads on 32 bit windows should be more than enough for anybody.

It is plenty for me. But other people might think differently.

Any program with that many threads would trash the CPU anyway - there's really no point launching more than ~NUMCPU os threads.

I really do not know. I will let @aclements decide.

It's really mostly about main GUI thread, which is the thread pumping the message loop which ends up calling wndproc of windows and the code called from there.

That's the main thread that is created outside of control of Go runtime and uses the stack size from PE header.

GUI does not have to run on the main thread. You can even run multiple GUI threads.

I imagine other threads created by the Go runtime can have a stack size independent of PE value (given to CreateThread) and there's argument that those threads are not expected to have such heavy nesting as main GUI thread so could get away with less stack.

I think these different thread sizes for different environments are already complicated enough. I do not want to make things even more complicated.

Alex

@aclements

This comment has been minimized.

Copy link
Member

commented Jul 11, 2017

Can you explain how a "system call" wound up deep in the MSHTML library? (Every time I think I'm starting to get a feel for Windows there's something even more mystifying...) Skimming over lxn, it looks like the Go syscall package has provided enough rope to actually call into arbitrary Windows libraries without going through cgo, but I'm still not sure how that wound up in MSHTML...

Maybe a call to syscall.LoadLibrary (or syscall.GetProcAddress?) from outside std should trigger larger stacks?

Any program with that many threads would trash the CPU anyway - there's really no point launching more than ~NUMCPU os threads.

I wish this were true. :) There's (basically) no point in launching more than NUMCPU compute threads, but threads are also the unit of blocking for many operations, including many system calls and cgo calls. We generally do a good job of mapping blocking IO on to asynchronous system calls, but other things consume extra threads. So it's really NUMCPU + the number of things that can be blocked or in cgo simultaneously.

I imagine other threads created by the Go runtime can have a stack size independent of PE value (given to CreateThread) and there's argument that those threads are not expected to have such heavy nesting as main GUI thread so could get away with less stack.

Unless you're calling runtime.LockOSThread from init (maybe you are), Go makes no distinction between the "main" thread and other threads created later. It's just a big thread pool to the runtime.

@kjk

This comment has been minimized.

Copy link
Author

commented Jul 11, 2017

Can you explain how a "system call" wound up deep in the MSHTML library?

Please see the callstack in #20975 (comment)

The structure of Window GUI app is:

  • you create your windows
  • you run the messsage pump

Message pump is:

	for fb.hWnd != 0 {
		switch win.GetMessage(&msg, 0, 0, 0) {
		case 0:
			return int(msg.WParam)

		case -1:
			return -1
		}

		if !win.IsDialogMessage(fb.hWnd, &msg) {
			win.TranslateMessage(&msg)
			win.DispatchMessage(&msg)
		}
	}

DispatchMessage is a syscall:

func DispatchMessage(msg *MSG) uintptr {
	ret, _, _ := syscall.Syscall(dispatchMessage, 1,
		uintptr(unsafe.Pointer(msg)),
		0,
		0)

	return ret
}

This triggers arbitrarily deep processing inside windows code. In this case mshtml control was told to display a website, so, after downloading html, it started to parse and render it and that happened on main GUI thread, as part of processing a message that was triggered with DispatchMessage.

@alexbrainman

This comment has been minimized.

Copy link
Member

commented Jul 12, 2017

Maybe a call to syscall.LoadLibrary (or syscall.GetProcAddress?) from outside std should trigger larger stacks?

LoadLibrary loads DLL from a file.
GetProcAddress finds a function (by name) inside of DLL that you just loaded into memory.
Once you have access to the function (address of it), and you know what parameters it takes, you can call it.
All these functions, obviously, assume you use standard Windows stack. Simple functions use very little stack, but complicated might use a lot.
Microsoft provide a lot of DLLs packed with different functions that do different things for you.
I have not used MSHTML DLL myself, but, from what I understand, it can display full browser inside of your GUI application. Not surprising it would use a lot of stack.
But I would say MSHTML DLL is probably extreme case here.

Alex

@aclements

This comment has been minimized.

Copy link
Member

commented Jul 12, 2017

@kjk, thanks for the explanation. It's interesting that message loops haven't changed since I last wrote Windows GUI code 15+ years ago. :) I take it some other part of your application set up the MSHTML control, again without using any cgo?

Once you have access to the function (address of it), and you know what parameters it takes, you can call it.

Right. My point is that we're now well outside the realm of what's really considered a "system call". I realize the distinction is blurry on Windows, but I assert that all of the "system calls" exposed by the std syscall package on Windows use very little stack and then enter the kernel (or use clever tricks to stack in user space, but still use very little stack). The stack size is set based on this assumption.

The lxn package may be using syscall.Syscall, but these really aren't system calls in the traditional sense, and they clearly violate the stack assumption. This was intended to be the realm of cgo calls, but the syscall package gives you enough power to make these calls without cgo on Windows.

That suggests that we need to either always use 2MB stacks on Windows, or we need to do something cleverer to detect if you're escaping into the deep C world, such as observing LoadLibrary calls from outside std.

/cc @randall77

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jul 12, 2017

I note that the linker already changes its behavior based on whether reflect.Value.Call is linked into the program. We could similarly detect syscall.LoadDLL.

@alexbrainman

This comment has been minimized.

Copy link
Member

commented Jul 13, 2017

I take it some other part of your application set up the MSHTML control, again without using any cgo?

Correct. The MSHTML control is just one of many used to build GUI windows. So you create a window, then you add controls to it (buttons, edit controls, list and others) - MSHTML control is just another control. You do all that by calling Windows APIs. I can give you some links, if you are interested, but these are just functions that are loaded from DLLs. There are some some complications, like for example, all GUI code must run on single thread (syscall.LockThread) and it uses callbacks (syscall.NewCallback), but Go program is no different from others that use theses DLLs.

but I assert that all of the "system calls" exposed by the std syscall package on Windows use very little stack and then enter the kernel (or use clever tricks to stack in user space, but still use very little stack). The stack size is set based on this assumption

On Windows every thread (including first) starts with standard Windows stack. All Windows syscalls switch to that stack before calling DLL functions. These DLL functions are called in user space. Some of them switch to kernel space, but some (for example if you build C DLL yourself) do not switch to kernel space.

they clearly violate the stack assumption

I do not see what is violated.

to either always use 2MB stacks on Windows

We can do that. @kjk program stops crashing at 512kb, so we can go with that instead.

Alex

@kjk

This comment has been minimized.

Copy link
Author

commented Jul 13, 2017

I don't think 512 kb is enough. It's where the simplest program stopped crashing on the simplest website. I haven't done much testing beyond that but it stands to reason other websites (or other programs) might require more than 512 kb.

I think the value should be what the standard for windows linker is i.e. 1 MB (https://msdn.microsoft.com/en-us/library/windows/desktop/ms686774(v=vs.85).aspx), which seems to be the same for 32bit and 64bit (https://blogs.technet.microsoft.com/markrussinovich/2009/07/05/pushing-the-limits-of-windows-processes-and-threads/).

Less than that and Go code will run into this crash more often than C code.

And I would still like the linker option to bump it to larger value if I determine that my particular application needs that, especially on 64bit where larger reserved stack has essentially no penalty because address space is aplenty.

I get the desire that make it "just work" but you can't in this particular case.

@alexbrainman

This comment has been minimized.

Copy link
Member

commented Jul 13, 2017

I don't think 512 kb is enough. It's where the simplest program stopped crashing on the simplest website. I haven't done much testing beyond that but it stands to reason other websites (or other programs) might require more than 512 kb.

Please, see if you can break Go with 512kb stack.

Thank you.

Alex

@kjk

This comment has been minimized.

Copy link
Author

commented Jul 13, 2017

There is no doubt that I can. I'll just allocate more than 512 kb on the stack.

I broke 256 kb on literally one of the first real programs I wrote that used mshtml. There's infinite number of websites and infinite number of programs that use other potentially stack hungry libraries.

If I'm intentional about it then I'll break any limit. If I'm not intentional and just writing random windows programs then I have infinite search space.

The question is: what is a reasonably safe value.

And the answer is: what everyone else is using i.e. 1 MB.

Those are the constraints under which Window OS developers and other WIndows developers operate. If they write code that uses more than 1 MB of stack, their programs will crash, so they have incentive to stay under that limit.

The lower the limit the higher probability that Go programs will crash because they'll hit the limit.

I'll also note that there might more bad going on than just stack size limit. When I use the repro program I linked to and visit https://vox.com, it crashes even if I set very large limit, like 2 MB. Sometimes it prints "fatal: more gc" sometimes it doesn't and it crashes it a very bad way (not an orderly panic that prints a crashing callstack but a vanishing act where process is wiped out without a trace).

Unfortunately delve doesn't debug this kind of code at all and windbg doesn't understand Go symbols and I know nothing about debugging runtime so that's as deep as I can go on this.

But I do have a consistent repro of a bad crash.

@aclements

This comment has been minimized.

Copy link
Member

commented Jul 13, 2017

I note that the linker already changes its behavior based on whether reflect.Value.Call is linked into the program. We could similarly detect syscall.LoadDLL.

@ianlancetaylor, I believe LoadDLL will always be linked into Windows programs because it's used by the syscall package itself. The linker would have to distinguish uses in syscall from uses outside the package. Alternatively, we could refactor syscall a bit so it uses some internal function. That's slightly tricky because it's a few layers deep, but might be easier than figuring out what calls it in the linker.

(syscall.LoadLibrary doesn't have this problem since we already have the exported/internal split.)

they clearly violate the stack assumption

I do not see what is violated.

@alexbrainman, I think we're talking past each other, since the violation I'm talking about is what this whole issue is about. The runtime assumes anything called a "syscall" will not consume more than 128kB of user-space stack. I believe that is true of everything exported by the syscall package itself. @kjk's application makes a "syscall" that overflows the 128kB stack, hence violating the runtime's assumption about stack usage.

And the answer is: what everyone else is using i.e. 1 MB.

Those are the constraints under which Window OS developers and other WIndows developers operate. If they write code that uses more than 1 MB of stack, their programs will crash, so they have incentive to stay under that limit.

@kjk, I think this is an excellent reason to raise the size to the default when there's any possibility of a call to regular Windows code (which we clearly need to be more conservative about). Maybe we should just always do this on 64-bit. But it also suggests that a linker flag to set it manually may be overkill? Unless it's common to build Windows applications with larger stacks?

I'll also note that there might more bad going on than just stack size limit.

What if you try setting it to something significantly larger? Say, 32MB?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2017

@havoc-io Sure, TLS may be involved. I don't know.

But what I was trying to explain above is that when a program is built without cgo, simply setting the stack size does absolutely nothing. When a program is built with cgo, then setting the stack size does change the stack size. Or so it seems to me.

@kjk

This comment has been minimized.

Copy link
Author

commented Aug 9, 2017

I might be wrong but _beginthread, when given 0 as stack size as in gcc_windows_amd64.c, uses the stack size provided in PE, so editbin would change it:

https://msdn.microsoft.com/en-us/library/kdzttdcb.aspx

The operating system handles the allocation of the stack when either _beginthread or _beginthreadex is called; you don't have to pass the address of the thread stack to either of these functions. In addition, the stack_size argument can be 0, in which case the operating system uses the same value as the stack that's specified for the main thread.

Also, the thread that crashes is main thread, which, I believe, is created by the OS.

Also, looking at the callstack of the crash it's pretty clear that the first bug to investigate and fix is the fact that Go runtime has infinite loop:

00 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
01 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822
02 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
03 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822

I haven't succeeded in getting symbols for the stack but it seems the crash happens because A calls B which calls A etc. and eventually that will run out of stack regardless of how much stack there is.

@havoc-io

This comment has been minimized.

Copy link

commented Aug 9, 2017

@ianlancetaylor Ah, so your point was that setting the stack size via editbin.exe would have no effect on the stack size for threads created via newosproc, which is the thread creation mechanism for the non-CGO case, so that perhaps playing with editbin.exe to set stack size wasn't really doing anything when we hadn't compiled with CGO. Is newosproc also used for creating the main thread, or would that thread's size be taken directly from the PE headers? I ask because the crash seems to be happening on the main thread (which is locked at program start in order to invoke GUI code). Also, what about threads created inside a DLL - would they be re-routed through newosproc or is that mechanism only really used for creating new M threads?

In any case, the stack size should be sufficiently large even in the non-CGO case in the Go 1.9 RCs (it's at least as big as the stacks for C/C++ binaries that use IE via COM/OLE and don't crash), but since the reproducer seems to crash with the Go 1.9 RCs, it definitely seems like there's something beyond stack size at play here.

@havoc-io

This comment has been minimized.

Copy link

commented Aug 9, 2017

@kjk You're right about the behavior of _beginthread, but I think his point was that the code in gcc_windows_amd64.c won't even be called in the non-CGO case. But I think you're right (and that's why I've asked) that the main thread will inherit the stack size from the PE headers since it's created by the OS, so that even in the non-CGO case, at least that thread should have a decent stack size if set by editbin.exe, so it wouldn't make sense for it to be running out.

But again, in any case, Go 1.9 RCs should be giving every thread loads of stack, and something is still clearly broken.

@kjk

This comment has been minimized.

Copy link
Author

commented Aug 10, 2017

After stepping through the code in the debugger, I don't think it's related to stack size.

The issue is that we have:

0:000> kb
 # RetAddr           : Args to Child                                                           : Call Site
00 00000000`0045104a : 00000000`004519ee 00000000`004304f0 00000000`00b9fef0 00000000`00000000 : go20975!runtime.morestack+0x26 [C:\Go\src\runtime\asm_amd64.s @ 382] 
01 00000000`004519ee : 00000000`004304f0 00000000`00b9fef0 00000000`00000000 00007ffa`59b6e618 : go20975!runtime.exitsyscallfast.func1+0xaa [C:\Go\src\runtime\proc.go @ 2717] 
02 00000000`004304f0 : 00000000`00b9fef0 00000000`00000000 00007ffa`59b6e618 00000000`00455804 : go20975!runtime.systemstack+0x7e [C:\Go\src\runtime\asm_amd64.s @ 347] 
03 00000000`00b9fef0 : 00000000`00000000 00007ffa`59b6e618 00000000`00455804 00000000`006307d8 : go20975!runtime.mstart [C:\Go\src\runtime\proc.go @ 1125] 
04 00000000`00000000 : 00007ffa`59b6e618 00000000`00455804 00000000`006307d8 00000000`00b8be40 : 0xb9fef0

We are in:

TEXT runtime·morestack(SB),NOSPLIT,$0-0
	// Cannot grow scheduler stack (m->g0).
	get_tls(CX)
	MOVQ	g(CX), BX
	MOVQ	g_m(BX), BX
	MOVQ	m_g0(BX), SI
	CMPQ	g(CX), SI
	JNE	3(PC)
	CALL	runtime·badmorestackg0(SB)
	INT	$3

The source of the problem, it seems, is that morestack was called on scheduler stack i.e. g.m.g0 == g.

Best I can tell, this is called from exitsyscallfast:

	if sched.pidle != 0 {
		var ok bool
		systemstack(func() {
			ok = exitsyscallfast_pidle()
			if ok && trace.enabled {
				if oldp != nil {
					// Wait till traceGoSysBlock event is emitted.
					// This ensures consistency of the trace (the goroutine is started after it is blocked).
					for oldp.syscalltick == _g_.m.syscalltick {
						osyield()
					}
				}
				traceGoSysExit(0)
			}
		})
		if ok {
			return true
		}
	}
	return false

It seems like systemstack is supposed to ensure that the function called is not on scheduler stack but sometimes it fails.

Then the called function has unconditional call to morecallstack which triggers call to badmorestack0 which seems to be "should never happen" thing.

Unfortunately systemstack purposefully cuts off callstack, so I can't tell which function calls it but this is correlated with mshtml showing a modal dialog, so it most likely has to do with nested Go => C => Go transitions.

@kjk

This comment has been minimized.

Copy link
Author

commented Aug 10, 2017

Narrowing it down further, it seems like it's always a result of calling back from C to Go:

00 00000000`00b975c0 : go20975!runtime.exitsyscallfast+0xcc [C:\Go\src\runtime\proc.go @ 2717] 
01 00000000`00000028 : 0xb975c0
02 00000000`00450fa0 : 0x28
03 000000c0`42069357 : go20975!runtime.exitsyscallfast.func1 [C:\Go\src\runtime\proc.go @ 2717] 
04 000000c0`42019300 : 0x000000c0`42069357
05 000000c0`4201e000 : 0x000000c0`42019300
06 000000c0`420693a8 : 0x000000c0`4201e000
07 00000000`00434777 : 0x000000c0`420693a8
08 00000003`00000002 : go20975!runtime.exitsyscall+0x67 [C:\Go\src\runtime\proc.go @ 2630] 
09 000000c0`4201e000 : 0x00000003`00000002
0a 000000c0`4201e000 : 0x000000c0`4201e000
0b 000000c0`42019300 : 0x000000c0`4201e000
0c 000000c0`42069410 : 0x000000c0`42019300
0d 00000000`004026bc : 0x000000c0`42069410
0e 00000000`00000000 : go20975!runtime.cgocallbackg+0x7c [C:\Go\src\runtime\cgocall.go @ 184] 

I captured it by using runtime.exitsyscallfast+0xcc "~. kb; g" which , in windbg, breaks in exitsyscallfast right before calling systemstack, prints callstack (kb) and continues (g).

systemstack will then call lamda, which will call morestack which will crash with cut off callstack.

The last callstack is the one which triggered systemstack. It's still not the full callstack.

@kjk

This comment has been minimized.

Copy link
Author

commented Aug 10, 2017

This seems to be limited to amd64. I spent a while trying to repro this on 386 build and couldn't.

@havoc-io

This comment has been minimized.

Copy link

commented Aug 10, 2017

Just to be clear - this debugging is on non-CGO binaries (i.e. those without a blank runtime/cgo import)? I noticed the runtime.cgocallbackg frame in your stack trace, but I guess maybe this CGO functionality is re-used in the Windows syscall implementation?

@kjk

This comment has been minimized.

Copy link
Author

commented Aug 10, 2017

Yes, non-CGO, latest tests are with go 1.9 rc2

@alexbrainman

This comment has been minimized.

Copy link
Member

commented Aug 11, 2017

Is newosproc also used for creating the main thread, or would that thread's size be taken directly from the PE headers?

newosproc is not used to create main thread. main thread is created by Windows process loader. The loader uses pe file parameters to configure main thread stack.

Also, what about threads created inside a DLL - would they be re-routed through newosproc or is that mechanism only really used for creating new M threads?

If any non Go code calls Windows CreateThread API, the call cannot magically "be re-routed through newosproc". In fact this scenario is broken at this moment (see issue #6751).

I would not be surprised if your problem is a dup of #6751. Given how much external code you use, can you be certain that none of that code calls Go code on a thread that has not been created by Go?

Alex

@kjk

This comment has been minimized.

Copy link
Author

commented Aug 11, 2017

I would not be surprised if your problem is a dup of #6751. Given how much external code you use, can you be certain that none of that code calls Go code on a thread that has not been created by Go?

The crash always happen on the main thread.

If you read earlier comments I think I narrowed it down to morestack crashing because it detects we're on scheduler stack even though it's being called via systemstack() whose purpose seems to be ensuring that doesn't happen.

So my best guess is that systemstack on amd64 doesn't always do the job (the issue doesn't happen on i386 which is not unexpected because it's different assembly code).

gopherbot pushed a commit that referenced this issue Jan 3, 2018

runtime: always use 1MB stacks on 32-bit Windows
Commit c2c07c7 (CL 49331) changed the linker and runtime to always
use 2MB stacks on 64-bit Windows. This is the corresponding change to
make 32-bit Windows always use large (1MB) stacks because it's
difficult to detect when Windows applications will call into arbitrary
C code that may expect a large stack.

This is done as a separate change because it's possible this will
cause too much address space pressure for a 32-bit address space. On
the other hand, cgo binaries on Windows already use 1MB stacks and
there haven't been complaints.

Updates #20975.

Change-Id: I8ce583f07cb52254fb4bd47250f1ef2b789bc490
Reviewed-on: https://go-review.googlesource.com/49610
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>

@ianlancetaylor ianlancetaylor modified the milestones: Go1.10, Go1.11 Jan 3, 2018

@xgfone

This comment has been minimized.

Copy link

commented May 17, 2018

Hi @kjk, I have tested your app.

test.manifest is from https://github.com/lxn/walk#create-manifest-testmanifest .

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0" xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
    <assemblyIdentity version="1.0.0.0" processorArchitecture="*" name="SomeFunkyNameHere" type="win32"/>
    <dependency>
        <dependentAssembly>
            <assemblyIdentity type="win32" name="Microsoft.Windows.Common-Controls" version="6.0.0.0" processorArchitecture="*" publicKeyToken="6595b64144ccf1df" language="*"/>
        </dependentAssembly>
    </dependency>
    <asmv3:application>
        <asmv3:windowsSettings xmlns="http://schemas.microsoft.com/SMI/2005/WindowsSettings">
            <dpiAware>true</dpiAware>
        </asmv3:windowsSettings>
    </asmv3:application>
</assembly>
$ go get -u github.com/lxn/win
$ go get -u github.com/lxn/walk
$ go get -u github.com/akavel/rsrc
$ rsrc -manifest test.manifest -o rsrc.syso
$ go build -ldflags="-H windowsgui"
$ ./testwalk.exe   # It can run as expected, not be crashed!

My Windows OS is Windows 10 64 Bit. Go version is as follow.

$ go version
go version go1.10 windows/amd6

$ go env
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\xgf\AppData\Local\go-build
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=D:\gopath
set GORACE=
set GOROOT=C:\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\xgf\AppData\Local\Temp\go-build028597650=/tmp/go-build -gno-record-gcc-switches

/all, FYI!


Updated:

2018517103740

201851710389

@kjk

This comment has been minimized.

Copy link
Author

commented May 17, 2018

@xgfone my website (https://blog.kowalczyk.info) is no longer a good test because I fixed a bug on my website generator that created mis-formatted, deeply nested html that triggered this problem easily.

Try https://vox.com or some other complicated website and make sure to click on links etc. It doesn't necessarily just crash on first render.

@xgfone

This comment has been minimized.

Copy link

commented May 17, 2018

@kjk Yes, I have also tested https://vox.com.

It works after starting, and the program does not crash. Then I clicked and opened some links. When going on clicking a link, it crashed as follow.

# ...
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
Segmentation fault

The number of the buffer line of my console is 10000, and the output has exceeded the whole console buffer.

FYI.

@aclements

This comment has been minimized.

Copy link
Member

commented Jun 19, 2018

If you read earlier comments I think I narrowed it down to morestack crashing because it detects we're on scheduler stack even though it's being called via systemstack() whose purpose seems to be ensuring that doesn't happen.

Based on your debugging, both systemstack and morestack appear to be doing what they're supposed to. systemstack switches to the scheduler (aka system aka g0) stack, and morestack detects that we ran out of space on the scheduler stack. systemstack isn't supposed to ensure morestack doesn't crash.

The question is why we ran out of space on the scheduler stack. Are you able to print the value of RSP and g0.stack.lo and g0.stack.hi at the crash from windbg?

When going on clicking a link, it crashed as follow.

fatal: morestack on g0
fatal: morestack on g0

I suspect I know why it's repeating until it crashes hard. After printing this message, it does a runtime.abort, which raises an INT $3, which I suspect is going into the runtime's exception handler, which eventually tries to call runtime.exceptionhandler on the g0 stack. That detects that we're out of g0 stack space and aborts again. @alexbrainman, does this sound plausible? Is there a way we can abort the process without recursively invoking the exception handler? Or maybe runtime.sigtramp should detect debug exceptions and return without calling runtime.exceptionhandler?

@aclements

This comment has been minimized.

Copy link
Member

commented Jun 20, 2018

Actually, the fact that it was able to print >10000 "morestack on g0" errors before segfaulting is telling. I assume the OS stack has a guard page, which means there must be a lot more space on the OS stack than Go thinks there is.

@aclements

This comment has been minimized.

Copy link
Member

commented Jun 20, 2018

Oh, I'm pretty sure I know what's going on. While we allocate large stacks for both the main thread and new threads, both rt0_go (in the non-cgo case) and tstart_stdcall (only called in the non-cgo case) set the g0 stack bounds assuming the stack is only 64k. Hence, if C code calls back into Go code, it's entirely possible that we have plenty of stack space left, but we've gone past the 64k the runtime thinks it has.

(Edited to add: in the cgo case it looks like we get the stack bounds right for all of the threads. x_cgo_init in runtime/cgo/gcc_windows_amd64.c updates the bounds for the main thread and threadentry updates them for new threads.)

@kjk

This comment has been minimized.

Copy link
Author

commented Jun 20, 2018

@aclements That makes sense.

I'm not sure I'll be capable of testing unreleased Go version but if you have a fix, it should be easy to verify using https://github.com/kjk/go20975 test program and going e.g. to vox.com and clicking around.

It should crash 100% with current Go and not crash after the fix.

@aclements

This comment has been minimized.

Copy link
Member

commented Jun 20, 2018

@kjk, thanks. It would be easy if I had a graphical Windows machine/VM. :)

@alexbrainman, can you think of an easy automated test for this? I think I need a C function in a DLL that uses lots of stack and then calls back into Go, but all in a non-cgo binary.

@gopherbot

This comment has been minimized.

Copy link

commented Jun 20, 2018

Change https://golang.org/cl/120195 mentions this issue: runtime: initialize g0 stack bounds on Windows to full stack

@gopherbot

This comment has been minimized.

Copy link

commented Jun 21, 2018

Change https://golang.org/cl/120336 mentions this issue: runtime: query thread stack size from OS on Windows

@alexbrainman

This comment has been minimized.

Copy link
Member

commented Jun 23, 2018

I suspect I know why it's repeating until it crashes hard. After printing this message, it does a runtime.abort, which raises an INT $3, which I suspect is going into the runtime's exception handler, which eventually tries to call runtime.exceptionhandler on the g0 stack. That detects that we're out of g0 stack space and aborts again. @alexbrainman, does this sound plausible?

It does sounds reasonable (you assume that runtime.morestack is called on runtime.exceptionhandler entry).

Is there a way we can abort the process without recursively invoking the exception handler?

I suppose we could call runtime.exit instead of executing INT 3.

Or maybe runtime.sigtramp should detect debug exceptions and return without calling runtime.exceptionhandler?

We do handle debug exceptions (search for _EXCEPTION_BREAKPOINT), but it is too late for this scenario. So, sure, we could modify runtime.sigtramp to handle this situation. Maybe we could mark runtime.exceptionhandler or at least some part of it as "no to grow stack" instead?

I think I need a C function in a DLL that uses lots of stack and then calls back into Go, but all in a non-cgo binary.

We have quite a few tests in runtime package that builds C dlls on the fly and use them to test runtime code. See TestStdcallAndCDeclCallbacks, TestReturnAfterStackGrowInCallback, TestFloatArgs and TestDLLPreloadMitigation.

Alex

@gopherbot gopherbot closed this in 52e782a Jul 2, 2018

gopherbot pushed a commit that referenced this issue Jul 2, 2018

runtime: query thread stack size from OS on Windows
Currently, on Windows, the thread stack size is set or assumed in many
different places. In non-cgo binaries, both the Go linker and the
runtime have a copy of the stack size, the Go linker sets the size of
the main thread stack, and the runtime sets the size of other thread
stacks. In cgo binaries, the external linker sets the main thread
stack size, the runtime assumes the size of the main thread stack will
be the same as used by the Go linker, and the cgo entry code assumes
the same.

Furthermore, users can change the main thread stack size using
editbin, so the runtime doesn't even really know what size it is, and
user C code can create threads with unknown thread stack sizes, which
we also assume have the same default stack size.

This is all a mess.

Fix the corner cases of this and the duplication of knowledge between
the linker and the runtime by querying the OS for the stack bounds
during thread setup. Furthermore, we unify all of this into just
runtime.minit for both cgo and non-cgo binaries and for the main
thread, other runtime-created threads, and C-created threads.

Updates #20975.

Change-Id: I45dbee2b5ea2ae721a85a27680737ff046f9d464
Reviewed-on: https://go-review.googlesource.com/120336
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.