New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: TestWindowsStackMemoryCgo is flaky #22575

Open
alexbrainman opened this Issue Nov 4, 2017 · 10 comments

Comments

Projects
None yet
7 participants
@alexbrainman
Member

alexbrainman commented Nov 4, 2017

CL 74490 has added TestWindowsStackMemoryCgo that has been flaky. It fails with on windows-386-2008 builder

https://build.golang.org/log/6eac250b23e6d93f36a6824e260c69c67bd639be
https://build.golang.org/log/8528fd237adab902144de46f4a1814a162ed1514
https://build.golang.org/log/790eb5b35c79bb3154775987f5294c66715ed85a

--- FAIL: TestWindowsStackMemoryCgo (0.03s)
	crash_cgo_test.go:460: Failed to read stack usage: strconv.Atoi: parsing "59678\r\nThis application has requested the Runtime to terminate it in an unusual way.\nPlease contact the application's support team for more information.\r\nruntime: failed to create new OS thread (12)\r\n": invalid syntax
FAIL
FAIL	runtime	19.962s

on windows-amd64-2008 builder

https://build.golang.org/log/48659fdd5ce7d5fa4b5ac88009a1bb58a3ca3989
https://build.golang.org/log/b90c3a5bac8098da6dbbd0f1de78b349885d064f

and on windows-amd64-race builder

https://build.golang.org/log/c15b87b42bd461acb0066ca98b8f22aa982b5fc2

The error 12 is (from https://docs.microsoft.com/en-us/cpp/c-runtime-library/errno-doserrno-sys-errlist-and-sys-nerr ) ENOMEM Not enough memory.

We also had trybots failed with different message

https://storage.googleapis.com/go-build-log/7228f1ad/windows-386-2008_ede19d12.log

--- FAIL: TestWindowsStackMemoryCgo (0.03s)
	crash_cgo_test.go:460: Failed to read stack usage: strconv.Atoi: parsing "69550\r\nThis application has requested the Runtime to terminate it in an unusual way.\nPlease contact the application's support team for more information.\r\nruntime: failed to create new OS thread (13)\r\n": invalid syntax
FAIL
FAIL	runtime	21.491s

13 is EACCES Permission denied

Alex

@bradfitz

This comment has been minimized.

Member

bradfitz commented Nov 4, 2017

Copying my note from the other recent Windows memory bug: these VMs have 3.6 GB of memory.

@alexbrainman

This comment has been minimized.

Member

alexbrainman commented Nov 4, 2017

Also not that windows-amd64-2012 and windows-amd64-2016 are not affected (yet).
And I could not reproduce this failure here even with 500 (instead of 100) threads on both my Windows 10 and XP.

Alex

@gopherbot

This comment has been minimized.

gopherbot commented Nov 4, 2017

Change https://golang.org/cl/76030 mentions this issue: runtime: skip flaky TestWindowsStackMemoryCgo

gopherbot pushed a commit that referenced this issue Nov 4, 2017

runtime: skip flaky TestWindowsStackMemoryCgo
Updates #22575

Change-Id: I1f848768934b7024d2ef01db13b9003e9ca608a0
Reviewed-on: https://go-review.googlesource.com/76030
Reviewed-by: Russ Cox <rsc@golang.org>
@as

This comment has been minimized.

Contributor

as commented Nov 4, 2017

Title: s,falky,flaky,

@alexbrainman alexbrainman changed the title from runtime: TestWindowsStackMemoryCgo is falky to runtime: TestWindowsStackMemoryCgo is flaky Nov 4, 2017

@alexbrainman

This comment has been minimized.

Member

alexbrainman commented Nov 4, 2017

Title: s,falky,flaky,

Done. Thank you.

Alex

@aclements

This comment has been minimized.

Member

aclements commented Nov 6, 2017

Copying/updating some discussion from CL 74490.

The order of the failure output strikes me as incredibly odd. First is the final output of the program just before it (in theory) exits, then the output from abort(), which you'd think would kill the program, then the output from the fprintf just before the abort (maybe stderr is buffered, though that would be really annoying).

I'm wondering if the problem is actually that we're creating a thread at program teardown and it has nothing to do with memory. After all, according to the program's own output it should only be using ~10MB of memory.

We've had similar problems in the past with races between ExitProcess and CreateThread causing CreateThread to fail. We solved this by adding a runtime.exiting flag that inhibits complaining about CreateThread failures. But we don't have the equivalent check in cgo binaries. Maybe we just need to check this flag in _cgo_sys_thread_start?

@rsc

This comment has been minimized.

Contributor

rsc commented Nov 6, 2017

I was going to suggest that it might be an exit vs thread-create as well but I remembered that as being a Linux problem. If there's no check in _cgo_sys_thread_start it certainly seems like there should be. There would probably always be a race window, but at least we can minimize it.

@alexbrainman

This comment has been minimized.

Member

alexbrainman commented Nov 7, 2017

We solved this by adding a runtime.exiting flag that inhibits complaining about CreateThread failures. But we don't have the equivalent check in cgo binaries. Maybe we just need to check this flag in _cgo_sys_thread_start?

How do I check runtime.exiting from _cgo_sys_thread_start? There is no Go code in runtime/cgo, and we need runtime/internal/atomic.Load to do the checking.

Also I still cannot reproduce the failure here. It would be nice to be able to reproduce the failure to make sure we actually fixed it.

Alex

@aclements

This comment has been minimized.

Member

aclements commented Nov 7, 2017

I was going to suggest that it might be an exit vs thread-create as well but I remembered that as being a Linux problem.

I think that particular combination is actually safe (it's exec and thread create on Linux that isn't, or wasn't). Exiting a process on Windows is a complex and decidedly non-atomic process that can race in unfortunate ways with thread creation.

There would probably always be a race window, but at least we can minimize it.

There's actually no race window since we only use this information after CreateThread fails. I suppose technically there's a race the other way: we could incorrectly suppress a true failure when exiting, but I'm okay with that. :)

How do I check runtime.exiting from _cgo_sys_thread_start?

I would suggest giving runtime.exiting a go:linkname so you can access it as an extern variable from C and then just using __sync_fetch_and_add to read it. That should be compatible with atomic.Load.

@alexbrainman

This comment has been minimized.

Member

alexbrainman commented Nov 7, 2017

I would suggest giving runtime.exiting a go:linkname so you can access it as an extern variable from C and then just using __sync_fetch_and_add to read it. That should be compatible with atomic.Load.

Thank you.I will try that. When I have some time.

Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment