Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: TestDWARF*/testprog is flaky on darwin/amd64 #32218

Open
josharian opened this issue May 24, 2019 · 6 comments

Comments

@josharian
Copy link
Contributor

commented May 24, 2019

At b84e0bc, on darwin/amd64:

$ go test -run=TestDWARF/testprog -count=10 cmd/link
--- FAIL: TestDWARFiOS (0.53s)
    --- FAIL: TestDWARFiOS/testprog#03 (2.03s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.50s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.78s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.52s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.96s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.54s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.83s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.53s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.98s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.54s)
    --- FAIL: TestDWARFiOS/testprog#02 (1.99s)
        dwarf_test.go:96: exit status 255
FAIL
FAIL	cmd/link	47.577s
FAIL

I've also seen

--- FAIL: TestDWARF (0.22s)
    --- FAIL: TestDWARF/testprog (3.00s)
        dwarf_test.go:96: exit status 255
FAIL

The failures are intermittent.

cc @dr2chase

@cherrymui

This comment has been minimized.

Copy link
Contributor

commented May 24, 2019

Happened for me as well, with exactly same error. I commented on CL https://go-review.googlesource.com/c/go/+/170377

@odeke-em

This comment has been minimized.

Copy link
Member

commented Jun 23, 2019

Running on c11f6c4

go version devel +c11f6c4929 Fri Jun 21 21:24:47 2019 +0000 darwin/amd64

uname

uname -a
Darwin Emmanuels-MacBook-Pro-2.local 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64

On the first try I get a failure but this time spectacularly with a 1.2MB log whose beginning is something like

--- FAIL: TestDWARFiOS (0.54s)
    --- FAIL: TestDWARFiOS/testprogcgo#01 (2.33s)
        dwarf_test.go:96: symbols testprogcgo.exe: exit status 255: /var/folders/v3/7z434qpx5v3bw0wh8h2myfpw0000gn/T/go-link-TestDWARF414534277/testprogcgo.exe [arm64, 0.004386 seconds]:
                A829102B-EE24-329C-874B-9BAAE2337C79 /private/var/folders/v3/7z434qpx5v3bw0wh8h2myfpw0000gn/T/go-link-TestDWARF414534277/testprogcgo.exe [AOUT, PIE, FaultedFromDisk]
                    0x0000000000000000 (0x100000000) __PAGEZERO SEGMENT
                    0x0000000100000000 ( 0xbc000) __TEXT SEGMENT
                        0x0000000100000000 (  0x4c90) MACH_HEADER
                        0x0000000100004c90 ( 0xb67a4) __TEXT __text
                            0x0000000100004c90 (    0x68) runtime.text [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100004cf8 (    0x40) internal/cpu.Initialize [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100004d38 (   0x620) internal/cpu.processOptions [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005358 (    0x50) internal/cpu.indexByte [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x00000001000053a8 (   0x680) internal/cpu.doinit [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a28 (    0x10) type..hash.internal/cpu.CacheLinePad [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a38 (    0x10) type..eq.internal/cpu.CacheLinePad [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a48 (    0x50) type..hash.internal/cpu.arm64 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a98 (    0x60) type..eq.internal/cpu.arm64 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005af8 (    0x70) type..hash.internal/cpu.option [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005b68 (    0xa0) type..eq.internal/cpu.option [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005c08 (    0x50) type..hash.internal/cpu.x86 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005c58 (    0x50) type..eq.internal/cpu.x86 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005ca8 (    0x70) type..hash.[24]internal/cpu.option [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
....
<FRAMES_INTENTIONALLY_ELIDED_BY_MYSELF_SINCE_IT_IS_A_LARGE_FILE>
                            0x00000001001e3d20 (    0x10) syscall.envOnce [NameNList, MangledNameNList, NList]
                            0x00000001001e3d30 (    0x10) time.localOnce [NameNList, MangledNameNList, NList]
                            0x00000001001e3d40 (    0x10) runtime.writeBarrier [NameNList, MangledNameNList, NList]
                            0x00000001001e3d50 (    0x20) syscall.ForkLock [NameNList, MangledNameNList, NList]
                            0x00000001001e3d70 (    0x30) syscall.envLock [NameNList, MangledNameNList, NList]
                            0x00000001001e3da0 (    0x20) runtime.hashkey [NameNList, MangledNameNList, NList]
                            0x00000001001e3dc0 (    0x40) runtime.newmHandoff [NameNList, MangledNameNList, NList]
                            0x00000001001e3e00 (    0x40) runtime.sig [NameNList, MangledNameNList, NList]
                            0x00000001001e3e40 (    0x40) runtime.execLock [NameNList, MangledNameNList, NList]
                            0x00000001001e3e80 (    0x40) runtime.debug [NameNList, MangledNameNList, NList]
                            0x00000001001e3ec0 (    0x40) runtime.finptrmask [NameNList, MangledNameNList, NList]
                            0x00000001001e3f00 (    0x80) runtime.aeskeysched [NameNList, MangledNameNList, NList]
                            0x00000001001e3f80 (    0x80) runtime.handlingSig [NameNList, MangledNameNList, NList]
                            0x00000001001e4000 (    0xa0) internal/cpu.X86 [NameNList, MangledNameNList, NList]
                            0x00000001001e40a0 (    0xa0) internal/cpu.ARM64 [NameNList, MangledNameNList, NList]
                            0x00000001001e4140 (    0xa0) runtime.gcController [NameNList, MangledNameNList, NList]
                            0x00000001001e41e0 (   0x100) runtime.fwdSig [NameNList, MangledNameNList, NList]
                            0x00000001001e42e0 (   0x200) runtime.printBacklog [NameNList, MangledNameNList, NList]
                            0x00000001001e44e0 (   0x400) runtime.zeroVal [NameNList, MangledNameNList, NList]
                            0x00000001001e48e0 (  0x1758) runtime.memstats [NameNList, MangledNameNList, NList]
                    0x00000001001e8000 ( 0x7c000) __LINKEDIT SEGMENT
            symbols[65885]: [fatal]  child process status could not be determined; exit code unavailable.
FAIL
exit status 1
FAIL    cmd/link        47.568s

with the output at https://gist.github.com/odeke-em/85bb51ded80d9239a97c435f875b04c5

@aclements

This comment has been minimized.

Copy link
Member

commented Jun 27, 2019

Some notes:

As far as I can tell, this has never happened on the dashboard based on greplogs -E 'dwarf_test.*(exit status 255|symbols)' -dashboard.

CL 171819 added better logging for this (previously we didn't print the symbols output). That's why the original report just has "exit status 255", but @odeke-em's report has much more information.

Unfortunately, it seems like the symbols tool may just be flaky. Just searching around for that error reveals other people seeing it occasionally in various macOS development tools outside the context of Go.

Could someone with a Mac try just running symbols (or maybe the test) in a loop to see how often it flakes?

@dr2chase

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2019

It may require Darwin 18.5, which is one back from current.
I've tried running it in a loop with luck at seeing the bug, will try some more.
(Maybe a busy Mac helps?)

@dr2chase

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2019

About 2000 repetitions on Darwin 18.6 (aka Mojave, macOS 10.14.5), with no failure, on a busy-ish machine. Still running.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 2, 2019

In the output above the symbols program completes, but then reports an error symbols[65885]: [fatal] child process status could not be determined; exit code unavailable. I can see the error string if I run strings on /usr/bin/symbols, so it's definitely coming from there. I can't find any reports of this error on the Interwebs (the top search result for me is in fact this issue). I don't know why the symbols program would start a child process at all. I don't know why it would be unable to determine the status of that child process. I can't find the source code of the symbols program; I don't know if it is available at all.

Moving to 1.14 but if we can't figure out a reproducer, or get the source code, I doubt there is much that we can do.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.13, Go1.14 Aug 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.