Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: TestDWARF*/testprog is flaky on darwin/amd64 #32218

Open
josharian opened this issue May 24, 2019 · 9 comments
Open

cmd/link: TestDWARF*/testprog is flaky on darwin/amd64 #32218

josharian opened this issue May 24, 2019 · 9 comments

Comments

@josharian
Copy link
Contributor

@josharian josharian commented May 24, 2019

At b84e0bc, on darwin/amd64:

$ go test -run=TestDWARF/testprog -count=10 cmd/link
--- FAIL: TestDWARFiOS (0.53s)
    --- FAIL: TestDWARFiOS/testprog#03 (2.03s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.50s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.78s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.52s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.96s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.54s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.83s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.53s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.98s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.54s)
    --- FAIL: TestDWARFiOS/testprog#02 (1.99s)
        dwarf_test.go:96: exit status 255
FAIL
FAIL	cmd/link	47.577s
FAIL

I've also seen

--- FAIL: TestDWARF (0.22s)
    --- FAIL: TestDWARF/testprog (3.00s)
        dwarf_test.go:96: exit status 255
FAIL

The failures are intermittent.

cc @dr2chase

@cherrymui

This comment has been minimized.

Copy link
Contributor

@cherrymui cherrymui commented May 24, 2019

Happened for me as well, with exactly same error. I commented on CL https://go-review.googlesource.com/c/go/+/170377

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Jun 23, 2019

Running on c11f6c4

go version devel +c11f6c4929 Fri Jun 21 21:24:47 2019 +0000 darwin/amd64

uname

uname -a
Darwin Emmanuels-MacBook-Pro-2.local 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64

On the first try I get a failure but this time spectacularly with a 1.2MB log whose beginning is something like

--- FAIL: TestDWARFiOS (0.54s)
    --- FAIL: TestDWARFiOS/testprogcgo#01 (2.33s)
        dwarf_test.go:96: symbols testprogcgo.exe: exit status 255: /var/folders/v3/7z434qpx5v3bw0wh8h2myfpw0000gn/T/go-link-TestDWARF414534277/testprogcgo.exe [arm64, 0.004386 seconds]:
                A829102B-EE24-329C-874B-9BAAE2337C79 /private/var/folders/v3/7z434qpx5v3bw0wh8h2myfpw0000gn/T/go-link-TestDWARF414534277/testprogcgo.exe [AOUT, PIE, FaultedFromDisk]
                    0x0000000000000000 (0x100000000) __PAGEZERO SEGMENT
                    0x0000000100000000 ( 0xbc000) __TEXT SEGMENT
                        0x0000000100000000 (  0x4c90) MACH_HEADER
                        0x0000000100004c90 ( 0xb67a4) __TEXT __text
                            0x0000000100004c90 (    0x68) runtime.text [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100004cf8 (    0x40) internal/cpu.Initialize [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100004d38 (   0x620) internal/cpu.processOptions [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005358 (    0x50) internal/cpu.indexByte [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x00000001000053a8 (   0x680) internal/cpu.doinit [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a28 (    0x10) type..hash.internal/cpu.CacheLinePad [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a38 (    0x10) type..eq.internal/cpu.CacheLinePad [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a48 (    0x50) type..hash.internal/cpu.arm64 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a98 (    0x60) type..eq.internal/cpu.arm64 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005af8 (    0x70) type..hash.internal/cpu.option [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005b68 (    0xa0) type..eq.internal/cpu.option [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005c08 (    0x50) type..hash.internal/cpu.x86 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005c58 (    0x50) type..eq.internal/cpu.x86 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005ca8 (    0x70) type..hash.[24]internal/cpu.option [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
....
<FRAMES_INTENTIONALLY_ELIDED_BY_MYSELF_SINCE_IT_IS_A_LARGE_FILE>
                            0x00000001001e3d20 (    0x10) syscall.envOnce [NameNList, MangledNameNList, NList]
                            0x00000001001e3d30 (    0x10) time.localOnce [NameNList, MangledNameNList, NList]
                            0x00000001001e3d40 (    0x10) runtime.writeBarrier [NameNList, MangledNameNList, NList]
                            0x00000001001e3d50 (    0x20) syscall.ForkLock [NameNList, MangledNameNList, NList]
                            0x00000001001e3d70 (    0x30) syscall.envLock [NameNList, MangledNameNList, NList]
                            0x00000001001e3da0 (    0x20) runtime.hashkey [NameNList, MangledNameNList, NList]
                            0x00000001001e3dc0 (    0x40) runtime.newmHandoff [NameNList, MangledNameNList, NList]
                            0x00000001001e3e00 (    0x40) runtime.sig [NameNList, MangledNameNList, NList]
                            0x00000001001e3e40 (    0x40) runtime.execLock [NameNList, MangledNameNList, NList]
                            0x00000001001e3e80 (    0x40) runtime.debug [NameNList, MangledNameNList, NList]
                            0x00000001001e3ec0 (    0x40) runtime.finptrmask [NameNList, MangledNameNList, NList]
                            0x00000001001e3f00 (    0x80) runtime.aeskeysched [NameNList, MangledNameNList, NList]
                            0x00000001001e3f80 (    0x80) runtime.handlingSig [NameNList, MangledNameNList, NList]
                            0x00000001001e4000 (    0xa0) internal/cpu.X86 [NameNList, MangledNameNList, NList]
                            0x00000001001e40a0 (    0xa0) internal/cpu.ARM64 [NameNList, MangledNameNList, NList]
                            0x00000001001e4140 (    0xa0) runtime.gcController [NameNList, MangledNameNList, NList]
                            0x00000001001e41e0 (   0x100) runtime.fwdSig [NameNList, MangledNameNList, NList]
                            0x00000001001e42e0 (   0x200) runtime.printBacklog [NameNList, MangledNameNList, NList]
                            0x00000001001e44e0 (   0x400) runtime.zeroVal [NameNList, MangledNameNList, NList]
                            0x00000001001e48e0 (  0x1758) runtime.memstats [NameNList, MangledNameNList, NList]
                    0x00000001001e8000 ( 0x7c000) __LINKEDIT SEGMENT
            symbols[65885]: [fatal]  child process status could not be determined; exit code unavailable.
FAIL
exit status 1
FAIL    cmd/link        47.568s

with the output at https://gist.github.com/odeke-em/85bb51ded80d9239a97c435f875b04c5

@aclements

This comment has been minimized.

Copy link
Member

@aclements aclements commented Jun 27, 2019

Some notes:

As far as I can tell, this has never happened on the dashboard based on greplogs -E 'dwarf_test.*(exit status 255|symbols)' -dashboard.

CL 171819 added better logging for this (previously we didn't print the symbols output). That's why the original report just has "exit status 255", but @odeke-em's report has much more information.

Unfortunately, it seems like the symbols tool may just be flaky. Just searching around for that error reveals other people seeing it occasionally in various macOS development tools outside the context of Go.

Could someone with a Mac try just running symbols (or maybe the test) in a loop to see how often it flakes?

@dr2chase

This comment has been minimized.

Copy link
Contributor

@dr2chase dr2chase commented Jun 27, 2019

It may require Darwin 18.5, which is one back from current.
I've tried running it in a loop with luck at seeing the bug, will try some more.
(Maybe a busy Mac helps?)

@dr2chase

This comment has been minimized.

Copy link
Contributor

@dr2chase dr2chase commented Jun 27, 2019

About 2000 repetitions on Darwin 18.6 (aka Mojave, macOS 10.14.5), with no failure, on a busy-ish machine. Still running.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 2, 2019

In the output above the symbols program completes, but then reports an error symbols[65885]: [fatal] child process status could not be determined; exit code unavailable. I can see the error string if I run strings on /usr/bin/symbols, so it's definitely coming from there. I can't find any reports of this error on the Interwebs (the top search result for me is in fact this issue). I don't know why the symbols program would start a child process at all. I don't know why it would be unable to determine the status of that child process. I can't find the source code of the symbols program; I don't know if it is available at all.

Moving to 1.14 but if we can't figure out a reproducer, or get the source code, I doubt there is much that we can do.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 15, 2019

Is this still happening for anyone?

@cherrymui

This comment has been minimized.

Copy link
Contributor

@cherrymui cherrymui commented Nov 15, 2019

I haven't seen this for quite a while, since upgraded to macOS 10.14. I only saw it on macOS 10.13 in the past. My guess is that there may be some problem with the symbols command in earlier version.

@9nut

This comment has been minimized.

Copy link

@9nut 9nut commented Jan 13, 2020

FYI, this might be an issue with xcode requiring an update. Here's why I think that:

When I tried to build 1.13.6 this morning I got the following:

--- FAIL: TestDWARF (0.19s)
    --- FAIL: TestDWARF/testprog (1.17s)
        dwarf_test.go:96: symbols testprog.exe: exit status 255: /var/folders/j0/pr17cml548j19x25h2h0dv9h0000gp/T/go-link-TestDWARF355434644/testprog.exe [x86_64, 0.001705 seconds]:
                null-uuid                            /private/var/folders/j0/pr17cml548j19x25h2h0dv9h0000gp/T/go-link-TestDWARF355434644/testprog.exe [AOUT, FaultedFromDisk]  
....

I updated my machine to macOS Catalina (10.15.2) last week, and had not started xcode in a while. I guessed I might need to update xcode. After starting xcode -- which required an update -- Go 1.13.6 was able to build without any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.