Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: empty __debug_ranges section on darwin #21945

Closed
aarzilli opened this issue Sep 20, 2017 · 14 comments

Comments

Projects
None yet
5 participants
@aarzilli
Copy link
Contributor

commented Sep 20, 2017

This issue was initially discussed here: go-delve/delve#964

When compiling the tests of https://github.com/deasmi/terraform-provider-libvirt/tree/graphicsandvnc with optimizations disabled the resulting binary has a 4kb __debug_ranges section that consists almost entirely of zeroes, with the exception of a few bytes at the end of the section.

Output of go test -x here: https://www.dropbox.com/s/kmn2g8ibsd5q9po/gotest.txt?dl=0

What version of Go are you using (go version)?

go version go1.9 darwin/amd64

@aarzilli

This comment has been minimized.

Copy link
Contributor Author

commented Sep 20, 2017

Managed to reproduce this locally, it looks like go.o has a reasonable looking debug_ranges section but it doesn't survive the external linker (it could be that there is a subtle problem with debug_ranges). I couldn't find any way to convince the system linker to emit diagnostic output.

@deasmi

This comment has been minimized.

Copy link

commented Sep 20, 2017

go env as requested

[deasmi@kaboom:~] $ go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/deasmi/Projects/goworkspace"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/s5/l2k67rp1479c2dvh9__pmhj54pc_8r/T/go-build540667432=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
[deasmi@kaboom:~] $ go version
go version go1.9 darwin/amd64
[deasmi@kaboom:~] $
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Sep 20, 2017

@heschik

This comment has been minimized.

Copy link
Contributor

commented Sep 20, 2017

Huh, and here I was thinking that dsymutil crashing was a bad failure mode. At least that gives you a hint; I don't really know how to debug it when it's just mangling the result. I don't have a Mac handy so I can offer suggestions but not much more at the moment.

We know that this works at least some of the time, so there must be something about this code. dsymutil seems quite sensitive to malformed DWARF, so perhaps there's a particular function in here somewhere that's triggering a compiler bug? I'd start by building a program with the package and seeing if that worked. If so, hopefully there's a particular test that's triggering the problem and it can just be bisected from there.

@deasmi

This comment has been minimized.

Copy link

commented Sep 20, 2017

I'm happy to recreate this in a VM and provide SSH access if that's of any help.

@aarzilli

This comment has been minimized.

Copy link
Contributor Author

commented Sep 20, 2017

dsymutil seems quite sensitive to malformed DWARF

where is dsymutil called in this case? I don't see it in the output of go test -x nor if I call link with -v.

@heschik

This comment has been minimized.

Copy link
Contributor

commented Sep 20, 2017

Yeah, I don't think it's logged. -x is a flag to the go command, but dsymutil is called by the linker, and it just doesn't have any logging even when -v is enabled.

if !*FlagS && !*FlagW && !debug_s && Headtype == objabi.Hdarwin {
// Skip combining dwarf on arm.
if !SysArch.InFamily(sys.ARM, sys.ARM64) {
dsym := filepath.Join(*flagTmpdir, "go.dwarf")
if out, err := exec.Command("dsymutil", "-f", *flagOutfile, "-o", dsym).CombinedOutput(); err != nil {
Exitf("%s: running dsymutil failed: %v\n%s", os.Args[0], err, out)
}
// Skip combining if `dsymutil` didn't generate a file. See #11994.
if _, err := os.Stat(dsym); os.IsNotExist(err) {
return
}
// For os.Rename to work reliably, must be in same directory as outfile.
combinedOutput := *flagOutfile + "~"
if err := machoCombineDwarf(*flagOutfile, dsym, combinedOutput); err != nil {
Exitf("%s: combining dwarf failed: %v", os.Args[0], err)
}
os.Remove(*flagOutfile)
if err := os.Rename(combinedOutput, *flagOutfile); err != nil {
Exitf("%s: %v", os.Args[0], err)
}
}
}

I imagine you can confirm with truss or something. I guess I should say that I don't have any particular evidence that dsymutil is the one messing things up, but that's where all of my OSX-specific DWARF issues have been, so I'm just assuming it's the problem. It shouldn't change a divide-and-conquer debugging approach much though.

@deasmi, I can get access to a Mac around here if @aarzilli needs a second pair of eyes. I appreciate the offer though.

@heschik heschik changed the title cmd/link: empty __debug_ranges section on darwin [Debugging] cmd/link: empty __debug_ranges section on darwin Sep 20, 2017

@aarzilli

This comment has been minimized.

Copy link
Contributor Author

commented Sep 21, 2017

Dug more into this, if you call dsymutil with -verbose some useful output is emitted. The offending function seems to be patchRangesForUnit note in particular the warning output in emitRangesEntries which is never reached because of the logic inside patchRangesForUnit.

It seems like they have decided to not support base selection entries, in fact this problem can be reproduced with trivial programs as long as -ldflags '-linkmode external' is passed to go build.

@aarzilli

This comment has been minimized.

Copy link
Contributor Author

commented Sep 21, 2017

PS. I don't have a solution for this, without a base selection entry the other entries are relative to the low pc of the compile unit, can the compiler emit a relocation like that? Also I was debugging dsymutil and, besides the base selection problem, I'm not convinced that there aren't other bugs, the very first Offset it tries to emit is past the end of the debug_ranges section, for some reason.

@heschik

This comment has been minimized.

Copy link
Contributor

commented Sep 21, 2017

Ugh, that's a shame.

I don't have a solution for this, without a base selection entry the other entries are relative to the low pc of the compile unit, can the compiler emit a relocation like that?

Huh. Location lists have the exact same wording and I didn't notice. But at one point, before I switched to DWARF 4, I wasn't using base selection entries, and GDB was still able to understand my location lists. So either I don't understand what CU-relative addressing means, or this is a common enough mistake that GDB can deal with it anyway. It might be worth just putting absolute addresses in there and see what happens.

This is kind of like an R_DWARFREF, but since there's only one .text and there could be multiple CUs, it can't be that easy. If there's a way to do this I don't know what it is.

@aarzilli

This comment has been minimized.

Copy link
Contributor Author

commented Sep 22, 2017

There was something nagging me about the verbose output of dsymutil so I took another look:

while processing /tmp/testdir/go.o:
warning: could not find referenced DIE
    in DIE:

0x00306e5f:       DW_TAG_array_type [14] *
                    DW_AT_name [DW_FORM_string] ("[62]testing.InternalTest")
                    DW_AT_type [DW_FORM_ref_addr]       (0x00000000003061b0)
                    DW_AT_byte_size [DW_FORM_udata]     (1488)
                    DW_AT_Unknown_2900 [DW_FORM_data1]  (0x11)

in hindsight I should have been more worried, sure enough this DIE is mangled in the output:

 <1><18af46>: Abbrev Number: 27 (DW_TAG_array_type)
    <18af47>   DW_AT_name        : (indirect string, offset: 0x195b72): [62]testing.InternalTest
    <18af4b>   DW_AT_byte_size   : 1488
    <18af4d>   Unknown AT value: 2900: 17

despite the fact that the DW_AT_type does get copied in the output.

This happens 86 times in the libvirt test build.

I think I've finally found the cause of this old bug of mine that I had to unjustly close because I couldn't reproduce it on 1.8.

@gopherbot

This comment has been minimized.

Copy link

commented Sep 28, 2017

Change https://golang.org/cl/66850 mentions this issue: dwarf: break debug_ranges

@heschik

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2017

https://golang.org/cl/69973 gives us a CU per package, so the compiler is in a slightly better position to do CU-relative addressing -- the addresses can be expressed as a relocation vs. the first function in the package. But that'll get messed up when the linker eliminates a dead function, so I still don't have a good solution. Maybe leave the elimination of text symbols to the external linker? Pretty iffy.

Someone should probably file an LLVM bug to get them to support base selection entries so there's some light at the end of the tunnel, at least.

@gopherbot

This comment has been minimized.

Copy link

commented Oct 21, 2017

Change https://golang.org/cl/72371 mentions this issue: compile, link: remove base address selector from DWARF range lists

@gopherbot gopherbot closed this in 018642d Nov 1, 2017

@golang golang locked and limited conversation to collaborators Nov 1, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.