Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link, cmd/go: emit split DWARF on darwin #62577

Open
cherrymui opened this issue Sep 11, 2023 · 18 comments
Open

cmd/link, cmd/go: emit split DWARF on darwin #62577

cherrymui opened this issue Sep 11, 2023 · 18 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Debugging OS-Darwin Proposal Proposal-Accepted
Milestone

Comments

@cherrymui
Copy link
Member

Overview

On macOS, we currently generate debug information combined into the executable. This is not Apple's convention, and it's been difficult to make the platform toolchain happy with the combined debug info. With Apple's new linker to be released in Xcode 15, it is even harder. We propose to generate debug info in a separate file on macOS, following the system convention.

Background and context

Platform conventions

Currently, on macOS, for DWARF debug information, the Go toolchain generates it in the executable as a __DWARF segment, similar to what we do on other platforms. However, this is not Apple's convention for its C toolchain. Instead, the C toolchain often creates debug info in a separate file/directory.

Specifically, for C compilation with debug info enabled,

  • the C compiler generates object files that contains debug info in debug_* sections
  • the C linker links the object files to an executable without debug info, but with STAB symbols referencing the object files
  • optionally another program dsymutil can be run on the executable, which extracts the debug info from the object files and stores it into either a dSYM directory or a single file. If this is done, the STAB symbol and object files are no longer needed, and can be stripped/deleted.

Combined DWARF in Go

For Go toolchain, we currently generate debug information combined into the executable, similar to what we do on other platforms. In internal linking mode, the Go linker directly produces a binary with debug info combined into the executable as a __DWARF segment. In external linking mode, the Go linker

  • passes Go and C object files with debug info to the C linker, which produces an executable with STAB symbols
  • run dsymutil to extract the debug info
  • strip the STAB symbols (which contains object file paths which are nondeterministic)
  • post-edit the executable, combine the debug info to the executable
  • (delete the temp directory containing Go and C objects)

While it is simple in internal linking mode, in external linking mode this process is a bit convoluted.

Combining DWARF into the executable requires post-edit the executable, adding a __DWARF segment, which requires editing the program header, and some other data. The Mach-O loader in the platform's static linker and dynamic linker have a number of integrity checks for the program, which generally doesn't like an extra unmapped segment. The code in the Go linker that adds the segment has been revised several times to make the dynamic linker happy.

With Apple's new linker to be released in Xcode 15, there are even more checks and it is hard to work around all the requirements. Currently, if one builds Go code into a c-shared object, then link with C code using Apple's linker, it will reject the shared object produced by the Go toolchain (see also #61229). We could potentially try harder to work around more checks (if possible). But it may get harder and harder in the future and eventually be forced to change.

Debugger support

For the debugger side, the system's default debugger, LLDB, understands the C toolchain's convention. When debugging an executable (say x),

  • it can automatically find debug info combined in the executable
  • it can automatically find debug info from object files referenced by the STAB symbols
  • it can automatically find debug info from the dSYM directory x.dSYM
  • or the debug info file can be specified with target create --symfile command.
    Notably, LLDB doesn't understand compressed DWARF which we generate by default. So currently Go programs do not work out of box with LLDB. (An easy workaround is -ldflags=-compressdwarf=0).

Delve, a commonly used debugger for Go programs, understand the DWARF combined in the executable, and also the compressed DWARF. So Delve works for Go programs out of box.

Proposed changes

We propose that the Go toolchain switches to generate split DWARF on macOS, following the platform conventions. This would make Go toolchain more consistent with Apple's convention, and behave more similar to the system C toolchain. We would no longer need to "fight against" the checks in Mach-O loader in the system static and dynamic linker. So it will be more forward compatible against platform updates.

Naming convention

Following the system convention, for an executable named x we will generate a directory named x.dSYM which contains a DWARF file at x.dSYM/Contents/Resources/DWARF/x. In the system convention, there are other files in the dSYM directory (a Info.plist file and a relocation file), which are irrelevant to DWARF. We may skip them for now. We could consider generating them if it is needed in the future. (For c-archive build mode, as we produce C objects, which contain combined DWARF in the C toolchain's convention, we will continue to do so.)

We could also consider using a different naming convention, e.g. for an executable named x we will generate a single DWARF file named x.dwarf. LLDB would not load it automatically. But as LLDB already does not work out of box (due to compressions), maybe this is not too bad. One needs to pass the --symfile flag. Feedback welcome.

Go linker

The Go linker will generate split DWARF on macOS.

  • In internal linking mode the Go linker will emit an executable (without DWARF) and a separate DWARF file.
  • In external linking mode the Go linker will invoke the C linker to emit an executable and invoke dsymutil to generate a DWARF file; this is the same as before, but the Go linker will not post-edit the executable to combine the DWARF back into the executable.

The go command

The go command needs to understand that we now generate two output files, the executable and the DWARF file (in the case of c-shared build mode, three files: the shared object, the C header file, and the DWARF file). It needs to copy them from the temporary directory where the build is performed to the output directory. Specifically for file naming,

  • go build without the -o flag will generate executable <exe> (which is the default name matching the main module or .go file name) and a DWARF file in <exe>.dSYM
  • go build -o <exe> will generate executable <exe> and a DWARF file in <exe>.dSYM
  • go build -o <dir> will generate executable <dir>/<exe> and a DWARF file in <dir>/<exe>.dSYM (where <exe> is the default name based on the main module or .go file name)
  • a special case for go build -o /dev/null, which generates no file

go test -c will follow the similar naming convention.

In order not to clutter directories that contains installed binaries like $HOME/bin, we propose that go install will have DWARF disabled by default (by passing the -w flag to the linker). One can still explicitly ask for DWARF by passing -ldflags=-w=0 (the -w flag disables DWARF, -w=0 negates it).

There is a prior art for emitting two output files: in c-shared build mode go build command generates a C shared object (usually named with .so) and a C header file (usually named with .h). So outputting two files isn't completely new. Maybe it could be implemented similarly.

go clean will also understand the naming convention, and remove the DWARF file if it is invoked to remove the executable file.

Build cache

Executables are not cached. So the DWARF file will not be cached, either. However, for executables the go command checks if the output file already exists and contains the expected build ID, and if so, it will assume it is up to date and not relink it. With split DWARF, we propose that it will also check if the DWARF file is up to date (the DWARF file will probably also contain the build ID so it can be checked, details TBD). If either the executable or the DWARF file is not up to date, it will relink and generate both.

Debugger support

With this change, LLDB understands the naming convention so it should still be able to load the DWARF info automatically (if it is not compressed). If either the executable or the DWARF file is moved or renamed, it can still be loaded with the --symfile flag.

Delve will need to be updated to understand the naming convention, finding the DWARF file from the dSYM directory. We suggest it also provides a way (e.g. a command line flag, if it does not already have one) to explicitly specify the DWARF file's location, in case that the user wants to move or rename the file.

debug/macho package

Currently, for a Mach-O executable with combined DWARF, the debug/macho.(*File).DWARF function can load the debug information. With split DWARF, the binary will not contain DWARF, so it cannot be loaded from the same macho.File. One could open another macho.File for the DWARF file.

If the macho.File is from an OS file (e.g. opened from macho.Open), it may be possible that the macho package automatically tries to find the split DWARF from the DWARF file following the naming convention. Then the user won't need to open another file. On the other hand, automatically opening another file seems a but magic. Feedback welcome.

If accepted, we plan to implement this in Go 1.22.

Thanks.

cc @golang/compiler @rsc @bcmills @aarzilli @derekparker @archanaravindar

@gopherbot gopherbot added this to the Proposal milestone Sep 11, 2023
@cherrymui cherrymui added compiler/runtime Issues related to the Go compiler and/or runtime. Debugging OS-Darwin labels Sep 11, 2023
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/527415 mentions this issue: cmd/link: disable DWARF by default in c-shared mode on darwin

@ianlancetaylor
Copy link
Contributor

Will iOS be changed in the same ways as Darwin?

@cherrymui
Copy link
Member Author

cherrymui commented Sep 12, 2023

We currently don't generate combined DWARF on iOS because its dynamic linker doesn't like it (I think we just drop the DWARF). With this we can start to generate split DWARF on iOS.

However, I guess many (most?) users targeting iOS use c-archive build mode, which is not affected by this.

gopherbot pushed a commit that referenced this issue Sep 12, 2023
Currently, linking a Go c-shared object with C code using Apple's
new linker, it fails with

% cc a.c go.so
ld: segment '__DWARF' filesize exceeds vmsize in 'go.so'

Apple's new linker has more checks for unmapped segments. It is
very hard to make it accept a Mach-O shared object with an
additional DWARF segment.

We may want to stop combinding DWARF into the shared object (see
also #62577). For now, disable DWARF by default in c-shared mode
on darwin. (One can still enable it with -ldflags=-w=0, which will
contain DWARF, but it will need the old C linker to link against
with.)

For #61229.

Change-Id: I4cc77da54fac10e2c2cbcffa92779cba82706d75
Reviewed-on: https://go-review.googlesource.com/c/go/+/527415
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/527816 mentions this issue: [release-branch.go1.21] cmd/link: disable DWARF by default in c-shared mode on darwin

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/527819 mentions this issue: [release-branch.go1.20] cmd/link: disable DWARF by default in c-shared mode on darwin

@thanm
Copy link
Contributor

thanm commented Sep 13, 2023

Overall LGTM.

Although going the x.dSYM directory seems to me to be the most consistent with the way things work with other tools on Darwin, I worry that my work areas and repos are going to be littered with x.dSYM dirs that I'm going to have to remember to clean up periodically. [NB: will we be adding *.dSYM entries to our Go .gitignore?]

I note that Delve (on unix) already seems to have support for reading debug info from
/usr/lib/debug/.build-id (this via the 'debug-info-directories') config parameter.). Perhaps for "go install" rather than having the debug info copied into the bin dir, how about instead we copy it into ~/.cache/debug/.build-id, where this directory is indexed/structured by build ID in the same way that /usr/lib/debug/.build-id works. You could then tell Delve to look there, and at that point you would be good to go for debugging your installed binaries.

gopherbot pushed a commit that referenced this issue Sep 21, 2023
…d mode on darwin

[This is a (manual) backport of CL 527415 to Go 1.21.]

Currently, linking a Go c-shared object with C code using Apple's
new linker, it fails with

% cc a.c go.so
ld: segment '__DWARF' filesize exceeds vmsize in 'go.so'

Apple's new linker has more checks for unmapped segments. It is
very hard to make it accept a Mach-O shared object with an
additional DWARF segment.

We may want to stop combinding DWARF into the shared object (see
also #62577). For now, disable DWARF by default in c-shared mode
on darwin.

Updates #61229.
For #62598.

Change-Id: I525987b7fe1a4e64571327cb4696f98cc7b419a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/527816
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
gopherbot pushed a commit that referenced this issue Sep 21, 2023
…d mode on darwin

[This is a (manual) backport of CL 527415 to Go 1.20.]

Currently, linking a Go c-shared object with C code using Apple's
new linker, it fails with

% cc a.c go.so
ld: segment '__DWARF' filesize exceeds vmsize in 'go.so'

Apple's new linker has more checks for unmapped segments. It is
very hard to make it accept a Mach-O shared object with an
additional DWARF segment.

We may want to stop combinding DWARF into the shared object (see
also #62577). For now, disable DWARF by default in c-shared mode
on darwin.

Updates #61229.
For #62597.

Change-Id: I313349f71296d6d7025db28469593825ce9f1866
Reviewed-on: https://go-review.googlesource.com/c/go/+/527819
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
bradfitz pushed a commit to tailscale/go that referenced this issue Sep 25, 2023
…d mode on darwin

[This is a (manual) backport of CL 527415 to Go 1.21.]

Currently, linking a Go c-shared object with C code using Apple's
new linker, it fails with

% cc a.c go.so
ld: segment '__DWARF' filesize exceeds vmsize in 'go.so'

Apple's new linker has more checks for unmapped segments. It is
very hard to make it accept a Mach-O shared object with an
additional DWARF segment.

We may want to stop combinding DWARF into the shared object (see
also golang#62577). For now, disable DWARF by default in c-shared mode
on darwin.

Updates golang#61229.
For golang#62598.

Change-Id: I525987b7fe1a4e64571327cb4696f98cc7b419a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/527816
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@rsc
Copy link
Contributor

rsc commented Oct 11, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@willfaught
Copy link
Contributor

I'm surprised to see this proposal. One of the things I love about Go is that executables are self-contained and debuggable by default, and builds are "clean" in that the source tree isn't littered with build artifacts. This will break that. If there's any way to keep those properties, I hope we do that. I'm getting flashbacks to symbol servers and undebuggable executables from my Microsoft tech days. I would intensely dislike having to manage separate .dSYM files/directories.

On macOS, we currently generate debug information combined into the executable. This is not Apple's convention, and it's been difficult to make the platform toolchain happy with the combined debug info. With Apple's new linker to be released in Xcode 15, it is even harder.

What is the difficulty, specifically? Even if it's harder, is it still possible to do? Will there be a point in the future when it won't even be possible? I'm not familiar with the situation.

@cherrymui
Copy link
Member Author

Thanks. I understand that the current combined DWARF is simple for users, and also easier to handle for the go command and tools, and I'm personally not a big fan of dSYM directories, either. With all that said, I think we pretty much have to do this in some form.

What is the difficulty, specifically? Even if it's harder, is it still possible to do? Will there be a point in the future when it won't even be possible?

The new Apple linker in Xcode 15 adds a number of checks for the Mach-O file. With the current DWARF combining code, the check fails and the Apple linker rejects the file from the Go linker. I tried to modify the DWARF combining code to make the Apple linker happy, after appeasing 5 checks, I just found more, yet another category of checks. I don't know how much modification I need to make all the checks happy.

Technically, it might not be strictly impossible, but it needs to work around a number of undocumented requirements and checks, on sometimes undocumented data structures. Combining DWARF is not a mode that Apple supports, so it may change when a new version of Apple toolchain or macOS releases. Maybe one day it will just be impossible (currently it is already impossible on iOS). We are pretty much forced by Apple to follow the platform convention, just like in the past we had to switch to make syscalls via libSystem instead of directly using the SYSCALL instruction (#17490). It is probably better to do it now than the very last day.

Thanks.

@qmuntal qmuntal closed this as completed Oct 23, 2023
@aarzilli

This comment was marked as off-topic.

@thanm

This comment was marked as off-topic.

@qmuntal qmuntal reopened this Oct 23, 2023
@qmuntal

This comment was marked as off-topic.

@rsc
Copy link
Contributor

rsc commented Oct 24, 2023

I wrote the "put DWARF back in the macOS executable" code, precisely because I like having self-contained binaries, for all the reasons that @willfaught lists. That said, it's clear that Apple feels differently, and it's their system and their toolchain. They are making it more and more difficult to include DWARF in the main executable. At some point we will have to throw in the towel and go along. This happened also with Go making direct system calls instead of using the system DLLs.

From my point of view, we can put lots more effort into keeping this working, but eventually it seems clear that we are going to lose this fight, meaning it's wasted effort. Probably better to fold now.

@rsc
Copy link
Contributor

rsc commented Oct 26, 2023

Have all remaining concerns about this proposal been addressed?

The details are in the top message, but in summary:

  • the Go toolchain will generate split DWARF on darwin, following the platform convention. The executable will not contain DWARF. The DWARF will be generated in a separate file in a x.dSYM directory for executable x.
  • go build and go test -c will generate the executable and the dSYM directory.
  • go install will by default install the executable without DWARF.
  • go clean will understand this naming convention and remove the DWARF file if it is invoked to remove the executable.

@willfaught
Copy link
Contributor

Do we know how separate executables and debug data will be handled by debuggers like delve? Will delve have to add a new CLI flag and new implementation to support this? If so, I think it's worth highlighting to Go users that debugging won't be supported for the corresponding Go release until delve catches up (which isn't guaranteed to ever happen, although it likely will). For me, that would be a big reason not to upgrade right away.

@aarzilli
Copy link
Contributor

Do we know how separate executables and debug data will be handled by debuggers like delve?

Delve will be changed to understand the naming convention and automatically load the file from inside the dSYM directory. It will also have something to specify its path manually.

@rsc
Copy link
Contributor

rsc commented Nov 2, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

The details are in the top message, but in summary:

  • the Go toolchain will generate split DWARF on darwin, following the platform convention. The executable will not contain DWARF. The DWARF will be generated in a separate file in a x.dSYM directory for executable x.
  • go build and go test -c will generate the executable and the dSYM directory.
  • go install will by default install the executable without DWARF.
  • go clean will understand this naming convention and remove the DWARF file if it is invoked to remove the executable.

The Delve maintainers have agreed to add support for this new convention to Delve (thank you!).

@rsc
Copy link
Contributor

rsc commented Nov 10, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

The details are in the top message, but in summary:

  • the Go toolchain will generate split DWARF on darwin, following the platform convention. The executable will not contain DWARF. The DWARF will be generated in a separate file in a x.dSYM directory for executable x.
  • go build and go test -c will generate the executable and the dSYM directory.
  • go install will by default install the executable without DWARF.
  • go clean will understand this naming convention and remove the DWARF file if it is invoked to remove the executable.

The Delve maintainers have agreed to add support for this new convention to Delve (thank you!).

@rsc rsc changed the title proposal: cmd/link, cmd/go: emit split DWARF on darwin cmd/link, cmd/go: emit split DWARF on darwin Nov 10, 2023
@rsc rsc modified the milestones: Proposal, Backlog Nov 10, 2023
LBeernaertProton added a commit to ProtonMail/proton-mail-export that referenced this issue Nov 24, 2023
Debug symbol generation for the CGO shared library needs to be disabled
on mac. The new mac os linker does not work correctly with the output of
CGO.

Waiting on golang/go#62577 to be completed.
jakubgs added a commit to status-im/infra-role-golang that referenced this issue Jan 8, 2024
Fix for `emit split DWARF` error included in `1.20.9`.
golang/go#62577

Signed-off-by: Jakub Sokołowski <jakub@status.im>
pendo324 added a commit to runfinch/finch that referenced this issue Jun 10, 2024
Issue #, if available:

*Description of changes:*
Embed Info.plists into all of the executables that we vend. Requires
stripping debug information from the go executable (see [this
issue](golang/go#62577)).

*Testing done:*
Tested manually


- [x] I've reviewed the guidance in CONTRIBUTING.md


#### License Acceptance

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

Signed-off-by: Justin Alvarez <alvajus@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Debugging OS-Darwin Proposal Proposal-Accepted
Projects
Status: In Progress
Status: Accepted
Development

No branches or pull requests

8 participants