Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: darwin_amd64: running dsymutil failed: signal: segmentation fault #23374

Closed
benesch opened this issue Jan 8, 2018 · 17 comments

Comments

Projects
None yet
6 participants
@benesch
Copy link
Contributor

commented Jan 8, 2018

Just to get this out of the way: a very similar issue was described in #23046, but I'm running a version of Go that includes the fix to that issue.

What version of Go are you using (go version)?

go version devel +a62071a209 Sat Jan 6 04:52:00 2018 +0000 darwin/amd64

Does this issue reproduce with the latest release?

No.

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/benesch/Library/Caches/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/benesch/go"
GORACE=""
GOROOT="/usr/local/Cellar/go/HEAD-a62071a/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/HEAD-a62071a/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/2m/klw683vj1575nyyymnc0mr280000gn/T/go-build978307768=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

go get -d github.com/cockroachdb/cockroach
cd $(go env GOPATH)/src/github.com/cockroachdb/cockroach
make build GOVERS=.

What did you expect to see?

A built cockroach binary.

What did you see instead?

go build -i -o ./cockroach -v  -tags ' make x86_64_apple_darwin16.7.0' -ldflags ' -X github.com/cockroachdb/cockroach/pkg/build.typ=development -X "github.com/cockroachdb/cockroach/pkg/build.tag=up-784-g742b93b01" -X "github.com/cockroachdb/cockroach/pkg/build.utcTime=2018/01/08 15:31:57" -X "github.com/cockroachdb/cockroach/pkg/build.rev=742b93b0111e0370bf793aeb3323ad153dd2635d"' .
# github.com/cockroachdb/cockroach
/usr/local/Cellar/go/HEAD-a62071a/libexec/pkg/tool/darwin_amd64/link: /usr/local/Cellar/go/HEAD-a62071a/libexec/pkg/tool/darwin_amd64/link: running dsymutil failed: signal: segmentation fault

A few thoughts, in no particular order:

  • The affected package depends on several packages that make heavy use of cgo.
  • I'm afraid I don't have the expertise necessary to produce a smaller reproduction. I don't have the faintest idea what's going wrong.
  • You only need to run make build in the Cockroach repository once to compile a few C/C++ dependencies. We generate Go files with the appropriate cgo flags so that, after the first make build, a bare go build will work correctly (until the C/C++ dependencies need to be recompiled, of course).
  • I've collected the output of go build -x -ldflags=-v.
  • go build -ldflags=s and go build -ldflags=-w both produce working binaries, which suggests the problem is DWARF-related, as in #23046.
@benesch

This comment has been minimized.

Copy link
Contributor Author

commented Jan 8, 2018

/cc @thanm

@thanm

This comment has been minimized.

Copy link
Member

commented Jan 8, 2018

I will take a look. What flavor of IOS and/or Xcode are you using?

@benesch

This comment has been minimized.

Copy link
Contributor Author

commented Jan 8, 2018

Thank you much!

$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.12.6
BuildVersion:	16G29

$ clang --version
Apple LLVM version 9.0.0 (clang-900.0.39.2)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

That's Low Sierra and, I believe, XCode 9.2, but I don't know how to tell for sure given that I've only got the command line tools installed.

@thanm thanm self-assigned this Jan 8, 2018

@thanm

This comment has been minimized.

Copy link
Member

commented Jan 8, 2018

I have been able to reproduce the problem (took a while to get things set up). Working on finding the root cause now.

@thanm

This comment has been minimized.

Copy link
Member

commented Jan 8, 2018

Inline in question is taking place within github.com/cockroachdb/cockroach/pkg/storage/engine/enginepb.(*MVCCMetadata).Size, at call to github.com/cockroachdb/cockroach/pkg/storage/engine/enginepb.(*MVCCStats).Size. Here is the bad inline info DIE:

 <1><836cf5>: Abbrev Number: 4 (DW_TAG_subprogram)
    <836cf6>   DW_AT_abstract_origin: <0x8350c8>
    <836cfa>   DW_AT_low_pc      : 0x5b71d0
    <836d02>   DW_AT_high_pc     : 0x5b71da
    <836d0a>   DW_AT_frame_base  : 1 byte block: 9c 	(DW_OP_call_frame_cfa)
 <2><836d0c>: Abbrev Number: 12 (DW_TAG_variable)
    <836d0d>   DW_AT_abstract_origin: <0x83511a>
    <836d11>   DW_AT_location    : 0 byte block: 	()
 <2><836d12>: Abbrev Number: 17 (DW_TAG_formal_parameter)
    <836d13>   DW_AT_abstract_origin: <0x835123>
    <836d17>   DW_AT_location    : 1 byte block: 9c 	(DW_OP_call_frame_cfa)
 <2><836d19>: Abbrev Number: 17 (DW_TAG_formal_parameter)
    <836d1a>   DW_AT_abstract_origin: <0x83512d>
    <836d1e>   DW_AT_location    : 2 byte block: 91 8 	(DW_OP_fbreg: 8)
 <2><836d21>: Abbrev Number: 0

The abstract origin for the second formal is off by 1 -- it is 0x83512d, should be 0x83512c. Not sure why... time to dig some more.

@thanm

This comment has been minimized.

Copy link
Member

commented Jan 9, 2018

I think I finally have a handle on what's happening here. This was a very difficult bug to track down.

The problem seems to be that there are different versions of the abstract subprogram DIE being generated depending on whether we're compiling the package P containing an inlinable function F, vs. compiling some other package that imports P and uses F. Here is the abbreviation entry description for an abstract parameter:

		DW_TAG_formal_parameter,
		DW_CHILDREN_no,
		[]dwAttrForm{
			{DW_AT_name, DW_FORM_string},
			{DW_AT_variable_parameter, DW_FORM_flag},
			{DW_AT_decl_line, DW_FORM_udata},
			{DW_AT_type, DW_FORM_ref_addr},
		},

Note the decl_line DIE -- it uses variable-length encoding, meaning the larger the line number, the more bytes consumed. Here is the inlinable function of interst:

package enginepb
...
func (m *MVCCStats) Size() (n int) { // line 553
	var l int
	_ = l
	n += 9
	...

}

When the package containing this function is compiled, the compile captures the declaration line of all interestin variables, meaning line 553 for 'm' and 'n' and 554 for 'l'. It emits an abstract function DIE with abstract param/local child DIEs based on this info.

The build continues for a while, then some other package that imports 'enginepb' and uses the method above. In this version of the compile, somehow the declaration line values for 'l' and 'n' are correct, but the line for 'm' is being set not to the correct line number but to the line number of the 'import' statement that pulls in enginepb (something about the fact that it is the receiver?).

Ordinarily nobody would care (hardly anyone looks at this sort of debug info) but in this case when the line number is LEB-encoded as the DIE is emitted, the block only takes one byte instead of the 2 bytes it takes to emit the correct line of 553.

This means that the offset of the next abstract parameter DIE ('n') in this case is computed differently (instead of the offset of 101 in the home package), we get an offset of 100, or vice versa.

Possible solutions:

  1. Turn off DWARF inline info generation for Macos (given that it is very late in the release cycle, this might be the safest thing to do).

  2. Change the abbrev entry for abstract params/autos to remove the declaration line. This will mean that less accurate/comprehensive DWARF, but I don't think it will compromise the debugging experience in any meaningful way.

Longer term we could track down the reason for the declaration line inconsistency, but it's not clear that there is an easy fix for that (since presumably it would mean making sure that it got written to the export data for the function).

@thanm

This comment has been minimized.

Copy link
Member

commented Jan 10, 2018

I chatted with Austin for a bit this afternoon about possible solutions. For the short term I think it makes sense to pick option 2 (removing decl_line from abstract param abbrev entry).

@gopherbot

This comment has been minimized.

Copy link

commented Jan 10, 2018

Change https://golang.org/cl/87055 mentions this issue: cmd/compile: workaround for inconsistent receiver param srcpos

@bradfitz bradfitz added this to the Go1.10 milestone Jan 10, 2018

gopherbot pushed a commit that referenced this issue Jan 10, 2018

cmd/compile: workaround for inconsistent receiver param srcpos
Given an inlinable method M in package P:

   func (r *MyStruct) M(...) {

When M is compiled within its home package, the source position that
the compiler records for 'r' (receiver parameter variable) is
accurate, whereas if M is built as part of the compilation of some
other package (body read from export data), the declaration line
assigned to 'r' will be the line number of the 'import' directive, not
the source line from M's source file.

This inconsistency can cause differences in the size of abstract
parameter DIEs (due to variable-length encoding), which can then in
turn result in bad abstract origin offsets, which in turn triggers
build failures on iOS (dsymutil crashes when it encounters an
incorrect abstract origin reference).

Work around the problem by removing the "declaration line number"
attribute within the abstract parameter abbreviation table entry. The
decl line attribute doesn't contribute a whole lot to the debugging
experience, and it gets rid of the inconsistencies that trigger the
dsymutil crashes.

Updates #23374.

Change-Id: I0fdc8e19a48db0ccd938ceadf85103936f89ce9f
Reviewed-on: https://go-review.googlesource.com/87055
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jan 10, 2018

Thanks for the patch. Am I correct in thinking that this is now as fixed as it is going to be for 1.10, and so the issue should be moved to 1.11?

@thanm

This comment has been minimized.

Copy link
Member

commented Jan 10, 2018

Correct. I want to understand why the srcpos for the receiver variable is different in the two scenarios -- ideally it would be identical in both cases (compiling local inlineable routine vs compiling imported inlinable routine).

@ianlancetaylor ianlancetaylor modified the milestones: Go1.10, Go1.11 Jan 10, 2018

@ianlancetaylor ianlancetaylor changed the title cmd/go: darwin_amd/64/link: running dsymutil failed: signal: segmentation fault cmd/link: darwin_amd64: running dsymutil failed: signal: segmentation fault Jan 10, 2018

@benesch

This comment has been minimized.

Copy link
Contributor Author

commented Jan 10, 2018

Thanks for the fast turnaround, @thanm! That doesn't sound like it was fun.

@thanm

This comment has been minimized.

Copy link
Member

commented Apr 17, 2018

Closing this out, has been fixed for a while.

@thanm thanm closed this Apr 17, 2018

@jrwren

This comment has been minimized.

Copy link

commented Apr 18, 2018

I just subscribed to this a couple of days ago because I'm sill getting it after upgrading to 1.10.1. What was/is the fix?

@benesch

This comment has been minimized.

Copy link
Contributor Author

commented Apr 18, 2018

Are you sure it's the same issue, @jrwren? Are you seeing the segfault when compiling Cockroach or another piece of software?

@jrwren

This comment has been minimized.

Copy link

commented Apr 18, 2018

I definite get

/usr/local/Cellar/go/1.10.1/libexec/pkg/tool/darwin_amd64/link: /usr/local/Cellar/go/1.10.1/libexec/pkg/tool/darwin_amd64/link: running dsymutil failed

I think it is followed by : signal: segmentation fault

I'm compiling our internal software. It does not include Cockroach.

@benesch

This comment has been minimized.

Copy link
Contributor Author

commented Apr 18, 2018

"running dsymutil failed" means that the Go toolchain produced a binary that was rejected by macOS's toolchain. There are myriad reasons this could happen—basically any misplaced bit during linking could cause this—and it's statistically unlikely that it happens to be the same problem that was addressed here.

I think you're best off filing another issue with a code sample that reproduces the issue!

@thanm

This comment has been minimized.

Copy link
Member

commented Apr 18, 2018

Agree with @benesch -- there are many reasons why dsymutil can have problems. Please file an issue if possible.

@golang golang locked and limited conversation to collaborators Apr 18, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.