Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: does not support multiple TOCs in internal linking mode #21961

Open
mkumatag opened this issue Sep 21, 2017 · 28 comments
Open

cmd/link: does not support multiple TOCs in internal linking mode #21961

mkumatag opened this issue Sep 21, 2017 · 28 comments

Comments

@mkumatag
Copy link

@mkumatag mkumatag commented Sep 21, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

master

root@localhost:~/go# git rev-parse HEAD
eca45997dfd6cd14a59fbdea2385f6648a0dc786
root@localhost:~/go#

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

ppc64le

What did you do?

export PATH=/usr/local/go/bin/:$PATH
export GOROOT_BOOTSTRAP=go env GOROOT

root@localhost:~/go/bin# go version
go version go1.8.3 linux/ppc64le

cloned the latest code here:

root@localhost:~/go/src# pwd
/root/go/src

run make.bash
root@localhost:~/go/src# ./make.bash

Use the build go binary:

root@localhost:~/go/bin# pwd
/root/go/bin
root@localhost:~/go/bin# ./go version
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0xc88320 pc=0x112c8]

runtime stack:
invalid spdelta runtime/cgo(.text).sigfillset 0x112c0 0x112c8 0x0 -1
runtime.throw(0x44825a, 0x2a)
	/root/go/src/runtime/panic.go:606 +0x68
runtime.sigpanic()
	/root/go/src/runtime/signal_unix.go:360 +0x264
invalid spdelta runtime/cgo(.text).sigfillset 0x112c0 0x112c8 0x0 -1

goroutine 1 [running]:
runtime.systemstack_switch()
	/root/go/src/runtime/asm_ppc64x.s:209 +0x10 fp=0xc420036758 sp=0xc420036738 pc=0x64e40
runtime.main()
	/root/go/src/runtime/proc.go:128 +0x60 fp=0xc4200367c0 sp=0xc420036758 pc=0x3cf80
runtime.goexit()
	/root/go/src/runtime/asm_ppc64x.s:1353 +0x4 fp=0xc4200367c0 sp=0xc4200367c0 pc=0x67464
root@localhost:~/go/bin#

If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.

What did you expect to see?

Instead, I should see a golang version

What did you see instead?

I see a SIGSEGV

@mkumatag
Copy link
Author

@mkumatag mkumatag commented Sep 21, 2017

/cc @laboger

@mkumatag
Copy link
Author

@mkumatag mkumatag commented Sep 21, 2017

@mundaym
Copy link
Member

@mundaym mundaym commented Sep 21, 2017

@mkumatag which revision of Go are you using? Can you try rolling back to 101fbc2 (git checkout 101fbc2) and retrying? It looks like the ppc64le builder has been failing since 8e5ac83 (cmd/go: stop linking cgo objects together with ld -r).

@mkumatag
Copy link
Author

@mkumatag mkumatag commented Sep 21, 2017

@mundaym I'm using latest one

root@localhost:~/go# git rev-parse HEAD
eca45997dfd6cd14a59fbdea2385f6648a0dc786
root@localhost:~/go#
@mkumatag
Copy link
Author

@mkumatag mkumatag commented Sep 21, 2017

I rolled back to 101fbc2 and things are fine,

root@localhost:~/go/bin# ./go version
go version devel +101fbc2 Wed Sep 20 20:27:13 2017 +0000 linux/ppc64le
root@localhost:~/go/bin#

@mundaym @ianlancetaylor not really sure what is wrong with the latest version!

@mundaym mundaym added this to the Go1.10 milestone Sep 21, 2017
@laboger
Copy link
Contributor

@laboger laboger commented Sep 21, 2017

Yikes I haven't been watching the build board lately and I see we have been randomly failing in runtime since e7e4a4f and this latest failure of SEGV was introduced with @ianlancetaylor 's change 8e5ac83.

segv happens here:

0x112c0 <runtime/cgo(.text).sigfillset>: std r2,24(r1)
0x112c4 <runtime/cgo(.text).sigfillset+4>: addis r12,r2,100
=> 0x112c8 <runtime/cgo(.text).sigfillset+8>: ld r12,192(r12)
0x112cc <runtime/cgo(.text).sigfillset+12>: mtctr r12
0x112d0 <runtime/cgo(.text).sigfillset+16>: bctr

In golang prior to the change to omit the ld -r where it works, this code looks like this:

=> 0x112c0 <runtime/cgo(.text).sigfillset>: std r2,24(r1)
0x112c4 <runtime/cgo(.text).sigfillset+4>: addis r12,r2,-1
0x112c8 <runtime/cgo(.text).sigfillset+8>: ld r12,32368(r12)
0x112cc <runtime/cgo(.text).sigfillset+12>: mtctr r12
0x112d0 <runtime/cgo(.text).sigfillset+16>: bctr

So my theory is that the TOCs from the individual .o files should be getting merged into a single TOC at some point, but if the external linker isn't involved in the final link, how's it going to happen except by the ld -r? My understanding is that the go tool is linked with the internal linker and not the external linker, so in that case the ld -r is still useful for ppc64le.

This brings up a different concern however, why this wasn't caught in the buildbots?

@mundaym
Copy link
Member

@mundaym mundaym commented Sep 21, 2017

You mean the trybots? They don't run on ppc64x, they only check that the compilation works.

@laboger
Copy link
Contributor

@laboger laboger commented Sep 21, 2017

FYI... this doesn't affect ppc64 because 'go' is built statically there.

@laboger
Copy link
Contributor

@laboger laboger commented Sep 22, 2017

@mundaym This breaks the 'go' tool so I was somewhat thinking that it should fail during the build. But I see now that go_bootstrap is built as static but the final go tool is built as a dynamic binary containing cgo using the internal linker which is the case that is broken by the removal of ld -r.

@mkumatag
Copy link
Author

@mkumatag mkumatag commented Sep 25, 2017

Any updates? builds are still failing with the latest code. @ianlancetaylor any comments?

@laboger
Copy link
Contributor

@laboger laboger commented Sep 25, 2017

@ianlancetaylor I don't know what your thoughts are on this. This regression makes the go tool unusable on the golang build so we would like to get this fixed.

I can create a fix to back out your change for ppc64le, i.e., continue to use the ld -r on ppc64le but not on other platforms.

I believe the other option is to fix the internal linker to correctly handle merging of the TOCs, but that might take me a while because I don't know the internal linker well enough.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 25, 2017

What happens if we simply add sys.PPC64 to this list in cmd/link/internal/ld/config.go?

	if iscgo && SysArch.InFamily(sys.ARM64, sys.MIPS64, sys.MIPS) {
		return true, objabi.GOARCH + " does not support internal cgo"
	}

That seems to me like the right temporary fix unless and until somebody fixes the internal linker. The downside is that we won't generate fully statically linked binaries for pure Go programs on ppc64le. Is that acceptable?

@laboger
Copy link
Contributor

@laboger laboger commented Sep 25, 2017

Isn't this going to force external linking only when iscgo is true? So pure Go programs on ppc64le should still be truly statically linked with this change.

I just tried it, and that's what happened with a simple hello.go without cgo, which is pure Go AFAIK?

I think this is the correct temporary fix, but this means the 'go' tool is linked with the external linker and not the internal linker. In both cases it is a dynamic binary. I'm not sure what the downside might be for using a different linker.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 25, 2017

Sorry, what I wrote was a little confusing. I think this will change the behavior of any pure Go program that imports the net or os/user packages. I think that currently they will use internal linking, and with my suggestion they will use external linking. You're right that they will be dynamically linked either way.

I think the only significant difference is that with this change a pure Go program (that imports the net or os/user package) will only be buildable if the C linker is installed on the system where the Go program is being built. Without this change such a program would be buildable even if the C linker was not installed at all. On GNU/Linux I doubt that is worth worrying about (on Windows, on the other hand, it would be a problem).

Does the change actually fix the problem?

@laboger
Copy link
Contributor

@laboger laboger commented Sep 26, 2017

Yes this fixes the problem, sorry I wasn't explicit on that.

I don't have a good feel for how many programs might be affected or who would care. Obviously the building of the 'go' tool requires the 'ld -r' so it must have the C linker on the system. I think we fix it and see who complains.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 26, 2017

You're right that clearly invoking ld -r means that you have the C linker. I must be misremembering something about this.

@ianlancetaylor ianlancetaylor changed the title SIGSEGV while running go on ppc64le cmd/link: does not support multiple TOCs in internal linking mode Sep 26, 2017
@gopherbot
Copy link

@gopherbot gopherbot commented Sep 26, 2017

Change https://golang.org/cl/66270 mentions this issue: cmd/link: don't use internal linking mode for cgo on PPC64

gopherbot pushed a commit that referenced this issue Sep 26, 2017
The internal linker doesn't know how to handle multiple TOC sections
in internal linking mode. This used to work because before CL 64793 we
invoked ld -r on multiple objects, and that merged the TOC sections
for us.

Updates #21961

Change-Id: I48260a7195be660016f2f358ebc8cb79652210ab
Reviewed-on: https://go-review.googlesource.com/66270
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@mwhudson
Copy link
Contributor

@mwhudson mwhudson commented Oct 27, 2017

This is related to #15409 is it not?

From my POV I'd be fine with deleting all the code related to internal linking support for cgo but that might not be terribly friendly.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 27, 2017

Yes, it seems likely that fixing this would fix #15409.

Internal linking is quite important on Windows but I agree that it is less interesting on GNU/Linux systems. Still, in general, it is a goal that people can install a Go distribution and, as long as they don't use cgo, use it with no C compiler or linker installed. With the current standard library that means that the internal linker must support at least the code generated for the net and os/user packages.

@laboger
Copy link
Contributor

@laboger laboger commented Jun 14, 2018

The recent breakage in the ppc64x builds and the discussion about cross builds made me curious, so I did a cross build of golang for ppc64le and found that the go binary created is statically (and therefore internally) linked. I moved the cross built golang over to a ppc64le machine and the tests all passed. So I think that means an installation of a Go distribution for ppc64le from a cross build on an x86 could be installed with no C compiler or linker.

At this point I don't understand why the go binary is built differently when doing a cross. It seems that it should be built the same way whether cross or natively built. It should still have multiple TOCs in both cases?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 14, 2018

By default when cross-compiling we assume that we don't have a C cross-compiler available, so we build without cgo support. You should be able to get the same result cross and native, assuming you have a C cross-compiler, if you set CGO_ENABLED=1 in the environment.

@laboger
Copy link
Contributor

@laboger laboger commented Aug 13, 2020

@cherrymui @jeremyfaller Do you have a guess on how much work it would be to get internal linking to work with cgo on ppc64le? Maybe some relocation types are missing and then this TOC merging would have to be done.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Aug 13, 2020

I'm not familiar with the ABI to tell. I would guess as long as we actually understand the ABI, it won't take much work. Maybe keep track the multiple TOCs, and group them together (?) according to the ABI?

@laboger
Copy link
Contributor

@laboger laboger commented Aug 17, 2020

Actually I'm not sure if the title truly explains the problem here. It's been a while so I'm looking back through to try and understand what's really wrong. I know the problem happened because there were two relocatable files to be linked together and they both had references which needed plt generation. They previously worked because 'ld -r' was done for all those files prior to the final link. When the 'ld-r' was removed this quit working and the fix was to disable internal linking with cgo on ppc64le.

Is this the ABI you mean? https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi-1.9.html

@laboger
Copy link
Contributor

@laboger laboger commented Aug 17, 2020

I tried to enable internal linking with cgo on master in cmd/link/internal/ld/config.go and got this error during the build:

 ./make.bash
Building Go cmd/dist using /home/boger/golang/go-linux-ppc64le-bootstrap. (devel +91f997b Wed Jan 6 06:08:34 2016 +0000 linux/ppc64le)
Building Go toolchain1 using /home/boger/golang/go-linux-ppc64le-bootstrap.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
Building packages and commands for linux/ppc64le.
# cmd/trace
.plt: initialize bounds (16 < 24)
panic: unexpected empty container symbol

goroutine 1 [running]:
cmd/link/internal/loader.(*Loader).AddInteriorSym(0xc000912000, 0x22a5e, 0x22a68)
	/home/boger/golang/link/go/src/cmd/link/internal/loader/loader.go:1678 +0x2d0
cmd/link/internal/ld.(*dodataState).allocateDataSections(0xc000012d80, 0xc000276000)
	/home/boger/golang/link/go/src/cmd/link/internal/ld/data.go:1661 +0x3874
cmd/link/internal/ld.(*Link).dodata(0xc000276000, 0xc0034b6000, 0x23a38, 0x23a38)
	/home/boger/golang/link/go/src/cmd/link/internal/ld/data.go:1503 +0xa20
cmd/link/internal/ld.Main(0x4386a0, 0x10, 0x20, 0x1, 0x1, 0x41, 0x0, 0x0, 0x2a58dc, 0x10, ...)
	/home/boger/golang/link/go/src/cmd/link/internal/ld/main.go:301 +0x1078
main.main()
	/home/boger/golang/link/go/src/cmd/link/main.go:68 +0x228

I added print statements before the panic and the empty container was the .got.

@laboger
Copy link
Contributor

@laboger laboger commented Sep 9, 2020

@cherrymui I think I found out why the "initialize bounds" error message is happening in .plt.

In cmd/link/internal/ppc64/asm.go function addpltsym, when an entry is added to the plt using plt.Grow, that doesn't update the size. If I add a call to plt.SetSize to increase the plt size then I can get past this error.

By the way, is there a tool like readelf to see what relocations are in the runtime/cgo(.text) and runtime/cgo(.toc) sections? It looks like the problem is that each input file has a runtime/cgo with (.text) and (.toc) but then at the final link, the relocations assume a single TOC. So some of the r2 addresses and relocated offsets are wrong because of this.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Sep 9, 2020

If I add a call to plt.SetSize to increase the plt size then I can get past this error.

That makes sense.

is there a tool like readelf to see what relocations are in the runtime/cgo(.text) and runtime/cgo(.toc) sections?

Sorry I'm not sure I understand what you mean. Why readelf itself doesn't do?

@laboger
Copy link
Contributor

@laboger laboger commented Sep 10, 2020

I was able to get readelf to work, I was trying it on the wrong type of file.

I'm trying to understand how the SELFGOT data should be processed because that's the problem I'm trying to resolve next.
In cmd/link/internal/loadelf/ldelf.go Load function is trying to load files and puts entries into the SELFGOT data. I added some prints here and see a few lines like this (the last number on the line is the localSymVersion).

### Reading file: /home/boger/golang/link/go/pkg/linux_ppc64le/runtime/cgo.a(_x004.o) ###
### Setting SELFGOT section: .toc for runtime/cgo(.toc) 250 ###
### Reading file: /home/boger/golang/link/go/pkg/linux_ppc64le/runtime/cgo.a(_x005.o) ###
### Setting SELFGOT section: .toc for runtime/cgo(.toc) 251 ###
### Reading file: /home/boger/golang/link/go/pkg/linux_ppc64le/runtime/cgo.a(_x009.o) ###
### Setting SELFGOT section: .toc for runtime/cgo(.toc) 255 ###

And then in cmd/link/internal/ld/data.go function allocateDataSections it tries to process the SELFGOT entries and there are a few problems with that code:

  • The .got is an entry in the SELFGOT but that has size 0 which causes a failure in the code below it. I added code to skip entries for the .got.
  • There is an expectation that runtime/cgo(.toc) is a symbol to be used as an interior symbol but that doesn't exist anywhere. I get an error saying that the interior symbol doesn't exist in the symtab. All I could find are places where the TOC symbols are created. Should these runtime/cgo(.toc) symbols be created? If not what should the interior symbol be for these.
  • There is an array called DotTOC in ctxt that has been filled in with .TOC. symbols for each symbol version but I don't see where or how those are to be used.
  • Once the program is created there is only one TOC symbol and all the relocations are based on a single TOC. But the code to load r2 and within the callstubs (plus maybe others?) are assuming the runtime/cgo(.toc) for the "version" it was generated. Maybe this is where the DotTOC values should be used but I don't see how they should be.

I don't know if the code in allocateDataSections for processing the SELFGOT for PPC64 is currently used for AIX or if it ever executed at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.