New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: shared object constructor functions are not invoked with internal linker and musl libc #28909

Open
Hollerberg opened this Issue Nov 21, 2018 · 11 comments

Comments

Projects
None yet
5 participants
@Hollerberg

Hollerberg commented Nov 21, 2018

What version of Go are you using (go version)?

1.11.2

$ go version
go version go1.11.2 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build960632520=/tmp/go-build -gno-record-gcc-switches"

What did you do?

  • Create a dynamically linked Go application with the internal linker.
  • Create a shared object (DLL) which has a constructor function - e.g. a C function attributed with __attribute__((constructor))
  • Use LD_PRELOAD to preload the shared object into the Go application process - e.g LD_PRELOAD=libsotest.so ./myapp

This issue could be the root cause for failing alpine tests in #20120.

I have created docker image with a reproducer for this issue: https://gist.github.com/Hollerberg/5fc64f8abf0f16d4e801c4ad348f21b6

Download the files from the github gist and execute build-and-run-container.sh. This script will build the container image. Within the docker build process, libpreload.so is built from preload.c, and the Go application gogo.go is built once to gogo-int-linker using the internal and once to gogo-ext-linker using the external (system) linker.

Once the docker image is built and started, the two applications can be started with script runtest.sh.

The shared object constructor function in libpreload.so will write a message to stdout. Preloading the shared object into gogo-int-linker will not execute the constructor and therefore no message is printed to console. Repeating this with gogo-ext-linker, the constructor message is printed to console.

The root cause for this behavior is a subtle difference between musl and glibc dynamic linker behavior. glibc seems to call shared object constructors in dynamic linker context (_dl_start_user), while musl libc executes constructors in __libc_start_main called by crt1.c. musl libc dynamic linker loads the application and preloaded/depending shared object to memory. Then it jumps to the application entry point. Typically, the application entry point is the code defined in crt1.c - but not in case when the application has been linked with the internal Go linker - __libc_start_main is never invoked.

See also Rich Felkers comments on musl libc mailing list

Short background info on why we are preloading shared objects into Go processes - my company provides an application monitoring solution and the shared object pulls diverse data from the executing Go application. Thus, the use case is very relevant to us in order to be able to support musl libc based systems without constraints.

What did you expect to see?

The constructor function in the preloaded shared object should be invoked before Go application main function.

What did you see instead?

The shared object constructor is not invoked at all.

@dgryski

This comment has been minimized.

Contributor

dgryski commented Nov 21, 2018

Is there an official position on the state of musl support?

@ALTree

This comment has been minimized.

Member

ALTree commented Nov 21, 2018

@dgryski

Is there an official position on the state of musl support?

We had a Linux Alpine builder at some point, but then it was removed because it was always broken.

AFAIK the current status is 1) Go assumes glibc 2) musl may work sometimes, but since it's untested, it is not officially supported

@ianlancetaylor ianlancetaylor changed the title from shared object constructor functions are not invoked with internal linker and musl libc to cmd/link: shared object constructor functions are not invoked with internal linker and musl libc Nov 21, 2018

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Nov 21, 2018

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 21, 2018

Thanks for the investigation. My initial take on this is that the musl dynamic linker is not implementing the ELF ABI. I'm not opposed to changing this in Go if there is a clean fix, but offhand I'm not sure what that would look like.

@fweimer

This comment has been minimized.

Contributor

fweimer commented Nov 23, 2018

It looks to me that the internal linker does not link against crt1.o and just calls the main function directly from the startup code in the executable. This is fairly broken and needs to be fixed. This issue is not specific to musl, it applies to glibc as well.

@Hollerberg

This comment has been minimized.

Hollerberg commented Nov 23, 2018

Preamble - I am no expert in ELF or Linux ABI. I will try to digest the information and draw (hopefully) correct conclusions:

For me, a core statement in Rich Felkers reply to my inquiry was

If the program is not entered via __libc_start_main, libc is not usable.

It seems, that neither with glibc nor with musl libc __libc_start_main is invoked using the internal linker. Thus, resulting applications may encounter unexpected problems during execution - very likely for musl, less (to unlikely?) for glibc. But looking at the amount of initialization code in glibc __libc_start_main, leaves a bad stomach feeling, too. It isn't even in control of Go community - once one of the dependent libraries changes its initialization behavior, applications will fail in the field.

Using the external (system) linker or linking crt code with internal linker, solves the __libc_start_main issue. It is invoked by the crt code linked to the application resulting in a correctly initialized libc and correctly executed shared object constructors.

Thus, my first take away would be to either promote using the external linker or make internal linker linking crt code. At least users should be made aware about potential consequences using the internal linker.

@fweimer

This comment has been minimized.

Contributor

fweimer commented Nov 23, 2018

@Hollerberg From a glibc perspective, this is mainly a forward compatibility problem. There is a lot of code in csu/libc-start.c, but most of it is guarded by #ifndef SHARED, so it is not relevant here. For dynamic linking, all the library initialization happens in the dynamic linker. This is necessary because in glibc, constructors of shared objects run before __libc_start_main is invoked and already need a fully working libc. Musl appears to run shared object constructor from __libc_start_main, so it can delay initialization until __libc_start_main is called.

Anyway, the current internal linker behavior breaks these things for glibc, as far as I can tell:

  • Constructors in the main program are not executed. (But perhaps the internal linker has a different mechanism to deal with this.)
  • Auditing/LD_AUDIT lacks one checkpoint. (This is a very obscure feature, but important to some people.)
  • The main thread cannot be canceled, and pthread_exit will not work. This is likely unsupported in Go programs anyway.

I have not checked whether the Go linker uses libc_nonshared.a properly. If not, there are more issues, particularly on i386 with its support for legacy libio streams (which could get activated accidentally if glibc is linked improperly).

On the glibc side, we need to take the existing Go binaries into account going forward. It would have been nice if we could have switched to the simpler musl approach (I have a patch that changes __libc_start_main to run constructors in the main program via data from PT_DYNAMIC, replacing the current crt1.o indirection, and the musl approach would simplify it further), but probably we can't do that anymore due to the existing broken Go binaries.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 24, 2018

For what it's worth, the internal linker is only used for pure Go programs that are not linked against any C code. Such programs have no constructors. I suppose it's true that LD_AUDIT won't work; people who need that can use -linkmode=external to force use of the external linker. Pure Go programs never cancel threads and they never call pthread_exit.

The point of internal linking mode is to permit building Go programs without having a C development environment installed. Admittedly this case is uninteresting on GNU/Linux; it is more useful on Darwin and Windows.

@Hollerberg

This comment has been minimized.

Hollerberg commented Nov 26, 2018

I see your point - definitely, requiring external linker would reduce Go quality of service for developers.

I see three issues for (more or less) pure Go programs:

  1. the references into the resolver part of libc (getaddrinfo et al). Although they are safe in glibc case, other libc implementations (like musl libc) will not guarantee their function.
  2. one can enforce -linkmode=internal with cgo, which might imply quite obscure error scenarios for unaware developers
  3. LD_PRELOAD - this is the use case that initiated my investigations. Basically, LD_PRELOAD will "taint" a pure Go program / process (in my case, the loading of a shared object for performance monitoring). It will not have an impact on the pure Go application, but it makes other parts of the system fail.

Could Go startup code call __libc_start_main in case of an application linked dynamically with the internal linker? Could it resolve the implicit danger of not correctly initializing libc without sacrificing Go quality of service?

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 26, 2018

Calling __libc_start_main is theoretically possible but it would be a pretty big overhaul of the Go runtime package. I would be particularly concerned about the effect on statically linked Go programs; although I know that at least some glibc developers do not care much about static linking, many Go developers prefer it because it gives them a standalone binary that they can use in a Docker container. It would not be easy to separate the startup code between static and dynamic linking for programs that use libc. I would be happier about calling __libc_start_main if it were in the ELF ABI or were a widespread convention, but as far as I know it is only used by glibc and musl.

I'm perfectly comfortable in saying that anybody who wants to use LD_PRELOAD must force the use of external linking. I'm also comfortable in saying that anybody who explicitly uses -linkmode=internal must clearly understand what they are doing.

@fweimer

This comment has been minimized.

Contributor

fweimer commented Nov 27, 2018

__libc_start_main is certainly an internal implementation detail. You need to link against the startup code (crt1.o etc.), which will take care of the details. The various variants already take care of the static vs dynamic vs PIE differences.

Static vs dynamic linking is not a matter of preference. With glibc and static linking, many things simply do not work, and you do a disservice to your users by pretending otherwise (particularly when they hard-code static linking because it happens to work on amd64 at present, but breaks horribly on other architectures).

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 27, 2018

The purpose of internal linking is to avoid requiring C development tools. It wouldn't make sense to start linking against crt1.o, etc., even if we were able to keep up with the many variations. If internal linking can't work, then it can't work; we shouldn't try to make it more like external linking. When using external linking, then of course the external linker is responsible for choosing the startup object files.

Regarding static linking, we aren't pretending anything one way or another. Our users want static linking, and they do so by passing -static to the external linker to ask for a statically linked binary.
Frankly, my take on this is the reverse of yours: the glibc maintainers are doing a disservice to our shared users by refusing to support a feature that users clearly want. Requiring shared linking is entirely acceptable in the desktop world. It is not a feature in the Docker world.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment