New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: c-shared builds fail with musllibc #13492

Open
mdempsky opened this Issue Dec 5, 2015 · 14 comments

Comments

Projects
None yet
9 participants
@mdempsky
Member

mdempsky commented Dec 5, 2015

Currently, some of the init_array functions provided by a c-shared build expect to be called with (argc, argv, envp) arguments as is done by glibc, but this isn't specified by the ELF gABI for DT_INIT_ARRAY (http://www.sco.com/developers/gabi/latest/ch5.dynamic.html#init_fini), and isn't done with other libc implementations like musllibc.

CC @ianlancetaylor @mwhudson

@jamesr

This comment has been minimized.

jamesr commented Dec 5, 2015

musl does expose getauxval() for querying the auxiliary vector but does not provide any way to get the main program's argc/argv. The musl maintainers argue that this isn't something a shared library should have access to anyway.

@mdempsky

This comment has been minimized.

Member

mdempsky commented Dec 5, 2015

Somewhat relatedly, C99 says [section 5.1.2.2.1]:

The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

So in the shared library cases where a C main function runs, the Go runtime should conservatively assume the C program may legitimately mutate the argv strings. In particular, it can't use gostringnocopy on them like it currently does in goargs.

@mdempsky

This comment has been minimized.

Member

mdempsky commented Dec 5, 2015

So I think in the cases where we need to play nicely with arbitrary C code:

  • syscall's {Clear,Get,Put,Set}env and Environ functions should all treat C as the source of truth for the environment, rather than keeping a local copy of the environment to manipulate and just trying to copy mutations to C. (E.g., even today, if cgo code calls setenv, it won't be visible to os.Getenv; whereas os.Setenv is visible to getenv.)
  • package runtime should use getauxval for accessing the aux value array, to avoid needing to locate the array in memory.

That would just leave needing to figure out a solution for os.Args. It would kinda suck, but maybe we could just leave it nil in the case where Go isn't acting as the program's entry point?

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Dec 5, 2015

If we can't get os.Args, then we have to leave it nil. But I think we should only do that for environments where we can't get it. When we can get it, as for glibc, we should.

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Dec 5, 2015

@mdempsky

This comment has been minimized.

Member

mdempsky commented Dec 5, 2015

I just looked at glibc, Bionic, uClibc, musl, and dietlibc, as well as the dynamic linkers for FreeBSD, NetBSD, and OpenBSD. It looks like only glibc passes (argc, argv, envp) to the DSO init functions.

If we want to detect glibc at build time, it seems like we can include <limits.h> and test for defined(__GLIBC__) && !defined(__UCLIBC__). (The User-Agent saga continues as uClibc claims to be glibc.)

To detect at runtime, we could use a weak reference to a glibc-only symbol like gnu_get_libc_version and see if it resolves. I'm worried about making sure we pick a symbol that won't later be implemented by other C libraries though. (E.g., I found a thread where someone suggested adding a gnu_get_libc_version function to musl to make it compatible with Nvidia's binary drivers.)

Any other suggestions/ideas for detecting glibc?

@mdempsky mdempsky self-assigned this Dec 7, 2015

@mdempsky

This comment has been minimized.

Member

mdempsky commented Dec 7, 2015

For what it's worth, I tracked down that glibc started passing (argc, argv, envp) to the DSO init functions in 1996: https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=elf/dl-open.c;h=76f6329762308de4ba1620c50ff32d2c02359766;hp=40b52247253cf045498761342afd09ba3c7e1187;hb=dcf0671d905200c449f92ead6cf43c184637a0d5;hpb=4884d0f03c5a3b3d2459655e76fa2d0684d389dc

So at least we don't need to worry about glibc version detection.

benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Nov 15, 2016

Benoît Canet
elf/app: Pass argc and argv to library initialization
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
@inolen

This comment has been minimized.

inolen commented Dec 29, 2016

I just ran into this issue while trying to help my partner debug a crash when using a golang plugin they'd written for fluentbit (https://github.com/fluent/fluent-bit).

When running on Alpine Linux which uses musl, I get a segfault inside of runtime.sysargs due to a bad argv pointer. The cause of this is as mentioned above, when the plugin is dlopened, musl does not pass any arguments to the members of DT_INIT_ARRAY. The platform-specific init function (_rt0_amd64_linux_lib in my case) assumes it's being passed a valid argc / argv, and eventually segfaults as they are not.

After finding this issue, it seems that the way forward is to:

  • update each platform-specific init file to contain an empty argument vector, and update _rt0_*_lib_argv to point at this empty vector
  • update each platform-specific init function to only overwrite the default argv pointer with the incoming arguments if glibc is detected

Can anyone comment if testing for glibc would still be desired or not?

@rrozestw

This comment has been minimized.

rrozestw commented Dec 30, 2016

The same problem appears when using c-archive build mode with musl.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jan 2, 2017

If there is a reliable way to test for glibc, then I think it would be perfectly reasonable to do that. @mdempsky 's comment above suggests a way.

@mdempsky mdempsky removed their assignment Jan 9, 2017

@mdempsky

This comment has been minimized.

Member

mdempsky commented Jan 9, 2017

Unassigning because I'm not planning to work on this, but still happy to review if anyone has suggestions.

@inolen

This comment has been minimized.

inolen commented Jan 9, 2017

@mdempsky I started work on the x64 / x86 / arm / arm64 versions the other night, but it was becoming quite time consuming to test. I have docker images running qemu, which are bootstrapped with a cross-compiled toolchain from my host using bootstrap.sh, but then recompile the toolchain locally inside of qemu for CGO support. Is there a better / faster way to test across the various targets?

Also, do you have any recommendations for getting access to the above pre-processor defines from the platform-specific assembly files? Is it possible, or would they need to call into some C function to do the work, e.g.:

void override_args(int argc, char *argv, int *argc_out, char *argv_out) {
#if GLIBC
  *argc_out = argc;
  *argv_out = argv;
#endif
}
@mdempsky

This comment has been minimized.

Member

mdempsky commented Jan 9, 2017

@inolen I think calling out to a C function (or Go function using cgo) in runtime/cgo is probably simplest/cleanest. Even if POSIX guaranteed there was a system header that could be safely #include'd into non-C files, cmd/asm doesn't support the full C preprocessor language.

We could potentially have cmd/dist detect glibc and generate cmd/asm-compatible .h files, but then we need to re-run make.bash depending on target libc, which seems unfortunate.

Lastly, sorry, I don't have any good solution to efficiently testing either. That's a contributing factor to why I haven't gotten around to it yet. :/

benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Jan 13, 2017

Benoît Canet
elf/app: Pass argc and argv to library initialization
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>

benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Feb 8, 2017

Benoît Canet
elf/app: Pass argc and argv to library initialization
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>

benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Feb 8, 2017

Benoît Canet
elf/app: Pass argc and argv to library initialization
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>

benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Feb 9, 2017

Benoît Canet
elf/app: Pass argc and argv to library initialization
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
@gopherbot

This comment has been minimized.

gopherbot commented Mar 9, 2017

CL https://golang.org/cl/37868 mentions this issue.

@inolen

This comment has been minimized.

inolen commented Mar 9, 2017

@mdempsky pushed review to https://go-review.googlesource.com/c/37868/

Initially, I tried the approach I mentioned above, but setting up the default arguments / calling into the cgo function in each platform-specific assembler function became a lot of code. After digging around the runtime code more, I found the islibrary and isarchive bools which let me fix this outside of each platform's library init routines.

I'm not sure if the code allocating the default arguments is correct. I read through https://github.com/golang/go/blob/master/src/runtime/HACKING.md and I think using persitentalloc is sane for this case, but perhaps I need to use sysAlloc and setup the appropriate terminate functions to free it back up.

No new tests were added, as the old testcarchive / testcshared tests failed when using musl. However I did setup a few scripts which ran the golang library tests for various os / arch combinations through docker / binfmt_misc / qemu here:
https://gist.github.com/inolen/499da4e40a866b3f8fa5be3635d78721

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9Maybe Jul 20, 2017

Dimrok pushed a commit to infinit/memo that referenced this issue Jul 26, 2017

docker: fix alpine
Currently we ship libgcc and libstdc++, but on alpine, they are useless.
For a start, libgcc_s.so is a linker script, not a library.  So we need
to actually get the real stuff: libstdc++.so.6 and libgcc_s.so.1.  This
fixes our current docker-alpine, which when started, crashed with:

    $ docker run --rm $PROJECT-alpine:$DESC memo --help
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/memo)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libmemo.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libelle_webapi_aws.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libelle_cryptography.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libelle_core.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libelle_protocol.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libelle_reactor.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libprometheus-cpp.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libstdc++.so.6)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libboost_filesystem.so.1.60.0)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libboost_regex.so.1.60.0)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libelle_dropbox.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libgrpc.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libboost_thread.so.1.60.0)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libboost_context.so.1.60.0)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libprotobuf.so)
    Error relocating /usr/bin/../lib/libmemo.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libelle_webapi_aws.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libelle_cryptography.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libelle_core.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libelle_protocol.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libelle_reactor.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libprometheus-cpp.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_GetRegionStart: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_GetTextRelBase: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_RaiseException: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_Resume_or_Rethrow: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_GetIPInfo: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_GetLanguageSpecificData: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_GetDataRelBase: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_SetGR: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_DeleteException: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_SetIP: symbol not found
    Error relocating /usr/bin/../lib/libboost_filesystem.so.1.60.0: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libboost_regex.so.1.60.0: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libelle_dropbox.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libgrpc.so: __udivti3: symbol not found
    Error relocating /usr/bin/../lib/libgrpc.so: __umodti3: symbol not found
    Error relocating /usr/bin/../lib/libgrpc.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libboost_thread.so.1.60.0: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libboost_context.so.1.60.0: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libprotobuf.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/memo: _Unwind_Resume: symbol not found

Unfortunately, it does not fix all the errors: bin/memo crashes very
very soon (logs don't even get a chance to start).  It seems to be
related to go, see golang/go#13492.  Go's
runtime tries to read argv, which is nullptr.  It happens in alpine
because it's C library does not try to expose argv/argc/envp to the
shared libs, contrary to what the glibc does.

Dimrok pushed a commit to infinit/memo that referenced this issue Sep 7, 2017

build: disable libkvs on Alpine
Currently we cannot mix the go runtime and ours on musl libc (which is
the libc on Alpine).  See golang/go#13492 for details.

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

@gopherbot gopherbot modified the milestones: Go1.11, Unplanned May 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment