Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: c-shared builds fail with musllibc #13492

Open
mdempsky opened this issue Dec 5, 2015 · 19 comments
Milestone

Comments

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Dec 5, 2015

Currently, some of the init_array functions provided by a c-shared build expect to be called with (argc, argv, envp) arguments as is done by glibc, but this isn't specified by the ELF gABI for DT_INIT_ARRAY (http://www.sco.com/developers/gabi/latest/ch5.dynamic.html#init_fini), and isn't done with other libc implementations like musllibc.

CC @ianlancetaylor @mwhudson

@jamesr

This comment has been minimized.

Copy link

@jamesr jamesr commented Dec 5, 2015

musl does expose getauxval() for querying the auxiliary vector but does not provide any way to get the main program's argc/argv. The musl maintainers argue that this isn't something a shared library should have access to anyway.

@mdempsky

This comment has been minimized.

Copy link
Member Author

@mdempsky mdempsky commented Dec 5, 2015

Somewhat relatedly, C99 says [section 5.1.2.2.1]:

The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

So in the shared library cases where a C main function runs, the Go runtime should conservatively assume the C program may legitimately mutate the argv strings. In particular, it can't use gostringnocopy on them like it currently does in goargs.

@mdempsky

This comment has been minimized.

Copy link
Member Author

@mdempsky mdempsky commented Dec 5, 2015

So I think in the cases where we need to play nicely with arbitrary C code:

  • syscall's {Clear,Get,Put,Set}env and Environ functions should all treat C as the source of truth for the environment, rather than keeping a local copy of the environment to manipulate and just trying to copy mutations to C. (E.g., even today, if cgo code calls setenv, it won't be visible to os.Getenv; whereas os.Setenv is visible to getenv.)
  • package runtime should use getauxval for accessing the aux value array, to avoid needing to locate the array in memory.

That would just leave needing to figure out a solution for os.Args. It would kinda suck, but maybe we could just leave it nil in the case where Go isn't acting as the program's entry point?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 5, 2015

If we can't get os.Args, then we have to leave it nil. But I think we should only do that for environments where we can't get it. When we can get it, as for glibc, we should.

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Dec 5, 2015
@mdempsky

This comment has been minimized.

Copy link
Member Author

@mdempsky mdempsky commented Dec 5, 2015

I just looked at glibc, Bionic, uClibc, musl, and dietlibc, as well as the dynamic linkers for FreeBSD, NetBSD, and OpenBSD. It looks like only glibc passes (argc, argv, envp) to the DSO init functions.

If we want to detect glibc at build time, it seems like we can include <limits.h> and test for defined(__GLIBC__) && !defined(__UCLIBC__). (The User-Agent saga continues as uClibc claims to be glibc.)

To detect at runtime, we could use a weak reference to a glibc-only symbol like gnu_get_libc_version and see if it resolves. I'm worried about making sure we pick a symbol that won't later be implemented by other C libraries though. (E.g., I found a thread where someone suggested adding a gnu_get_libc_version function to musl to make it compatible with Nvidia's binary drivers.)

Any other suggestions/ideas for detecting glibc?

@mdempsky mdempsky self-assigned this Dec 7, 2015
@mdempsky

This comment has been minimized.

Copy link
Member Author

@mdempsky mdempsky commented Dec 7, 2015

For what it's worth, I tracked down that glibc started passing (argc, argv, envp) to the DSO init functions in 1996: https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=elf/dl-open.c;h=76f6329762308de4ba1620c50ff32d2c02359766;hp=40b52247253cf045498761342afd09ba3c7e1187;hb=dcf0671d905200c449f92ead6cf43c184637a0d5;hpb=4884d0f03c5a3b3d2459655e76fa2d0684d389dc

So at least we don't need to worry about glibc version detection.

benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Nov 15, 2016
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
@inolen

This comment has been minimized.

Copy link

@inolen inolen commented Dec 29, 2016

I just ran into this issue while trying to help my partner debug a crash when using a golang plugin they'd written for fluentbit (https://github.com/fluent/fluent-bit).

When running on Alpine Linux which uses musl, I get a segfault inside of runtime.sysargs due to a bad argv pointer. The cause of this is as mentioned above, when the plugin is dlopened, musl does not pass any arguments to the members of DT_INIT_ARRAY. The platform-specific init function (_rt0_amd64_linux_lib in my case) assumes it's being passed a valid argc / argv, and eventually segfaults as they are not.

After finding this issue, it seems that the way forward is to:

  • update each platform-specific init file to contain an empty argument vector, and update _rt0_*_lib_argv to point at this empty vector
  • update each platform-specific init function to only overwrite the default argv pointer with the incoming arguments if glibc is detected

Can anyone comment if testing for glibc would still be desired or not?

@rrozestw

This comment has been minimized.

Copy link

@rrozestw rrozestw commented Dec 30, 2016

The same problem appears when using c-archive build mode with musl.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 2, 2017

If there is a reliable way to test for glibc, then I think it would be perfectly reasonable to do that. @mdempsky 's comment above suggests a way.

@mdempsky mdempsky removed their assignment Jan 9, 2017
@mdempsky

This comment has been minimized.

Copy link
Member Author

@mdempsky mdempsky commented Jan 9, 2017

Unassigning because I'm not planning to work on this, but still happy to review if anyone has suggestions.

@inolen

This comment has been minimized.

Copy link

@inolen inolen commented Jan 9, 2017

@mdempsky I started work on the x64 / x86 / arm / arm64 versions the other night, but it was becoming quite time consuming to test. I have docker images running qemu, which are bootstrapped with a cross-compiled toolchain from my host using bootstrap.sh, but then recompile the toolchain locally inside of qemu for CGO support. Is there a better / faster way to test across the various targets?

Also, do you have any recommendations for getting access to the above pre-processor defines from the platform-specific assembly files? Is it possible, or would they need to call into some C function to do the work, e.g.:

void override_args(int argc, char *argv, int *argc_out, char *argv_out) {
#if GLIBC
  *argc_out = argc;
  *argv_out = argv;
#endif
}
@mdempsky

This comment has been minimized.

Copy link
Member Author

@mdempsky mdempsky commented Jan 9, 2017

@inolen I think calling out to a C function (or Go function using cgo) in runtime/cgo is probably simplest/cleanest. Even if POSIX guaranteed there was a system header that could be safely #include'd into non-C files, cmd/asm doesn't support the full C preprocessor language.

We could potentially have cmd/dist detect glibc and generate cmd/asm-compatible .h files, but then we need to re-run make.bash depending on target libc, which seems unfortunate.

Lastly, sorry, I don't have any good solution to efficiently testing either. That's a contributing factor to why I haven't gotten around to it yet. :/

benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Jan 13, 2017
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Feb 8, 2017
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Feb 8, 2017
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Feb 9, 2017
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Mar 9, 2017

CL https://golang.org/cl/37868 mentions this issue.

@inolen

This comment has been minimized.

Copy link

@inolen inolen commented Mar 9, 2017

@mdempsky pushed review to https://go-review.googlesource.com/c/37868/

Initially, I tried the approach I mentioned above, but setting up the default arguments / calling into the cgo function in each platform-specific assembler function became a lot of code. After digging around the runtime code more, I found the islibrary and isarchive bools which let me fix this outside of each platform's library init routines.

I'm not sure if the code allocating the default arguments is correct. I read through https://github.com/golang/go/blob/master/src/runtime/HACKING.md and I think using persitentalloc is sane for this case, but perhaps I need to use sysAlloc and setup the appropriate terminate functions to free it back up.

No new tests were added, as the old testcarchive / testcshared tests failed when using musl. However I did setup a few scripts which ran the golang library tests for various os / arch combinations through docker / binfmt_misc / qemu here:
https://gist.github.com/inolen/499da4e40a866b3f8fa5be3635d78721

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9Maybe Jul 20, 2017
Dimrok pushed a commit to infinit/memo that referenced this issue Jul 26, 2017
Currently we ship libgcc and libstdc++, but on alpine, they are useless.
For a start, libgcc_s.so is a linker script, not a library.  So we need
to actually get the real stuff: libstdc++.so.6 and libgcc_s.so.1.  This
fixes our current docker-alpine, which when started, crashed with:

    $ docker run --rm $PROJECT-alpine:$DESC memo --help
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/memo)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libmemo.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libelle_webapi_aws.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libelle_cryptography.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libelle_core.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libelle_protocol.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libelle_reactor.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libprometheus-cpp.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libstdc++.so.6)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libboost_filesystem.so.1.60.0)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libboost_regex.so.1.60.0)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libelle_dropbox.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libgrpc.so)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libboost_thread.so.1.60.0)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libboost_context.so.1.60.0)
    Error loading shared library libgcc_s.so.1: Exec format error (needed by /usr/bin/../lib/libprotobuf.so)
    Error relocating /usr/bin/../lib/libmemo.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libelle_webapi_aws.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libelle_cryptography.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libelle_core.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libelle_protocol.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libelle_reactor.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libprometheus-cpp.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_GetRegionStart: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_GetTextRelBase: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_RaiseException: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_Resume_or_Rethrow: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_GetIPInfo: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_GetLanguageSpecificData: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_GetDataRelBase: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_SetGR: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_DeleteException: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libstdc++.so.6: _Unwind_SetIP: symbol not found
    Error relocating /usr/bin/../lib/libboost_filesystem.so.1.60.0: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libboost_regex.so.1.60.0: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libelle_dropbox.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libgrpc.so: __udivti3: symbol not found
    Error relocating /usr/bin/../lib/libgrpc.so: __umodti3: symbol not found
    Error relocating /usr/bin/../lib/libgrpc.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libboost_thread.so.1.60.0: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libboost_context.so.1.60.0: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/../lib/libprotobuf.so: _Unwind_Resume: symbol not found
    Error relocating /usr/bin/memo: _Unwind_Resume: symbol not found

Unfortunately, it does not fix all the errors: bin/memo crashes very
very soon (logs don't even get a chance to start).  It seems to be
related to go, see golang/go#13492.  Go's
runtime tries to read argv, which is nullptr.  It happens in alpine
because it's C library does not try to expose argv/argc/envp to the
shared libs, contrary to what the glibc does.
Dimrok pushed a commit to infinit/memo that referenced this issue Sep 7, 2017
Currently we cannot mix the go runtime and ours on musl libc (which is
the libc on Alpine).  See golang/go#13492 for details.
@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017
@gopherbot gopherbot modified the milestones: Go1.11, Unplanned May 23, 2018
@rprichard

This comment has been minimized.

Copy link

@rprichard rprichard commented Jan 15, 2019

I'm wondering if Go-on-musl has the same problem as Go-on-Bionic (#29674) where it wants to allocate a word of static TLS memory, but if the Go code is packaged into an solib and loaded with dlopen, there's no reliable way to do so.

musl appears to ignore DF_STATIC_TLS and allow TPREL/TPOFF relocations to a TLS symbol in a dlopen'ed solib. AFAICT, the relocations are valid only for new threads -- if any existing thread uses the TLS IE relocation to access a runtime.tlsg TLS symbol, it would access unallocated memory.

@ooepay

This comment has been minimized.

Copy link

@ooepay ooepay commented Apr 9, 2019

Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 400]
runtime.sysargs (argc=0, argv=0x0) at /usr/local/go/src/runtime/os_linux.go:206

arm-hisiv500-linux-uclibcgnueabi-gcc

@severin83

This comment has been minimized.

Copy link

@severin83 severin83 commented Apr 18, 2019

Since this topic has been open for a while, I was wondering if there is any news about it, or any suggested workarounds?
I'm trying to use a c-shared go library in a docker container based on Alpine Linux. My application is in Java and uses the lib through jnr-ffi. It works on other distributions, but on Alpine Linux it gives me this error:
java.lang.UnsatisfiedLinkError: Error relocating /usr/local/lib/liblicense2go-client.so: : initial-exec TLS resolves to dynamic definition in /usr/local/lib/liblicense2go-client.so at jnr.ffi.provider.jffi.NativeLibrary.loadNativeLibraries(NativeLibrary.java:87) ~[jnr-ffi-2.1.9.jar!/:na] at jnr.ffi.provider.jffi.NativeLibrary.getNativeLibraries(NativeLibrary.java:70) ~[jnr-ffi-2.1.9.jar!/:na] at jnr.ffi.provider.jffi.NativeLibrary.getSymbolAddress(NativeLibrary.java:49) ~[jnr-ffi-2.1.9.jar!/:na] at jnr.ffi.provider.jffi.NativeLibrary.findSymbolAddress(NativeLibrary.java:59) ~[jnr-ffi-2.1.9.jar!/:na] at jnr.ffi.provider.jffi.AsmLibraryLoader.generateInterfaceImpl(AsmLibraryLoader.java:158) ~[jnr-ffi-2.1.9.jar!/:na] at jnr.ffi.provider.jffi.AsmLibraryLoader.loadLibrary(AsmLibraryLoader.java:89) ~[jnr-ffi-2.1.9.jar!/:na] at jnr.ffi.provider.jffi.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:44) ~[jnr-ffi-2.1.9.jar!/:na] at jnr.ffi.LibraryLoader.load(LibraryLoader.java:325) ~[jnr-ffi-2.1.9.jar!/:na] at jnr.ffi.LibraryLoader.load(LibraryLoader.java:304) ~[jnr-ffi-2.1.9.jar!/:na]

I also tried to link the c-shared go library to a c++ binary compiled on Alpine Linux, and it gives me a Segmentation Fault.

If I compile the same go code as an executable it runs nicely in the Alpine docker container.
Thanks

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 18, 2019

The initial comment on this issue explains the problem. Someone will need to fix it in the Go runtime package. I'm not aware of any workarounds.

Your Java link error seems like a different problem, though.

@severin83

This comment has been minimized.

Copy link

@severin83 severin83 commented Apr 23, 2019

For info: using dlopen in c++ to open the library dynamically, rather than linking against it, gives me the same error as with the Java JNI:
initial-exec TLS resolves to dynamic definition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.