Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: c-shared builds fail with musllibc #13492

Open
mdempsky opened this issue Dec 5, 2015 · 40 comments
Open

runtime: c-shared builds fail with musllibc #13492

mdempsky opened this issue Dec 5, 2015 · 40 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime.
Milestone

Comments

@mdempsky
Copy link
Member

mdempsky commented Dec 5, 2015

Currently, some of the init_array functions provided by a c-shared build expect to be called with (argc, argv, envp) arguments as is done by glibc, but this isn't specified by the ELF gABI for DT_INIT_ARRAY (http://www.sco.com/developers/gabi/latest/ch5.dynamic.html#init_fini), and isn't done with other libc implementations like musllibc.

CC @ianlancetaylor @mwhudson

@jamesr
Copy link

jamesr commented Dec 5, 2015

musl does expose getauxval() for querying the auxiliary vector but does not provide any way to get the main program's argc/argv. The musl maintainers argue that this isn't something a shared library should have access to anyway.

@mdempsky
Copy link
Member Author

mdempsky commented Dec 5, 2015

Somewhat relatedly, C99 says [section 5.1.2.2.1]:

The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

So in the shared library cases where a C main function runs, the Go runtime should conservatively assume the C program may legitimately mutate the argv strings. In particular, it can't use gostringnocopy on them like it currently does in goargs.

@mdempsky
Copy link
Member Author

mdempsky commented Dec 5, 2015

So I think in the cases where we need to play nicely with arbitrary C code:

  • syscall's {Clear,Get,Put,Set}env and Environ functions should all treat C as the source of truth for the environment, rather than keeping a local copy of the environment to manipulate and just trying to copy mutations to C. (E.g., even today, if cgo code calls setenv, it won't be visible to os.Getenv; whereas os.Setenv is visible to getenv.)
  • package runtime should use getauxval for accessing the aux value array, to avoid needing to locate the array in memory.

That would just leave needing to figure out a solution for os.Args. It would kinda suck, but maybe we could just leave it nil in the case where Go isn't acting as the program's entry point?

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Dec 5, 2015

If we can't get os.Args, then we have to leave it nil. But I think we should only do that for environments where we can't get it. When we can get it, as for glibc, we should.

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Dec 5, 2015
@mdempsky
Copy link
Member Author

mdempsky commented Dec 5, 2015

I just looked at glibc, Bionic, uClibc, musl, and dietlibc, as well as the dynamic linkers for FreeBSD, NetBSD, and OpenBSD. It looks like only glibc passes (argc, argv, envp) to the DSO init functions.

If we want to detect glibc at build time, it seems like we can include <limits.h> and test for defined(__GLIBC__) && !defined(__UCLIBC__). (The User-Agent saga continues as uClibc claims to be glibc.)

To detect at runtime, we could use a weak reference to a glibc-only symbol like gnu_get_libc_version and see if it resolves. I'm worried about making sure we pick a symbol that won't later be implemented by other C libraries though. (E.g., I found a thread where someone suggested adding a gnu_get_libc_version function to musl to make it compatible with Nvidia's binary drivers.)

Any other suggestions/ideas for detecting glibc?

@mdempsky mdempsky self-assigned this Dec 7, 2015
@mdempsky
Copy link
Member Author

mdempsky commented Dec 7, 2015

For what it's worth, I tracked down that glibc started passing (argc, argv, envp) to the DSO init functions in 1996: https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=elf/dl-open.c;h=76f6329762308de4ba1620c50ff32d2c02359766;hp=40b52247253cf045498761342afd09ba3c7e1187;hb=dcf0671d905200c449f92ead6cf43c184637a0d5;hpb=4884d0f03c5a3b3d2459655e76fa2d0684d389dc

So at least we don't need to worry about glibc version detection.

benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Nov 15, 2016
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
@inolen
Copy link

inolen commented Dec 29, 2016

I just ran into this issue while trying to help my partner debug a crash when using a golang plugin they'd written for fluentbit (https://github.com/fluent/fluent-bit).

When running on Alpine Linux which uses musl, I get a segfault inside of runtime.sysargs due to a bad argv pointer. The cause of this is as mentioned above, when the plugin is dlopened, musl does not pass any arguments to the members of DT_INIT_ARRAY. The platform-specific init function (_rt0_amd64_linux_lib in my case) assumes it's being passed a valid argc / argv, and eventually segfaults as they are not.

After finding this issue, it seems that the way forward is to:

  • update each platform-specific init file to contain an empty argument vector, and update _rt0_*_lib_argv to point at this empty vector
  • update each platform-specific init function to only overwrite the default argv pointer with the incoming arguments if glibc is detected

Can anyone comment if testing for glibc would still be desired or not?

@rrozestw
Copy link

rrozestw commented Dec 30, 2016

The same problem appears when using c-archive build mode with musl.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Jan 2, 2017

If there is a reliable way to test for glibc, then I think it would be perfectly reasonable to do that. @mdempsky 's comment above suggests a way.

@mdempsky mdempsky removed their assignment Jan 9, 2017
@mdempsky
Copy link
Member Author

mdempsky commented Jan 9, 2017

Unassigning because I'm not planning to work on this, but still happy to review if anyone has suggestions.

@inolen
Copy link

inolen commented Jan 9, 2017

@mdempsky I started work on the x64 / x86 / arm / arm64 versions the other night, but it was becoming quite time consuming to test. I have docker images running qemu, which are bootstrapped with a cross-compiled toolchain from my host using bootstrap.sh, but then recompile the toolchain locally inside of qemu for CGO support. Is there a better / faster way to test across the various targets?

Also, do you have any recommendations for getting access to the above pre-processor defines from the platform-specific assembly files? Is it possible, or would they need to call into some C function to do the work, e.g.:

void override_args(int argc, char *argv, int *argc_out, char *argv_out) {
#if GLIBC
  *argc_out = argc;
  *argv_out = argv;
#endif
}

@mdempsky
Copy link
Member Author

mdempsky commented Jan 9, 2017

@inolen I think calling out to a C function (or Go function using cgo) in runtime/cgo is probably simplest/cleanest. Even if POSIX guaranteed there was a system header that could be safely #include'd into non-C files, cmd/asm doesn't support the full C preprocessor language.

We could potentially have cmd/dist detect glibc and generate cmd/asm-compatible .h files, but then we need to re-run make.bash depending on target libc, which seems unfortunate.

Lastly, sorry, I don't have any good solution to efficiently testing either. That's a contributing factor to why I haven't gotten around to it yet. :/

benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Jan 13, 2017
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Feb 8, 2017
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Feb 8, 2017
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Feb 9, 2017
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
@gopherbot
Copy link

gopherbot commented Mar 9, 2017

CL https://golang.org/cl/37868 mentions this issue.

@inolen
Copy link

inolen commented Mar 9, 2017

@mdempsky pushed review to https://go-review.googlesource.com/c/37868/

Initially, I tried the approach I mentioned above, but setting up the default arguments / calling into the cgo function in each platform-specific assembler function became a lot of code. After digging around the runtime code more, I found the islibrary and isarchive bools which let me fix this outside of each platform's library init routines.

I'm not sure if the code allocating the default arguments is correct. I read through https://github.com/golang/go/blob/master/src/runtime/HACKING.md and I think using persitentalloc is sane for this case, but perhaps I need to use sysAlloc and setup the appropriate terminate functions to free it back up.

No new tests were added, as the old testcarchive / testcshared tests failed when using musl. However I did setup a few scripts which ran the golang library tests for various os / arch combinations through docker / binfmt_misc / qemu here:
https://gist.github.com/inolen/499da4e40a866b3f8fa5be3635d78721

@ericonr
Copy link

ericonr commented Jun 29, 2021

MUSL's position here is a bit dogmatic but probably not technically wrong.

It's not dogmatic, this glibc extension isn't part of the standardized ABI or documented anywhere; glibc docs and gcc docs don't mention its existence. No libc other than glibc implements this.

Easy fix: abort processing args if argv is null.

Probably works in most cases, but breaks if any of the registers have garbage in them when the function is called, which afaik is allowed.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Jun 29, 2021

For what it's worth, I believe that FreeBSD does this also nowadays: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249162.

@gopherbot
Copy link

gopherbot commented Jul 16, 2021

Change https://golang.org/cl/334991 mentions this issue: runtime: add check before using arguments with -buildmode=c-archive and -buildmode=c-shared on non glibc systems such as musl/uclinux

@ansiwen
Copy link

ansiwen commented Dec 28, 2021

What is blocking this since 6 years? Is it that no-one has time to work on it, or is there simply no good idea how to fix it, because the glibc dependency is too entangled with Go?

Also, is there any workaround to compile a Go shared object that is loadable with dlopen on MUSL systems? I would even make my own Go runtime branch as a last resort, if in the end I am able to build Alpine containers that able to load my c-shared Go library as a plugin.

@ansiwen
Copy link

ansiwen commented Dec 28, 2021

For what it's worth, compiled with -buildmode=c-shared I get:

sc_dlopen failed: Error relocating ./foobar.so: (null): initial-exec TLS resolves to dynamic definition in ./foobar.so

with

$ go version
go version go1.17.5 linux/amd64
$ cat /etc/alpine-release 
3.15.0
$ apk version musl
Installed:                                Available:
musl-1.2.2-r7                           = 1.2.2-r7 

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Dec 28, 2021

@ansiwen Most of the discussion on this issue is about whether a shared library can get access to argc/argv. As far as I know that is impossible when using MUSL and on many other non-glibc systems. I don't know of any good workaround there, although it would be best if we simply leave os.Args as a nil slice in that case. Perhaps that already happens, I don't know. I see that there is a change that is meant to address that, but it is pending review (https://golang.org/cl/334991).

The TLS problem you mention looks different. I don't know what is going on with that. It most likely required a fix in the Go linker.

Is it that no-one has time to work on it, or is there simply no good idea how to fix it, because the glibc dependency is too entangled with Go?

I would say it's more that the core Go team would look to people using MUSL to fix a problem like this. Go is an open source project and in general the core Go team puts relatively little time into non-first-class ports (the first class ports are listed at https://golang.org/wiki/PortingPolicy).

@ansiwen
Copy link

ansiwen commented Dec 28, 2021

Thanks again for you answer @ianlancetaylor, you're remarkably reactive. 👍

The TLS problem you mention looks different. I don't know what is going on with that. It most likely required a fix in the Go linker.

Ok, I thought these issues are related, since I saw a few mentions of TLS in this discussion. Do you suggest to open a separate issue?

Is it that no-one has time to work on it, or is there simply no good idea how to fix it, because the glibc dependency is too entangled with Go?

I would say it's more that the core Go team would look to people using MUSL to fix a problem like this. Go is an open source project and in general the core Go team puts relatively little time into non-first-class ports (the first class ports are listed at https://golang.org/wiki/PortingPolicy).

The porting policy doesn't mention any specific libc implementations for linux/*, so it's not clear from it, that no-glibc-linux systems are non-first-class ports.

I would love to help out, but unfortunately I have no idea what the problem here is. My experience is limited to the building of alpine based containers.

It seems like MUSL used to crash, when dlopen was used to load a shared object that contained inital-exec references to dynamic TLS. This got fixed in a way that it now bails out with a proper error: http://git.musl-libc.org/cgit/musl/commit/?id=5c2f46a214fceeee3c3e41700c51415e0a4f1acd

So, as far a I understand dlopen is only compatible with initial-exec if static TLS is used. Would it be feasible to implement a Go buildmode like "c-plugin" that uses static TLS? As I said, I have barely knowledge about that topic and feel like a blind who tries to describe colors. 😅

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Dec 29, 2021

I do think that the TLS problem would be better handled in a separate issue.

The porting policy doesn't mention any specific libc implementations for linux/*, so it's not clear from it, that no-glibc-linux systems are non-first-class ports.

Good point, I added a sentence to the page.

It seems like MUSL used to crash, when dlopen was used to load a shared object that contained inital-exec references to dynamic TLS.

I don't understand why the shared object contains initial-exec TLS uses. I think that would be the thing to fix. That would benefit all cases.

@fenos
Copy link
Contributor

fenos commented May 20, 2022

I also came across to initial-exec TLS resolves to dynamic definition when loading a c-archive library on alpine.
Is there a way to work around it?

@ansiwen
Copy link

ansiwen commented May 20, 2022

@fenos

I also came across to initial-exec TLS resolves to dynamic definition when loading a c-archive library on alpine. Is there a way to work around it?

I'm not aware of any, but I am also still interested in it. I reverted to use debian images instead of alpine, which is unfortunate.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@Sfinx
Copy link

Sfinx commented Jul 18, 2022

heh, 7 years old go bug still strikes - go plugin for fluent-bit just do not starts:

bash-5.1# ./bin/fluent-bit -e plugins/out_grafana_loki.so 
[2022/07/18 21:09:47] [ info] [config] changing coro_stack_size from 3072 to 4096 bytes
[proxy] error opening plugin plugins/out_grafana_loki.so: 'Error relocating plugins/out_grafana_loki.so: (null): initial-exec TLS resolves to dynamic definition in plugins/out_grafana_loki.so'
[2022/07/18 21:09:47] [error] [plugin] error loading proxy plugin: plugins/out_grafana_loki.so

@charleskorn
Copy link

charleskorn commented Sep 1, 2022

I've created #54805 to track the initial-exec TLS resolves to dynamic definition issue.

@donob4n
Copy link

donob4n commented Sep 29, 2022

Hi, I just created a patch for gcc-go that avoids this problem just skipping goargs() and goenv() when build as c-[shared|archive]. Essentially:

diff --git a/libgo/go/runtime/proc.go b/libgo/go/runtime/proc.go
index 881793b..52534ba 100644
--- a/libgo/go/runtime/proc.go
+++ b/libgo/go/runtime/proc.go
@@ -692,9 +692,11 @@ func schedinit() {
 		throw("sched.timeToRun not aligned to 8 bytes")
 	}
 
-	goargs()
-	goenvs()
-	parsedebugvars()
+	if !isarchive && !islibrary {
+		goargs()
+		goenvs()
+		parsedebugvars()
+	}
 	gcinit()

It works properly with libs that don't rely/need this data like https://github.com/hoehermann/purple-gowhatsapp/ but probably will affect others that use it in their logic.

In the aim of get a better integration of musl and go, would you be willing to change this glibc-ism and enforce that when building on c-archive or c-shared args and env are not accesible? There are plenty ways for passing the needed info.

If yes I can submit a PR, currently it's based on gcc-go so probably it needs some changes.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Sep 29, 2022

@donob4n That approach means that os.Args won't work in Go code build with -buildmode=c-archive or c-shared. While the current situation is not ideal, that situation is also not ideal. And, as you say, may break existing working programs. It doesn't better overall than the current situation.

@richfelker
Copy link

richfelker commented Sep 29, 2022

While the current situation is not ideal

The current situation is that it crashes dereferencing a pointer that came from interpreting garbage on the stack or in a register as a pointer, so anything that doesn't crash from that seems like an improvement.

If there are libraries written in Go that are trying to interpret the main program's initial arguments, or other random data left there after the main program overwrote that storage, this is surely a bug that needs to be identified and fixed. It's very intentional that musl does not provide these arguments to ctors because (1) it's nonstandard functionality with no means to detect its presence, meaning the only way you can use it is by writing nonportable code that commits UB when the functionality isn't available, and (2) it's functionality whose only purpose is to write library-unsafe library code that peeks (or worse, pokes) at data that doesn't belong to it.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Sep 29, 2022

From my perspective, the current situation is that these libraries work as expected when using glibc, which is the majority of Linux systems. I respect that musl has adopted a different approach, and I certainly think we should support that if we can figure out how. But while I don't know of any good answer, I don't think the approach of breaking existing code that works when using glibc is the best one.

@donob4n
Copy link

donob4n commented Sep 30, 2022

If the problem is breaking working code for glibc it could only skip 'goargs()' when no-glibc lib is detected (or at least musl).

This way the affected apps will only fail when someone tries to build them with a non-glibc and hopefully he will found the problem and report to upstrream.

Since it needs some hooks on sysargs() and other funcs, it could also write some debug warning like "Trying to read args but built as c-xxxx".

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Oct 1, 2022

Is there a reasonable way for a Go archive to know whether it is being run on a glibc or a musl system?

@donob4n
Copy link

donob4n commented Oct 2, 2022

Do you mean in runtime? There is a GLIBC macro that can be used when building.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime.
Projects
None yet
Development

No branches or pull requests