-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: panic when executing from multiple c-shared libraries #65050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
CC @golang/runtime. |
Using multiple c-shared libraries in the same process is never really supported. Currently the c-shared library assumed it is the only copy of the Go runtime in the process. Unlike plugins, it doesn't try to see if there is any other Go runtime loaded in the process and integrate with them. Having multiple c-shared libraries in the same process might work in some simple cases, where each shared library mostly works in isolation. But if the program passes pointers around, weird things can happen. You could try building them into a single c-shared library, or using plugins. Thanks. |
Speaking with our use case in mind: writing Python extensions in Go. The c-shared execution mode is at the base of Python extensions loading, we cannot build all the Go extensions in a single library. It's not even possible to know in advance which extensions would be loaded. The plugin execution mode is available only to Go applications therefore it's not usable by the Python interpreter which is written C. At this point I'd like to understand what plugins do that cannot be done by c-shared, what kind of 'passing pointers around' gets the runtime in trouble. Could you please elaborate? Substantially, what can be done to support multiple c-shared libraries? What's the problem at the root that cannot be solved? If this has been already discussed, please give me a pointer. |
The underlying problem is that currently each Go runtime assumes it has the complete information of all Go code in the process, and there is no other runtime or Go code exist in the same process. Specifically, say there are two c-shared libraries For plugins at load time it takes extra steps to find exisiting Go runtime(s) and attach various metadata (tables) to the existing ones. For c-shared build mode, it currently doesn't do that. I think there is no fundamental reason that this couldn't be done. But it needs time to make it work. Another possibility is probably make it possible to load plugins from a c-shared library, which may be less work. Then you can build one c-shared object to just do the plugin loading, and building other extensions as plugins. |
Also, currently, as the c-shared build mode is designed and implemented with the assumption that it is the only copy of Go, there is no care taken for unifying/deduplicating symbols from multiple copies. So if multiple c-shared libraries are loaded, currently whether a symbol of a function or global variable with the same name is deduplicated depends on the specific behavior of the system's dynamic linker. It may behave differently from one platform to another (e.g. on Linux and on macOS). Plugins are designed carefully with that in mind, so global variables are properly deduplicated. |
This clearly points in the direction of keeping only a single runtime around. It also sounds reasonable for efficiency and resources consumption.
I like this approach, would you supervise/support me if I try it? I know nothing of Go internals and surely I don't have enough time but I already have some entries in the hall of shame so I've nothing to loose.
This is interesting but does not seem attractive, at least for my use case. |
This is quite cryptic, I think I already read it in some other similar issue. Isn't it a generic problem of every shared library? It may well happen a clash with some other's library global symbol, there are no alternatives than leaving this in the hands of the library developer. Are you maybe referring to global symbols of the runtime? Could you clarify? In general it seems that plugins already solved most of the problems, I would try to adapt/reuse/generalize the solutions also to the c-shared case. |
I modified the repro and indeed it gets in trouble also on Go 1.20.12. Here package main
// inline void call2(void *p)
// {
// void (*f)(void) = p;
// f();
// }
import "C"
import (
"fmt"
"unsafe"
)
func main() {
}
//export fun
func fun() {
fmt.Println("fun!")
panic("fun!")
}
//export call
func call(p unsafe.Pointer) {
fmt.Printf("calling %p\n", p)
C.call2(p)
} The test now gets #include <assert.h>
#include <stdio.h>
#include <dlfcn.h>
typedef void (*fun)(void);
typedef void (*call)(void*);
int main(int argc, char* argv[])
{
void *lib1 = dlopen("./lib1.so", RTLD_NOW);
assert(lib1);
void *lib2 = dlopen("./lib2.so", RTLD_NOW);
assert(lib2);
fun fun = dlsym(lib1, "fun");
assert(fun);
call call = dlsym(lib2, "call");
assert(call);
printf("fun: %p\n", fun);
call(fun);
dlclose(lib1);
dlclose(lib2);
return 0;
} The result is this panic:
If test is modified to take both
What I don't get is that in all (except Go 1.10.4 on Ubuntu 18.04, the only 1.10.x in the batch) of the Pygolo pipelines, which include also some macOS versions (??), the runtime does not panic in any of the 1000 iterations. What happens is that So, despite that the cross c-shared function invocation seems to work, I see the holes as you described above. @cherrymui, what should I add to this test in order to have a complete picture of what needs to be implemented? You mentioned types identity but Go types do not cross the C API barrier. Even when the underlying data of a |
I would really like to see a least some limited support for such a use case. I'm often considering writing plugins (plugin in the sense of a c-shared library) for things that use dynamic loading of c-shared libraries. While I would often favor Go for that the pick-at-max-one-plugin-written-in-Go limitation makes me rule out using Go completely as I have no control over what other plugins may be run within the process. I would not expect the two libraries communicating with each other directly, but co-existing in some way - even them having to requiring the same runtime version across all would be something to work with. |
@fzwoch I don't think anybody is opposed to this. It just doesn't work with the current implementation approach. Nobody has come up with an alternate implementation approach, and any such approach would take considerable implementation effort. |
I'd be quite happy if there was a way to detect that a runtime is already running so that troublesome situations could be handled in a controlled way. In my case, where the library is actually a Python extension, it would be nice to set up a Python exception and abort the installation of the module into the interpreter. This would be my very first step, without the aim of solving the full problem but with that in prospective. However I won't be able to focus on this before 3-6 months. Would be an hacky PR ok to start a practical conversation or is there a more formal process? |
@cavokz are you suggesting that if a Go runtime is already loaded, at the second one we just fail out loud? That is probably doable. But I'm worried that there are existing use cases that relies on loading multiple Go shared objects and using them in a careful and isolated way, which will break if we add a check. If we do that, we'll need a way to fall back (an environment variable, for example). |
@cherrymui I'd prefer the library to be in charge of such check so that it can handle/report the failure according to its entry point protocol. Such check could be later expanded to fail only if the two runtimes have different versions, in case mutliple c-shared libraries become supported. If it's all done by the runtime startup then how can it do anything aside panicking? I expect that dlopen has long returned a success before the runtime startup has any opportunity to check for any existing selves. If panic is the only option I think it's way better than just letting the situation collapse later with some apparently unrelated crash. Is there any support for "loading multiple Go shared objects and using them in a careful and isolated way" that is not mentioned in the Go Execution Modes? |
No. Technically this is not supported. But it doesn't mean that there aren't programs that do this and "happen to work". |
What about using the c-archive build mode (with static linking)? Would the same problem happen when mixing a single c-shared library with a single c-archive library? Or two c-archive libraries? |
That's not an option for the Python modules use case. |
From my understanding the problem of runtime clash with multiple c-shared libraries is not Python-specific, so my question was general, for any tech stack where such a configuration is possible. |
Yes, at least theoretically, the same problem would arise, and this is not really supported currently. In practice it may be more likely or less likely to fail, depending on how the C static and dynamic linkers resolve and deduplicate symbols. |
I am in the same boat but slightly different situation. I only need to support 3-4 "plugins". So I am wondering, if I intentionally compile each of them with different go version would that work? My hope is that each of them would use their own runtime? |
I think this should be reopened -- after bisection this appears to have regressed in c426c87 but only on macos (and as far as I can tell only on x86_64 macos?) I used the reproducer in this issue and bisected using: #!/usr/bin/env bash
# ../bisect.sh
set -euxo pipefail
git clean -fxfd
cd src && bash make.bash
cd ..
export "PATH=$PWD/bin:$PATH"
cd ~/workspace/cgo-repro
make clean
make and git bisect start
git bisect good go1.20.12
git bisect bad go1.21.5
git bisect run ../bisect.sh I suspect there's maybe a bug in the the usecase above (as python modules) should not be running into symbol collisions -- python's import system loads modules by default using |
I agree that https://go.dev/cl/495855 is going to make it even harder to support multiple c-shared libraries. But I'm skeptical that it ever worked in general although it may well have worked for your case. CC @cherrymui |
As Ian mentioned, this never works in general. It is possible that some specific case happened to work, but it is not supported and subject to change. If reopening, it would be a feature request. Thanks. |
I'm sorry but I think you're maybe misunderstanding symbol relocation and shared objects build-shared is completely useless if you can't link multiple shared libraries simply because they are written in go this used to work for years perfectly and was broken by the patch above which as far as I can tell is just a "performance hack" |
@asottile It's not only a matter of symbol relocation and shared objects. The Go runtime also expects to have control over process-wide concepts like signal handlers and thread-local storage at certain offsets. The Go team is unfortunately unable to commit to making everything work when there are multiple Go shared libraries in an executable process. I completely understand that this is frustrating. Our resources are limited. We can review patches to make these cases work better, but we can't promise to fix them ourselves. |
I don't understand why a very clear regression is not being taken seriously. |
I'm sorry about the trouble. I don't know what to say except that we never intended to support that use case and we didn't know that it worked. The change that appears to have broken it is a significant performance benefit to programs that use cgo and don't use shared libraries at all. As far as I can tell the breakage is indeed due to conflicting signal handlers, a case that will always be problematic when using multiple shared libraries. |
Just out of curiosity, what about WASI? Isn't there guarantee that multiple WASI libraries made in Go work in one process? |
It works if each WASI library runs on separate instances of Wasm linear memory. If you use a shared buffer for the backing store of the linear memory for multiple WASI library instances (I don't know if this is possible, but even if it is), it would not work. |
For c-shared mode on other platforms (ELF, Mach-O, etc.), we have a mechanism that supports multiple copies of the runtime, the plugin mode. In theory I think we can apply a similar mechanism to the c-shared mode. At startup, it would detect if there is an existing instance of the runtime, and if so, add itself as a new runtime module, which includes combining certain tables and deduplicating certain things (beyond what the system dynamic linker does). It would need to be built with this assumption, instead of assuming it is the only copy, and also use proper relocations and symbol visibilities that are compatible with this. I think this is probably possible, but need quite some work. |
Go version
go version go1.21.5 darwin/amd64
Output of
go env
in your module/workspace:What did you do?
I encountered this issue on macOS while importing two Python extensions written in Go using Pygolo. I could reduce the problem to the following repro, completely removing Python from the picture.
This is a minimal shared library exporting a dummy function:
This test C program loads two libraries built from the above source code and invokes the exported
fun
function of each.This Makefile builds the two libraries and the test program, invokes the test multiple times. In an handful of attempts the runtime explodes.
What did you see happen?
After a few executions of the test program, the Go runtime panics. For example:
What did you expect to see?
I expect to see no panics. Given the repro, I expect a simple
.
for each execution of the test program.Various Go versions
Used goenv to try different versions on the same system (macOS 14.2.1) starting from 1.10.8.
1.10.8: fails to build, no messages
1.11.13, 1.12.17, 1.13.15, 1.14.15, 1.15.15, 1.16.15: build of the shared library fails with
combining dwarf failed: Unknown load command 0x80000034 (2147483700)
1.16.15, 1.17.13, 1.18.10, 1.19.13, and 1.20.12: tested 1000 iterations, no panics
1.21.0, 1.21.1, 1.21.2, 1.21.3, 1.21.4, 1.21.5, 1.21.6: fail after a few iterations with
fatal error: bad sweepgen in refill
as above.1.22-8db131082d: same as 1.21.x.
On Debian unstable, 1.21.5 works without issues. Did not try all the versions above but the test seems to pass on various distributions like Ubuntu, Debian, Alpine, RHEL, SLE. See https://gitlab.com/pygolo/py/-/blob/main/docs/TEST-MATRIX.md.
I will complete the map of success/failures at go-multi-c-shared.
The text was updated successfully, but these errors were encountered: