Skip to content

runtime: panic when executing from multiple c-shared libraries #65050

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cavokz opened this issue Jan 10, 2024 · 29 comments
Closed

runtime: panic when executing from multiple c-shared libraries #65050

cavokz opened this issue Jan 10, 2024 · 29 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@cavokz
Copy link

cavokz commented Jan 10, 2024

Go version

go version go1.21.5 darwin/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/Users/cavok/Library/Caches/go-build'
GOENV='/Users/cavok/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/cavok/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/cavok/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/Cellar/go/1.21.6/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/Cellar/go/1.21.6/libexec/pkg/tool/darwin_amd64'
GOVCS=''
GOVERSION='go1.21.6'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/9y/hlpdgn0s10s5c3_k60jc0d5c0000gn/T/go-build4292255947=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

I encountered this issue on macOS while importing two Python extensions written in Go using Pygolo. I could reduce the problem to the following repro, completely removing Python from the picture.

This is a minimal shared library exporting a dummy function:

package main

import "C"

func main() {
}

//export fun
func fun() {
}

This test C program loads two libraries built from the above source code and invokes the exported fun function of each.

#include <assert.h>
#include <stdio.h>
#include <dlfcn.h>

typedef void (*fun)(void);

int main(int argc, char* argv[])
{
	void *lib1 = dlopen("./lib1.so", RTLD_NOW);
	if (!lib1) {
		printf("%s\n", dlerror());
	}
	assert(lib1);

	void *lib2 = dlopen("./lib2.so", RTLD_NOW);
	if (!lib2) {
		printf("%s\n", dlerror());
	}
	assert(lib2);

	fun fun1 = dlsym(lib1, "fun");
	assert(fun1);
	fun1();

	fun fun2 = dlsym(lib2, "fun");
	assert(fun2);
	fun2();

	dlclose(lib1);
	dlclose(lib2);
	return 0;
}

This Makefile builds the two libraries and the test program, invokes the test multiple times. In an handful of attempts the runtime explodes.

GO ?= go
LIBS := lib1.so lib2.so

ITERATIONS ?= 1000

all: $(LIBS) test
	for n in `seq $(ITERATIONS)`; do ./test || exit 1; printf .; done; echo ok

test: export LDFLAGS := -ldl

%.so: lib.go FORCE
	$(GO) build -buildmode=c-shared -o $@ $<

clean:
	rm -rf test $(LIBS) $(LIBS:.so=.h)

FORCE:
.PHONY: FORCE

What did you see happen?

After a few executions of the test program, the Go runtime panics. For example:

fatal error: bad sweepgen in refill

goroutine 17 [running, locked to thread]:
runtime.throw({0x1505830c6?, 0x0?})
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/panic.go:1077 +0x5c fp=0xc00006ec10 sp=0xc00006ebe0 pc=0x15055341c
runtime.(*mcache).refill(0x1096e6a68, 0x0?)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/mcache.go:157 +0x20d fp=0xc00006ec50 sp=0xc00006ec10 pc=0x150537b6d
runtime.(*mcache).nextFree(0x1096e6a68, 0x10)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/malloc.go:929 +0x85 fp=0xc00006ec98 sp=0xc00006ec50 pc=0x150531645
runtime.mallocgc(0x58, 0x1505a6ec0, 0x1)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/malloc.go:1116 +0x448 fp=0xc00006ed00 sp=0xc00006ec98 pc=0x150531c08
runtime.newobject(0x0?)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/malloc.go:1328 +0x25 fp=0xc00006ed28 sp=0xc00006ed00 pc=0x150532145
runtime.acquireSudog()
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/proc.go:437 +0x229 fp=0xc00006ed90 sp=0xc00006ed28 pc=0x150556489
runtime.chanrecv(0x1c0000a0000, 0x0, 0x1)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/chan.go:563 +0x225 fp=0xc00006ee08 sp=0xc00006ed90 pc=0x15052bc25
runtime.chanrecv1(0x10984e950?, 0x0?)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/chan.go:442 +0x12 fp=0xc00006ee30 sp=0xc00006ee08 pc=0x15052b9f2
runtime.cgocallbackg1(0x150580620, 0xc00006efe0?, 0x0)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/cgocall.go:306 +0x214 fp=0xc00006ef00 sp=0xc00006ee30 pc=0x15052a7f4
runtime.cgocallbackg(0x0?, 0x0?, 0x0?)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/cgocall.go:245 +0x109 fp=0xc00006ef90 sp=0xc00006ef00 pc=0x15052a549
runtime.cgocallbackg(0x150580620, 0x7ff7b685ff50, 0x0)
	<autogenerated>:1 +0x29 fp=0xc00006efb8 sp=0xc00006ef90 pc=0x15057e4e9
runtime.cgocallback(0x0, 0x0, 0x0)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/asm_amd64.s:1035 +0xcc fp=0xc00006efe0 sp=0xc00006efb8 pc=0x15057bfcc
runtime: g 17: unexpected return pc for runtime.cgocallback called from 0x1098211e1
stack: frame={sp:0xc00006efb8, fp:0xc00006efe0} stack=[0xc00006e000,0xc00006f000)
0x000000c00006eeb8:  0x000000015057a46e <runtime.exitsyscall+0x000000000000012e>  0x000000c000006680
0x000000c00006eec8:  0x0000000200000003  0x000000c000006680
0x000000c00006eed8:  0x0000000000000000  0x0000000000000000
0x000000c00006eee8:  0x00000001505a99e8  0x000000c00006ef80
0x000000c00006eef8:  0x000000015052a549 <runtime.cgocallbackg+0x0000000000000109>  0x0000000150580620 <_cgoexp_47f08e3a3bbd_fun+0x0000000000000000>
0x000000c00006ef08:  0x000000c00006efe0  0x0000000000000000
0x000000c00006ef18:  0x00000001098211e1  0x0000000000000000
0x000000c00006ef28:  0x0000000000000000  0x0000000000000000
0x000000c00006ef38:  0x0000000000000000  0x0000000000000000
0x000000c00006ef48:  0x0000000000000000  0x0000000000000000
0x000000c00006ef58:  0x000000c00006efe0  0x000000c000006680
0x000000c00006ef68:  0x000000c000060000  0x0000000150580620 <_cgoexp_47f08e3a3bbd_fun+0x0000000000000000>
0x000000c00006ef78:  0x00007ff7b685ff50  0x000000c00006efa8
0x000000c00006ef88:  0x000000015057e4e9 <runtime.cgocallbackg+0x0000000000000029>  0x0000000000000000
0x000000c00006ef98:  0x0000000000000000  0x0000000000000000
0x000000c00006efa8:  0x00007ff7b685fef0  0x000000015057bfcc <runtime.cgocallback+0x00000000000000cc>
0x000000c00006efb8: <0x0000000150580620 <_cgoexp_47f08e3a3bbd_fun+0x0000000000000000>  0x00007ff7b685ff50
0x000000c00006efc8:  0x0000000000000000  0x0000000000000000
0x000000c00006efd8:  0x00000001098211e1 >0x0000000000000000
0x000000c00006efe8:  0x0000000000000000  0x0000000000000000
0x000000c00006eff8:  0x0000000000000000

goroutine 1 [runnable, locked to thread]:
runtime.gcTrigger.test({0x0?, 0x0?, 0x0?})
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/mgc.go:569 +0xdc fp=0x1c00006ac70 sp=0x1c00006ac68 pc=0x15053947c
runtime.mallocgc(0x38, 0x15059f6e0, 0x1)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/malloc.go:1245 +0x75d fp=0x1c00006acd8 sp=0x1c00006ac70 pc=0x150531f1d
runtime.newobject(0x1c00005a590?)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/malloc.go:1328 +0x25 fp=0x1c00006ad00 sp=0x1c00006acd8 pc=0x150532145
syscall.nametomib({0x150582afe, 0x14})
	/usr/local/Cellar/go/1.21.6/libexec/src/syscall/syscall_darwin.go:50 +0x28 fp=0x1c00006ad60 sp=0x1c00006ad00 pc=0x15057fbe8
syscall.SysctlUint32({0x150582afe?, 0x1c00005a5f0?})
	/usr/local/Cellar/go/1.21.6/libexec/src/syscall/syscall_bsd.go:465 +0x1c fp=0x1c00006adb8 sp=0x1c00006ad60 pc=0x15057fb1c
syscall.adjustFileLimit(0x1c00006adf0)
	/usr/local/Cellar/go/1.21.6/libexec/src/syscall/rlimit_darwin.go:13 +0x25 fp=0x1c00006add8 sp=0x1c00006adb8 pc=0x15057fa05
syscall.init.0()
	/usr/local/Cellar/go/1.21.6/libexec/src/syscall/rlimit.go:37 +0x73 fp=0x1c00006ae10 sp=0x1c00006add8 pc=0x15057f9b3
runtime.doInit1(0x1505f8ee0)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/proc.go:6740 +0xd8 fp=0x1c00006af40 sp=0x1c00006ae10 pc=0x150562a38
runtime.doInit(...)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/proc.go:6707
runtime.main()
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/proc.go:249 +0x374 fp=0x1c00006afe0 sp=0x1c00006af40 pc=0x150555e54
runtime.goexit()
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/asm_amd64.s:1650 +0x1 fp=0x1c00006afe8 sp=0x1c00006afe0 pc=0x15057c1e1

goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/proc.go:398 +0xce fp=0x1c00005afa8 sp=0x1c00005af88 pc=0x1505561ee
runtime.goparkunlock(...)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/proc.go:404
runtime.forcegchelper()
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/proc.go:322 +0xb3 fp=0x1c00005afe0 sp=0x1c00005afa8 pc=0x150556073
runtime.goexit()
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/asm_amd64.s:1650 +0x1 fp=0x1c00005afe8 sp=0x1c00005afe0 pc=0x15057c1e1
created by runtime.init.6 in goroutine 1
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/proc.go:310 +0x1a

goroutine 3 [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/proc.go:398 +0xce fp=0x1c00005b778 sp=0x1c00005b758 pc=0x1505561ee
runtime.goparkunlock(...)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/proc.go:404
runtime.bgsweep(0x0?)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/mgcsweep.go:280 +0x94 fp=0x1c00005b7c8 sp=0x1c00005b778 pc=0x150543f34
runtime.gcenable.func1()
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/mgc.go:200 +0x25 fp=0x1c00005b7e0 sp=0x1c00005b7c8 pc=0x1505392c5
runtime.goexit()
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/asm_amd64.s:1650 +0x1 fp=0x1c00005b7e8 sp=0x1c00005b7e0 pc=0x15057c1e1
created by runtime.gcenable in goroutine 1
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/mgc.go:200 +0x66

goroutine 4 [GC scavenge wait]:
runtime.gopark(0x1c00007c000?, 0x1505988a8?, 0x1?, 0x0?, 0x1c000007520?)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/proc.go:398 +0xce fp=0x1c00005bf70 sp=0x1c00005bf50 pc=0x1505561ee
runtime.goparkunlock(...)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/proc.go:404
runtime.(*scavengerState).park(0x1505fa360)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/mgcscavenge.go:425 +0x49 fp=0x1c00005bfa0 sp=0x1c00005bf70 pc=0x1505417e9
runtime.bgscavenge(0x0?)
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/mgcscavenge.go:653 +0x3c fp=0x1c00005bfc8 sp=0x1c00005bfa0 pc=0x150541d7c
runtime.gcenable.func2()
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/mgc.go:201 +0x25 fp=0x1c00005bfe0 sp=0x1c00005bfc8 pc=0x150539265
runtime.goexit()
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/asm_amd64.s:1650 +0x1 fp=0x1c00005bfe8 sp=0x1c00005bfe0 pc=0x15057c1e1
created by runtime.gcenable in goroutine 1
	/usr/local/Cellar/go/1.21.6/libexec/src/runtime/mgc.go:201 +0xa5

What did you expect to see?

I expect to see no panics. Given the repro, I expect a simple . for each execution of the test program.

Various Go versions

Used goenv to try different versions on the same system (macOS 14.2.1) starting from 1.10.8.

1.10.8: fails to build, no messages

1.11.13, 1.12.17, 1.13.15, 1.14.15, 1.15.15, 1.16.15: build of the shared library fails with combining dwarf failed: Unknown load command 0x80000034 (2147483700)

1.16.15, 1.17.13, 1.18.10, 1.19.13, and 1.20.12: tested 1000 iterations, no panics

1.21.0, 1.21.1, 1.21.2, 1.21.3, 1.21.4, 1.21.5, 1.21.6: fail after a few iterations with fatal error: bad sweepgen in refill as above.

1.22-8db131082d: same as 1.21.x.

On Debian unstable, 1.21.5 works without issues. Did not try all the versions above but the test seems to pass on various distributions like Ubuntu, Debian, Alpine, RHEL, SLE. See https://gitlab.com/pygolo/py/-/blob/main/docs/TEST-MATRIX.md.

I will complete the map of success/failures at go-multi-c-shared.

@dmitshur dmitshur changed the title Panic when executing from multiple c-shared libraries runtime: panic when executing from multiple c-shared libraries; used to work in Go 1.16—1.20 Jan 10, 2024
@dmitshur dmitshur added this to the Backlog milestone Jan 10, 2024
@dmitshur dmitshur added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. compiler/runtime Issues related to the Go compiler and/or runtime. labels Jan 10, 2024
@dmitshur
Copy link
Contributor

CC @golang/runtime.

@cherrymui
Copy link
Member

Using multiple c-shared libraries in the same process is never really supported. Currently the c-shared library assumed it is the only copy of the Go runtime in the process. Unlike plugins, it doesn't try to see if there is any other Go runtime loaded in the process and integrate with them. Having multiple c-shared libraries in the same process might work in some simple cases, where each shared library mostly works in isolation. But if the program passes pointers around, weird things can happen.

You could try building them into a single c-shared library, or using plugins. Thanks.

@cherrymui cherrymui closed this as not planned Won't fix, can't repro, duplicate, stale Jan 10, 2024
@cavokz
Copy link
Author

cavokz commented Jan 10, 2024

Speaking with our use case in mind: writing Python extensions in Go.

The c-shared execution mode is at the base of Python extensions loading, we cannot build all the Go extensions in a single library. It's not even possible to know in advance which extensions would be loaded.

The plugin execution mode is available only to Go applications therefore it's not usable by the Python interpreter which is written C.

At this point I'd like to understand what plugins do that cannot be done by c-shared, what kind of 'passing pointers around' gets the runtime in trouble. Could you please elaborate?

Substantially, what can be done to support multiple c-shared libraries? What's the problem at the root that cannot be solved?

If this has been already discussed, please give me a pointer.

@cherrymui
Copy link
Member

The underlying problem is that currently each Go runtime assumes it has the complete information of all Go code in the process, and there is no other runtime or Go code exist in the same process. Specifically, say there are two c-shared libraries A and B. A doesn't know B exists, including all functions and types in B (that are not in A). So if the program has a call stack that have both functions from A and B on the same stack (say, by passing func values), A's runtime doesn't know how to unwind B's frames and the garbage collector doesn't know how to scan B's frames. Similarly, A's runtime doesn't know how to handle a type in B (e.g. for the garbage collector to scan it, if say an object from A pointing to an object from B). Also type identity may not be handled correctly.

For plugins at load time it takes extra steps to find exisiting Go runtime(s) and attach various metadata (tables) to the existing ones. For c-shared build mode, it currently doesn't do that. I think there is no fundamental reason that this couldn't be done. But it needs time to make it work.

Another possibility is probably make it possible to load plugins from a c-shared library, which may be less work. Then you can build one c-shared object to just do the plugin loading, and building other extensions as plugins.

@cherrymui
Copy link
Member

Also, currently, as the c-shared build mode is designed and implemented with the assumption that it is the only copy of Go, there is no care taken for unifying/deduplicating symbols from multiple copies. So if multiple c-shared libraries are loaded, currently whether a symbol of a function or global variable with the same name is deduplicated depends on the specific behavior of the system's dynamic linker. It may behave differently from one platform to another (e.g. on Linux and on macOS). Plugins are designed carefully with that in mind, so global variables are properly deduplicated.

@cavokz
Copy link
Author

cavokz commented Jan 10, 2024

The underlying problem is that currently each Go runtime assumes it has the complete information of all Go code in the process, and there is no other runtime or Go code exist in the same process. Specifically, say there are two c-shared libraries A and B. A doesn't know B exists, including all functions and types in B (that are not in A). So if the program has a call stack that have both functions from A and B on the same stack (say, by passing func values), A's runtime doesn't know how to unwind B's frames and the garbage collector doesn't know how to scan B's frames. Similarly, A's runtime doesn't know how to handle a type in B (e.g. for the garbage collector to scan it, if say an object from A pointing to an object from B). Also type identity may not be handled correctly.

This clearly points in the direction of keeping only a single runtime around. It also sounds reasonable for efficiency and resources consumption.

For plugins at load time it takes extra steps to find existing Go runtime(s) and attach various metadata (tables) to the existing ones. For c-shared build mode, it currently doesn't do that. I think there is no fundamental reason that this couldn't be done. But it needs time to make it work.

I like this approach, would you supervise/support me if I try it? I know nothing of Go internals and surely I don't have enough time but I already have some entries in the hall of shame so I've nothing to loose.

Another possibility is probably make it possible to load plugins from a c-shared library, which may be less work. Then you can build one c-shared object to just do the plugin loading, and building other extensions as plugins.

This is interesting but does not seem attractive, at least for my use case.

@cavokz
Copy link
Author

cavokz commented Jan 10, 2024

Also, currently, as the c-shared build mode is designed and implemented with the assumption that it is the only copy of Go, there is no care taken for unifying/deduplicating symbols from multiple copies. So if multiple c-shared libraries are loaded, currently whether a symbol of a function or global variable with the same name is deduplicated depends on the specific behavior of the system's dynamic linker. It may behave differently from one platform to another (e.g. on Linux and on macOS). Plugins are designed carefully with that in mind, so global variables are properly deduplicated.

This is quite cryptic, I think I already read it in some other similar issue.

Isn't it a generic problem of every shared library? It may well happen a clash with some other's library global symbol, there are no alternatives than leaving this in the hands of the library developer.

Are you maybe referring to global symbols of the runtime? Could you clarify?

In general it seems that plugins already solved most of the problems, I would try to adapt/reuse/generalize the solutions also to the c-shared case.

@cavokz
Copy link
Author

cavokz commented Jan 12, 2024

I modified the repro and indeed it gets in trouble also on Go 1.20.12.

Here fun wants to print something and then panic, call instead calls the C function passed as argument:

package main

// inline void call2(void *p)
// {
//     void (*f)(void) = p;
//     f();
// }
import "C"
import (
	"fmt"
	"unsafe"
)

func main() {
}

//export fun
func fun() {
	fmt.Println("fun!")
	panic("fun!")
}

//export call
func call(p unsafe.Pointer) {
	fmt.Printf("calling %p\n", p)
	C.call2(p)
}

The test now gets fun from lib1.so and passes it to call of lib2.so:

#include <assert.h>
#include <stdio.h>
#include <dlfcn.h>

typedef void (*fun)(void);
typedef void (*call)(void*);

int main(int argc, char* argv[])
{
	void *lib1 = dlopen("./lib1.so", RTLD_NOW);
	assert(lib1);

	void *lib2 = dlopen("./lib2.so", RTLD_NOW);
	assert(lib2);

	fun fun = dlsym(lib1, "fun");
	assert(fun);

	call call = dlsym(lib2, "call");
	assert(call);

	printf("fun: %p\n", fun);
	call(fun);

	dlclose(lib1);
	dlclose(lib2);
	return 0;
}

The result is this panic:

fun: 0x104b5ca60
calling 0x104b5ca60
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x104ade7bd]

goroutine 17 [running, locked to thread]:
runtime.throw({0x12c16bd7a?, 0x0?})
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/panic.go:1047 +0x5d fp=0x1c00006e810 sp=0x1c00006e7e0 pc=0x12c104add
runtime: g 17: unexpected return pc for runtime.sigpanic called from 0x104ade7bd
stack: frame={sp:0x1c00006e810, fp:0x1c00006e870} stack=[0x1c00006e000,0x1c00006f000)
0x000001c00006e710:  0x0000000000000000  0x0000000000000000
0x000001c00006e720:  0x0000000000000000  0x0000000000000000
0x000001c00006e730:  0x0000000000000000  0x0000000000000000
0x000001c00006e740:  0x0000000000000000  0x0000000000000000
0x000001c00006e750:  0x0000000000000000  0x0000000000000000
0x000001c00006e760:  0x0000000000000000  0x0000000000000000
0x000001c00006e770:  0x0000000000000000  0x0000000000000000
0x000001c00006e780:  0x0000000000000000  0x0000000000000000
0x000001c00006e790:  0x000000012c13080e <runtime.systemstack+0x000000000000002e>  0x000000012c104e2c <runtime.fatalthrow+0x000000000000006c>
0x000001c00006e7a0:  0x000001c00006e7b0  0x000001c000006680
0x000001c00006e7b0:  0x000000012c104e60 <runtime.fatalthrow.func1+0x0000000000000000>  0x000001c000006680
0x000001c00006e7c0:  0x000000012c104add <runtime.throw+0x000000000000005d>  0x000001c00006e7e0
0x000001c00006e7d0:  0x000001c00006e800  0x000000012c104add <runtime.throw+0x000000000000005d>
0x000001c00006e7e0:  0x000001c00006e7e8  0x000000012c104b00 <runtime.throw.func1+0x0000000000000000>
0x000001c00006e7f0:  0x000000012c16bd7a  0x000000000000002a
0x000001c00006e800:  0x000001c00006e860  0x000000012c1195e9 <runtime.sigpanic+0x00000000000003e9>
0x000001c00006e810: <0x000000012c16bd7a  0x0000000000000000
0x000001c00006e820:  0x0000000000000000  0x0000000000000000
0x000001c00006e830:  0x0000000000000000  0x0000000000000000
0x000001c00006e840:  0x000001c000006680  0x0000000000000000
0x000001c00006e850:  0x0000000000000000  0x0000000000000000
0x000001c00006e860:  0x000001c00006e880 !0x0000000104ade7bd
0x000001c00006e870: >0x0000000000000000  0x0000000000000000
0x000001c00006e880:  0x000001c00006e938  0x0000000104adefff
0x000001c00006e890:  0x0000000000000000  0x0000000000000000
0x000001c00006e8a0:  0x0000000000000000  0x0000000000000000
0x000001c00006e8b0:  0x0000000000000000  0x0000000000000000
0x000001c00006e8c0:  0x0000000000000040  0x0000000000000034
0x000001c00006e8d0:  0x0000000000000003  0x0000000000000000
0x000001c00006e8e0:  0x000e000e000e000e  0x0000000000000000
0x000001c00006e8f0:  0x0000000000000000  0x0000000000000000
0x000001c00006e900:  0x000001c00006e968  0x0000000104ad83ca
0x000001c00006e910:  0x0000000000000000  0x0000000000000000
0x000001c00006e920:  0x0000000000000000  0x0000000000000000
0x000001c00006e930:  0x0000000000000000  0x000001c00006e9a0
0x000001c00006e940:  0x0000000104ad8387  0x000001c0001ae800
0x000001c00006e950:  0x0000000000000800  0x0000000000000800
0x000001c00006e960:  0x0000000104b8bd80  0x000000c00006e9c8
runtime.sigpanic()
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/signal_unix.go:825 +0x3e9 fp=0x1c00006e870 sp=0x1c00006e810 pc=0x12c1195e9

goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/proc.go:381 +0xd6 fp=0x1c00005cfb0 sp=0x1c00005cf90 pc=0x12c1077b6
runtime.goparkunlock(...)
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/proc.go:387
runtime.forcegchelper()
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/proc.go:305 +0xb0 fp=0x1c00005cfe0 sp=0x1c00005cfb0 pc=0x12c1075f0
runtime.goexit()
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/asm_amd64.s:1598 +0x1 fp=0x1c00005cfe8 sp=0x1c00005cfe0 pc=0x12c132a01
created by runtime.init.6
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/proc.go:293 +0x25

goroutine 3 [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/proc.go:381 +0xd6 fp=0x1c00005d780 sp=0x1c00005d760 pc=0x12c1077b6
runtime.goparkunlock(...)
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/proc.go:387
runtime.bgsweep(0x0?)
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/mgcsweep.go:278 +0x8e fp=0x1c00005d7c8 sp=0x1c00005d780 pc=0x12c0f4b6e
runtime.gcenable.func1()
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/mgc.go:178 +0x26 fp=0x1c00005d7e0 sp=0x1c00005d7c8 pc=0x12c0e9fe6
runtime.goexit()
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/asm_amd64.s:1598 +0x1 fp=0x1c00005d7e8 sp=0x1c00005d7e0 pc=0x12c132a01
created by runtime.gcenable
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/mgc.go:178 +0x6b

goroutine 4 [GC scavenge wait]:
runtime.gopark(0x1c00007c000?, 0x12c183238?, 0x1?, 0x0?, 0x0?)
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/proc.go:381 +0xd6 fp=0x1c00005df70 sp=0x1c00005df50 pc=0x12c1077b6
runtime.goparkunlock(...)
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/proc.go:387
runtime.(*scavengerState).park(0x12c21d4c0)
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/mgcscavenge.go:400 +0x53 fp=0x1c00005dfa0 sp=0x1c00005df70 pc=0x12c0f2a53
runtime.bgscavenge(0x0?)
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/mgcscavenge.go:628 +0x45 fp=0x1c00005dfc8 sp=0x1c00005dfa0 pc=0x12c0f3025
runtime.gcenable.func2()
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/mgc.go:179 +0x26 fp=0x1c00005dfe0 sp=0x1c00005dfc8 pc=0x12c0e9f86
runtime.goexit()
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/asm_amd64.s:1598 +0x1 fp=0x1c00005dfe8 sp=0x1c00005dfe0 pc=0x12c132a01
created by runtime.gcenable
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/mgc.go:179 +0xaa

goroutine 19 [finalizer wait]:
runtime.gopark(0x1a0?, 0x12c21d900?, 0xa0?, 0x61?, 0x1c00005c770?)
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/proc.go:381 +0xd6 fp=0x1c00005c628 sp=0x1c00005c608 pc=0x12c1077b6
runtime.runfinq()
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/mfinal.go:193 +0x107 fp=0x1c00005c7e0 sp=0x1c00005c628 pc=0x12c0e9027
runtime.goexit()
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/asm_amd64.s:1598 +0x1 fp=0x1c00005c7e8 sp=0x1c00005c7e0 pc=0x12c132a01
created by runtime.createfing
	/Users/cavok/.goenv/versions/1.20.12/src/runtime/mfinal.go:163 +0x45

If test is modified to take both call and fun from lib1.so (or lib2.so), the output is much nicer:

fun: 0x10aa86a60
calling 0x10aa86a60
fun!
panic: fun!

goroutine 17 [running, locked to thread]:
main.fun(...)
	/Users/cavok/devel/go-multi-c-shared.git/lib.go:20
main._Cfunc_call2(0x10aa86a60)
	_cgo_gotypes.go:39 +0x45
main.call.func1(0x10aac10f8?)
	/Users/cavok/devel/go-multi-c-shared.git/lib.go:26 +0x3a
main.call(0x10aa86a60)
	/Users/cavok/devel/go-multi-c-shared.git/lib.go:26 +0x67

What I don't get is that in all (except Go 1.10.4 on Ubuntu 18.04, the only 1.10.x in the batch) of the Pygolo pipelines, which include also some macOS versions (??), the runtime does not panic in any of the 1000 iterations. fun is correctly executed and indeed it panics as expected with fun!.

What happens is that call (from lib2.so) is totally absent from the stack traces. In another round where fun and call both come from lib1.so I see the nicer stack trace (also here Ubuntu 18.04 fails with runtime: address space conflict, let's ignore it).

So, despite that the cross c-shared function invocation seems to work, I see the holes as you described above.

@cherrymui, what should I add to this test in order to have a complete picture of what needs to be implemented?

You mentioned types identity but Go types do not cross the C API barrier. Even when the underlying data of a void * pointer is actually a Go value, if it crosses the C API it eventually needs to be casted and from that point on Go can only trust the developer in doing correct casts.

@cavokz cavokz changed the title runtime: panic when executing from multiple c-shared libraries; used to work in Go 1.16—1.20 runtime: panic when executing from multiple c-shared libraries Jan 12, 2024
@fzwoch
Copy link

fzwoch commented Aug 18, 2024

I would really like to see a least some limited support for such a use case.

I'm often considering writing plugins (plugin in the sense of a c-shared library) for things that use dynamic loading of c-shared libraries. While I would often favor Go for that the pick-at-max-one-plugin-written-in-Go limitation makes me rule out using Go completely as I have no control over what other plugins may be run within the process.

I would not expect the two libraries communicating with each other directly, but co-existing in some way - even them having to requiring the same runtime version across all would be something to work with.

@ianlancetaylor
Copy link
Member

@fzwoch I don't think anybody is opposed to this. It just doesn't work with the current implementation approach. Nobody has come up with an alternate implementation approach, and any such approach would take considerable implementation effort.

@cavokz
Copy link
Author

cavokz commented Aug 24, 2024

I'd be quite happy if there was a way to detect that a runtime is already running so that troublesome situations could be handled in a controlled way.

In my case, where the library is actually a Python extension, it would be nice to set up a Python exception and abort the installation of the module into the interpreter.

This would be my very first step, without the aim of solving the full problem but with that in prospective. However I won't be able to focus on this before 3-6 months.

Would be an hacky PR ok to start a practical conversation or is there a more formal process?

@cherrymui
Copy link
Member

@cavokz are you suggesting that if a Go runtime is already loaded, at the second one we just fail out loud? That is probably doable. But I'm worried that there are existing use cases that relies on loading multiple Go shared objects and using them in a careful and isolated way, which will break if we add a check. If we do that, we'll need a way to fall back (an environment variable, for example).

@cavokz
Copy link
Author

cavokz commented Aug 26, 2024

@cherrymui I'd prefer the library to be in charge of such check so that it can handle/report the failure according to its entry point protocol. Such check could be later expanded to fail only if the two runtimes have different versions, in case mutliple c-shared libraries become supported.

If it's all done by the runtime startup then how can it do anything aside panicking? I expect that dlopen has long returned a success before the runtime startup has any opportunity to check for any existing selves. If panic is the only option I think it's way better than just letting the situation collapse later with some apparently unrelated crash.

Is there any support for "loading multiple Go shared objects and using them in a careful and isolated way" that is not mentioned in the Go Execution Modes?

@cherrymui
Copy link
Member

Is there any support for "loading multiple Go shared objects and using them in a careful and isolated way" that is not mentioned in the Go Execution Modes?

No. Technically this is not supported. But it doesn't mean that there aren't programs that do this and "happen to work".

@antoinebj
Copy link

What about using the c-archive build mode (with static linking)? Would the same problem happen when mixing a single c-shared library with a single c-archive library? Or two c-archive libraries?

@cavokz
Copy link
Author

cavokz commented Oct 9, 2024

What about using the c-archive build mode (with static linking)? Would the same problem happen when mixing a single c-shared library with a single c-archive library? Or two c-archive libraries?

That's not an option for the Python modules use case.

@antoinebj
Copy link

What about using the c-archive build mode (with static linking)? Would the same problem happen when mixing a single c-shared library with a single c-archive library? Or two c-archive libraries?

That's not an option for the Python modules use case.

From my understanding the problem of runtime clash with multiple c-shared libraries is not Python-specific, so my question was general, for any tech stack where such a configuration is possible.

@cherrymui
Copy link
Member

Would the same problem happen when mixing a single c-shared library with a single c-archive library? Or two c-archive libraries?

Yes, at least theoretically, the same problem would arise, and this is not really supported currently. In practice it may be more likely or less likely to fail, depending on how the C static and dynamic linkers resolve and deduplicate symbols.

@zodiac1214
Copy link

I am in the same boat but slightly different situation. I only need to support 3-4 "plugins". So I am wondering, if I intentionally compile each of them with different go version would that work? My hope is that each of them would use their own runtime?

@asottile
Copy link
Contributor

I think this should be reopened -- after bisection this appears to have regressed in c426c87 but only on macos (and as far as I can tell only on x86_64 macos?)

I used the reproducer in this issue and bisected using:

#!/usr/bin/env bash
# ../bisect.sh
set -euxo pipefail

git clean -fxfd
cd src && bash make.bash

cd ..
export "PATH=$PWD/bin:$PATH"

cd ~/workspace/cgo-repro
make clean
make

and

git bisect start
git bisect good go1.20.12
git bisect bad go1.21.5
git bisect run ../bisect.sh

I suspect there's maybe a bug in the cgo/asm*.s files where a symbol is introduced incorrectly (however I am not at all an expert on platform-specific assembly -- purely a guess based on the commit contents)

the usecase above (as python modules) should not be running into symbol collisions -- python's import system loads modules by default using RTLD_NOW (which, despite being unspecified, on every platform I have access to also implicitly has RLTD_LOCAL) which I believe should mean that each shared object will not clobber each others' symbols in the global namespace (at least that's how it works in general for python extension modules -- many modules can link unrelated versions of say openssl and not care about each other as long as they're not trying to pass pointers between each other). the reproducer in this issue doesn't try and cross the streams either -- it just calls some noop function so it shouldn't care (the two go extensions shouldn't cross communicate any data)

@ianlancetaylor
Copy link
Member

I agree that https://go.dev/cl/495855 is going to make it even harder to support multiple c-shared libraries. But I'm skeptical that it ever worked in general although it may well have worked for your case.

CC @cherrymui

@cherrymui
Copy link
Member

As Ian mentioned, this never works in general. It is possible that some specific case happened to work, but it is not supported and subject to change. If reopening, it would be a feature request. Thanks.

@asottile
Copy link
Contributor

I'm sorry but I think you're maybe misunderstanding symbol relocation and shared objects

build-shared is completely useless if you can't link multiple shared libraries simply because they are written in go

this used to work for years perfectly and was broken by the patch above which as far as I can tell is just a "performance hack"

@ianlancetaylor
Copy link
Member

@asottile It's not only a matter of symbol relocation and shared objects. The Go runtime also expects to have control over process-wide concepts like signal handlers and thread-local storage at certain offsets. The Go team is unfortunately unable to commit to making everything work when there are multiple Go shared libraries in an executable process. I completely understand that this is frustrating. Our resources are limited. We can review patches to make these cases work better, but we can't promise to fix them ourselves.

@asottile
Copy link
Contributor

I don't understand why a very clear regression is not being taken seriously.

@ianlancetaylor
Copy link
Member

I'm sorry about the trouble. I don't know what to say except that we never intended to support that use case and we didn't know that it worked. The change that appears to have broken it is a significant performance benefit to programs that use cgo and don't use shared libraries at all. As far as I can tell the breakage is indeed due to conflicting signal handlers, a case that will always be problematic when using multiple shared libraries.

@hajimehoshi
Copy link
Member

Using multiple c-shared libraries in the same process is never really supported.

Just out of curiosity, what about WASI? Isn't there guarantee that multiple WASI libraries made in Go work in one process?

#65199

@cherrymui
Copy link
Member

Isn't there guarantee that multiple WASI libraries made in Go work in one process?

It works if each WASI library runs on separate instances of Wasm linear memory. If you use a shared buffer for the backing store of the linear memory for multiple WASI library instances (I don't know if this is possible, but even if it is), it would not work.

@cherrymui
Copy link
Member

For c-shared mode on other platforms (ELF, Mach-O, etc.), we have a mechanism that supports multiple copies of the runtime, the plugin mode. In theory I think we can apply a similar mechanism to the c-shared mode. At startup, it would detect if there is an existing instance of the runtime, and if so, add itself as a new runtime module, which includes combining certain tables and deduplicating certain things (beyond what the system dynamic linker does). It would need to be built with this assumption, instead of assuming it is the only copy, and also use proper relocations and symbol visibilities that are compatible with this. I think this is probably possible, but need quite some work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

9 participants