Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/cgo: TestCrossPackageTests fails on musl (Alpine Linux Edge) #39857

Open
nmeum opened this issue Jun 25, 2020 · 12 comments
Open

cmd/cgo: TestCrossPackageTests fails on musl (Alpine Linux Edge) #39857

nmeum opened this issue Jun 25, 2020 · 12 comments
Milestone

Comments

@nmeum
Copy link

@nmeum nmeum commented Jun 25, 2020

This is a follow up to #39343 where I already briefly mentioned this problem. This issue is probably related to musl libc I can reliably reproduce it on Alpine Linux which uses musl.

What version of Go are you using (go version)?

$ go version
go version go1.14.3 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/soeren/.cache/go-build"
GOENV="/home/soeren/.config/go/env"
GOEXE=""
GOFLAGS="-buildmode=pie"
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/soeren/src/go"
GOPRIVATE=""
GOPROXY="direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build802004994=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Started the TestCrossPackageTests from misc/cgo/test/pkg_test.go:

misc/cgo/test$ go test -run TestCrossPackageTests

What did you expect to see?

A successful test run.

What did you see instead?

An error message:

--- FAIL: TestCrossPackageTests (1.95s)
    pkg_test.go:67: go test: exit status 1
        --- FAIL: Test9400 (0.00s)
            issue9400_linux.go:55: entry 804 of test pattern is wrong; 0x7fc60b3b0cf4 != 0x123456789abcdef
        FAIL
        exit status 1
        FAIL	cgotest	0.005s
FAIL
exit status 1
FAIL	misc/cgo/test	1.958s
@cagedmantis cagedmantis changed the title TestCrossPackageTests fails on musl (Alpine Linux Edge) cmd/go: TestCrossPackageTests fails on musl (Alpine Linux Edge) Jun 29, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Jun 29, 2020
@jayconrod jayconrod changed the title cmd/go: TestCrossPackageTests fails on musl (Alpine Linux Edge) cmd/cgo: TestCrossPackageTests fails on musl (Alpine Linux Edge) Jul 24, 2020
@oflebbe
Copy link

@oflebbe oflebbe commented Aug 8, 2020

I didn't run any of the tests, but read a bunch of comments, since this seems an interesting problem...

As far as I understood: test9400 is checking if the handling of setxid (i.e. setuid, setgid, ...) class of system calls may smash the go stack, as setxid() is implemented by sending signals to all threads to fullfill POSIX requirements. (see linux man setgid)

In order to prevent stack overrun by signals one usually creates an alternate signal stack and providing the SA_ONSTACK while installing signal handlers, to use the alternate stack. This way one cannot overrun the original stack.

glibc doesn't set it, but it installs a signal handler for SIGSETXID at startup in nptl-init.c . In PR#9400 therefore it is possible to enumerate all signal handlers and add a missing SA_ONSTACK flag, fixing the issue on glibc.

musl doesn't implement an alternate stack and SA_ONSTACK for their internal signal implementation of setxid either. This is actually confirmed by @richfelker in a somewhat related issue #19938 (comment) . Unfortunately the fix of #9400 doesn't apply to musl, since the signal handler is dynamically installed when setxid is called by the __synccall() function in src/thread/synccall.c of musl .

I would vote for adding SA_ONSTACK to musl's __synccall implementation.

@richfelker
Copy link

@richfelker richfelker commented Aug 8, 2020

There's a thread from 2019 on this topic: sigaltstack for implementation-internal signals? that never reached a conclusion. Basically I'm unclear whether it's arguably conforming for the implementation to use the alternate signal stack for implementation-internal signals, since it may have observable side effects on the application in the absence of any signals/signal-handlers setup to run on the alternate stack.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 10, 2020

If musl doesn't use SA_ONSTACK for the signal handler that it installs, that won't work with any program that uses sigaltstack.

I don't see any way to fix this in Go. I don't see what change we could make that would make things work better.

@richfelker
Copy link

@richfelker richfelker commented Aug 10, 2020

@ianlancetaylor I don't follow how it "doesn't work with any program that uses sigaltstack". It just doesn't work with any program that has a severely undersized stack, which is a known constraint. But it would be nice to be able to use the alt stack when it's available, assuming it's more likely to always have sufficient space for the signal handler to run.

@richfelker
Copy link

@richfelker richfelker commented Aug 10, 2020

Note that I've reopened the topic on the musl list: https://www.openwall.com/lists/musl/2020/08/09/1

@oflebbe
Copy link

@oflebbe oflebbe commented Aug 10, 2020

Hi @richfelker , I tested: It is sufficient to have a patch like this on musl
PATCH.txt
to resolve. This will fix go, and will not harm musl.

If there is an alternate stack, it will use it. Go does create an alternate stack.
If there is no alternate stack, kernel will ignore SA_ONSTACK. That's it.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 10, 2020

@richfelker I'm assuming that a program that calls sigaltstack does so for some good reason. I'm not sure why a program that calls sigaltstack would want to receive signals on the normal stack.

@richfelker
Copy link

@richfelker richfelker commented Aug 10, 2020

Nobody said anything about wanting to receive signals on the normal stack. From the relevant perspective these aren't signals. They are asynchronous use of the alt signal stack by the implementation in a way the application isn't and can't be aware of.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 10, 2020

That is a valid perspective.

But it also a valid perspective for a program to say "I am in control of my stack, and do not use my stack for any unexpected purpose. In particular, don't use it to catch signals."

In any case I'm not sure there is anything we can do here in the Go standard library. If musl decides not to change, then as far as I can see code like this can't work on musl. So perhaps we should close this issue.

@richfelker
Copy link

@richfelker richfelker commented Aug 11, 2020

@ianlancetaylor: I have been wanting to change this for a while (see the 2019 thread), but I'm making sure we actually consider the consequences of such a change and whether they break anything that someone can reasonably expect to work. (My leaning is that they don't, but I like to explore this kind of thing thoroughly since making hasty decisions has bitten us in the past.) The point of my bringing these things up is not to argue against the change, but to make sure it's well-supported when (technically if, but most likely when) it's made.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 11, 2020

Understood.

(I suppose musl could also change to act as glibc does. Is there an advantage to only installing the signal handler when a relevant libc function is called?)

@richfelker
Copy link

@richfelker richfelker commented Aug 11, 2020

Yes, it avoids syscall spam (strace) and wasted time in processes (the vast, vast majority) that don't need the handler. And I don't see how the glibc behavior makes it any easier unless you're poking at implementation internals which are not a stable interface. The signal numbers used for these internal signals are not a public interface, and they're not even pokable via public interfaces (as far as the public interfaces are concerned, the reserved signal numbers simply are not existant signals). The only way you can poke at them is via directly making syscalls, and this will break if signal handling is ever wrapped (which has been considered at times, but turned out we could always get by without it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants