Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: musl uses signal 34 for setgid #39343

Open
nmeum opened this issue Jun 1, 2020 · 12 comments
Open

runtime: musl uses signal 34 for setgid #39343

nmeum opened this issue Jun 1, 2020 · 12 comments
Milestone

Comments

@nmeum
Copy link

@nmeum nmeum commented Jun 1, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.3 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/soeren/.cache/go-build"
GOENV="/home/soeren/.config/go/env"
GOEXE=""
GOFLAGS="-buildmode=pie"
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/soeren/src/go"
GOPRIVATE=""
GOPROXY="direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build010166581=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Started the go test suite using:

./run.bash -no-rebuild -run='cgo_test'

What did you expect to see?

A successful test run.

What did you see instead?

A failed test run:

##### ../misc/cgo/test
SIGQUIT: quit
PC=0x46e261 m=0 sigcode=0

goroutine 0 [idle]:
runtime.futex(0x7409e8, 0x80, 0x0, 0x0, 0x7fff00000000, 0x7fffd799a608, 0x440ef3, 0xc000034848, 0x7fffd799a628, 0x41249f, ...)
	/home/soeren/src/aports/community/go/src/go/src/runtime/sys_linux_amd64.s:567 +0x21
runtime.futexsleep(0x7409e8, 0x7fff00000000, 0xffffffffffffffff)
	/home/soeren/src/aports/community/go/src/go/src/runtime/os_linux.go:45 +0x46
runtime.notesleep(0x7409e8)
	/home/soeren/src/aports/community/go/src/go/src/runtime/lock_futex.go:151 +0x9f
runtime.stoplockedm()
	/home/soeren/src/aports/community/go/src/go/src/runtime/proc.go:1971 +0x88
runtime.schedule()
	/home/soeren/src/aports/community/go/src/go/src/runtime/proc.go:2454 +0x4a6
runtime.park_m(0xc000082600)
	/home/soeren/src/aports/community/go/src/go/src/runtime/proc.go:2690 +0x9d
runtime.mcall(0x5)
	/home/soeren/src/aports/community/go/src/go/src/runtime/asm_amd64.s:318 +0x5b

goroutine 1 [chan receive]:
testing.(*T).Run(0xc0000a4120, 0x5cd2ae, 0xa, 0x5d8420, 0x486c26)
	/home/soeren/src/aports/community/go/src/go/src/testing/testing.go:1043 +0x37e
testing.runTests.func1(0xc0000a4000)
	/home/soeren/src/aports/community/go/src/go/src/testing/testing.go:1284 +0x78
testing.tRunner(0xc0000a4000, 0xc000043e10)
	/home/soeren/src/aports/community/go/src/go/src/testing/testing.go:991 +0xdc
testing.runTests(0xc0000a0000, 0x726e20, 0x59, 0x59, 0x0)
	/home/soeren/src/aports/community/go/src/go/src/testing/testing.go:1282 +0x2a7
testing.(*M).Run(0xc00009a000, 0x0)
	/home/soeren/src/aports/community/go/src/go/src/testing/testing.go:1199 +0x15f
main.main()
	_testmain.go:226 +0x135

goroutine 18 [select]:
misc/cgo/test.runTestSetgid(0xc0000a8180)
	/home/soeren/src/aports/community/go/src/go/misc/cgo/test/setgid_linux.go:30 +0xeb
misc/cgo/test.testSetgid(0xc0000a4120)
	/home/soeren/src/aports/community/go/src/go/misc/cgo/test/setgid_linux.go:46 +0x98
misc/cgo/test.TestSetgid(0xc0000a4120)
	/home/soeren/src/aports/community/go/src/go/misc/cgo/test/cgo_linux_test.go:16 +0x2b
testing.tRunner(0xc0000a4120, 0x5d8420)
	/home/soeren/src/aports/community/go/src/go/src/testing/testing.go:991 +0xdc
created by testing.(*T).Run
	/home/soeren/src/aports/community/go/src/go/src/testing/testing.go:1042 +0x357

goroutine 20 [runnable]:
os/signal.loop()
	/home/soeren/src/aports/community/go/src/go/src/os/signal/signal_unix.go:21
created by os/signal.Notify.func1
	/home/soeren/src/aports/community/go/src/go/src/os/signal/signal.go:127 +0x44

goroutine 22 [syscall]:
misc/cgo/test._Cfunc_setgid(0x0, 0x0)
	_cgo_gotypes.go:2114 +0x49
misc/cgo/test.runTestSetgid.func1(0xc0000a2360)
	/home/soeren/src/aports/community/go/src/go/misc/cgo/test/setgid_linux.go:27 +0x29
created by misc/cgo/test.runTestSetgid
	/home/soeren/src/aports/community/go/src/go/misc/cgo/test/setgid_linux.go:26 +0x6a

rax    0xca
rbx    0x7408a0
rcx    0x46f273
rdx    0x0
rdi    0x7409e8
rsi    0x80
rbp    0x7fffd799a5f0
rsp    0x7fffd799a5a8
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0xc0000a2300
r13    0xc0000a2300
r14    0xc0000a22a0
r15    0x0
rip    0x46f271
rflags 0x286
cs     0x33
fs     0x0
gs     0x0
*** Test killed with quit: ran too long (11m0s).
exit status 2
FAIL	misc/cgo/test	660.005s
2020/06/01 16:22:03 Failed: exit status 1
PASS
ok  	misc/cgo/testtls	0.009s
skipped due to earlier error
skipped due to earlier error
skipped due to earlier error
skipped due to earlier error
skipped due to earlier error
skipped due to earlier error
skipped due to earlier error
skipped due to earlier error
skipped due to earlier error
skipped due to earlier error
go tool dist: FAILED

Not sure if this is just a timeout problem on my hardware (i.e. if the timeout needs to be increased) or if this is some kind of deadlock. Any ideas how I could debug this further? I am unfourtunatly not farmilar with go internals but currently trying to make the entire go testsuite pass with our Alpine Linux package and this is the only remaining test which keeps failing.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 1, 2020

According to the stack traces, it looks like the test calls the C function setgid and setgid never returns. Why would that happen?

Separately, it looks like select statement that has a 5 second timeout never times out in 11 minutes. That doesn't make any sense at all.

You can put this in a separate program by creating two files in an empty directory:

x.go

package cgotest

/*
#include <sys/types.h>
#include <unistd.h>
*/
import "C"

import (
	"os"
	"os/signal"
	"syscall"
	"testing"
	"time"
)

func runTestSetgid() bool {
	c := make(chan bool)
	go func() {
		C.setgid(0)
		c <- true
	}()
	select {
	case <-c:
		return true
	case <-time.After(5 * time.Second):
		return false
	}

}

func testSetgid(t *testing.T) {
	if !runTestSetgid() {
		t.Error("setgid hung")
	}

	// Now try it again after using signal.Notify.
	signal.Notify(make(chan os.Signal, 1), syscall.SIGINT)
	if !runTestSetgid() {
		t.Error("setgid hung after signal.Notify")
	}
}

x_test.go:

package cgotest

import "testing"

func TestSetgid(t *testing.T) {
	testSetgid(t)
}

What happens when you run go test in that directory?

@nmeum
Copy link
Author

@nmeum nmeum commented Jun 1, 2020

What happens when you run go test in that directory?

Doesn't return, outputs nothing at all.

Output with SIGQUIT:

SIGQUIT: quit
PC=0x558020e66791 m=0 sigcode=0

goroutine 0 [idle]:
runtime.futex(0x558021046dc8, 0x80, 0x0, 0x0, 0x7ffe00000000, 0x7ffe41a0ab18, 0x558020e38655, 0xc000032bc8, 0x7ffe41a0ab38, 0x558020e0a273, ...)
	/usr/lib/go/src/runtime/sys_linux_amd64.s:567 +0x21
runtime.futexsleep(0x558021046dc8, 0x7ffe00000000, 0xffffffffffffffff)
	/usr/lib/go/src/runtime/os_linux.go:45 +0x46
runtime.notesleep(0x558021046dc8)
	/usr/lib/go/src/runtime/lock_futex.go:151 +0xa3
runtime.stoplockedm()
	/usr/lib/go/src/runtime/proc.go:1971 +0x8c
runtime.schedule()
	/usr/lib/go/src/runtime/proc.go:2454 +0x4aa
runtime.park_m(0xc000001c80)
	/usr/lib/go/src/runtime/proc.go:2690 +0xa3
runtime.mcall(0x2)
	/usr/lib/go/src/runtime/asm_amd64.s:318 +0x53

goroutine 1 [chan receive]:
testing.(*T).Run(0xc000110120, 0x558020f010f5, 0xa, 0x558020f70ad8, 0x558020e789a8)
	/usr/lib/go/src/testing/testing.go:1043 +0x380
testing.runTests.func1(0xc000110000)
	/usr/lib/go/src/testing/testing.go:1284 +0x7a
testing.tRunner(0xc000110000, 0xc000041e10)
	/usr/lib/go/src/testing/testing.go:991 +0xde
testing.runTests(0xc00000e0c0, 0x55802102caa0, 0x1, 0x1, 0x0)
	/usr/lib/go/src/testing/testing.go:1282 +0x2a9
testing.(*M).Run(0xc00010e000, 0x0)
	/usr/lib/go/src/testing/testing.go:1199 +0x161
main.main()
	_testmain.go:44 +0x137

goroutine 6 [select]:
_/tmp/test.runTestSetgid(0xc000058300)
	/tmp/test/x.go:23 +0xed
_/tmp/test.testSetgid(0xc000110120)
	/tmp/test/x.go:39 +0x9a
_/tmp/test.TestSetgid(0xc000110120)
	/tmp/test/x_test.go:6 +0x2d
testing.tRunner(0xc000110120, 0x558020f70ad8)
	/usr/lib/go/src/testing/testing.go:991 +0xde
created by testing.(*T).Run
	/usr/lib/go/src/testing/testing.go:1042 +0x359

goroutine 8 [runnable]:
os/signal.loop()
	/usr/lib/go/src/os/signal/signal_unix.go:21
created by os/signal.Notify.func1
	/usr/lib/go/src/os/signal/signal.go:127 +0x46

goroutine 10 [syscall]:
_/tmp/test._Cfunc_setgid(0x0, 0x0)
	_cgo_gotypes.go:47 +0x4b
_/tmp/test.runTestSetgid.func1(0xc000064420)
	/tmp/test/x.go:20 +0x2b
created by _/tmp/test.runTestSetgid
	/tmp/test/x.go:19 +0x6c

rax    0xca
rbx    0x558021046c80
rcx    0x558020e66793
rdx    0x0
rdi    0x558021046dc8
rsi    0x80
rbp    0x7ffe41a0ab00
rsp    0x7ffe41a0aab8
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0xc0000643c0
r13    0xc0000643c0
r14    0xc000064360
r15    0x39
rip    0x558020e66791
rflags 0x286
cs     0x33
fs     0x0
gs     0x0
exit status 2
FAIL	_/tmp/test	34.767s

Not sure if helpful but here is a strace -f output: strace-go-test.txt

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 1, 2020

Thanks for the strace output. What would be really helpful is to run

go test -c
strace -f ./x.test

Right now the strace output is mixing in all the go test system calls. Thanks.

@nmeum
Copy link
Author

@nmeum nmeum commented Jun 2, 2020

While doing that I noticed that this is definitely flaky sometimes the program terminates and sometimes (most of the times) it doesn't here is the strace -f ./test.test output of both runs:

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 2, 2020

Thanks. Looks like the C library is using signal 34 to implement setgid, perhaps in the way that glibc uses signal 33. That is fairly annoying.

I see it here in the musl sources: https://git.musl-libc.org/cgit/musl/tree/src/internal/pthread_impl.h#n117

The fix will be to change runtime/sigtab_linux_generic.go and runtime/sigtab_linux_mipsx.go to handle signal 34 the way we handle signals 32 and 33.

That will have a chance of breaking some existing programs, so marking this for 1.16.

@ianlancetaylor ianlancetaylor added this to the Go1.16 milestone Jun 2, 2020
@ianlancetaylor ianlancetaylor changed the title cgo_test fails on Alpine Linux Edge runtime: musl uses signal 34 for setgid Jun 2, 2020
algitbot pushed a commit to alpinelinux/aports that referenced this issue Jun 3, 2020
@GeorgeTsilias
Copy link

@GeorgeTsilias GeorgeTsilias commented Jun 3, 2020

Hello,
I'm looking to start contributing in Go and after your investigation this seems like a straightforward issue to start with.
Is it okay with you if I try to open a PR and get it fixed?
I'll get a standalone reproduction in an Alpine container, ensure that the fix works as intended and see if I can add any tests.
Thanks in advance!

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 4, 2020

@GeorgeTsilias Sure, go for it. Note that at this stage in the release process any change will not appear until the 1.16 release. Thanks.

@gopherbot
Copy link

@gopherbot gopherbot commented Jun 4, 2020

Change https://golang.org/cl/236518 mentions this issue: runtime: handle signal 34 for musl setgid

@fabled
Copy link

@fabled fabled commented Jun 20, 2020

This was reported earlier in musl list, and their comment is at https://www.openwall.com/lists/musl/2018/10/09/3. The suggestion there was to use SIGRTMIN which resolves to a function call to check what is the range for libc reserved signals.

@fabled
Copy link

@fabled fabled commented Jun 20, 2020

With the suggested patch, I don't see the hangs anymore. And most of the time things work. But on my application I also see sometimes gopanic with backtrace:

go version go1.14.3 linux/amd64

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x55813a8 pc=0x465b65]

goroutine 4 [running]:
runtime.throw(0xc43ebe, 0x2a)
	/usr/lib/go/src/runtime/panic.go:1116 +0x72 fp=0xc00005cdf0 sp=0xc00005cdc0 pc=0x4343d2
runtime.sigpanic()
	/usr/lib/go/src/runtime/signal_unix.go:679 +0x46a fp=0xc00005ce20 sp=0xc00005cdf0 pc=0x44a47a
runtime.memmove(0xc0036a8000, 0x0, 0x5581428)
	/usr/lib/go/src/runtime/memmove_amd64.s:392 +0x485 fp=0xc00005ce28 sp=0xc00005ce20 pc=0x465b65
runtime.growslice(0xb04520, 0x0, 0xab0285, 0x0, 0xab0286, 0x406842, 0xc000070058, 0xc000070060)
	/usr/lib/go/src/runtime/slice.go:188 +0x19e fp=0xc00005ce90 sp=0xc00005ce28 pc=0x44b70e
runtime.doaddtimer(0xc000041800, 0xc00007a000)
	/usr/lib/go/src/runtime/time.go:273 +0x131 fp=0xc00005cee8 sp=0xc00005ce90 pc=0x4537d1
runtime.modtimer(0xc00007a000, 0x18def59750, 0x0, 0xc5c640, 0x0, 0x0, 0x0)
	/usr/lib/go/src/runtime/time.go:462 +0x22c fp=0xc00005cf28 sp=0xc00005cee8 pc=0x4540fc
runtime.resettimer(...)
	/usr/lib/go/src/runtime/time.go:514
time.resetTimer(...)
	/usr/lib/go/src/runtime/time.go:223
runtime.scavengeSleep(0xd19458, 0x134b140)
	/usr/lib/go/src/runtime/mgcscavenge.go:210 +0x8b fp=0xc00005cf78 sp=0xc00005cf28 pc=0x421f2b
runtime.bgscavenge(0xc000070000)
	/usr/lib/go/src/runtime/mgcscavenge.go:337 +0x1db fp=0xc00005cfd8 sp=0xc00005cf78 pc=0x42216b
runtime.goexit()
	/usr/lib/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc00005cfe0 sp=0xc00005cfd8 pc=0x464531
created by runtime.gcenable
	/usr/lib/go/src/runtime/mgc.go:215 +0x7e
@nmeum
Copy link
Author

@nmeum nmeum commented Jun 20, 2020

I also tested CL 236518 by backporting it to our Alpine go 1.14.3 package. When doing so the cgo_test does not hang anymore but it fails with the following error message now:

--- FAIL: TestCrossPackageTests (3.72s)
    pkg_test.go:67: go test -short: exit status 1
        --- FAIL: Test9400 (0.00s)
            issue9400_linux.go:55: entry 844 of test pattern is wrong; 0x7fd858b79c6d != 0x123456789abcdef
        FAIL
        exit status 1
        FAIL    cgotest 0.008s
FAIL
exit status 1
FAIL    misc/cgo/test   5.779s
2020/06/20 12:18:19 Failed: exit status 1
scatter = 0x4027f0
--- FAIL: TestCrossPackageTests (2.18s)
    pkg_test.go:67: go test -short: exit status 1
        --- FAIL: Test9400 (0.00s)
            issue9400_linux.go:55: entry 844 of test pattern is wrong; 0x7f575db46c6d != 0x123456789abcdef
        FAIL
        exit status 1
        FAIL    cgotest 0.004s
FAIL
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 20, 2020

@nmeum Thanks for the comments. It's best if we can keep issues focused on specific topics, so that we know when they are fixed. For problems with musl that are not related to the use of signal 34 for setgid, please open separate issues. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.