Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: signal handling changed by signal.Notify #36420

Closed
stevenh opened this issue Jan 7, 2020 · 18 comments
Closed

runtime: signal handling changed by signal.Notify #36420

stevenh opened this issue Jan 7, 2020 · 18 comments

Comments

@stevenh
Copy link
Contributor

@stevenh stevenh commented Jan 7, 2020

What version of Go are you using (go version)?

go version go1.13.5 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/steveh/.cache/go-build"
GOENV="/home/steveh/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/steveh/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/steveh/code/go.d.plugin/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build394267353=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Use signal.Notify to handle os.Interrupt

What did you expect to see?

All other signal handling to be unchanged

What did you see instead?

SEGV, BUS and FPE handling from external process kill(2) don't work if signal.Notify(...) is used.

Example

package main

import (
        "fmt"
        "os"
        "os/signal"
        "time"
)

func main() {
        c := make(chan os.Signal, 1)
        signal.Notify(c, os.Interrupt)
        go func() {
                s := <-c
                fmt.Println("Got signal:", s)
                signal.Stop(c)
        }()
        if signal.Ignored(os.Interrupt) {
                fmt.Println("ignored")
        }
        for i := 0; i < 1000; i++ {
                fmt.Println("sleep...")
                time.Sleep(time.Second)
        }
}

This seems related the definition in the runtime of these signals containing _SigPanic as those which are defined as _SigThrow continue to still function correctly; confirmed by changing SEGV from _SigPanic to _SigThrow which then continues to work even once signal.Notify(...) has been called.

@stevenh
Copy link
Contributor Author

@stevenh stevenh commented Jan 7, 2020

Seems the without signal.Notify(...) gp.throwsplit is always true which means that _SigPanic is remapped to _SigThrow.

Given this and the docs in os/signal it seems that the behaviour without signal.Notify might actually be the unexpected behaviour even-though from a unix perspective it's unexpected.

Thoughts?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 7, 2020

What exactly do you mean when you say "SEGV, BUS and FPE handling from external process kill(2) don't work"? What precisely are you doing with your example program, and what precisely happens?

@stevenh
Copy link
Contributor Author

@stevenh stevenh commented Jan 7, 2020

Without the call signal.Notify all of those signals cause the app to output a stack trace and exit with it called the app continues to run ignoring the external generated signals.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 7, 2020

I do not ever see this program output a stack trace when sent an external signal, even if I remove the call to signal.Notify, or remove the import of the os/signal package entirely. Can you show precisely what you are doing to cause a stack trace? By this I mean show us the exact commands that you run and the exact output that you see. Don't describe what you are showing, show us exactly what you are doing. Thanks.

@stevenh
Copy link
Contributor Author

@stevenh stevenh commented Jan 7, 2020

my test app is called go-signal and I simply run the following to send a SIGSEGV

kill -SEGV `ps -ef |grep go-signal |grep -v grep | awk '{print $2}'`

With signal.Notify

./go-signal
sleep...
sleep...
sleep...
sleep...
sleep...
sleep...
sleep...
sleep...
^Cprocess: interrupt
Got signal: interrupt
^C

Without signal.Notify

./go-signal
sleep...
sleep...
sleep...
sleep...
SIGSEGV: segmentation violation
PC=0x4578a3 m=0 sigcode=0

goroutine 5 [syscall]:
runtime.notetsleepg(0x567820, 0x3b9ab614, 0x1415)
        /usr/local/go/src/runtime/lock_futex.go:227 +0x34 fp=0xc000044760 sp=0xc000044730 pc=0x409d64
runtime.timerproc(0x567800)
        /usr/local/go/src/runtime/time.go:311 +0x2f1 fp=0xc0000447d8 sp=0xc000044760 pc=0x446d11
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc0000447e0 sp=0xc0000447d8 pc=0x455991
created by runtime.(*timersBucket).addtimerLocked
        /usr/local/go/src/runtime/time.go:169 +0x10e

goroutine 1 [sleep]:
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:310
time.Sleep(0x3b9aca00)
        /usr/local/go/src/runtime/time.go:105 +0x157
main.main()
        /home/steveh/code/go-signal/main.go:32 +0xb6

goroutine 18 [syscall]:
os/signal.signal_recv(0x0)
        /usr/local/go/src/runtime/sigqueue.go:147 +0x9c
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
        /usr/local/go/src/os/signal/signal_unix.go:29 +0x41

rax    0xfffffffffffffffc
rbx    0x3b9ab614
rcx    0xc0000446d8
rdx    0x0
rdi    0x567820
rsi    0x80
rbp    0xc0000446e8
rsp    0xc0000446a0
r8     0x0
r9     0x0
r10    0xc0000446d8
r11    0xc0000446d8
r12    0xff
r13    0xffffffffffffff
r14    0x4dc522
r15    0x38
rip    0x4578a3
rflags 0x206
cs     0x33
fs     0x53
gs     0x2b

To clarify in both cases package os/signal is loaded in the second there's just no call to signal.Notify

@stevenh
Copy link
Contributor Author

@stevenh stevenh commented Jan 7, 2020

For reference my test is running under WSL

uname -a
Linux vader 4.4.0-18362-Microsoft #476-Microsoft Fri Nov 01 16:53:00 PST 2019 x86_64 x86_64 x86_64 GNU/Linux

However also had my colleague tested on native Ubuntu and he sees the same too.

@danmrichards
Copy link

@danmrichards danmrichards commented Jan 7, 2020

I am also seeing the same behaviour as mentioned by @stevenh

For reference I am running Go version go version go1.13.5 linux/amd64 on Linux (Linux unity-XPS-15-9570 4.18.20-041820-generic #201812030624 SMP Mon Dec 3 11:25:55 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux)

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 7, 2020

Interesting. When I try the same thing, the program simply continues running.

Ah, I see. It works the way you describe for Go 1.13, but not for Go tip.

The difference is that in Go 1.13 the signal is being received by a G for which throwsplit is set. On tip it is not. When throwsplit is set SigPanic is changed to SigThrow.

When the program does not call signal.Notify, then sending an external SIGSEGV crashes the program with Go 1.11, Go 1.12, and Go 1.13. With earlier versions of Go, and with tip, the externally generated SIGSEGV is ignored.

Oddly, I think the change in tip is due to the new timers code. With the old timers code the only real thread in this program would sleep in notetsleepg, which calls entersyscallblock, which sets throwsplit. With the new timers code the only real thread sleeps in netpoll which does not call entersyscallblock. So on tip throwsplit is not set and the signal does not crash the program.

When the program does call signal.Notify, then we start a separate thread just to receive signals. That thread does not have throwsplit set, so the signal does not crash the program.

I think the question now is: what do we want to happen? Clearly whether the program calls signal.Notify for a signal other than SIGSEGV should not affect the behavior for an externally generated SIGSEGV. For Go versions through 1.10, we ignored externally generated SIGSEGV signals. For 1.11 through 1.13, for certain kinds of programs, we crashed on an externally generated SIGSEGV. On tip, we do not crash. But for Go 1.11 and later, including tip, whether we crash or not is actually slightly random, since a signal might randomly be delivered to a G with throwsplit set.

So we need to fix that. Whether throwsplit is set must not change the behavior for an externally generated SIGSEGV.

But still: what do we want to happen? Historically we have ignored externally generated SIGSEGV, SIGBUS, and SIGFPE signals. Should we continue to ignore them, or should we crash with a stack dump, or should we crash without a stack dump? We haven't documented any behavior for this, and I don't see any principle to follow. For example, currently for an externally generated SIGSYS we crash with a stack dump, for an externally generated SIGTERM we crash without a stack dump, for an externally generated SIGALRM we ignore the signal. I don't know which of those categories is right for an externally generated SIGSEGV.

@ianlancetaylor ianlancetaylor added this to the Go1.15 milestone Jan 7, 2020
@ianlancetaylor ianlancetaylor changed the title runtime: signal handling broken by signal.Notify runtime: signal handling changed by signal.Notify Jan 7, 2020
@stevenh
Copy link
Contributor Author

@stevenh stevenh commented Jan 7, 2020

In our case we're looking for feature parity apps built in other languages e.g. C & C++
SIGTERM:

sleep...
Terminated

SEGV:

sleep...
Segmentation fault (core dumped)

SIGBUS:

sleep...
Bus error (core dumped)

SIGFPE:

sleep...
Floating point exception (core dumped)

SIGALRM:

sleep...
Alarm clock

SIGSYS:

sleep...
Bad system call (core dumped)

In summary using the OS native behaviour if we don't handle it.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 7, 2020

Go has already deviated from OS native behavior for signals like SIGALRM. That is already documented at https://golang.org/pkg/os/signal, and I don't think we can reasonably change it now. Also, even if if OS native behavior tells us whether the crash the program or not, it doesn't tell us whether to print a stack dump or not. So while OS native behavior is a guide I don't think it determines the answer.

@stevenh
Copy link
Contributor Author

@stevenh stevenh commented Jan 7, 2020

Agreed, using it as a guide is the most appropriate, as I'm not aware of other languages which have a standard convention to print a stack dump.

Another reason is stdout / err generally aren't monitored so the standard way to gain information about a "crash" is the generation of a core dump; there are of course workflows which are but cores always have the ability to work.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jan 7, 2020

I remember the old behavior, and didn't realize the behavior has changed (probably accidentally).

@bcmills
Copy link
Member

@bcmills bcmills commented Jan 7, 2020

I would argue that an external SIGSEGV should crash the program.

(In fact, I think this may be more-or-less a duplicate of #19465.)

@stevenh
Copy link
Contributor Author

@stevenh stevenh commented Jan 7, 2020

I came across that when investigating this issue, and agree it's very similar.

The notable difference is maintaining compatibility with previously installed C handlers, which is arguably a super-set of the functionality required for an external SIGSEGV etc to cause the program to crash.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 7, 2020

One way to approach this is to ask what the use case is. There are fairly well-defined use cases for several externally generated signals such as SIGHUP, SIGINT, SIGTERM, SIGQUIT, and the Go behavior tries to take those use cases into account.

What is the use case for an externally generated SIGSEGV, SIGBUS, SIGFPE signal?

@stevenh
Copy link
Contributor Author

@stevenh stevenh commented Jan 7, 2020

For SIGSEGV our use case is to force a core dump.

This is also the common definition for SIGQUIT but this is often used by apps more like a SIGTERM and proves unreliable.

For SIGBUS SIGFPE I've not come across an kill(2) based use, I just mentioned them here as they where both effected by the same change in behaviour under signal.Notify(..) on go 1.13.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 7, 2020

Note that Go programs don't dump core by default on receipt of any signal, SIGSEGV or otherwise. To get a Go program to dump core you need to set GOTRACEBACK=crash in the environment, or call runtime/debug.SetTraceback("crash"). If you do that then the program will dump core on receipt of a number of signals such as SIGQUIT, SIGILL, SIGABRT, etc.

kacf added a commit to kacf/mender that referenced this issue Feb 4, 2020
This was done to simplify the handler, now every interaction with the
handler is a request/response, so deadlocks are easy to analyze.

There is still one outstanding issue after this commit, but it appears
to be a bug in Go itself: SIGINT is not always delivered to sigChan.
It does not matter if you put the Notify call and the select on two
consecutive lines, it's random whether a ^C is delivered or not. Most
often it's not. I'm not able to reproduce it in a simple program, but
I cannot see how it could possibly be our fault if it doesn't even
work with Notify and select on consecutive lines. Although it's not
exactly the same issue, it may be related to using signals and timers
together, which appears to have been through some changes lately
[1]. This was tested with Go 1.13.3 and 1.13.7. The handler will
eventually recover by expiring the timer, so it's not a huge issue,
and I will leave this alone for now.

[1] golang/go#36420

Changelog: None

Signed-off-by: Kristian Amlie <kristian.amlie@northern.tech>
kacf added a commit to kacf/mender that referenced this issue Feb 4, 2020
This was done to simplify the handler, now every interaction with the
handler is a request/response, so deadlocks are easy to analyze.

There is still one outstanding issue after this commit, but it appears
to be a bug in Go itself: SIGINT is not always delivered to sigChan.
It does not matter if you put the Notify call and the select on two
consecutive lines, it's random whether a ^C is delivered or not. Most
often it's not. I'm not able to reproduce it in a simple program, but
I cannot see how it could possibly be our fault if it doesn't even
work with Notify and select on consecutive lines. Although it's not
exactly the same issue, it may be related to using signals and timers
together, which appears to have been through some changes lately
[1]. This was tested with Go 1.13.3 and 1.13.7. The handler will
eventually recover by expiring the timer, so it's not a huge issue,
and I will leave this alone for now.

[1] golang/go#36420

Changelog: None

Signed-off-by: Kristian Amlie <kristian.amlie@northern.tech>
kacf added a commit to kacf/mender that referenced this issue Feb 4, 2020
This was done to simplify the handler, now every interaction with the
handler is a request/response, so deadlocks are easy to analyze.

There is still one outstanding issue after this commit, but it appears
to be a bug in Go itself: SIGINT is not always delivered to sigChan.
It does not matter if you put the Notify call and the select on two
consecutive lines, it's random whether a ^C is delivered or not. Most
often it's not. I'm not able to reproduce it in a simple program, but
I cannot see how it could possibly be our fault if it doesn't even
work with Notify and select on consecutive lines. Although it's not
exactly the same issue, it may be related to using signals and timers
together, which appears to have been through some changes lately
[1]. This was tested with Go 1.13.3 and 1.13.7. The handler will
eventually recover by expiring the timer, so it's not a huge issue,
and I will leave this alone for now.

[1] golang/go#36420

Changelog: None

Signed-off-by: Kristian Amlie <kristian.amlie@northern.tech>
kacf added a commit to kacf/mender that referenced this issue Feb 4, 2020
This was done to simplify the handler, now every interaction with the
handler is a request/response, so deadlocks are easy to analyze.

There is still one outstanding issue after this commit, but it appears
to be a bug in Go itself: SIGINT is not always delivered to sigChan.
It does not matter if you put the Notify call and the select on two
consecutive lines, it's random whether a ^C is delivered or not. Most
often it's not. I'm not able to reproduce it in a simple program, but
I cannot see how it could possibly be our fault if it doesn't even
work with Notify and select on consecutive lines. Although it's not
exactly the same issue, it may be related to using signals and timers
together, which appears to have been through some changes lately
[1]. This was tested with Go 1.13.3 and 1.13.7. The handler will
eventually recover by expiring the timer, so it's not a huge issue,
and I will leave this alone for now.

[1] golang/go#36420

Changelog: None

Signed-off-by: Kristian Amlie <kristian.amlie@northern.tech>
kacf added a commit to kacf/mender that referenced this issue Feb 4, 2020
This was done to simplify the handler, now every interaction with the
handler is a request/response, so deadlocks are easy to analyze.

There is still one outstanding issue after this commit, but it appears
to be a bug in Go itself: SIGINT is not always delivered to sigChan.
It does not matter if you put the Notify call and the select on two
consecutive lines, it's random whether a ^C is delivered or not. Most
often it's not. I'm not able to reproduce it in a simple program, but
I cannot see how it could possibly be our fault if it doesn't even
work with Notify and select on consecutive lines. It only seems to
happen in cases where the writing is blocked due to a frozen
filesystem, so it may be related to the I/O thread that Go uses
somehow. If writing is proceeding as normal, then signal delivery
proceeds as normal as well, so it is possible to cancel an ongoing
operation. And a stalled operation due to a frozen filesystem will
also recover after the timer expires, so I'm not considering this a
big issue. It will just take a bit longer to cancel it.

[1] golang/go#36420

Changelog: None

Signed-off-by: Kristian Amlie <kristian.amlie@northern.tech>
@gopherbot
Copy link

@gopherbot gopherbot commented Apr 19, 2020

Change https://golang.org/cl/228900 mentions this issue: runtime: crash on SI_USER SigPanic signal

@gopherbot gopherbot closed this in e5bd6e1 Apr 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.