Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/signal: notify should filter out runtime-generated SIGURG #37942

Open
minux opened this issue Mar 19, 2020 · 12 comments
Open

os/signal: notify should filter out runtime-generated SIGURG #37942

minux opened this issue Mar 19, 2020 · 12 comments
Milestone

Comments

@minux
Copy link
Member

@minux minux commented Mar 19, 2020

This program, when running under go 1.14, will show a huge number of SIGURG are received even though none should be (no network IO, no explicit kills).

https://play.golang.org/p/x7hFjPZnrg5

package main

import (
	"fmt"
	"os"
	"os/signal"
	"syscall"
	"time"
)

func main() {
	ch := make(chan os.Signal, 1)
	signal.Notify(ch, syscall.SIGURG)

	for {
		select {
		case sig := <-ch:
			fmt.Printf("received %v: %s\n", sig, time.Now())
		default:
			_ = new(int) // generate some GC activities
		}
	}
}

https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md changes to use SIGURG to preempt goroutines, which is fine, but I think os/signal.Notify should filter out those runtime generated SIGURGs.

The reason I found this is that I have a program that happens to use SIGURG as a custom signal like SIGUSR1, which works fine with previous Go releases. I understand that it should probably use SIGUSR1, but still I think os/signal should hide any signals generated by the runtime as it's irrelevant for the user and an implementation detail.

/cc @aclements thoughts?

@minux minux changed the title os/signal.Notfiy: filter out runtime-generated SIGURG? os/signal.Notify: filter out runtime-generated SIGURG? Mar 19, 2020
@aclements
Copy link
Member

@aclements aclements commented Mar 19, 2020

The reason I found this is that I have a program that happens to use SIGURG as a custom signal like SIGUSR1

Right. It was an explicit decision to pass these through to the program, in case the program is using SIGURG. Unfortunately, the runtime can't tell if a signal is "runtime generated" and thus can't filter it out. Because the kernel combines signals, it's not even well-specified what it would mean for a received signal to be runtime-generated because the kernel could have combined a runtime-generated signal with a signal from another source.

@changkun, I don't really understand your program. First, I don't understand the need for Gosched there. There's no tight loop, and even if there were, I don't see what that Gosched would accomplish. Second, if a program is going to subscribe to all signals, rather than passing specific signals of interest to signal.Notify, then it should be prepared to receive any signal. There are lots of reasons lots of signals get sent. For example, if I ran that program in a terminal emulator and started resizing the window, it would begin streaming warnings about unexpected SIGWINCHes.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 19, 2020

@minux I don't see how it is possible even in principle for os/signal to filter out signals generated from the runtime package. Do you have any suggestions as to how that could be done?

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Mar 20, 2020

signal.Notify is to capture process-targeted signal, right? The preemption signals are thread-targeted. Maybe we can filter signals of which sigcode is SI_TKILL (at least on Linux)?

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Mar 20, 2020

That would not work if the kernel combines signals with different sigcode. Does it?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 20, 2020

As far as I can tell, the Linux 5.3.12 kernel does combine signals with a different si_code value. In __send_signal I see

	if (legacy_queue(pending, sig))
		goto ret;

and

static inline bool legacy_queue(struct sigpending *signals, int sig)
{
	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}

where signals->signal is the set of pending signals.

@cagedmantis cagedmantis changed the title os/signal.Notify: filter out runtime-generated SIGURG? os/signal: notify should filter out runtime-generated SIGURG Mar 20, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Mar 20, 2020
@minux
Copy link
Member Author

@minux minux commented Mar 21, 2020

bfix added a commit to bfix/gnunet-go that referenced this issue Mar 31, 2020
Probably a Go1.14 issue; see golang/go#37942
@mengzhuo
Copy link
Contributor

@mengzhuo mengzhuo commented Apr 3, 2020

I found a lot of SIGURGs will timeout the runtime/regex testes of mips64le.

The origin test of runtime only takes 43s to finish, now it's 600s.

https://farmer.golang.org/temporarylogs?name=linux-mips64le-mengzhuo&rev=dcf0929de6a12103a8fd7097abd6e797188c366d&st=0xc010fd8ea0

rt_sigreturn({mask=[]})                 = -1 ENOENT (No such file or directory)     
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = 824642013144                              
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 (errno 824635811712)                   
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = 18446744073709551615                      
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 (errno 824642012824)                   
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 (errno 824642012416)                   
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 (errno 824642002944)                   
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 EDOM (Numerical argument out of domain)
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = 0                                         
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 (errno 1098664997869)                  
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = 18446744073709551615                      
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = 27                                        
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 EFBIG (File too large)                 
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 ESRCH (No such process)                
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 EPERM (Operation not permitted)        
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = 18446744073709551615                      
sched_yield()                           = 0                                         
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 3, 2020

@mengzhuo Can you open a separate issue for that? It sounds like the program is having a hard time making progress, which seems more like #37741. Thanks.

@aclements
Copy link
Member

@aclements aclements commented Apr 3, 2020

Perhaps we can repurpose this issue to "runtime: only use forced preemption when necessary"?

I've considered delaying the first attempt to use signals until some time after we've tried cooperative preemption and it's failed. Most of the time cooperative preemption responds very quickly. I'm not sure what to set that threshold to, though, since every millisecond of delay there is a millisecond added to STW if there's anything not cooperating.

@networkimprov
Copy link

@networkimprov networkimprov commented Apr 3, 2020

Could you also provide a build option to enable (or never enable) preemptive? Runtime env flags would presumably override that...

@aclements
Copy link
Member

@aclements aclements commented Apr 3, 2020

That option already exists (GODEBUG=asyncpreemptoff=1), but, as it says on the tin, it's meant for debugging.

@networkimprov
Copy link

@networkimprov networkimprov commented Apr 4, 2020

Um, I meant a compiler flag to set the default scheduler, which the env var could override.

mrsdizzie added a commit to mrsdizzie/gitea that referenced this issue Apr 28, 2020
…14 in git env

As seen in trouble shooting go-gitea#11032 the new feature of Go 1.14 is causing several second delays in startup in certain situations. Debugging shows it spending several seconds handling SIGURG commands during init:

```
6922:04:51.984234 trace init() ./modules/queue/unique_queue_wrapped.go
remote: ) = 69 <0.000012>
remote: [pid 15984] 22:04:51 write(1, "\ttime taken: 236.761\302\265s\n\n", 25    time taken: 236.761µs
remote:
remote: ) = 25 <0.000011>
remote: [pid 15984] 22:04:51 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
```

This causes up to 20 seconds added to a push in some cases as it happens for each call of the gitea hook command. This is likely the cause of go-gitea#10661 as well and would start to effect users once we release 1.12 which would be the first release compiled with Go 1.14. I suspect this is just a slight issue with the upstream implementatation as there have been a few very similar bugs fixed and reported:

 golang/go#37741
 golang/go#37942

We should revisit this in the future and see if a newer version of Go has solved it, but for now disable this option in the environment that gitea hook runs in to avoid it.
techknowlogick pushed a commit to go-gitea/gitea that referenced this issue Apr 28, 2020
…14 in git env (#11237)

As seen in trouble shooting #11032 the new feature of Go 1.14 is causing several second delays in startup in certain situations. Debugging shows it spending several seconds handling SIGURG commands during init:

```
6922:04:51.984234 trace init() ./modules/queue/unique_queue_wrapped.go
remote: ) = 69 <0.000012>
remote: [pid 15984] 22:04:51 write(1, "\ttime taken: 236.761\302\265s\n\n", 25    time taken: 236.761µs
remote:
remote: ) = 25 <0.000011>
remote: [pid 15984] 22:04:51 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
```

This causes up to 20 seconds added to a push in some cases as it happens for each call of the gitea hook command. This is likely the cause of #10661 as well and would start to effect users once we release 1.12 which would be the first release compiled with Go 1.14. I suspect this is just a slight issue with the upstream implementatation as there have been a few very similar bugs fixed and reported:

 golang/go#37741
 golang/go#37942

We should revisit this in the future and see if a newer version of Go has solved it, but for now disable this option in the environment that gitea hook runs in to avoid it.
ydelafollye pushed a commit to ydelafollye/gitea that referenced this issue Jul 31, 2020
…14 in git env (go-gitea#11237)

As seen in trouble shooting go-gitea#11032 the new feature of Go 1.14 is causing several second delays in startup in certain situations. Debugging shows it spending several seconds handling SIGURG commands during init:

```
6922:04:51.984234 trace init() ./modules/queue/unique_queue_wrapped.go
remote: ) = 69 <0.000012>
remote: [pid 15984] 22:04:51 write(1, "\ttime taken: 236.761\302\265s\n\n", 25    time taken: 236.761µs
remote:
remote: ) = 25 <0.000011>
remote: [pid 15984] 22:04:51 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
```

This causes up to 20 seconds added to a push in some cases as it happens for each call of the gitea hook command. This is likely the cause of go-gitea#10661 as well and would start to effect users once we release 1.12 which would be the first release compiled with Go 1.14. I suspect this is just a slight issue with the upstream implementatation as there have been a few very similar bugs fixed and reported:

 golang/go#37741
 golang/go#37942

We should revisit this in the future and see if a newer version of Go has solved it, but for now disable this option in the environment that gitea hook runs in to avoid it.
pothos added a commit to kinvolk/coreos-overlay that referenced this issue Jan 12, 2021
When Docker/containerd binaries are compiled with Go 1.15 the
containers generate many signal 23 (SIGURG) events which flood
monitoring systems:
  kubernetes/kops#10388
The SIGURG signal does not kill the process but is generated by Go
runtime scheduling:
  https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md)
Because the Go runtime does not know if the process expects external
SIGURG signals, the signal is not filtered out but reported to the
process: golang/go#37942
The process has to filter this signal out itself before forwarding it
to, e.g,. children processes or logs.
This change was introduced with the Go 1.15 update (actually Go 1.14
but Flatcar skipped that for Stable), however, while containerd has
some workarounds in place, e.g., in
containerd/containerd#4532 but there are still
areas where the signal is not handled correctly.
Until this is the case, downgrade to use the Go 1.13 compiler for
Docker/containerd binaries.

See kinvolk/Flatcar#315
pothos added a commit to kinvolk/coreos-overlay that referenced this issue Jan 12, 2021
When Docker/containerd binaries are compiled with Go 1.15 the
containers generate many signal 23 (SIGURG) events which flood
monitoring systems:
  kubernetes/kops#10388
The SIGURG signal does not kill the process but is generated by Go
runtime scheduling:
  https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md)
Because the Go runtime does not know if the process expects external
SIGURG signals, the signal is not filtered out but reported to the
process: golang/go#37942
The process has to filter this signal out itself before forwarding it
to, e.g,. children processes or logs.
This change was introduced with the Go 1.15 update (actually Go 1.14
but Flatcar skipped that for Stable), however, while containerd has
some workarounds in place, e.g., in
containerd/containerd#4532 but there are still
areas where the signal is not handled correctly.
Until this is the case, downgrade to use the Go 1.13 compiler for
Docker/containerd binaries.

See kinvolk/Flatcar#315
pothos added a commit to kinvolk/coreos-overlay that referenced this issue Jan 13, 2021
When Docker/containerd binaries are compiled with Go 1.15 the
containers generate many signal 23 (SIGURG) events which flood
monitoring systems:
  kubernetes/kops#10388
The SIGURG signal does not kill the process but is generated by Go
runtime scheduling:
  https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md)
Because the Go runtime does not know if the process expects external
SIGURG signals, the signal is not filtered out but reported to the
process: golang/go#37942
The process has to filter this signal out itself before forwarding it
to, e.g,. children processes or logs.
This change was introduced with the Go 1.15 update (actually Go 1.14
but Flatcar skipped that for Stable), however, while containerd has
some workarounds in place, e.g., in
containerd/containerd#4532 but there are still
areas where the signal is not handled correctly.
Until this is the case, downgrade to use the Go 1.13 compiler for
Docker/containerd binaries.

See kinvolk/Flatcar#315
cpuguy83 added a commit to cpuguy83/cli that referenced this issue Jan 13, 2021
In go1.14+, SIGURG is used by the runtime to handle preemtable system
calls.
In practice this signal caught *frequently*.

For reference:

https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
golang/go#37942

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
cpuguy83 added a commit to cpuguy83/cli that referenced this issue Jan 13, 2021
In go1.14+, SIGURG is used by the runtime to handle preemtable system
calls.
In practice this signal caught *frequently*.

For reference:

https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
golang/go#37942

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
cpuguy83 added a commit to cpuguy83/cli that referenced this issue Jan 13, 2021
In go1.14+, SIGURG is used by the runtime to handle preemtable system
calls.
In practice this signal caught *frequently*.

For reference:

https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
golang/go#37942

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
cpuguy83 added a commit to cpuguy83/cli that referenced this issue Jan 14, 2021
In go1.14+, SIGURG is used by the runtime to handle preemtable system
calls.
In practice this signal caught *frequently*.

For reference:

https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
golang/go#37942

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
cpuguy83 added a commit to cpuguy83/cli that referenced this issue Jan 15, 2021
In go1.14+, SIGURG is used by the runtime to handle preemtable system
calls.
In practice this signal caught *frequently*.

For reference:

https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
golang/go#37942

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
docker-jenkins pushed a commit to docker/docker-ce that referenced this issue Jan 16, 2021
In go1.14+, SIGURG is used by the runtime to handle preemtable system
calls.
In practice this signal caught *frequently*.

For reference:

https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
golang/go#37942

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: fff164c22e8dc904291fecb62307312fd4ca153e
Component: cli
pothos added a commit to kinvolk/coreos-overlay that referenced this issue Jan 18, 2021
When Docker/containerd binaries are compiled with Go 1.15 the
containers generate many signal 23 (SIGURG) events which flood
monitoring systems:
  kubernetes/kops#10388
The SIGURG signal does not kill the process but is generated by Go
runtime scheduling:
  https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md)
Because the Go runtime does not know if the process expects external
SIGURG signals, the signal is not filtered out but reported to the
process: golang/go#37942
The process has to filter this signal out itself before forwarding it
to, e.g,. children processes or logs.
This change was introduced with the Go 1.15 update (actually Go 1.14
but Flatcar skipped that for Stable), however, while containerd has
some workarounds in place, e.g., in
containerd/containerd#4532 but there are still
areas where the signal is not handled correctly.
Until this is the case, downgrade to use the Go 1.13 compiler for
Docker/containerd binaries.

See kinvolk/Flatcar#315
pothos added a commit to kinvolk/coreos-overlay that referenced this issue Jan 18, 2021
When Docker/containerd binaries are compiled with Go 1.15 the
containers generate many signal 23 (SIGURG) events which flood
monitoring systems:
  kubernetes/kops#10388
The SIGURG signal does not kill the process but is generated by Go
runtime scheduling:
  https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md)
Because the Go runtime does not know if the process expects external
SIGURG signals, the signal is not filtered out but reported to the
process: golang/go#37942
The process has to filter this signal out itself before forwarding it
to, e.g,. children processes or logs.
This change was introduced with the Go 1.15 update (actually Go 1.14
but Flatcar skipped that for Stable), however, while containerd has
some workarounds in place, e.g., in
containerd/containerd#4532 but there are still
areas where the signal is not handled correctly.
Until this is the case, downgrade to use the Go 1.13 compiler for
Docker/containerd binaries.

See kinvolk/Flatcar#315
pothos added a commit to kinvolk/coreos-overlay that referenced this issue Jan 18, 2021
When Docker/containerd binaries are compiled with Go 1.15 the
containers generate many signal 23 (SIGURG) events which flood
monitoring systems:
  kubernetes/kops#10388
The SIGURG signal does not kill the process but is generated by Go
runtime scheduling:
  https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md)
Because the Go runtime does not know if the process expects external
SIGURG signals, the signal is not filtered out but reported to the
process: golang/go#37942
The process has to filter this signal out itself before forwarding it
to, e.g,. children processes or logs.
This change was introduced with the Go 1.15 update (actually Go 1.14
but Flatcar skipped that for Stable), however, while containerd has
some workarounds in place, e.g., in
containerd/containerd#4532 but there are still
areas where the signal is not handled correctly.
Until this is the case, downgrade to use the Go 1.13 compiler for
Docker/containerd binaries.

See kinvolk/Flatcar#315
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.