Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: starvation with frequent timer resets #38119

Closed
Stebalien opened this issue Mar 27, 2020 · 7 comments
Closed

runtime: starvation with frequent timer resets #38119

Stebalien opened this issue Mar 27, 2020 · 7 comments
Milestone

Comments

@Stebalien
Copy link

@Stebalien Stebalien commented Mar 27, 2020

What version of Go are you using (go version)?

$ go version
go version devel +af7eafd1505 Fri Mar 27 18:54:45 2020 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes. Tested with the latest master and the latest release (go 1.14.1).

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/steb/.cache/go-build"
GOENV="/home/steb/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/steb/.local/share/go"
GOPRIVATE=""
GOPROXY="direct"
GOROOT="/usr/lib/go"
GOSUMDB="off"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/run/user/1000/tmp/go-build828622990=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Frequently reset a 100us timer in 4K goroutines with:

package main

import (
	"fmt"
	"time"
)

func main() {
	ticker := time.NewTicker(time.Millisecond)
	defer ticker.Stop()
	exit := make(chan struct{})
	running := 0
	for {
		for i := 0; i < 100; i++ {
			running++
			go func() {
				timer := time.NewTimer(1 * time.Microsecond)
				for {
					for k := 0; k < 100; k++ {
						timer.Reset(100 * time.Microsecond)
					}
					select {
					case <-timer.C:
					case exit <- struct{}{}:
						return
					}
				}
			}()
			if running > 4000 {
				<-ticker.C
				select {
				case <-exit:
					running--
				default:
				}
			}
		}
		fmt.Printf("running: %d\n", running)
	}
}

I'm investigating this bug as part of libp2p/go-libp2p#858. I believe this is triggered by https://github.com/libp2p/go-yamux/pull/4/files#diff-f2fccfb96064097b73fbe8eeae3703eeR457 (I'm testing with the patch reverted at the moment and everything seems fine).

What did you expect to see?

Go prints "running: 4000" (ish) every few seconds at most.

What did you see instead?

  • On go 1.13, it very consistently:
    • Gets to 4000 goroutines immediately.
    • Prints about once a second.
  • On go 1.14.1, it works for a while but quickly slows down to a crawl and eventually stops entirely. The CPU usage drops to about a single core.
    • Disabling the new scheduler doesn't help.
    • I believe the halting is due to #38070.
  • On af7eafd (both with preempt on & off):
    • It gets half way to 4000 goroutines immediately, pauses, then makes it the rest of the way instantly.
    • Once at 4000 goroutines, it prints infrequently and often in bursts. That is, it may print once every second for a few seconds, then stop for a while, then start printing, etc.

Profiles

I've included a sample of the thread states. The first is in go 1.13.8, and the second is from the latest master.

Go 1.13.8

Included as a reference for what this looked like when it worked well.

CPU Profile:
go-1.13.8-cpu.gz

Threads from delve:

Thread 21907 at 0x40a5f3 /usr/lib/go/src/runtime/lock_futex.go:106 runtime.unlock
Thread 21912 at 0x45e62d /usr/lib/go/src/runtime/sys_linux_amd64.s:131 runtime.usleep
Thread 21913 at 0x45ebc3 /usr/lib/go/src/runtime/sys_linux_amd64.s:536 runtime.futex
Thread 21914 at 0x45ece7 /usr/lib/go/src/runtime/sys_linux_amd64.s:625 runtime.osyield
Thread 21915 at 0x45ebc3 /usr/lib/go/src/runtime/sys_linux_amd64.s:536 runtime.futex
Thread 21916 at 0x4a654e /usr/lib/go/src/time/sleep.go:124 time.(*Timer).Reset
Thread 21917 at 0x45ebc3 /usr/lib/go/src/runtime/sys_linux_amd64.s:536 runtime.futex
Thread 21918 at 0x45ebc3 /usr/lib/go/src/runtime/sys_linux_amd64.s:536 runtime.futex
Thread 21920 at 0x45ebc3 /usr/lib/go/src/runtime/sys_linux_amd64.s:536 runtime.futex
Thread 21907
0  0x000000000040a5f3 in runtime.unlock
   at /usr/lib/go/src/runtime/lock_futex.go:106
1  0x000000000044cec7 in runtime.deltimer
   at /usr/lib/go/src/runtime/time.go:191
2  0x000000000044cbab in time.stopTimer
   at /usr/lib/go/src/runtime/time.go:121
3  0x00000000004a6567 in time.(*Timer).Reset
   at /usr/lib/go/src/time/sleep.go:124
4  0x00000000004be4c6 in main.main.func1
   at ./tmp.go:20
5  0x000000000045cca1 in runtime.goexit
   at /usr/lib/go/src/runtime/asm_amd64.s:1357
Thread 21912
0  0x000000000045e62d in runtime.usleep
   at /usr/lib/go/src/runtime/sys_linux_amd64.s:131
1  0x0000000000439d6d in runtime.sysmon
   at /usr/lib/go/src/runtime/proc.go:4296
2  0x0000000000432ab1 in runtime.mstart1
   at /usr/lib/go/src/runtime/proc.go:1201
3  0x00000000004329b4 in runtime.mstart
   at /usr/lib/go/src/runtime/proc.go:1149
4  0x000000000045ec43 in runtime.clone
   at /usr/lib/go/src/runtime/sys_linux_amd64.s:587
Thread 21913
0  0x000000000045ebc3 in runtime.futex
   at /usr/lib/go/src/runtime/sys_linux_amd64.s:536
1  0x000000000042bbb6 in runtime.futexsleep
   at /usr/lib/go/src/runtime/os_linux.go:44
2  0x000000000040a897 in runtime.notesleep
   at /usr/lib/go/src/runtime/lock_futex.go:151
3  0x0000000000433f58 in runtime.stopm
   at /usr/lib/go/src/runtime/proc.go:1928
4  0x0000000000437146 in runtime.exitsyscall0
   at /usr/lib/go/src/runtime/proc.go:3140
5  0x000000000045ad3b in runtime.mcall
   at /usr/lib/go/src/runtime/asm_amd64.s:318
Thread 21914
0  0x000000000045ece7 in runtime.osyield
   at /usr/lib/go/src/runtime/sys_linux_amd64.s:625
1  0x000000000040a51d in runtime.lock
   at /usr/lib/go/src/runtime/lock_futex.go:93
2  0x000000000044ce8c in runtime.deltimer
   at /usr/lib/go/src/runtime/time.go:189
3  0x000000000044cbab in time.stopTimer
   at /usr/lib/go/src/runtime/time.go:121
4  0x00000000004a6567 in time.(*Timer).Reset
   at /usr/lib/go/src/time/sleep.go:124
5  0x00000000004be4c6 in main.main.func1
   at ./tmp.go:20
6  0x000000000045cca1 in runtime.goexit
   at /usr/lib/go/src/runtime/asm_amd64.s:1357
Thread 21915
0  0x000000000045ebc3 in runtime.futex
   at /usr/lib/go/src/runtime/sys_linux_amd64.s:536
1  0x000000000042bbb6 in runtime.futexsleep
   at /usr/lib/go/src/runtime/os_linux.go:44
2  0x000000000040a897 in runtime.notesleep
   at /usr/lib/go/src/runtime/lock_futex.go:151
3  0x0000000000433f58 in runtime.stopm
   at /usr/lib/go/src/runtime/proc.go:1928
4  0x0000000000437146 in runtime.exitsyscall0
   at /usr/lib/go/src/runtime/proc.go:3140
5  0x000000000045ad3b in runtime.mcall
   at /usr/lib/go/src/runtime/asm_amd64.s:318
Thread 21916
0  0x00000000004a654e in time.(*Timer).Reset
   at /usr/lib/go/src/time/sleep.go:124
1  0x00000000004be4c6 in main.main.func1
   at ./tmp.go:20
2  0x000000000045cca1 in runtime.goexit
   at /usr/lib/go/src/runtime/asm_amd64.s:1357
Thread 21917
0  0x000000000045ebc3 in runtime.futex
   at /usr/lib/go/src/runtime/sys_linux_amd64.s:536
1  0x000000000042bbb6 in runtime.futexsleep
   at /usr/lib/go/src/runtime/os_linux.go:44
2  0x000000000040a897 in runtime.notesleep
   at /usr/lib/go/src/runtime/lock_futex.go:151
3  0x0000000000433f58 in runtime.stopm
   at /usr/lib/go/src/runtime/proc.go:1928
4  0x0000000000437146 in runtime.exitsyscall0
   at /usr/lib/go/src/runtime/proc.go:3140
5  0x000000000045ad3b in runtime.mcall
   at /usr/lib/go/src/runtime/asm_amd64.s:318
Thread 21918
0  0x000000000045ebc3 in runtime.futex
   at /usr/lib/go/src/runtime/sys_linux_amd64.s:536
1  0x000000000042bbb6 in runtime.futexsleep
   at /usr/lib/go/src/runtime/os_linux.go:44
2  0x000000000040a897 in runtime.notesleep
   at /usr/lib/go/src/runtime/lock_futex.go:151
3  0x0000000000433f58 in runtime.stopm
   at /usr/lib/go/src/runtime/proc.go:1928
4  0x0000000000437146 in runtime.exitsyscall0
   at /usr/lib/go/src/runtime/proc.go:3140
5  0x000000000045ad3b in runtime.mcall
   at /usr/lib/go/src/runtime/asm_amd64.s:318
Thread 21920
0  0x000000000045ebc3 in runtime.futex
   at /usr/lib/go/src/runtime/sys_linux_amd64.s:536
1  0x000000000042bbb6 in runtime.futexsleep
   at /usr/lib/go/src/runtime/os_linux.go:44
2  0x000000000040a897 in runtime.notesleep
   at /usr/lib/go/src/runtime/lock_futex.go:151
3  0x0000000000433f58 in runtime.stopm
   at /usr/lib/go/src/runtime/proc.go:1928
4  0x0000000000437146 in runtime.exitsyscall0
   at /usr/lib/go/src/runtime/proc.go:3140
5  0x000000000045ad3b in runtime.mcall
   at /usr/lib/go/src/runtime/asm_amd64.s:318

Go master

CPU Profile:
go-master-cpu.gz

Thread 23755 at 0x45549d /usr/lib/go/src/runtime/time.go:307 runtime.deltimer
Thread 23760 at 0x4688cd /usr/lib/go/src/runtime/sys_linux_amd64.s:146 runtime.usleep
Thread 23761 at 0x468af9 /usr/lib/go/src/runtime/sys_linux_amd64.s:300 runtime.nanotime1
Thread 23762 at 0x40b2e0 /usr/lib/go/src/runtime/lock_futex.go:56 runtime.lock
Thread 23763 at :0
Thread 23809 at 0x468ed3 /usr/lib/go/src/runtime/sys_linux_amd64.s:568 runtime.futex
Thread 23755
0  0x000000000045549d in runtime.deltimer
   at /usr/lib/go/src/runtime/time.go:307
1  0x000000000045505b in time.stopTimer
   at /usr/lib/go/src/runtime/time.go:214
2  0x00000000004aae17 in time.(*Timer).Reset
   at /usr/lib/go/src/time/sleep.go:126
3  0x00000000004c1db6 in main.main.func1
   at ./tmp.go:20
4  0x0000000000466d61 in runtime.goexit
   at /usr/lib/go/src/runtime/asm_amd64.s:1374
Thread 23760
0  0x00000000004688cd in runtime.usleep
   at /usr/lib/go/src/runtime/sys_linux_amd64.s:146
1  0x0000000000441a6d in runtime.sysmon
   at /usr/lib/go/src/runtime/proc.go:4544
2  0x0000000000439c71 in runtime.mstart1
   at /usr/lib/go/src/runtime/proc.go:1134
3  0x0000000000439b74 in runtime.mstart
   at /usr/lib/go/src/runtime/proc.go:1081
4  0x0000000000468f53 in runtime.clone
   at /usr/lib/go/src/runtime/sys_linux_amd64.s:619
Thread 23761
0  0x0000000000468af9 in runtime.nanotime1
   at /usr/lib/go/src/runtime/sys_linux_amd64.s:300
1  0x000000c000880480 in ???
   at ?:-1
2  0x000000c0004a4690 in ???
   at ?:-1
   error: input/output error
Thread 23762
0  0x000000000040b2e0 in runtime.lock
   at /usr/lib/go/src/runtime/lock_futex.go:56
1  0x0000000000455bc4 in runtime.modtimer
   at /usr/lib/go/src/runtime/time.go:467
2  0x0000000000455d82 in runtime.resettimer
   at /usr/lib/go/src/runtime/time.go:520
3  0x00000000004550b5 in time.resetTimer
   at /usr/lib/go/src/runtime/time.go:223
4  0x00000000004aae43 in time.(*Timer).Reset
   at /usr/lib/go/src/time/sleep.go:127
5  0x00000000004c1db6 in main.main.func1
   at ./tmp.go:20
6  0x0000000000466d61 in runtime.goexit
   at /usr/lib/go/src/runtime/asm_amd64.s:1374
Thread 23763
0  0x000000000040b2e0 in runtime.lock
   at /usr/lib/go/src/runtime/lock_futex.go:56
1  0x0000000000455bc4 in runtime.modtimer
   at /usr/lib/go/src/runtime/time.go:467
2  0x0000000000455d82 in runtime.resettimer
   at /usr/lib/go/src/runtime/time.go:520
3  0x00000000004550b5 in time.resetTimer
   at /usr/lib/go/src/runtime/time.go:223
4  0x00000000004aae43 in time.(*Timer).Reset
   at /usr/lib/go/src/time/sleep.go:127
5  0x00000000004c1db6 in main.main.func1
   at ./tmp.go:20
6  0x0000000000466d61 in runtime.goexit
   at /usr/lib/go/src/runtime/asm_amd64.s:1374
Thread 23809
0  0x0000000000468ed3 in runtime.futex
   at /usr/lib/go/src/runtime/sys_linux_amd64.s:568
1  0x0000000000431336 in runtime.futexsleep
   at /usr/lib/go/src/runtime/os_linux.go:45
2  0x000000000040b747 in runtime.notesleep
   at /usr/lib/go/src/runtime/lock_futex.go:151
3  0x000000000043b138 in runtime.stopm
   at /usr/lib/go/src/runtime/proc.go:1865
4  0x000000000043ed66 in runtime.exitsyscall0
   at /usr/lib/go/src/runtime/proc.go:3331
5  0x0000000000464d4b in runtime.mcall
   at /usr/lib/go/src/runtime/asm_amd64.s:318
Stebalien added a commit to Stebalien/quic-go that referenced this issue Mar 27, 2020
This reverts commit c88a690.

Unfortunately, go 1.14's new scheduler and timer changes have a few bugs that
need to be ironed out before it's ready for production use.

See:
* libp2p/go-libp2p#858
* golang/go#38119
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 28, 2020

Thanks for the bug report. This is #38023/#38070, and will be fixed in 1.14.2. Sorry for the difficulty.

@Stebalien
Copy link
Author

@Stebalien Stebalien commented Mar 28, 2020

Sorry, I should have called this out more prominently in the issue.

I believe this is a new issue. I tested master as of af7eafd (this morning) and, while it no longer deadlocks and hangs completely, it still starves the main thread.

Please take a look at the "what did you see instead". I tested on go 1.13.8, go 1.14.1, and master.

  • Go 1.13: works perfectly and updates every ~1s.
  • Go 1.14: eventually hangs (#38070/#38023)
  • Master: starves. It makes progress, but sporadically.
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 28, 2020

Sorry, I missed that.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 28, 2020

At first glance, what seems to have changed here is that in 1.13 resetting a timer required taking a lock on a timer bucket. That effectively serialized the goroutines, so they ran in order, and they also alternated with the timer goroutine that took the same lock. On tip, resetting a timer does not require any lock. So you have 4000 goroutines running flat out with no locks. 100 microseconds isn't enough time for them all to finish, so they are always running and contending with each other. That seems to be enough to slow down the timer checks.

If I add a call to runtime.Gosched() in the loop, either just after the call to time.Reset or just after the loop that calls time.Reset 100 times, I get entirely different behavior, and can run many more goroutines.

So I think the key difference in 1.14 is that a loop calling timer.Reset is a busy loop, whereas in 1.13 it was not a busy loop. In Go, busy loops are bad in that the monopolize the scheduler. Go 1.14 does preempt goroutines, so the program does make some progress, but it doesn't work as it ideally should.

I can't decide whether there is a real bug here or not, or whether this is something that should be fixed by changing the program to not busy loop. It would be interesting to see whether the real program has the same busy looping behavior.

@ianlancetaylor ianlancetaylor added this to the Go1.15 milestone Mar 28, 2020
@ianlancetaylor ianlancetaylor changed the title Starvation with frequent timer resets runtime: starvation with frequent timer resets Mar 28, 2020
@Stebalien
Copy link
Author

@Stebalien Stebalien commented Mar 28, 2020

Ah, got it. The actual code that spawned this bug report writes to a socket occasionally so it shouldn't exhibit this issue.

I was mostly concerned that my test program for reproducing the bug worked on 1.13.8 but not on the latest master. Given your explanation, I can't think of a useful program that would have this problem.

@Stebalien Stebalien closed this Mar 28, 2020
@Stebalien Stebalien reopened this Mar 28, 2020
@Stebalien
Copy link
Author

@Stebalien Stebalien commented Mar 28, 2020

(sorry, didn't mean to close this for you)

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 28, 2020

I'm OK with closing this, unless and until we see some real code that runs into this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.