Skip to content

runtime: loops forever on sched_yield sometimes(timer related) #38023

@xtaci

Description

@xtaci

What version of Go are you using (go version)?

$ go version
go version go1.14.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
➜  ~ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/xtaci/.cache/go-build"
GOENV="/home/xtaci/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/xtaci/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build204371315=/tmp/go-build -gno-record-gcc-switches"

What did you do?

a program ran for some unkown time(minutes to hours), then CPU goes to 100% and never back to normal.

What did you expect to see?

normal CPU usage

What did you see instead?

100% CPU usage for a single core, I use delve to attach to the process, and other goroutines are blocking on events forever, including some time.Tickers which expected to fire periodically.(I put breakpoints on first lines of timer handlers.)

(dlv) b osyield
Breakpoint 1 set at 0x4645b0 for runtime.osyield() /usr/local/go/src/runtime/sys_linux_amd64.s:655
(dlv) c
> runtime.osyield() /usr/local/go/src/runtime/sys_linux_amd64.s:655 (hits total:7) (PC: 0x4645b0)
Warning: debugging optimized function
   650:         JLS     2(PC)
   651:         MOVL    $0xf1, 0xf1  // crash
   652:         RET
   653:
   654: TEXT runtime·osyield(SB),NOSPLIT,$0
=> 655:         MOVL    $SYS_sched_yield, AX
   656:         SYSCALL
   657:         RET
   658:
   659: TEXT runtime·sched_getaffinity(SB),NOSPLIT,$0
   660:         MOVQ    pid+0(FP), DI
(dlv) bt
0  0x00000000004645b0 in runtime.osyield
   at /usr/local/go/src/runtime/sys_linux_amd64.s:655
1  0x0000000000452d5c in runtime.runtimer
   at /usr/local/go/src/runtime/time.go:740
2  0x000000000043b634 in runtime.checkTimers
   at /usr/local/go/src/runtime/proc.go:2621
3  0x000000000043b0db in runtime.schedule
   at /usr/local/go/src/runtime/proc.go:2483
4  0x000000000043ba76 in runtime.goschedImpl
   at /usr/local/go/src/runtime/proc.go:2705
5  0x000000000043bcc4 in runtime.gopreempt_m
   at /usr/local/go/src/runtime/proc.go:2733
6  0x000000000044bd9e in runtime.newstack
   at /usr/local/go/src/runtime/stack.go:1025
7  0x00000000004604ef in runtime.morestack
   at /usr/local/go/src/runtime/asm_amd64.s:449
8  0x0000000000462331 in runtime.goexit
   at /usr/local/go/src/runtime/asm_amd64.s:1373

also strace shows it call sched_yield() infinitely

sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
....

with 100% CPU usage
program build flag:
env CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build ...

/usr/local/go/src/runtime/time.go

 690 func runtimer(pp *p, now int64) int64 {
 691     for {
 692         t := pp.timers[0]
 693         if t.pp.ptr() != pp {
 694             throw("runtimer: bad p")
 695         }
 696         switch s := atomic.Load(&t.status); s {
 697         case timerWaiting:
 698             if t.when > now {
 699                 // Not ready to run.
 700                 return t.when
 701             }
 702
 703             if !atomic.Cas(&t.status, s, timerRunning) {
 704                 continue
 705             }
 706             // Note that runOneTimer may temporarily unlock
 707             // pp.timersLock.
 708             runOneTimer(pp, t, now)
 709             return 0
 710
 711         case timerDeleted:
....
 738         case timerModifying:
 739             // Wait for modification to complete.
 740             osyield()
 741
 742         case timerNoStatus, timerRemoved:
....

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.release-blocker

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions