-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time: excessive CPU usage when using Ticker and Sleep #27707
Comments
I am unable to reproduce this. Both programs show 5-6% CPU for me. And I have 4 CPUs. When you say 20-25%, how many CPUs do you have ? |
As described, all reported % figures are per process %CPU value reported in top. I tried two servers, one with dual E5-2690v2 (10 cores X 2 with HT) and another server with a single E5-2696v4 (22 cores X 1 with HT). Other people have confirmed high CPU usage on their Mac OS machines as well, see below. https://groups.google.com/forum/#!topic/golang-nuts/_XEuq69kUOY |
Hi, top:
go version: ticker1.go: package main
import (
"time"
)
func main() {
ticker := time.NewTicker(time.Millisecond)
defer ticker.Stop()
for range ticker.C {
}
} |
Dup of #25471? |
Percentage of CPU use is not very specific. Can you describe how it is actually affecting you? Is it preventing the program from working correctly somehow? Go is not necessarily specifically optimized for minimal CPU usage. |
On a laptop or other mobile device, percentage of CPU is more-or-less equivalent to battery lifetime. We should not waste users' battery capacity needlessly. (Using a lot more CPU than expected in a real program is a problem in and of itself: it doesn't need extra justification.) |
@bcmills I'd like to implement time.Ticker with timerfd to solve this issue, but timerfd_* APIs are merged into kernel 2.6.24, as our minimal kernel requirement is 2.6.23. When can we drop support of kernel 2.6.23? |
For that, you should probably file a proposal. |
@choleraehyq Can you sketch out how using the timerfd functions will help? It's not immediately obvious to me. Thanks. |
I can confirm this behavior. It looks as if NewTicker is leaving futex locks open, and that drives up CPU/memory usage. Using |
Hi,
Please include code that you use to measure. package main
import (
"flag"
"fmt"
"time"
)
func ticker() {
ticker := time.NewTicker(time.Millisecond)
defer ticker.Stop()
for range ticker.C {
}
}
func timer() {
t := time.NewTimer(time.Millisecond)
for range t.C {
t.Reset(time.Millisecond)
}
}
func sleep() {
for {
time.Sleep(time.Millisecond)
}
}
func main() {
t := flag.String("t", "timer", "use timer")
flag.Parse()
switch *t {
case "timer":
timer()
case "ticker":
ticker()
case "sleep":
sleep()
default:
fmt.Println("use timer, ticker or sleep")
}
} I think that 10% is also high cpu usage. |
I tried djadala's program above on an Amazon EC2 a1.medium node with 1 ARM Cortex-A72 core, OS is Ubuntu 18.04LTS with a 4.15.0-1028-aws kernel, Go arm64 version 1.11.2. Interestingly, the CPU usage is only around 3% for ticker/timer/sleep. |
for reference i included syscall and cgo variants, cgo use 2% cpu, and syscall use 12%
|
I did some testing on this, comparing Go, C, and Java. The Go uses time.Sleep(), C uses usleep(), and Java uses LockSupport.ParkNanos(). Interestingly, Java and Go use the same 'timeout on semaphore' to implement. The following shows CPU utilization (on OSX).
I believe the performance issue in Go is due to multiple threads being involved (the timerproc, and the user routine) causing a lot of overhead. The simplest solution, although it doesn't work for channel select timeout, would be to have short duration time.Sleep() just invoke usleep(). There are probably very few threads using timers of this duration, so the overhead of allocating and blocking the thread shouldn't be that great. I would expect the resulting runtime to be superior to Java and on par with C. In fact, it is doubtful a sleep request for less than 10 us should even sleep at all - just call runtime.Gosched() since the overhead of a context switch is probably longer than the sleep request. |
I did some more testing by writing a CGo function USleep that calls C's usleep directly.
So this appears to be a viable alternative for polling with a short interval and low polling routine count. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@ianlancetaylor maybe this could be assigned to someone? |
I can't tell anybody to do anything. I can assign it to someone but that doesn't mean anything will happen. Rereading this I would like to double-check that there is still something to do here. @dvyukov sped up the timer code quite a bit in the 1.12 release while working on #25729. To what extent is this still a problem? |
We still handle timers on a separate thread, so there are like 4 excessive thread context switches per timer. Timers need to be part of scheduler/netpoll, i.e. call epoll_wait with the timers timeout. |
I wonder if it would make sense to say that if we call |
That was my proposal - I think it is a good one :) but I think 10 ms might be too high. Although I could see reasonable per platform defaults with an environment variable to override for special cases.
… On Mar 8, 2019, at 8:30 AM, Ian Lance Taylor ***@***.***> wrote:
I wonder if it would make sense to say that if we call time.Sleep for a time <= 10 milliseconds, and the number of active goroutines is <= GOMAXPROCS, then we should just sleep rather than going through the timer thread. That might handle microbenchmarks like this, including client programs, without compromising servers. (I chose 10 milliseconds based on sysmon).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
This issue sounds similar to what I described in #28808. In that issue a lot of the unexpected CPU load was due to the work stealing code in the scheduler after goroutines go to sleep. It would be nice if we could reduce that CPU load when there are short lived timers pending.
What do you mean by "active goroutines <= GOMAXPROCS"? Does active mean "existing" or does it mean "in a runnable state"? |
Change https://golang.org/cl/204800 mentions this issue: |
If multiple goroutines call time.(*Timer).Reset then the timer will go from timerWaiting to timerDeleted to timerModifying to timerModifiedLater. The timer can be on a different P, meaning that simultaneously cleantimers could change it from timerDeleted to timerRemoving to timerRemoved. If Reset sees timerRemoved, it was doing an atomic.Store of timerWaiting, meaning that it did not necessarily see the other values set in the timer, so the timer could appear to be in an inconsistent state. Use atomic.Cas to avoid that possibility. Updates #6239 Updates #27707 Fixes #35272 Change-Id: I1d59a13dc4f2ff4af110fc6e032c8c9d59cfc270 Reviewed-on: https://go-review.googlesource.com/c/go/+/204717 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
When dropping a P, if it has any timers, and if some thread is sleeping in the netpoller, wake the netpoller to run the P's timers. This mitigates races between the netpoller deciding how long to sleep and a new timer being added. In sysmon, if all P's are idle, check the timers to decide how long to sleep. This avoids oversleeping if no thread is using the netpoller. This can happen in particular if some threads use runtime.LockOSThread, as those threads do not block in the netpoller. Also, print the number of timers per P for GODEBUG=scheddetail=1. Before this CL, TestLockedDeadlock2 would fail about 1% of the time. With this CL, I ran it 150,000 times with no failures. Updates #6239 Updates #27707 Fixes #35274 Fixes #35288 Change-Id: I7e5193e6c885e567f0b1ee023664aa3e2902fcd1 Reviewed-on: https://go-review.googlesource.com/c/go/+/204800 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Change https://golang.org/cl/206938 mentions this issue: |
In the discussion of CL 171828 we decided that it was not necessary to acquire timersLock around the call to moveTimers, because the world is stopped. However, that is not correct, as sysmon runs even when the world is stopped, and it calls timeSleepUntil which looks through the timers. timeSleepUntil acquires timersLock, but that doesn't help if moveTimers is running at the same time. Updates #6239 Updates #27707 Updates #35462 Change-Id: I346c5bde594c4aff9955ae430b37c2b6fc71567f Reviewed-on: https://go-review.googlesource.com/c/go/+/206938 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Change https://golang.org/cl/207348 mentions this issue: |
Dan Scales pointed out a theoretical deadlock in the runtime. The timer code runs timer functions while holding the timers lock for a P. The scavenger queues up a timer function that calls wakeScavenger, which acquires the scavenger lock. The scavengeSleep function acquires the scavenger lock, then calls resetTimer which can call addInitializedTimer which acquires the timers lock for the current P. So there is a potential deadlock, in that the scavenger lock and the timers lock for some P may both be acquired in different order. It's not clear to me whether this deadlock can ever actually occur. Issue 35532 describes another possible deadlock. The pollSetDeadline function acquires pd.lock for some poll descriptor, and in some cases calls resettimer which can in some cases acquire the timers lock for the current P. The timer code runs timer functions while holding the timers lock for a P. The timer function for poll descriptors winds up in netpolldeadlineimpl which acquires pd.lock. So again there is a potential deadlock, in that the pd lock for some poll descriptor and the timers lock for some P may both be acquired in different order. I think this can happen if we change the deadline for a network connection exactly as the former deadline expires. Looking at the code, I don't see any reason why we have to hold the timers lock while running a timer function. This CL implements that change. Updates #6239 Updates #27707 Fixes #35532 Change-Id: I17792f5a0120e01ea07cf1b2de8434d5c10704dd Reviewed-on: https://go-review.googlesource.com/c/go/+/207348 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Updates #6239 Updates #27707 Change-Id: I65e6471829c9de4677d3ac78ef6cd7aa0a1fc4cb Reviewed-on: https://go-review.googlesource.com/c/go/+/171884 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
The new timer code is committed for 1.14. This code reduces lock contention and context switches for timers. This seems to have a clearly beneficial effect for real programs. With regard to this issue, these changes reduce the CPU requirements for the simple ticker program. They don't seem to affect the simple sleep program very much. Leaving this issue open for further investigation. |
Great work !
|
Change https://golang.org/cl/209580 mentions this issue: |
We were using the race context of the P that held the timer, but since we unlock the P's timers while executing a timer that could lead to a race on the race context itself. Updates #6239 Updates #27707 Fixes #35906 Change-Id: I5f9d5f52d8e28dffb88c3327301071b16ed1a913 Reviewed-on: https://go-review.googlesource.com/c/go/+/209580 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Change https://golang.org/cl/232298 mentions this issue: |
While responding to review comments by @ianlancetaylor on https://golang.org/cl/232298 I discovered that it appears to improve the issue reported here. Specifically I compiled the following program with Go 1.13, 1.14 and that CL. package main
import (
"time"
)
func main() {
for {
time.Sleep(time.Millisecond)
}
} I ran all three builds on the same machine at the same time. The machine has 56 hardware threads and was mostly idle. The OS is CentOS 7.4.1708. With that setup top reported the following typical %CPU (of one hardware thread) for each program:
So there is at least some evidence that the CL provides an improvement over both old and new timers. |
I don't know if this is related but i myself experienced high CPU usage while not using sleep. This is one of my loops. Without sleep in for {
select {
default:
// unnoticeable sleep to prevent high cpu usage
time.Sleep(time.Microsecond)
case <-exit:
// crawler has completed
return
case response := <-crawl.Crawler.ResponseChan:
} |
Without the sleep your for/select loop will spin the default case, so this is expected. The code above would probably work fine with the default case removed entirely. |
Client: go This is a follow up to 4db7a0a. Due to the go runtime bug [1], the 1ms ticker used by this feature could cause excessive cpu usage. So until the bug is fixed in go it's better to leave this behavior disabled by default. The instructions in the README file has been updated to how to enable this feature if needed. [1] golang/go#27707
Client: go This is a follow up to 4db7a0a. Due to the go runtime bug [1], the 1ms ticker used by this feature could cause excessive cpu usage. So until the bug is fixed in go it's better to leave this behavior disabled by default. The instructions in the README file has been updated to how to enable this feature if needed. [1] golang/go#27707
Client: go Because of the Go runtime bug [1], the previous default value of 1ms is not a great default as it could cause excessive cpu usage. Use 5ms instead as a balance between being useful and not causing too much cpu overhead. It's still configurable. [1]: golang/go#27707
Client: go This is a follow up to 4db7a0a. Because of the Go runtime bug [1], the previous default value of 1ms is not a great default as it could cause excessive cpu usage. Use 5ms instead as a balance between being useful and not causing too much cpu overhead. It's still configurable. [1]: golang/go#27707
Client: go This is a follow up to 4db7a0a. Because of the Go runtime bug [1], the previous default value of 1ms is not a great default as it could cause excessive cpu usage. Use 5ms instead as a balance between being useful and not causing too much cpu overhead. It's still configurable. [1]: golang/go#27707
Change the scheduler to treat expired timers with the same approach it uses to steal runnable G's. Previously the scheduler ignored timers on P's not marked for preemption. That had the downside that any G's waiting on those expired timers starved until the G running on their P completed or was preempted. That could take as long as 20ms if sysmon was in a 10ms wake up cycle. In addition, a spinning P that ignored an expired timer and found no other work would stop despite there being available work, missing the opportunity for greater parallelism. With this change the scheduler no longer ignores timers on non-preemptable P's or relies on sysmon as a backstop to start threads when timers expire. Instead it wakes an idle P, if needed, when creating a new timer because it cannot predict if the current P will have a scheduling opportunity before the new timer expires. The P it wakes will determine how long to sleep and block on the netpoller for the required time, potentially stealing the new timer when it wakes. This change also eliminates a race between a spinning P transitioning to idle concurrently with timer creation using the same pattern used for submission of new goroutines in the same window. Benchmark analysis: CL 232199, which was included in Go 1.15 improved timer latency over Go 1.14 by allowing P's to steal timers from P's not marked for preemption. The benchmarks added in this CL measure that improvement in the ParallelTimerLatency benchmark seen below. However, Go 1.15 still relies on sysmon to notice expired timers in some situations and sysmon can sleep for up to 10ms before waking to check timers. This CL fixes that shortcoming with modest regression on other benchmarks. name \ avg-late-ns go14.time.bench go15.time.bench fix.time.bench ParallelTimerLatency-8 17.3M ± 3% 7.9M ± 0% 0.2M ± 3% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=1-8 53.4k ±23% 50.7k ±31% 252.4k ± 9% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=2-8 204k ±14% 90k ±58% 188k ±12% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=3-8 1.17M ± 0% 0.11M ± 5% 0.11M ± 2% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=4-8 1.81M ±44% 0.10M ± 4% 0.10M ± 2% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=5-8 2.28M ±66% 0.09M ±13% 0.08M ±21% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=6-8 2.84M ±85% 0.07M ±15% 0.07M ±18% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=7-8 2.13M ±27% 0.06M ± 4% 0.06M ± 9% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=8-8 2.63M ± 6% 0.06M ±11% 0.06M ± 9% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=9-8 3.32M ±17% 0.06M ±16% 0.07M ±14% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=10-8 8.46M ±20% 4.37M ±21% 5.03M ±23% StaggeredTickerLatency/work-dur=2ms/tickers-per-P=1-8 1.02M ± 1% 0.20M ± 2% 0.20M ± 2% name \ max-late-ns go14.time.bench go15.time.bench fix.time.bench ParallelTimerLatency-8 18.3M ± 1% 8.2M ± 0% 0.5M ±12% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=1-8 141k ±19% 127k ±19% 1129k ± 3% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=2-8 2.78M ± 4% 1.23M ±15% 1.26M ± 5% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=3-8 6.05M ± 5% 0.67M ±56% 0.81M ±33% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=4-8 7.93M ±20% 0.71M ±46% 0.76M ±41% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=5-8 9.41M ±30% 0.92M ±23% 0.81M ±44% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=6-8 10.8M ±42% 0.8M ±41% 0.8M ±30% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=7-8 9.62M ±24% 0.77M ±38% 0.88M ±27% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=8-8 10.6M ±10% 0.8M ±32% 0.7M ±27% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=9-8 11.9M ±36% 0.6M ±46% 0.8M ±38% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=10-8 36.8M ±21% 24.7M ±21% 27.5M ±16% StaggeredTickerLatency/work-dur=2ms/tickers-per-P=1-8 2.12M ± 2% 1.02M ±11% 1.03M ± 7% Other time benchmarks: name \ time/op go14.time.bench go15.time.bench fix.time.bench AfterFunc-8 137µs ± 4% 123µs ± 4% 131µs ± 2% After-8 212µs ± 3% 195µs ± 4% 204µs ± 7% Stop-8 165µs ± 6% 156µs ± 2% 151µs ±12% SimultaneousAfterFunc-8 260µs ± 3% 248µs ± 3% 284µs ± 2% StartStop-8 65.8µs ± 9% 64.4µs ± 7% 67.3µs ±15% Reset-8 13.6µs ± 2% 9.6µs ± 2% 9.1µs ± 4% Sleep-8 307µs ± 4% 306µs ± 3% 320µs ± 2% Ticker-8 53.0µs ± 5% 54.5µs ± 5% 57.0µs ±11% TickerReset-8 9.24µs ± 2% 9.51µs ± 3% TickerResetNaive-8 149µs ± 5% 145µs ± 5% Fixes #38860 Updates #25471 Updates #27707 Change-Id: If52680509b0f3b66dbd1d0c13fa574bd2d0bbd57 Reviewed-on: https://go-review.googlesource.com/c/go/+/232298 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Trust: Ian Lance Taylor <iant@golang.org>
maybe heap structure is too heavy incase many goroutine is using timer/ticker/sleep and each with small sleep time(like < 10ms). i see much time used in siftdownTimer maintain the heap structure. also, i noticed that much time used in runtime.casgstatus doing cas, comment said its give way to goroutine gc. however, even i disable gc, the cas loop still use much time. simple code are: import _ "net/http/pprof"
func main() {
for i := 0; i < 10000; i++ {
go func() {
for {
time.Sleep(time.Millisecond * 5)
}
}()
}
debug.SetGCPercent(-1)
runtime.GC()
fmt.Println(http.ListenAndServe("localhost:8081", nil))
} |
What version of Go are you using (
go version
)?go1.11 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?GOARCH="amd64"
GOBIN=""
GOCACHE="/home/lni/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/lni/golang_ws"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build440058908=/tmp/go-build -gno-record-gcc-switches"
What did you do?
I need to call a function roughly every millisecond, there is no real time requirement, as long as it is called roughly every millisecond everything is fine. However, I realised that both time.Ticker and time.Sleep() are causing excessive CPU overhead, probably due to runtime scheduling.
The following Go program uses 20-25% %CPU as reported by top on Linux.
for loop with range is causing similar overhead
while the following program is showing 10-15% %CPU in top
To workaround the issue, I had to move the ticker/sleep part to C and let the C code to call the Go function that need to be invoked every millisecond. Such a cgo based ugly hack reduced %CPU in top to 2-3%. Please see the proof of concept code below.
ticker.h
ticker.c
ticker.go
What did you expect to see?
Much lower CPU overhead when using time.Ticker or time.Sleep()
What did you see instead?
20-25% %CPU in top
The text was updated successfully, but these errors were encountered: