Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time: excessive CPU usage when using Ticker and Sleep #27707

Open
lni opened this Issue Sep 17, 2018 · 47 comments

Comments

Projects
None yet
@lni
Copy link

lni commented Sep 17, 2018

What version of Go are you using (go version)?

go1.11 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="/home/lni/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/lni/golang_ws"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build440058908=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I need to call a function roughly every millisecond, there is no real time requirement, as long as it is called roughly every millisecond everything is fine. However, I realised that both time.Ticker and time.Sleep() are causing excessive CPU overhead, probably due to runtime scheduling.

The following Go program uses 20-25% %CPU as reported by top on Linux.

package main

import (
  "time"
)

func main() {
  ticker := time.NewTicker(time.Millisecond)
  defer ticker.Stop()
  for {
    select {
    case <-ticker.C:
    }
  }
}

for loop with range is causing similar overhead

package main

import (
  "time"
)

func main() {
  ticker := time.NewTicker(time.Millisecond)
  defer ticker.Stop()
  for range ticker.C {
  }
}

while the following program is showing 10-15% %CPU in top

package main

import (
  "time"
)

func main() {
  for {
    time.Sleep(time.Millisecond)
  }
}

To workaround the issue, I had to move the ticker/sleep part to C and let the C code to call the Go function that need to be invoked every millisecond. Such a cgo based ugly hack reduced %CPU in top to 2-3%. Please see the proof of concept code below.

ticker.h

#ifndef TEST_TICKER_H
#define TEST_TICKER_H

void cticker();

#endif // TEST_TICKER_H

ticker.c

#include <unistd.h>
#include "ticker.h"
#include "_cgo_export.h"

void cticker()
{
  for(int i = 0; i < 30000; i++)
  {
    usleep(1000);
    Gotask();
  }
}

ticker.go

package main

/*
#include "ticker.h"
*/
import "C"
import (
  "log"
)

var (
  counter uint64 = 0
)

//export Gotask
func Gotask() {
  counter++
}

func main() {
  C.cticker()
  log.Printf("Gotask called %d times", counter)
}

What did you expect to see?

Much lower CPU overhead when using time.Ticker or time.Sleep()

What did you see instead?

20-25% %CPU in top

@agnivade

This comment has been minimized.

Copy link
Member

agnivade commented Sep 17, 2018

I am unable to reproduce this. Both programs show 5-6% CPU for me. And I have 4 CPUs. When you say 20-25%, how many CPUs do you have ?

@lni

This comment has been minimized.

Copy link
Author

lni commented Sep 17, 2018

As described, all reported % figures are per process %CPU value reported in top.

I tried two servers, one with dual E5-2690v2 (10 cores X 2 with HT) and another server with a single E5-2696v4 (22 cores X 1 with HT). Other people have confirmed high CPU usage on their Mac OS machines as well, see below.

https://groups.google.com/forum/#!topic/golang-nuts/_XEuq69kUOY

@djadala

This comment has been minimized.

Copy link
Contributor

djadala commented Sep 17, 2018

Hi,
Same here,
Linux 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 (2018-08-21) x86_64 GNU/Linux,
4x AMD Phenom(tm) II X4 965 Processor,

top:

Tasks: 362 total,   1 running, 361 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.8 us,  0.4 sy,  0.0 ni, 98.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  8142260 total,   607488 free,  2363780 used,  5170992 buff/cache
KiB Swap:  4194300 total,  4144460 free,    49840 used.  5385516 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                   
 1689 djadala   20   0    2476    708    520 S  22.8  0.0   0:01.90 ticker1                                                   

go version:
go version go1.10.3 linux/amd64

ticker1.go:

package main

import (
	"time"
)

func main() {
	ticker := time.NewTicker(time.Millisecond)
	defer ticker.Stop()
	for range ticker.C {
	}
}

@agnivade agnivade changed the title Excessive CPU usage when using time.Ticker and time.Sleep time: excessive CPU usage when using Ticker and Sleep Sep 17, 2018

@agnivade agnivade added this to the Go1.12 milestone Sep 17, 2018

@bcmills

This comment has been minimized.

Copy link
Member

bcmills commented Sep 17, 2018

Dup of #25471?

@mbyio

This comment has been minimized.

Copy link

mbyio commented Sep 17, 2018

Percentage of CPU use is not very specific. Can you describe how it is actually affecting you? Is it preventing the program from working correctly somehow? Go is not necessarily specifically optimized for minimal CPU usage.

@bcmills

This comment has been minimized.

Copy link
Member

bcmills commented Sep 17, 2018

Can you describe how it is actually affecting you? Is it preventing the program from working correctly somehow?

On a laptop or other mobile device, percentage of CPU is more-or-less equivalent to battery lifetime. We should not waste users' battery capacity needlessly.

(Using a lot more CPU than expected in a real program is a problem in and of itself: it doesn't need extra justification.)

@choleraehyq

This comment has been minimized.

Copy link
Contributor

choleraehyq commented Nov 19, 2018

@bcmills I'd like to implement time.Ticker with timerfd to solve this issue, but timerfd_* APIs are merged into kernel 2.6.24, as our minimal kernel requirement is 2.6.23. When can we drop support of kernel 2.6.23?

@bcmills

This comment has been minimized.

Copy link
Member

bcmills commented Nov 19, 2018

When can we drop support of kernel 2.6.23?

For that, you should probably file a proposal.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Nov 19, 2018

@choleraehyq Can you sketch out how using the timerfd functions will help? It's not immediately obvious to me. Thanks.

@gmox

This comment has been minimized.

Copy link

gmox commented Dec 1, 2018

I can confirm this behavior. It looks as if NewTicker is leaving futex locks open, and that drives up CPU/memory usage. Using After does not result in increased CPU/memory usage.

@djadala

This comment has been minimized.

Copy link
Contributor

djadala commented Dec 1, 2018

Hi,

Using After does not result in increased CPU/memory usage

Please include code that you use to measure.
Here is my code, sleep & timer use 10%cpu, ticker use 20%cpu :

package main

import (
	"flag"
	"fmt"
	"time"
)

func ticker() {

	ticker := time.NewTicker(time.Millisecond)
	defer ticker.Stop()
	for range ticker.C {
	}
}

func timer() {
	t := time.NewTimer(time.Millisecond)
	for range t.C {
		t.Reset(time.Millisecond)
	}
}

func sleep() {
	for {
		time.Sleep(time.Millisecond)
	}
}

func main() {
	t := flag.String("t", "timer", "use timer")
	flag.Parse()
	switch *t {
	case "timer":
		timer()
	case "ticker":
		ticker()
	case "sleep":
		sleep()
	default:
		fmt.Println("use  timer, ticker or sleep")
	}
}

I think that 10% is also high cpu usage.

@lni

This comment has been minimized.

Copy link
Author

lni commented Dec 1, 2018

I tried djadala's program above on an Amazon EC2 a1.medium node with 1 ARM Cortex-A72 core, OS is Ubuntu 18.04LTS with a 4.15.0-1028-aws kernel, Go arm64 version 1.11.2. Interestingly, the CPU usage is only around 3% for ticker/timer/sleep.

@djadala

This comment has been minimized.

Copy link
Contributor

djadala commented Dec 20, 2018

for reference i included syscall and cgo variants, cgo use 2% cpu, and syscall use 12%

package main

import (
	"flag"
	"fmt"
	"time"

	"syscall"
)

// #include <unistd.h>
// //#include <errno.h>
// //int usleep(useconds_t usec);
import "C"

func usleep() {
	for {
		C.usleep(1000)
	}
}

func sysns() {
	for {
		v := syscall.Timespec{
			Sec:  0,
			Nsec: 1000,
		}
		syscall.Nanosleep(&v, &v)
	}
}

func ticker() {

	ticker := time.NewTicker(time.Millisecond)
	defer ticker.Stop()
	for range ticker.C {
	}
}

func timer() {
	t := time.NewTimer(time.Millisecond)
	for range t.C {
		t.Reset(time.Millisecond)
	}
}

func sleep() {
	for {
		time.Sleep(time.Millisecond)
	}
}

func main() {
	t := flag.String("t", "timer", "use timer")
	flag.Parse()
	switch *t {
	case "timer":
		timer()
	case "ticker":
		ticker()
	case "sleep":
		sleep()
	case "cgo":
		usleep()
	case "sys":
		sysns()
	default:
		fmt.Println("use  timer, ticker, sys, cgo or sleep")
	}
}

@robaho

This comment has been minimized.

Copy link

robaho commented Dec 20, 2018

I did some testing on this, comparing Go, C, and Java. The Go uses time.Sleep(), C uses usleep(), and Java uses LockSupport.ParkNanos(). Interestingly, Java and Go use the same 'timeout on semaphore' to implement. The following shows CPU utilization (on OSX).

System 1 us 100 us 1 ms
Go 130% 23.0% 4.6%
Java 67% 6.0 % 1.5%
C 42% 0.6 % 0.0%

I believe the performance issue in Go is due to multiple threads being involved (the timerproc, and the user routine) causing a lot of overhead.

The simplest solution, although it doesn't work for channel select timeout, would be to have short duration time.Sleep() just invoke usleep(). There are probably very few threads using timers of this duration, so the overhead of allocating and blocking the thread shouldn't be that great. I would expect the resulting runtime to be superior to Java and on par with C.

In fact, it is doubtful a sleep request for less than 10 us should even sleep at all - just call runtime.Gosched() since the overhead of a context switch is probably longer than the sleep request.

@robaho

This comment has been minimized.

Copy link

robaho commented Dec 28, 2018

I did some more testing by writing a CGo function USleep that calls C's usleep directly.

System 1 us 100 us 1 ms
CGo 50% 5.3% 1.3%

So this appears to be a viable alternative for polling with a short interval and low polling routine count.

@andybons andybons modified the milestones: Go1.12, Go1.13 Feb 12, 2019

@ls0f

This comment has been minimized.

Copy link

ls0f commented Mar 8, 2019

Same issue. Hope Go team to decrease the cpu cost of ticker in Linux.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Mar 8, 2019

@networkimprov

This comment has been minimized.

Copy link

networkimprov commented Mar 8, 2019

@ianlancetaylor maybe this could be assigned to someone?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Mar 8, 2019

I can't tell anybody to do anything. I can assign it to someone but that doesn't mean anything will happen.

Rereading this I would like to double-check that there is still something to do here. @dvyukov sped up the timer code quite a bit in the 1.12 release while working on #25729. To what extent is this still a problem?

@dvyukov

This comment has been minimized.

Copy link
Member

dvyukov commented Mar 8, 2019

We still handle timers on a separate thread, so there are like 4 excessive thread context switches per timer. Timers need to be part of scheduler/netpoll, i.e. call epoll_wait with the timers timeout.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Mar 8, 2019

I wonder if it would make sense to say that if we call time.Sleep for a time <= 10 milliseconds, and the number of active goroutines is <= GOMAXPROCS, then we should just sleep rather than going through the timer thread. That might handle microbenchmarks like this, including client programs, without compromising servers. (I chose 10 milliseconds based on sysmon).

@robaho

This comment has been minimized.

Copy link

robaho commented Mar 8, 2019

@ChrisHines

This comment has been minimized.

Copy link
Contributor

ChrisHines commented Mar 8, 2019

This issue sounds similar to what I described in #28808. In that issue a lot of the unexpected CPU load was due to the work stealing code in the scheduler after goroutines go to sleep.

It would be nice if we could reduce that CPU load when there are short lived timers pending.

I wonder if it would make sense to say that if we call time.Sleep for a time <= 10 milliseconds, and the number of active goroutines is <= GOMAXPROCS, then we should just sleep rather than going through the timer thread.

What do you mean by "active goroutines <= GOMAXPROCS"? Does active mean "existing" or does it mean "in a runnable state"?

@dvyukov

This comment has been minimized.

Copy link
Member

dvyukov commented Mar 8, 2019

Am I reading this right that we want to both complicate runtime and not improve real programs and mask problems of real programs in microbenchmarks so that it will be harder to analyze performance of real programs? :)
And in the end it may actually negatively affect real programs. Consider a server does not utilize all CPUs at 100% (which they usually do), we eat several threads for sleeping, then there is a spike of load but the threads are wasted for sleeping.

@robaho

This comment has been minimized.

Copy link

robaho commented Mar 8, 2019

@ChrisHines

This comment has been minimized.

Copy link
Contributor

ChrisHines commented Mar 8, 2019

Would usleep act like a syscall and consume an OS thread (an M in scheduler parlance) for each short sleep? If so, it seems like that would not behave nicely if there are many goroutines calling short sleeps concurrently as in the program described in #28808.

@robaho

This comment has been minimized.

Copy link

robaho commented Mar 8, 2019

@dvyukov

This comment has been minimized.

Copy link
Member

dvyukov commented Mar 8, 2019

No you are reading it wrong. For short duration timers it is more efficient to call usleep rather than using the multi thread park notify currently used. It is not a hack to make micro benchmarks perform better.

But we are losing whole P in this case and not doing useful work on it.
I believe it will be faster on a synthetic benchmark. But we are wasting resources.

@ChrisHines

This comment has been minimized.

Copy link
Contributor

ChrisHines commented Mar 8, 2019

Our use case is described in general terms in the issue I linked. It is real and we're doing it at a large scale, meaning, ~3500 timers averaging a 2.4ms period on a 56 CPU box (and many of those boxes). At present we have to work around behaviors of the Go runtime in various ways to keep CPU load under control.

Losing a whole P as @dvyukov points out is a non-starter for us.

@robaho

This comment has been minimized.

Copy link

robaho commented Mar 8, 2019

@ChrisHines

This comment has been minimized.

Copy link
Contributor

ChrisHines commented Mar 8, 2019

@robaho Yes, that strategy is one we've been pursuing, but it's not as simple as it sounds. Taken to conclusion it amounts to implementing a custom scheduler on top of the Go scheduler. I don't think this issue discussion is an appropriate place to design such a thing, so let's just leave that line of thought at that.

In the interest of staying focused on the Go runtime here, I think it is fair to say that it would be nice if the Go scheduler could handle some of these use cases better. Then projects like ours wouldn't have to write a custom scheduler to work around the existing behavior.

@robaho

This comment has been minimized.

Copy link

robaho commented Mar 8, 2019

@ChrisHines

This comment has been minimized.

Copy link
Contributor

ChrisHines commented Mar 8, 2019

Fair enough, @robaho . Getting back to your usleep suggestion, if it did block the P and a program had enough goroutines doing short sleeps to block all the available Ps then the whole program would be unable to make progress on other non-sleeping goroutines during that time. I don't think we want that to happen.

@robaho

This comment has been minimized.

Copy link

robaho commented Mar 8, 2019

@robaho

This comment has been minimized.

Copy link

robaho commented Mar 8, 2019

@dvyukov

This comment has been minimized.

Copy link
Member

dvyukov commented Mar 8, 2019

I am pretty sure it doesn’t work that way.

But then why/how will it be faster then the current code? The thread switched required to pass P across threads seem to be effectively the same as what we have now.

@robaho

This comment has been minimized.

Copy link

robaho commented Mar 8, 2019

@dvyukov

This comment has been minimized.

Copy link
Member

dvyukov commented Mar 8, 2019

I fail to understand the difference. We already do a sleep.
I see how it would be different if we would take a P, but you say we need to allow runtime to continue running Go code on the P. And then it does not seem to be different from what we have now.

@robaho

This comment has been minimized.

Copy link

robaho commented Mar 8, 2019

@dvyukov

This comment has been minimized.

Copy link
Member

dvyukov commented Mar 9, 2019

Okay, that's pretty synthetic conditions where sysmon thread decides to not retake the P and cools down as the result as there is no other work to do.
What exactly predicate do you propose to use? How do we detect that "few timers" scenario and ensure that we don't do any harm (there is lots of potential)?
And this is merely a stop-gap before we do the proper solution that removes thread switches, right?

@robaho

This comment has been minimized.

Copy link

robaho commented Mar 9, 2019

@networkimprov

This comment has been minimized.

Copy link

networkimprov commented Mar 9, 2019

And there's a new runtime pkg API to support a configurable threshold, e.g. your "environment variable to override for special cases".

@robaho

This comment has been minimized.

Copy link

robaho commented Mar 9, 2019

@networkimprov

This comment has been minimized.

Copy link

networkimprov commented Mar 9, 2019

Ah yes, a new env variable would suffice. I think it's advisable to include that.

IMO, "hard numbers" result from empirical study and all we have is simple benchmarks.

The issue tracker sees a ton of traffic; discerning which of it is soundly reasoned can be a challenge :-)

@networkimprov

This comment has been minimized.

Copy link

networkimprov commented Mar 13, 2019

Sounds like we have a consensus? Can we tag this NeedsFix?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Mar 13, 2019

I'm not sure we have a consensus here.

@networkimprov

This comment has been minimized.

Copy link

networkimprov commented Mar 20, 2019

Could we try the usleep() approach as an experiment during this cycle, and see if any issues appear?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.