Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time: excessive CPU usage when using Ticker and Sleep #27707

Open
lni opened this issue Sep 17, 2018 · 122 comments
Open

time: excessive CPU usage when using Ticker and Sleep #27707

lni opened this issue Sep 17, 2018 · 122 comments

Comments

@lni
Copy link

@lni lni commented Sep 17, 2018

What version of Go are you using (go version)?

go1.11 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="/home/lni/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/lni/golang_ws"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build440058908=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I need to call a function roughly every millisecond, there is no real time requirement, as long as it is called roughly every millisecond everything is fine. However, I realised that both time.Ticker and time.Sleep() are causing excessive CPU overhead, probably due to runtime scheduling.

The following Go program uses 20-25% %CPU as reported by top on Linux.

package main

import (
  "time"
)

func main() {
  ticker := time.NewTicker(time.Millisecond)
  defer ticker.Stop()
  for {
    select {
    case <-ticker.C:
    }
  }
}

for loop with range is causing similar overhead

package main

import (
  "time"
)

func main() {
  ticker := time.NewTicker(time.Millisecond)
  defer ticker.Stop()
  for range ticker.C {
  }
}

while the following program is showing 10-15% %CPU in top

package main

import (
  "time"
)

func main() {
  for {
    time.Sleep(time.Millisecond)
  }
}

To workaround the issue, I had to move the ticker/sleep part to C and let the C code to call the Go function that need to be invoked every millisecond. Such a cgo based ugly hack reduced %CPU in top to 2-3%. Please see the proof of concept code below.

ticker.h

#ifndef TEST_TICKER_H
#define TEST_TICKER_H

void cticker();

#endif // TEST_TICKER_H

ticker.c

#include <unistd.h>
#include "ticker.h"
#include "_cgo_export.h"

void cticker()
{
  for(int i = 0; i < 30000; i++)
  {
    usleep(1000);
    Gotask();
  }
}

ticker.go

package main

/*
#include "ticker.h"
*/
import "C"
import (
  "log"
)

var (
  counter uint64 = 0
)

//export Gotask
func Gotask() {
  counter++
}

func main() {
  C.cticker()
  log.Printf("Gotask called %d times", counter)
}

What did you expect to see?

Much lower CPU overhead when using time.Ticker or time.Sleep()

What did you see instead?

20-25% %CPU in top

@agnivade

This comment has been minimized.

Copy link
Contributor

@agnivade agnivade commented Sep 17, 2018

I am unable to reproduce this. Both programs show 5-6% CPU for me. And I have 4 CPUs. When you say 20-25%, how many CPUs do you have ?

@lni

This comment has been minimized.

Copy link
Author

@lni lni commented Sep 17, 2018

As described, all reported % figures are per process %CPU value reported in top.

I tried two servers, one with dual E5-2690v2 (10 cores X 2 with HT) and another server with a single E5-2696v4 (22 cores X 1 with HT). Other people have confirmed high CPU usage on their Mac OS machines as well, see below.

https://groups.google.com/forum/#!topic/golang-nuts/_XEuq69kUOY

@djadala

This comment has been minimized.

Copy link
Contributor

@djadala djadala commented Sep 17, 2018

Hi,
Same here,
Linux 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 (2018-08-21) x86_64 GNU/Linux,
4x AMD Phenom(tm) II X4 965 Processor,

top:

Tasks: 362 total,   1 running, 361 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.8 us,  0.4 sy,  0.0 ni, 98.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  8142260 total,   607488 free,  2363780 used,  5170992 buff/cache
KiB Swap:  4194300 total,  4144460 free,    49840 used.  5385516 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                   
 1689 djadala   20   0    2476    708    520 S  22.8  0.0   0:01.90 ticker1                                                   

go version:
go version go1.10.3 linux/amd64

ticker1.go:

package main

import (
	"time"
)

func main() {
	ticker := time.NewTicker(time.Millisecond)
	defer ticker.Stop()
	for range ticker.C {
	}
}
@agnivade agnivade changed the title Excessive CPU usage when using time.Ticker and time.Sleep time: excessive CPU usage when using Ticker and Sleep Sep 17, 2018
@agnivade agnivade added this to the Go1.12 milestone Sep 17, 2018
@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Sep 17, 2018

Dup of #25471?

@mbyio

This comment has been minimized.

Copy link

@mbyio mbyio commented Sep 17, 2018

Percentage of CPU use is not very specific. Can you describe how it is actually affecting you? Is it preventing the program from working correctly somehow? Go is not necessarily specifically optimized for minimal CPU usage.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Sep 17, 2018

Can you describe how it is actually affecting you? Is it preventing the program from working correctly somehow?

On a laptop or other mobile device, percentage of CPU is more-or-less equivalent to battery lifetime. We should not waste users' battery capacity needlessly.

(Using a lot more CPU than expected in a real program is a problem in and of itself: it doesn't need extra justification.)

@choleraehyq

This comment has been minimized.

Copy link
Contributor

@choleraehyq choleraehyq commented Nov 19, 2018

@bcmills I'd like to implement time.Ticker with timerfd to solve this issue, but timerfd_* APIs are merged into kernel 2.6.24, as our minimal kernel requirement is 2.6.23. When can we drop support of kernel 2.6.23?

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Nov 19, 2018

When can we drop support of kernel 2.6.23?

For that, you should probably file a proposal.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 19, 2018

@choleraehyq Can you sketch out how using the timerfd functions will help? It's not immediately obvious to me. Thanks.

@gmox

This comment has been minimized.

Copy link

@gmox gmox commented Dec 1, 2018

I can confirm this behavior. It looks as if NewTicker is leaving futex locks open, and that drives up CPU/memory usage. Using After does not result in increased CPU/memory usage.

@djadala

This comment has been minimized.

Copy link
Contributor

@djadala djadala commented Dec 1, 2018

Hi,

Using After does not result in increased CPU/memory usage

Please include code that you use to measure.
Here is my code, sleep & timer use 10%cpu, ticker use 20%cpu :

package main

import (
	"flag"
	"fmt"
	"time"
)

func ticker() {

	ticker := time.NewTicker(time.Millisecond)
	defer ticker.Stop()
	for range ticker.C {
	}
}

func timer() {
	t := time.NewTimer(time.Millisecond)
	for range t.C {
		t.Reset(time.Millisecond)
	}
}

func sleep() {
	for {
		time.Sleep(time.Millisecond)
	}
}

func main() {
	t := flag.String("t", "timer", "use timer")
	flag.Parse()
	switch *t {
	case "timer":
		timer()
	case "ticker":
		ticker()
	case "sleep":
		sleep()
	default:
		fmt.Println("use  timer, ticker or sleep")
	}
}

I think that 10% is also high cpu usage.

@lni

This comment has been minimized.

Copy link
Author

@lni lni commented Dec 1, 2018

I tried djadala's program above on an Amazon EC2 a1.medium node with 1 ARM Cortex-A72 core, OS is Ubuntu 18.04LTS with a 4.15.0-1028-aws kernel, Go arm64 version 1.11.2. Interestingly, the CPU usage is only around 3% for ticker/timer/sleep.

@djadala

This comment has been minimized.

Copy link
Contributor

@djadala djadala commented Dec 20, 2018

for reference i included syscall and cgo variants, cgo use 2% cpu, and syscall use 12%

package main

import (
	"flag"
	"fmt"
	"time"

	"syscall"
)

// #include <unistd.h>
// //#include <errno.h>
// //int usleep(useconds_t usec);
import "C"

func usleep() {
	for {
		C.usleep(1000)
	}
}

func sysns() {
	for {
		v := syscall.Timespec{
			Sec:  0,
			Nsec: 1000,
		}
		syscall.Nanosleep(&v, &v)
	}
}

func ticker() {

	ticker := time.NewTicker(time.Millisecond)
	defer ticker.Stop()
	for range ticker.C {
	}
}

func timer() {
	t := time.NewTimer(time.Millisecond)
	for range t.C {
		t.Reset(time.Millisecond)
	}
}

func sleep() {
	for {
		time.Sleep(time.Millisecond)
	}
}

func main() {
	t := flag.String("t", "timer", "use timer")
	flag.Parse()
	switch *t {
	case "timer":
		timer()
	case "ticker":
		ticker()
	case "sleep":
		sleep()
	case "cgo":
		usleep()
	case "sys":
		sysns()
	default:
		fmt.Println("use  timer, ticker, sys, cgo or sleep")
	}
}

@robaho

This comment has been minimized.

Copy link

@robaho robaho commented Dec 20, 2018

I did some testing on this, comparing Go, C, and Java. The Go uses time.Sleep(), C uses usleep(), and Java uses LockSupport.ParkNanos(). Interestingly, Java and Go use the same 'timeout on semaphore' to implement. The following shows CPU utilization (on OSX).

System 1 us 100 us 1 ms
Go 130% 23.0% 4.6%
Java 67% 6.0 % 1.5%
C 42% 0.6 % 0.0%

I believe the performance issue in Go is due to multiple threads being involved (the timerproc, and the user routine) causing a lot of overhead.

The simplest solution, although it doesn't work for channel select timeout, would be to have short duration time.Sleep() just invoke usleep(). There are probably very few threads using timers of this duration, so the overhead of allocating and blocking the thread shouldn't be that great. I would expect the resulting runtime to be superior to Java and on par with C.

In fact, it is doubtful a sleep request for less than 10 us should even sleep at all - just call runtime.Gosched() since the overhead of a context switch is probably longer than the sleep request.

@robaho

This comment has been minimized.

Copy link

@robaho robaho commented Dec 28, 2018

I did some more testing by writing a CGo function USleep that calls C's usleep directly.

System 1 us 100 us 1 ms
CGo 50% 5.3% 1.3%

So this appears to be a viable alternative for polling with a short interval and low polling routine count.

@andybons andybons modified the milestones: Go1.12, Go1.13 Feb 12, 2019
@ls0f

This comment was marked as off-topic.

Copy link

@ls0f ls0f commented Mar 8, 2019

Same issue. Hope Go team to decrease the cpu cost of ticker in Linux.

@ianlancetaylor

This comment was marked as off-topic.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 8, 2019

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Mar 8, 2019

@ianlancetaylor maybe this could be assigned to someone?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 8, 2019

I can't tell anybody to do anything. I can assign it to someone but that doesn't mean anything will happen.

Rereading this I would like to double-check that there is still something to do here. @dvyukov sped up the timer code quite a bit in the 1.12 release while working on #25729. To what extent is this still a problem?

@dvyukov

This comment has been minimized.

Copy link
Member

@dvyukov dvyukov commented Mar 8, 2019

We still handle timers on a separate thread, so there are like 4 excessive thread context switches per timer. Timers need to be part of scheduler/netpoll, i.e. call epoll_wait with the timers timeout.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 8, 2019

I wonder if it would make sense to say that if we call time.Sleep for a time <= 10 milliseconds, and the number of active goroutines is <= GOMAXPROCS, then we should just sleep rather than going through the timer thread. That might handle microbenchmarks like this, including client programs, without compromising servers. (I chose 10 milliseconds based on sysmon).

@robaho

This comment has been minimized.

Copy link

@robaho robaho commented Mar 8, 2019

@ChrisHines

This comment has been minimized.

Copy link
Contributor

@ChrisHines ChrisHines commented Mar 8, 2019

This issue sounds similar to what I described in #28808. In that issue a lot of the unexpected CPU load was due to the work stealing code in the scheduler after goroutines go to sleep.

It would be nice if we could reduce that CPU load when there are short lived timers pending.

I wonder if it would make sense to say that if we call time.Sleep for a time <= 10 milliseconds, and the number of active goroutines is <= GOMAXPROCS, then we should just sleep rather than going through the timer thread.

What do you mean by "active goroutines <= GOMAXPROCS"? Does active mean "existing" or does it mean "in a runnable state"?

gopherbot pushed a commit that referenced this issue Oct 22, 2019
The adjusttimers function is where we check the adjustTimers field in
the P struct to see if we need to resort the heap. We walk forward in
the heap and find and resort timers that have been modified, until we
find all the timers that were modified to run earlier. Along the way
we remove deleted timers.

Updates #27707

Change-Id: I1cba7fe77b8112b7e9a9dba80b5dfb08fcc7c568
Reviewed-on: https://go-review.googlesource.com/c/go/+/171877
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Oct 22, 2019
Updates #27707

Change-Id: I1e65effb708911c727d126c51e0f50fe219f42ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/171878
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Oct 22, 2019
Updates #27707

Change-Id: I51da8a04ec12ba1efa435e86e3a15d4d13c96c45
Reviewed-on: https://go-review.googlesource.com/c/go/+/171879
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Oct 23, 2019
Since timers are now on a P, rather than having a G running timerproc,
timejump changes to return a P rather than a G.

Updates #27707

Change-Id: I3d05af2d664409a0fd906e709fdecbbcbe00b9a7
Reviewed-on: https://go-review.googlesource.com/c/go/+/171880
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Oct 23, 2019
Updates #27707

Change-Id: Id4b37594511895f404ee3c09a85263b2b35f835d
Reviewed-on: https://go-review.googlesource.com/c/go/+/171881
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
gopherbot pushed a commit that referenced this issue Oct 23, 2019
Since the new timers run on g0, which does not have a race context,
we add a race context field to the P, and use that for timer functions.
This works since all timer functions are in the standard library.

Updates #27707

Change-Id: I8a5b727b4ddc8ca6fc60eb6d6f5e9819245e395b
Reviewed-on: https://go-review.googlesource.com/c/go/+/171882
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Oct 29, 2019

Change https://golang.org/cl/203890 mentions this issue: runtime: clear js idle timeout before new one and after event handler

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Oct 29, 2019

Change https://golang.org/cl/204058 mentions this issue: runtime: initialize netpoll in TestNetpollBreak

gopherbot pushed a commit that referenced this issue Oct 29, 2019
Netpoll must be always be initialized when TestNetpollBreak is launched.
However, when it is run in standalone, it won't be the case, so it must
be forced.

Updates: #27707

Change-Id: I28147f3834f3d6aca982c6a298feadc09b55f66e
Reviewed-on: https://go-review.googlesource.com/c/go/+/204058
Run-TryBot: Clément Chigot <clement.chigot%atos.net@gtempaccount.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Oct 30, 2019

Change https://golang.org/cl/204045 mentions this issue: runtime: record stub netpoll initialization, add lock around note

gopherbot pushed a commit that referenced this issue Oct 30, 2019
This fixes the Plan 9 support for the new timer code.

Updates #6239
Updates #27707

Change-Id: Ia498c399b8924910b97fcde07545fae3588aad47
Reviewed-on: https://go-review.googlesource.com/c/go/+/204045
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Oct 30, 2019
Updates #6239
Updates #27707

Change-Id: I0a62c1374db485dd830bf02e59625997d9247fc3
Reviewed-on: https://go-review.googlesource.com/c/go/+/203890
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
gopherbot pushed a commit that referenced this issue Oct 30, 2019
No big changes in the runtime package benchmarks.

Changes in the time package benchmarks:

name                      old time/op  new time/op  delta
AfterFunc-12              1.57ms ± 1%  0.07ms ± 1%  -95.42%  (p=0.000 n=10+8)
After-12                  1.63ms ± 3%  0.11ms ± 1%  -93.54%  (p=0.000 n=9+10)
Stop-12                   78.3µs ± 3%  73.6µs ± 3%   -6.01%  (p=0.000 n=9+10)
SimultaneousAfterFunc-12   138µs ± 1%   111µs ± 1%  -19.57%  (p=0.000 n=10+9)
StartStop-12              28.7µs ± 1%  31.5µs ± 5%   +9.64%  (p=0.000 n=10+7)
Reset-12                  6.78µs ± 1%  4.24µs ± 7%  -37.45%  (p=0.000 n=9+10)
Sleep-12                   183µs ± 1%   125µs ± 1%  -31.67%  (p=0.000 n=10+9)
Ticker-12                 5.40ms ± 2%  0.03ms ± 1%  -99.43%  (p=0.000 n=10+10)
Sub-12                     114ns ± 1%   113ns ± 3%     ~     (p=0.069 n=9+10)
Now-12                    37.2ns ± 1%  36.8ns ± 3%     ~     (p=0.287 n=8+8)
NowUnixNano-12            38.1ns ± 2%  37.4ns ± 3%   -1.87%  (p=0.020 n=10+9)
Format-12                  252ns ± 2%   195ns ± 3%  -22.61%  (p=0.000 n=9+10)
FormatNow-12               234ns ± 1%   177ns ± 2%  -24.34%  (p=0.000 n=10+10)
MarshalJSON-12             320ns ± 2%   250ns ± 0%  -21.94%  (p=0.000 n=8+8)
MarshalText-12             320ns ± 2%   245ns ± 2%  -23.30%  (p=0.000 n=9+10)
Parse-12                   206ns ± 2%   208ns ± 4%     ~     (p=0.084 n=10+10)
ParseDuration-12          89.1ns ± 1%  86.6ns ± 3%   -2.78%  (p=0.000 n=10+10)
Hour-12                   4.43ns ± 2%  4.46ns ± 1%     ~     (p=0.324 n=10+8)
Second-12                 4.47ns ± 1%  4.40ns ± 3%     ~     (p=0.145 n=9+10)
Year-12                   14.6ns ± 1%  14.7ns ± 2%     ~     (p=0.112 n=9+9)
Day-12                    20.1ns ± 3%  20.2ns ± 1%     ~     (p=0.404 n=10+9)

Updates #6239
Updates #27707

Change-Id: I51e25a90f941574f1a9cf83a22e84ac8c678537d
Reviewed-on: https://go-review.googlesource.com/c/go/+/171883
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Oct 30, 2019

Change https://golang.org/cl/204280 mentions this issue: runtime: use correct state machine in addAdjustedTimers

@changkun changkun mentioned this issue Oct 31, 2019
0 of 13 tasks complete
gopherbot pushed a commit that referenced this issue Nov 1, 2019
The addAdjustedTimers function was a late addition, and it got some of
the state machine wrong, leading to failures like
https://storage.googleapis.com/go-build-log/930576b6/windows-amd64-2016_53d0319e.log

Updates #6239
Updates #27707

Change-Id: I9e94e563b4698ff3035ce609055ca292b9cab3df
Reviewed-on: https://go-review.googlesource.com/c/go/+/204280
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Nov 1, 2019

Change https://golang.org/cl/204717 mentions this issue: runtime: use atomic.Cas to change timerRemoved to timerWaiting

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Nov 1, 2019

Change https://golang.org/cl/204800 mentions this issue: runtime: wake netpoller when dropping P, don't sleep too long in sysmon

gopherbot pushed a commit that referenced this issue Nov 1, 2019
If multiple goroutines call time.(*Timer).Reset then the timer will go
from timerWaiting to timerDeleted to timerModifying to timerModifiedLater.
The timer can be on a different P, meaning that simultaneously cleantimers
could change it from timerDeleted to timerRemoving to timerRemoved.
If Reset sees timerRemoved, it was doing an atomic.Store of timerWaiting,
meaning that it did not necessarily see the other values set in the timer,
so the timer could appear to be in an inconsistent state. Use atomic.Cas
to avoid that possibility.

Updates #6239
Updates #27707
Fixes #35272

Change-Id: I1d59a13dc4f2ff4af110fc6e032c8c9d59cfc270
Reviewed-on: https://go-review.googlesource.com/c/go/+/204717
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Nov 4, 2019
When dropping a P, if it has any timers, and if some thread is
sleeping in the netpoller, wake the netpoller to run the P's timers.
This mitigates races between the netpoller deciding how long to sleep
and a new timer being added.

In sysmon, if all P's are idle, check the timers to decide how long to sleep.
This avoids oversleeping if no thread is using the netpoller.
This can happen in particular if some threads use runtime.LockOSThread,
as those threads do not block in the netpoller.

Also, print the number of timers per P for GODEBUG=scheddetail=1.

Before this CL, TestLockedDeadlock2 would fail about 1% of the time.
With this CL, I ran it 150,000 times with no failures.

Updates #6239
Updates #27707
Fixes #35274
Fixes #35288

Change-Id: I7e5193e6c885e567f0b1ee023664aa3e2902fcd1
Reviewed-on: https://go-review.googlesource.com/c/go/+/204800
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Nov 13, 2019

Change https://golang.org/cl/206938 mentions this issue: runtime: acquire timersLocks around moveTimers

gopherbot pushed a commit that referenced this issue Nov 13, 2019
In the discussion of CL 171828 we decided that it was not necessary to
acquire timersLock around the call to moveTimers, because the world is
stopped. However, that is not correct, as sysmon runs even when the world
is stopped, and it calls timeSleepUntil which looks through the timers.
timeSleepUntil acquires timersLock, but that doesn't help if moveTimers
is running at the same time.

Updates #6239
Updates #27707
Updates #35462

Change-Id: I346c5bde594c4aff9955ae430b37c2b6fc71567f
Reviewed-on: https://go-review.googlesource.com/c/go/+/206938
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Nov 15, 2019

Change https://golang.org/cl/207348 mentions this issue: runtime: release timersLock while running timer

gopherbot pushed a commit that referenced this issue Nov 19, 2019
Dan Scales pointed out a theoretical deadlock in the runtime.

The timer code runs timer functions while holding the timers lock for a P.
The scavenger queues up a timer function that calls wakeScavenger,
which acquires the scavenger lock.

The scavengeSleep function acquires the scavenger lock,
then calls resetTimer which can call addInitializedTimer
which acquires the timers lock for the current P.

So there is a potential deadlock, in that the scavenger lock and
the timers lock for some P may both be acquired in different order.
It's not clear to me whether this deadlock can ever actually occur.

Issue 35532 describes another possible deadlock.

The pollSetDeadline function acquires pd.lock for some poll descriptor,
and in some cases calls resettimer which can in some cases acquire
the timers lock for the current P.

The timer code runs timer functions while holding the timers lock for a P.
The timer function for poll descriptors winds up in netpolldeadlineimpl
which acquires pd.lock.

So again there is a potential deadlock, in that the pd lock for some
poll descriptor and the timers lock for some P may both be acquired in
different order. I think this can happen if we change the deadline
for a network connection exactly as the former deadline expires.

Looking at the code, I don't see any reason why we have to hold
the timers lock while running a timer function.
This CL implements that change.

Updates #6239
Updates #27707
Fixes #35532

Change-Id: I17792f5a0120e01ea07cf1b2de8434d5c10704dd
Reviewed-on: https://go-review.googlesource.com/c/go/+/207348
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Nov 19, 2019
Updates #6239
Updates #27707

Change-Id: I65e6471829c9de4677d3ac78ef6cd7aa0a1fc4cb
Reviewed-on: https://go-review.googlesource.com/c/go/+/171884
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 19, 2019

The new timer code is committed for 1.14. This code reduces lock contention and context switches for timers. This seems to have a clearly beneficial effect for real programs.

With regard to this issue, these changes reduce the CPU requirements for the simple ticker program. They don't seem to affect the simple sleep program very much. Leaving this issue open for further investigation.

@robaho

This comment has been minimized.

Copy link

@robaho robaho commented Nov 19, 2019

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Dec 2, 2019

Change https://golang.org/cl/209580 mentions this issue: runtime: use current P's race context in timer code

gopherbot pushed a commit that referenced this issue Dec 2, 2019
We were using the race context of the P that held the timer,
but since we unlock the P's timers while executing a timer
that could lead to a race on the race context itself.

Updates #6239
Updates #27707
Fixes #35906

Change-Id: I5f9d5f52d8e28dffb88c3327301071b16ed1a913
Reviewed-on: https://go-review.googlesource.com/c/go/+/209580
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.