Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: make timers faster #6239

Closed
dvyukov opened this issue Aug 24, 2013 · 26 comments
Closed

runtime: make timers faster #6239

dvyukov opened this issue Aug 24, 2013 · 26 comments
Labels
Milestone

Comments

@dvyukov
Copy link
Member

@dvyukov dvyukov commented Aug 24, 2013

This is a follow up to:
https://golang.org/cl/12876047/
time: lower level interface to Timer: embedding, compact interface callback with fast
callback

Timers can be heavily used in networking applications.
Current implementation at least has problems with scalability:

$ ./time.test -test.run=none -test.bench=StartStop -test.benchtime=1s
-test.cpu=1,2,4,8,16,32
PASS
BenchmarkStartStop  10000000           214 ns/op
BenchmarkStartStop-2     5000000           515 ns/op
BenchmarkStartStop-4     5000000           735 ns/op
BenchmarkStartStop-8     2000000           804 ns/op
BenchmarkStartStop-16    5000000           708 ns/op
BenchmarkStartStop-32    5000000           679 ns/op

Some spot optimizations can be applied as well. Probably more efficient data structure
can be used, but it's not clear to me how to do better than current 4-ary heap.

FTR here is BenchmarkStartStop profile with 8 procs:

+  13.75%  time.test  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave                     
                                                            ▒
+  11.25%  time.test  time.test          [.] runtime.lock                               
                                                            ◆
+  11.15%  time.test  time.test          [.] runtime.xchg                               
                                                            ▒
+   6.89%  time.test  time.test          [.] runtime.procyield                          
                                                            ▒
+   6.32%  time.test  [kernel.kallsyms]  [k] _raw_spin_lock                             
                                                            ▒
+   4.06%  time.test  time.test          [.] runtime.cas                                
                                                            ▒
+   3.49%  time.test  [kernel.kallsyms]  [k] gup_pte_range                              
                                                            ▒
+   1.87%  time.test  time.test          [.] runtime.deltimer                           
                                                            ▒
+   1.80%  time.test  [kernel.kallsyms]  [k] get_futex_key                              
                                                            ▒
+   1.71%  time.test  [kernel.kallsyms]  [k] put_page                                   
                                                            ▒
+   1.58%  time.test  [kernel.kallsyms]  [k] try_to_wake_up                             
                                                            ▒
+   1.55%  time.test  [kernel.kallsyms]  [k] __wait_on_bit_lock                         
                                                            ▒
+   1.42%  time.test  time.test          [.] flushptrbuf                                
                                                            ▒
+   1.38%  time.test  [kernel.kallsyms]  [k] get_user_pages_fast                        
                                                            ▒
+   1.38%  time.test  time.test          [.] siftup                                     
                                                            ▒
+   1.22%  time.test  [kernel.kallsyms]  [k] copy_user_generic_string                   
                                                            ▒
+   1.19%  time.test  time.test          [.] runtime.casp                               
                                                            ▒
+   1.10%  time.test  [kernel.kallsyms]  [k] unlock_page                                
                                                            ▒
+   1.04%  time.test  [kernel.kallsyms]  [k] get_futex_key_refs                         
                                                            ▒
+   1.01%  time.test  time.test          [.] addtimer                                   
                                                            ▒
+   1.00%  time.test  [kernel.kallsyms]  [k] drop_futex_key_refs                        
                                                            ▒
+   0.98%  time.test  [kernel.kallsyms]  [k] prepare_to_wait_exclusive                  
                                                            ▒
+   0.97%  time.test  [kernel.kallsyms]  [k] __wake_up_bit                              
                                                            ▒
+   0.94%  time.test  [kernel.kallsyms]  [k] __wake_up_common                           
                                                            ▒
+   0.81%  time.test  [kernel.kallsyms]  [k] audit_filter_syscall                       
                                                            ▒
+   0.75%  time.test  [kernel.kallsyms]  [k] __schedule                                 
                                                            ▒
+   0.72%  time.test  time.test          [.] runtime.mallocgc                           
                                                            ▒
+   0.72%  time.test  time.test          [.] siftdown
@rsc
Copy link
Contributor

@rsc rsc commented Nov 27, 2013

Comment 1:

Labels changed: added go1.3maybe.

@rsc
Copy link
Contributor

@rsc rsc commented Dec 4, 2013

Comment 2:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor

@rsc rsc commented Dec 4, 2013

Comment 3:

Labels changed: added repo-main.

@dvyukov
Copy link
Member Author

@dvyukov dvyukov commented Apr 6, 2016

@aclements @RLH @randall77

Optimization of the timer heap can give up to 10%. On some of our prod servers we see 5-8% of time in siftdown.

But the real problem is the global mutex. We need:

  1. Merge timers into netpoll (epoll/kqueue/IOCP can wait with timeout).
  2. Get rid of the timer thread (this will also reduce timer latency, because currently we need 2 thread context switches to handle a timer).
  3. Distribute netpoll+timers per P.

Here is a prototype for 1 and 2:
https://go-review.googlesource.com/#/c/5760

Distribution is somewhat tricky. We will need hierarchical epoll descriptors, and kqueue/IOCP does not support hierarchical descriptors AFAICT. For timers we will need to somehow block on global minimum time, rather than on per-P minimum time.

@bryanpkc
Copy link
Contributor

@bryanpkc bryanpkc commented Nov 28, 2017

@dvyukov How does CL 5760 "get rid of the timer thread"? There is a timerproc goroutine which remains after the change. I am probably just too dumb to see it, but could you point me to where the timer thread was?

@dvyukov
Copy link
Member Author

@dvyukov dvyukov commented Nov 28, 2017

timerproc blocks in notetsleepg, which consumes a thread, so timerproc is also always a thread.

@bryanpkc
Copy link
Contributor

@bryanpkc bryanpkc commented Nov 28, 2017

Thanks for the quick explanation. If I am reading it correctly, at least notetsleepg will hand off the P to another thread right away, so it doesn't just consume the threadblock the P for a whole sysmon tick.

@dvyukov
Copy link
Member Author

@dvyukov dvyukov commented Nov 28, 2017

Yes, it hands off P, but it still consumes thread and causes 2 context switches per timer.

@gopherbot
Copy link

@gopherbot gopherbot commented May 31, 2019

Change https://golang.org/cl/171883 mentions this issue: runtime: switch to using new timer code

@gopherbot
Copy link

@gopherbot gopherbot commented May 31, 2019

Change https://golang.org/cl/171829 mentions this issue: runtime, syscall, time: prepare for adding timers to P's

@gopherbot
Copy link

@gopherbot gopherbot commented May 31, 2019

Change https://golang.org/cl/171827 mentions this issue: runtime: add wasm support for timers on P's

@gopherbot
Copy link

@gopherbot gopherbot commented May 31, 2019

Change https://golang.org/cl/171826 mentions this issue: runtime: initial scheduler changes for timers on P's

@gopherbot
Copy link

@gopherbot gopherbot commented May 31, 2019

Change https://golang.org/cl/171821 mentions this issue: runtime: change netpoll to take an amount of time to block

@gopherbot
Copy link

@gopherbot gopherbot commented May 31, 2019

Change https://golang.org/cl/171828 mentions this issue: runtime: handle timers on P's in procresize/(*pp).destroy

@gopherbot
Copy link

@gopherbot gopherbot commented May 31, 2019

Change https://golang.org/cl/171884 mentions this issue: runtime, syscall, time: remove old timer code

gopherbot pushed a commit that referenced this issue Oct 15, 2019
This new facility will be used by future CLs in this series.

Change the only blocking call to netpoll to do the right thing when
netpoll returns an empty list.

Updates #6239
Updates #27707

Change-Id: I58b3c2903eda61a3698b1a4729ed0e81382bb1ed
Reviewed-on: https://go-review.googlesource.com/c/go/+/171821
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
gopherbot pushed a commit that referenced this issue Oct 21, 2019
Add support to the main scheduler loop for handling timers on P's.
This is not used yet, as timers are not yet put on P's.

Updates #6239
Updates #27707

Change-Id: I6a359df408629f333a9232142ce19e8be8496dae
Reviewed-on: https://go-review.googlesource.com/c/go/+/171826
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Oct 21, 2019
When we put timers on P's, the wasm code will not be able to rely on
the timer goroutine. Use the beforeIdle hook to schedule a wakeup.

Updates #6239
Updates #27707

Change-Id: Idf6309944778b8c3d7178f5d09431940843ea233
Reviewed-on: https://go-review.googlesource.com/c/go/+/171827
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
gopherbot pushed a commit that referenced this issue Oct 21, 2019
Updates #6239
Updates #27707

Change-Id: I52cab8bf3dc8c552463725fc1d9e4e6b12230b03
Reviewed-on: https://go-review.googlesource.com/c/go/+/171828
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Oct 21, 2019
Add new fields to runtime.timer, and adjust the various timer
functions in preparation for adding timers to P's. This continues to
use the old timer code.

Updates #6239
Updates #27707

Change-Id: I9adb3814f657e083ec5e22736c4b5b52b77b6a3f
Reviewed-on: https://go-review.googlesource.com/c/go/+/171829
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Oct 29, 2019

Change https://golang.org/cl/203890 mentions this issue: runtime: clear js idle timeout before new one and after event handler

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 30, 2019

Change https://golang.org/cl/204045 mentions this issue: runtime: record stub netpoll initialization, add lock around note

gopherbot pushed a commit that referenced this issue Oct 30, 2019
This fixes the Plan 9 support for the new timer code.

Updates #6239
Updates #27707

Change-Id: Ia498c399b8924910b97fcde07545fae3588aad47
Reviewed-on: https://go-review.googlesource.com/c/go/+/204045
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Oct 30, 2019
Updates #6239
Updates #27707

Change-Id: I0a62c1374db485dd830bf02e59625997d9247fc3
Reviewed-on: https://go-review.googlesource.com/c/go/+/203890
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
gopherbot pushed a commit that referenced this issue Oct 30, 2019
No big changes in the runtime package benchmarks.

Changes in the time package benchmarks:

name                      old time/op  new time/op  delta
AfterFunc-12              1.57ms ± 1%  0.07ms ± 1%  -95.42%  (p=0.000 n=10+8)
After-12                  1.63ms ± 3%  0.11ms ± 1%  -93.54%  (p=0.000 n=9+10)
Stop-12                   78.3µs ± 3%  73.6µs ± 3%   -6.01%  (p=0.000 n=9+10)
SimultaneousAfterFunc-12   138µs ± 1%   111µs ± 1%  -19.57%  (p=0.000 n=10+9)
StartStop-12              28.7µs ± 1%  31.5µs ± 5%   +9.64%  (p=0.000 n=10+7)
Reset-12                  6.78µs ± 1%  4.24µs ± 7%  -37.45%  (p=0.000 n=9+10)
Sleep-12                   183µs ± 1%   125µs ± 1%  -31.67%  (p=0.000 n=10+9)
Ticker-12                 5.40ms ± 2%  0.03ms ± 1%  -99.43%  (p=0.000 n=10+10)
Sub-12                     114ns ± 1%   113ns ± 3%     ~     (p=0.069 n=9+10)
Now-12                    37.2ns ± 1%  36.8ns ± 3%     ~     (p=0.287 n=8+8)
NowUnixNano-12            38.1ns ± 2%  37.4ns ± 3%   -1.87%  (p=0.020 n=10+9)
Format-12                  252ns ± 2%   195ns ± 3%  -22.61%  (p=0.000 n=9+10)
FormatNow-12               234ns ± 1%   177ns ± 2%  -24.34%  (p=0.000 n=10+10)
MarshalJSON-12             320ns ± 2%   250ns ± 0%  -21.94%  (p=0.000 n=8+8)
MarshalText-12             320ns ± 2%   245ns ± 2%  -23.30%  (p=0.000 n=9+10)
Parse-12                   206ns ± 2%   208ns ± 4%     ~     (p=0.084 n=10+10)
ParseDuration-12          89.1ns ± 1%  86.6ns ± 3%   -2.78%  (p=0.000 n=10+10)
Hour-12                   4.43ns ± 2%  4.46ns ± 1%     ~     (p=0.324 n=10+8)
Second-12                 4.47ns ± 1%  4.40ns ± 3%     ~     (p=0.145 n=9+10)
Year-12                   14.6ns ± 1%  14.7ns ± 2%     ~     (p=0.112 n=9+9)
Day-12                    20.1ns ± 3%  20.2ns ± 1%     ~     (p=0.404 n=10+9)

Updates #6239
Updates #27707

Change-Id: I51e25a90f941574f1a9cf83a22e84ac8c678537d
Reviewed-on: https://go-review.googlesource.com/c/go/+/171883
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Oct 30, 2019

Change https://golang.org/cl/204280 mentions this issue: runtime: use correct state machine in addAdjustedTimers

gopherbot pushed a commit that referenced this issue Nov 1, 2019
The addAdjustedTimers function was a late addition, and it got some of
the state machine wrong, leading to failures like
https://storage.googleapis.com/go-build-log/930576b6/windows-amd64-2016_53d0319e.log

Updates #6239
Updates #27707

Change-Id: I9e94e563b4698ff3035ce609055ca292b9cab3df
Reviewed-on: https://go-review.googlesource.com/c/go/+/204280
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Nov 1, 2019

Change https://golang.org/cl/204717 mentions this issue: runtime: use atomic.Cas to change timerRemoved to timerWaiting

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 1, 2019

Change https://golang.org/cl/204800 mentions this issue: runtime: wake netpoller when dropping P, don't sleep too long in sysmon

gopherbot pushed a commit that referenced this issue Nov 1, 2019
If multiple goroutines call time.(*Timer).Reset then the timer will go
from timerWaiting to timerDeleted to timerModifying to timerModifiedLater.
The timer can be on a different P, meaning that simultaneously cleantimers
could change it from timerDeleted to timerRemoving to timerRemoved.
If Reset sees timerRemoved, it was doing an atomic.Store of timerWaiting,
meaning that it did not necessarily see the other values set in the timer,
so the timer could appear to be in an inconsistent state. Use atomic.Cas
to avoid that possibility.

Updates #6239
Updates #27707
Fixes #35272

Change-Id: I1d59a13dc4f2ff4af110fc6e032c8c9d59cfc270
Reviewed-on: https://go-review.googlesource.com/c/go/+/204717
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Nov 4, 2019
When dropping a P, if it has any timers, and if some thread is
sleeping in the netpoller, wake the netpoller to run the P's timers.
This mitigates races between the netpoller deciding how long to sleep
and a new timer being added.

In sysmon, if all P's are idle, check the timers to decide how long to sleep.
This avoids oversleeping if no thread is using the netpoller.
This can happen in particular if some threads use runtime.LockOSThread,
as those threads do not block in the netpoller.

Also, print the number of timers per P for GODEBUG=scheddetail=1.

Before this CL, TestLockedDeadlock2 would fail about 1% of the time.
With this CL, I ran it 150,000 times with no failures.

Updates #6239
Updates #27707
Fixes #35274
Fixes #35288

Change-Id: I7e5193e6c885e567f0b1ee023664aa3e2902fcd1
Reviewed-on: https://go-review.googlesource.com/c/go/+/204800
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Nov 13, 2019

Change https://golang.org/cl/206938 mentions this issue: runtime: acquire timersLocks around moveTimers

gopherbot pushed a commit that referenced this issue Nov 13, 2019
In the discussion of CL 171828 we decided that it was not necessary to
acquire timersLock around the call to moveTimers, because the world is
stopped. However, that is not correct, as sysmon runs even when the world
is stopped, and it calls timeSleepUntil which looks through the timers.
timeSleepUntil acquires timersLock, but that doesn't help if moveTimers
is running at the same time.

Updates #6239
Updates #27707
Updates #35462

Change-Id: I346c5bde594c4aff9955ae430b37c2b6fc71567f
Reviewed-on: https://go-review.googlesource.com/c/go/+/206938
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Nov 15, 2019

Change https://golang.org/cl/207348 mentions this issue: runtime: release timersLock while running timer

gopherbot pushed a commit that referenced this issue Nov 19, 2019
Dan Scales pointed out a theoretical deadlock in the runtime.

The timer code runs timer functions while holding the timers lock for a P.
The scavenger queues up a timer function that calls wakeScavenger,
which acquires the scavenger lock.

The scavengeSleep function acquires the scavenger lock,
then calls resetTimer which can call addInitializedTimer
which acquires the timers lock for the current P.

So there is a potential deadlock, in that the scavenger lock and
the timers lock for some P may both be acquired in different order.
It's not clear to me whether this deadlock can ever actually occur.

Issue 35532 describes another possible deadlock.

The pollSetDeadline function acquires pd.lock for some poll descriptor,
and in some cases calls resettimer which can in some cases acquire
the timers lock for the current P.

The timer code runs timer functions while holding the timers lock for a P.
The timer function for poll descriptors winds up in netpolldeadlineimpl
which acquires pd.lock.

So again there is a potential deadlock, in that the pd lock for some
poll descriptor and the timers lock for some P may both be acquired in
different order. I think this can happen if we change the deadline
for a network connection exactly as the former deadline expires.

Looking at the code, I don't see any reason why we have to hold
the timers lock while running a timer function.
This CL implements that change.

Updates #6239
Updates #27707
Fixes #35532

Change-Id: I17792f5a0120e01ea07cf1b2de8434d5c10704dd
Reviewed-on: https://go-review.googlesource.com/c/go/+/207348
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Nov 19, 2019
Updates #6239
Updates #27707

Change-Id: I65e6471829c9de4677d3ac78ef6cd7aa0a1fc4cb
Reviewed-on: https://go-review.googlesource.com/c/go/+/171884
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 19, 2019

The ideas suggested here have been implemented for 1.14. Any further timer speedups can be done in new issues.

@gopherbot
Copy link

@gopherbot gopherbot commented Dec 2, 2019

Change https://golang.org/cl/209580 mentions this issue: runtime: use current P's race context in timer code

gopherbot pushed a commit that referenced this issue Dec 2, 2019
We were using the race context of the P that held the timer,
but since we unlock the P's timers while executing a timer
that could lead to a race on the race context itself.

Updates #6239
Updates #27707
Fixes #35906

Change-Id: I5f9d5f52d8e28dffb88c3327301071b16ed1a913
Reviewed-on: https://go-review.googlesource.com/c/go/+/209580
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@minusdavid
Copy link

@minusdavid minusdavid commented Aug 23, 2020

After reviewing proc.go (and #18507 and c05b06a), am I right in assuming that the naming of "netpoll" is due to its origin for only polling for network I/O, but now the event loop uses the same "netpoll" mechanism to poll for file I/O and timers too?

@davecheney
Copy link
Contributor

@davecheney davecheney commented Aug 23, 2020

Yes. Interruptible sleep is often modelled using select, kqueue, or epoll, all of which started life as ways to poll a set of non blocking file descriptors, of which a network fd was the most common as cross platform async file io is a unicorn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.