New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: make timers faster #6239

Open
dvyukov opened this Issue Aug 24, 2013 · 8 comments

Comments

Projects
None yet
3 participants
@dvyukov
Member

dvyukov commented Aug 24, 2013

This is a follow up to:
https://golang.org/cl/12876047/
time: lower level interface to Timer: embedding, compact interface callback with fast
callback

Timers can be heavily used in networking applications.
Current implementation at least has problems with scalability:

$ ./time.test -test.run=none -test.bench=StartStop -test.benchtime=1s
-test.cpu=1,2,4,8,16,32
PASS
BenchmarkStartStop  10000000           214 ns/op
BenchmarkStartStop-2     5000000           515 ns/op
BenchmarkStartStop-4     5000000           735 ns/op
BenchmarkStartStop-8     2000000           804 ns/op
BenchmarkStartStop-16    5000000           708 ns/op
BenchmarkStartStop-32    5000000           679 ns/op

Some spot optimizations can be applied as well. Probably more efficient data structure
can be used, but it's not clear to me how to do better than current 4-ary heap.

FTR here is BenchmarkStartStop profile with 8 procs:

+  13.75%  time.test  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave                     
                                                            ▒
+  11.25%  time.test  time.test          [.] runtime.lock                               
                                                            ◆
+  11.15%  time.test  time.test          [.] runtime.xchg                               
                                                            ▒
+   6.89%  time.test  time.test          [.] runtime.procyield                          
                                                            ▒
+   6.32%  time.test  [kernel.kallsyms]  [k] _raw_spin_lock                             
                                                            ▒
+   4.06%  time.test  time.test          [.] runtime.cas                                
                                                            ▒
+   3.49%  time.test  [kernel.kallsyms]  [k] gup_pte_range                              
                                                            ▒
+   1.87%  time.test  time.test          [.] runtime.deltimer                           
                                                            ▒
+   1.80%  time.test  [kernel.kallsyms]  [k] get_futex_key                              
                                                            ▒
+   1.71%  time.test  [kernel.kallsyms]  [k] put_page                                   
                                                            ▒
+   1.58%  time.test  [kernel.kallsyms]  [k] try_to_wake_up                             
                                                            ▒
+   1.55%  time.test  [kernel.kallsyms]  [k] __wait_on_bit_lock                         
                                                            ▒
+   1.42%  time.test  time.test          [.] flushptrbuf                                
                                                            ▒
+   1.38%  time.test  [kernel.kallsyms]  [k] get_user_pages_fast                        
                                                            ▒
+   1.38%  time.test  time.test          [.] siftup                                     
                                                            ▒
+   1.22%  time.test  [kernel.kallsyms]  [k] copy_user_generic_string                   
                                                            ▒
+   1.19%  time.test  time.test          [.] runtime.casp                               
                                                            ▒
+   1.10%  time.test  [kernel.kallsyms]  [k] unlock_page                                
                                                            ▒
+   1.04%  time.test  [kernel.kallsyms]  [k] get_futex_key_refs                         
                                                            ▒
+   1.01%  time.test  time.test          [.] addtimer                                   
                                                            ▒
+   1.00%  time.test  [kernel.kallsyms]  [k] drop_futex_key_refs                        
                                                            ▒
+   0.98%  time.test  [kernel.kallsyms]  [k] prepare_to_wait_exclusive                  
                                                            ▒
+   0.97%  time.test  [kernel.kallsyms]  [k] __wake_up_bit                              
                                                            ▒
+   0.94%  time.test  [kernel.kallsyms]  [k] __wake_up_common                           
                                                            ▒
+   0.81%  time.test  [kernel.kallsyms]  [k] audit_filter_syscall                       
                                                            ▒
+   0.75%  time.test  [kernel.kallsyms]  [k] __schedule                                 
                                                            ▒
+   0.72%  time.test  time.test          [.] runtime.mallocgc                           
                                                            ▒
+   0.72%  time.test  time.test          [.] siftdown
@rsc

This comment has been minimized.

Contributor

rsc commented Nov 27, 2013

Comment 1:

Labels changed: added go1.3maybe.

@rsc

This comment has been minimized.

Contributor

rsc commented Dec 4, 2013

Comment 2:

Labels changed: added release-none, removed go1.3maybe.

@rsc

This comment has been minimized.

Contributor

rsc commented Dec 4, 2013

Comment 3:

Labels changed: added repo-main.

@dvyukov

This comment has been minimized.

Member

dvyukov commented Apr 6, 2016

@aclements @RLH @randall77

Optimization of the timer heap can give up to 10%. On some of our prod servers we see 5-8% of time in siftdown.

But the real problem is the global mutex. We need:

  1. Merge timers into netpoll (epoll/kqueue/IOCP can wait with timeout).
  2. Get rid of the timer thread (this will also reduce timer latency, because currently we need 2 thread context switches to handle a timer).
  3. Distribute netpoll+timers per P.

Here is a prototype for 1 and 2:
https://go-review.googlesource.com/#/c/5760

Distribution is somewhat tricky. We will need hierarchical epoll descriptors, and kqueue/IOCP does not support hierarchical descriptors AFAICT. For timers we will need to somehow block on global minimum time, rather than on per-P minimum time.

@bryanpkc

This comment has been minimized.

Contributor

bryanpkc commented Nov 28, 2017

@dvyukov How does CL 5760 "get rid of the timer thread"? There is a timerproc goroutine which remains after the change. I am probably just too dumb to see it, but could you point me to where the timer thread was?

@dvyukov

This comment has been minimized.

Member

dvyukov commented Nov 28, 2017

timerproc blocks in notetsleepg, which consumes a thread, so timerproc is also always a thread.

@bryanpkc

This comment has been minimized.

Contributor

bryanpkc commented Nov 28, 2017

Thanks for the quick explanation. If I am reading it correctly, at least notetsleepg will hand off the P to another thread right away, so it doesn't just consume the threadblock the P for a whole sysmon tick.

@dvyukov

This comment has been minimized.

Member

dvyukov commented Nov 28, 2017

Yes, it hands off P, but it still consumes thread and causes 2 context switches per timer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment