Skip to content
This repository has been archived by the owner on May 4, 2018. It is now read-only.

windows: use a heap to hold timers on windows, #1460 #1476

Closed
wants to merge 1 commit into from
Closed

windows: use a heap to hold timers on windows, #1460 #1476

wants to merge 1 commit into from

Conversation

cynron
Copy link

@cynron cynron commented Sep 15, 2014

a new pr on v1.x, #1475


if (a->start_id < b->start_id)
return 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: use 2 spaces for indenting

@saghul
Copy link
Contributor

saghul commented Sep 15, 2014

Left some comments, but it looks solid, thanks!

@cynron
Copy link
Author

cynron commented Sep 15, 2014

i have updated it according to your comments


timer = container_of(heap_node, uv_timer_t, heap_node);
if (timer->due > loop->time)
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: 2 spaces.

@saghul
Copy link
Contributor

saghul commented Sep 15, 2014

Looks good! I'll run tests a bit later in the day.

@cynron
Copy link
Author

cynron commented Sep 15, 2014

Thanks for your patient help.

@saghul
Copy link
Contributor

saghul commented Sep 15, 2014

I'm going to hold this one, for some reason it seems to introduce a quite acute performance penalty, as measured by the million timer benchmark:

pr/1476
=======

C:\Users\saghul\src\libuv\Debug>run-benchmarks.exe million_timers
27.95 seconds total
3.09 seconds init
23.84 seconds dispatch
1.02 seconds cleanup

C:\Users\saghul\src\libuv\Debug>run-benchmarks.exe million_timers
28.08 seconds total
3.09 seconds init
23.97 seconds dispatch
1.03 seconds cleanup

C:\Users\saghul\src\libuv\Debug>run-benchmarks.exe million_timers
28.59 seconds total
3.13 seconds init
24.43 seconds dispatch
1.03 seconds cleanup

------------------------------------------------------------------------------

v1.x
====

C:\Users\saghul\src\libuv\Debug>run-benchmarks.exe million_timers
15.15 seconds total
10.92 seconds init
3.20 seconds dispatch
1.02 seconds cleanup

C:\Users\saghul\src\libuv\Debug>run-benchmarks.exe million_timers
15.39 seconds total
10.92 seconds init
3.45 seconds dispatch
1.02 seconds cleanup

C:\Users\saghul\src\libuv\Debug>run-benchmarks.exe million_timers
15.21 seconds total
10.97 seconds init
3.22 seconds dispatch
1.02 seconds cleanup

@cynron
Copy link
Author

cynron commented Sep 15, 2014

Yeah, I have found it, I will review the code carefully

@saghul
Copy link
Contributor

saghul commented Sep 16, 2014

Thank you!
On Sep 16, 2014 1:49 AM, "cynron" notifications@github.com wrote:

Yeah, I have found it, I will review the code carefully


Reply to this email directly or view it on GitHub
#1476 (comment).

@txdv
Copy link
Contributor

txdv commented Sep 16, 2014

Maybe we should try RB trees on the linux branch?

@saghul
Copy link
Contributor

saghul commented Sep 16, 2014

The heap was faster there, that's why we switched to it in the first place.

@saghul
Copy link
Contributor

saghul commented Sep 16, 2014

Also, conceptually, a heap is the appropriate data structure to use here. I didn't dive into the details as of why this patch makes the code slower though.

@bnoordhuis
Copy link
Contributor

A heap is faster when the prevalent operation is looking up the first element (i.e. the min-heap operation) but it may be slower when there is a lot of mutation taking place. A heap has to walk up all the way to the root when a node is removed whereas a red-black tree usually only has to move up two or three nodes.

The million-timers benchmark is kind of pathological in that it creates a large number of timers, then immediately expires and removes them again. I added it to make sure that worst case behavior with a heap is not significantly worse than with a red-black tree but it's not really representative of average real world usage. It does surprise me that performance is so bad on Windows, though.

@saghul
Copy link
Contributor

saghul commented Sep 26, 2014

Oh, you're totally right, Ben! I'll reevaluate.

@cynron
Copy link
Author

cynron commented Sep 28, 2014

Ben is exactly right! I have done a performance analysis, and the report below:

pr/1476

19.88 seconds total
2.32 seconds init
16.97 seconds dispatch
0.59 seconds cleanup

Call Hierarchy:
"(99.75) RtlInitializeExceptionChain ?"
,"(99.75) RtlInitializeExceptionChain ?"
,,"(99.75) BaseThreadInitThunk ?"
,,,"(99.75) mainCRTStartup"
,,,,"(99.75) __tmainCRTStartup"
,,,,,"(99.73) main"
,,,,,,"(98.16) run_test_part"
,,,,,,,"(98.16) run_benchmark_million_timers"
,,,,,,,,"(83.31) uv_run"
,,,,,,,,,"(82.08) uv_process_timers"
,,,,,,,,,,"(81.06) uv_timer_stop"
,,,,,,,,,,,"(79.89) heap_remove"
,,,,,,,,,,,,"(40.76) timer_less_than"
,,,,,,,,,,,,"(20.95) heap_node_swap"
,,,,,,,,,,,,,"(3.59) _RTC_CheckStackVars"
,,,,,,,,,,,,"(0.13) _RTC_CheckStackVars"
,,,,,,,,,,,"(0.86) _RTC_CheckEsp"
,,,,,,,,,,"(0.27) uv_timer_again"
,,,,,,,,,,"(0.05) timer_cb"
,,,,,,,,,,"(0.03) heap_min"
,,,,,,,,,"(1.21) uv_process_endgames"
,,,,,,,,,,"(0.20) uv_timer_endgame"
,,,,,,,,,,"(0.05) _RTC_CheckEsp"
,,,,,,,,,"(0.01) uv_poll_ex"
,,,,,,,,,,"(0.01) ZwRemoveIoCompletionEx ?"
,,,,,,,,"(10.56) uv_timer_start"
,,,,,,,,,"(9.51) heap_insert"
,,,,,,,,,,"(0.68) timer_less_than"
,,,,,,,,,"(0.28) get_clamped_due_time"
,,,,,,,,"(1.39) malloc"
,,,,,,,,,"(1.39) _nh_malloc_dbg"
,,,,,,,,,,"(1.39) _nh_malloc_dbg_impl"
,,,,,,,,,,,"(1.39) memset"
,,,,,,,,"(1.25) uv_close"
,,,,,,,,,"(0.11) uv_want_endgame"
,,,,,,,,,"(0.08) uv_timer_stop"
,,,,,,,,"(0.63) free"
,,,,,,,,,"(0.63) _free_dbg"
,,,,,,,,,,"(0.62) memset"
,,,,,,,,,,"(0.01) _free_dbg_nolock"
,,,,,,,,,,,"(0.01) _free_base"

As we can see from the report, the test spends 79.89% time on heap_remove

v1.x

12.24 seconds total
9.12 seconds init
2.57 seconds dispatch
0.55 seconds cleanup

Call Hierarchy:
"(99.44) RtlInitializeExceptionChain ?"
,"(99.44) RtlInitializeExceptionChain ?"
,,"(99.44) BaseThreadInitThunk ?"
,,,"(99.44) mainCRTStartup"
,,,,"(99.44) __tmainCRTStartup"
,,,,,"(99.42) main"
,,,,,,"(97.29) run_test_part"
,,,,,,,"(97.29) run_benchmark_million_timers"
,,,,,,,,"(69.04) uv_timer_start"
,,,,,,,,,"(67.41) uv_timer_tree_s_RB_INSERT"
,,,,,,,,,,"(50.64) uv_timer_compare"
,,,,,,,,,,"(4.89) uv_timer_tree_s_RB_INSERT_COLOR"
,,,,,,,,,"(0.54) get_clamped_due_time"
,,,,,,,,"(22.05) uv_run"
,,,,,,,,,"(20.08) uv_process_timers"
,,,,,,,,,,"(14.63) uv_timer_tree_s_RB_MINMAX"
,,,,,,,,,,"(4.48) uv_timer_stop"
,,,,,,,,,,,"(4.06) uv_timer_tree_s_RB_REMOVE"
,,,,,,,,,,,,"(3.29) uv_timer_tree_s_RB_REMOVE_COLOR"
,,,,,,,,,,,"(0.04) _RTC_CheckEsp"
,,,,,,,,,,"(0.17) uv_timer_again"
,,,,,,,,,,"(0.04) timer_cb"
,,,,,,,,,,"(0.02) _RTC_CheckEsp"
,,,,,,,,,"(1.95) uv_process_endgames"
,,,,,,,,,,"(0.46) uv_timer_endgame"
,,,,,,,,,,,"(0.02) close_cb"
,,,,,,,,,,"(0.02) _RTC_CheckEsp"
,,,,,,,,,"(0.02) _RTC_CheckEsp"
,,,,,,,,"(2.30) malloc"
,,,,,,,,,"(2.30) _nh_malloc_dbg"
,,,,,,,,,,"(2.30) _nh_malloc_dbg_impl"
,,,,,,,,,,,"(2.30) memset"
,,,,,,,,"(1.95) uv_close"

the uv_timer_tree_s_RB_REMOVE just costs 4.06% time.

@txdv
Copy link
Contributor

txdv commented Sep 28, 2014

what did you use to test this? looks like a nice tool

@cynron
Copy link
Author

cynron commented Sep 29, 2014

@txdv I used LTProf, ^^

@saghul
Copy link
Contributor

saghul commented Sep 29, 2014

I was about to write a benchmark which tested a non-pathological usage of timers, to test this impact again, but when reading the benchmark this is what it does:

  • create timers in 1000 blocks, having each block expire 1ms apart from each other
  • run the loop until all are done
  • close all timers
  • run the loop again

Is this really the pathological use case @bnoordhuis? We are actually leveing the timers expire. Well, each loop iteration we'll remove a thousand of them, but they are not readded because they are not repeating, so I'm a bit confused :-S

@bnoordhuis
Copy link
Contributor

It's pathological in that most activity is inserting/deleting timers whereas the common use case normally is looking up the first expiring timer. You can approximate the common use case by calling uv_run(UV_RUN_NOWAIT) a few thousand times between steps.

@bnoordhuis
Copy link
Contributor

Oh, another thing: it's possible the heap performs worse due to cache eviction issues. Benchmark numbers on my machine were pretty good but it has obscene amounts of L1 and L2 data cache. YMMV.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants