You can clone with
HTTPS or Subversion.
Preliminary work here: bnoordhuis/libuv@495ac83...d7f273a
Quoting the commit log:
Sometimes improves performance by a substantial margin (async_pummel_) but unfortunately
regresses async2 and async_pummel_8.
Benchmark numbers, higher is better:
HEAD~2 HEAD~1 HEAD
async1 92,403 89,284 95,232
async1 93,325 88,715 98,610
async1 93,467 89,509 97,823
async2 128,070 126,377 93,293
async2 135,520 105,071 93,227
async2 137,828 143,562 93,502
async4 198,192 200,984 221,910
async4 199,347 198,864 224,752
async4 200,839 203,165 224,204
async8 106,850 108,500 108,629
async8 108,112 108,449 110,243
async8 108,593 108,433 108,961
async_pummel_1 467,278 550,532 1,515,974
async_pummel_1 472,080 507,830 1,519,934
async_pummel_1 475,117 542,153 1,487,933
async_pummel_2 452,631 459,222 1,169,264
async_pummel_2 463,484 191,564 996,789
async_pummel_2 469,932 192,780 1,096,375
async_pummel_4 198,106 277,783 609,135
async_pummel_4 425,196 165,933 505,209
async_pummel_4 550,310 344,396 317,331
async_pummel_8 280,258 273,188 281,671
async_pummel_8 326,921 275,581 263,683
async_pummel_8 335,018 269,845 270,633
Needs further investigation.
Forgot to mention that HEAD~1 and HEAD show clear improvement on the million_async benchmark. Before:
million_async: 12,682,736 async events in 5.0 seconds (2,536,547/s)
million_async: 15,959,391 async events in 5.0 seconds (3,191,878/s)
I think we should try reworking my write-ptr patch instead, shared structure under lock has a big price to pay in case of big concurrency (i.e. 24 threads or more).
I use async to communicate between multiple loops on different threads.
Indeed that's what everyone is using it for.
@bnoordhuis I've runned million-async benchmark on my branch today, looks it's 2x slower than current master. Go ahead with your thing, then :)
Why not use a pipe instead? There used to be a way to do it automatically in libuv long time ago IIRC.
@saghul because when you have millions of async handles, walking through them every time is starting to take a lot of time... Actually, you don't need millions of handles for this - 300-400 is already enough to feel the performance regression.
I have one async for every Loop. Once the async is called, it goes through a lockless queue, tries to deque all callbacks until there no one left and executes them in the same thread that the loop exists. 24 loops = 24 asyncs.
I also used this async to implement uv_run_nowait (async_send before running run_once) and Ref and Unref on the Loop (they simply count the refs and unrefs and call Unref or Ref on the async instance, quite handy for integrating libuv into C# await/async/context paradigm).
@txdv I see. We do have a nowait run now ;-) I also thought the problem was with the number of asyncs per loop, not the total number of them.
FWIW, b04fc33 speeds up the async handle benchmarks by 10-330% (not a typo) on Linux so it's a less pressing issue now - for Linux, anyway.
Close this please, the commit is in mainline.