This repository has been archived by the owner. It is now read-only.

Does linux kernel version affect the performance of asyncio ? #311

Closed
ly0 opened this Issue Jan 15, 2016 · 21 comments

Comments

Projects
None yet
6 participants
@ly0

ly0 commented Jan 15, 2016

I used asyncio-redis to access redis, but got different performance on different kernel.
Running the same code on linux 2.6.32 is far slower than 3.19.0.

@ly0 ly0 changed the title from Does linux kernel affect the performance of asyncio ? to Does linux kernel version affect the performance of asyncio ? Jan 15, 2016

@1st1

This comment has been minimized.

Show comment
Hide comment
@1st1

1st1 Jan 15, 2016

Member

Kernel affects everything you run on your linux. Without some profiling it's hard to say what specifically was going on. Maybe it's not even a problem of asyncio but redis?

Member

1st1 commented Jan 15, 2016

Kernel affects everything you run on your linux. Without some profiling it's hard to say what specifically was going on. Maybe it's not even a problem of asyncio but redis?

@asvetlov

This comment has been minimized.

Show comment
Hide comment
@asvetlov

asvetlov Jan 15, 2016

I bet it's because asyncio-redis doesn't use TCP_NODELAY option.
Modern linux has much more comfortable defaults for this.
I ran into the same problem in aiohttp development.

asvetlov commented Jan 15, 2016

I bet it's because asyncio-redis doesn't use TCP_NODELAY option.
Modern linux has much more comfortable defaults for this.
I ran into the same problem in aiohttp development.

@methane

This comment has been minimized.

Show comment
Hide comment
@methane

methane Jan 16, 2016

Member

Modern linux has much more comfortable defaults for this.

I'm very interested about it.
Could you give me more info about this?

Member

methane commented Jan 16, 2016

Modern linux has much more comfortable defaults for this.

I'm very interested about it.
Could you give me more info about this?

@asvetlov

This comment has been minimized.

Show comment
Hide comment
@asvetlov

asvetlov Jan 16, 2016

Sorry, I cannot point on Linux kernel change (version, release note etc).

My knowledge is: in aiohttp we had poor performance in benchmark run for the single connection with HTTP keepalives enabled. The benchmark was executed on localhost, both client and server.

I was able to reproduce the issue on Ubuntu 12.04 LTS.

After installing new Ubuntu on another laptop (15.04 or 15.10, I don't remember honestly what time it was but pretty sure 14.10 had an issue) the problem disappeared on new Ubuntu but was present on old 12.04.

After that a month ago I've added TCP_CORK / TCP_NODELAY to aiohttp. Basically it switches on TCP_CORK on start of response, send HTTP headers and body by two send syscalls and disables TCP_CORK with enabling TCP_NODELAY at the end of HTTP response transmitting.

The fix solves slow performance problem on 12.04 as well. Honestly TCP_CORK is not necessary for demonstration, TCP_NODELAY was enough.

My friend told me it most likely coupled with TCP_QUICKACK enabled by default now but I did not test it explicitly.

From my understanding TCP_QUICKACK doesn't make TCP_NODELAY option obsolete because it should be toggled on peer side while TCP_NODELAY doesn't relay on peer's settings.

That's the summary of my current understanding. If you will find more information please share.

asvetlov commented Jan 16, 2016

Sorry, I cannot point on Linux kernel change (version, release note etc).

My knowledge is: in aiohttp we had poor performance in benchmark run for the single connection with HTTP keepalives enabled. The benchmark was executed on localhost, both client and server.

I was able to reproduce the issue on Ubuntu 12.04 LTS.

After installing new Ubuntu on another laptop (15.04 or 15.10, I don't remember honestly what time it was but pretty sure 14.10 had an issue) the problem disappeared on new Ubuntu but was present on old 12.04.

After that a month ago I've added TCP_CORK / TCP_NODELAY to aiohttp. Basically it switches on TCP_CORK on start of response, send HTTP headers and body by two send syscalls and disables TCP_CORK with enabling TCP_NODELAY at the end of HTTP response transmitting.

The fix solves slow performance problem on 12.04 as well. Honestly TCP_CORK is not necessary for demonstration, TCP_NODELAY was enough.

My friend told me it most likely coupled with TCP_QUICKACK enabled by default now but I did not test it explicitly.

From my understanding TCP_QUICKACK doesn't make TCP_NODELAY option obsolete because it should be toggled on peer side while TCP_NODELAY doesn't relay on peer's settings.

That's the summary of my current understanding. If you will find more information please share.

@asvetlov

This comment has been minimized.

Show comment
Hide comment
@asvetlov

asvetlov Jan 16, 2016

@tailhook maybe you have something to add?

asvetlov commented Jan 16, 2016

@tailhook maybe you have something to add?

@tailhook

This comment has been minimized.

Show comment
Hide comment
@tailhook

tailhook Jan 16, 2016

Okay, here is a link on TCP_NODELAY and TCP_QUICKACK: https://news.ycombinator.com/item?id=10608356

If it's really related to NODELAY and QUICKACK then it's probably fixed by linux 3.14, but this is just a guess.

(and my guess about TCP_QUICKACK enabled by default, that @asvetlov mentioned, is probably wrong, at least, I have no evidence to back it up)

Anyway, without any info on a specific test performed and actual numbers, it's hard to speak of this issue seriously. In particular aio-redis does buffer the whole request and send it in one syscall AFAIK, and in this case, it's not affected by NODELAY and QUICKACK. It's entirely different project from asyncio-redis, though.

Also, there are many TCP stack improvements in this range of kernel versions.

tailhook commented Jan 16, 2016

Okay, here is a link on TCP_NODELAY and TCP_QUICKACK: https://news.ycombinator.com/item?id=10608356

If it's really related to NODELAY and QUICKACK then it's probably fixed by linux 3.14, but this is just a guess.

(and my guess about TCP_QUICKACK enabled by default, that @asvetlov mentioned, is probably wrong, at least, I have no evidence to back it up)

Anyway, without any info on a specific test performed and actual numbers, it's hard to speak of this issue seriously. In particular aio-redis does buffer the whole request and send it in one syscall AFAIK, and in this case, it's not affected by NODELAY and QUICKACK. It's entirely different project from asyncio-redis, though.

Also, there are many TCP stack improvements in this range of kernel versions.

@asvetlov

This comment has been minimized.

Show comment
Hide comment
@asvetlov

asvetlov Jan 18, 2016

Just note: Ubuntu 12.04 LTS mentioned above (it requires explicit NODELAY) has Linux 3.13 kernel

asvetlov commented Jan 18, 2016

Just note: Ubuntu 12.04 LTS mentioned above (it requires explicit NODELAY) has Linux 3.13 kernel

@ly0

This comment has been minimized.

Show comment
Hide comment
@ly0

ly0 Jan 20, 2016

ok, here's a simple code.
It's fast when a single client but when clients increases, the query time will increase significantly even up to 100 times (see the output in console).
I can't figure out where the problem is.

import asyncio
import asyncio_redis
from aiohttp import web

connection_args = {
    'host': '127.0.0.1',
    'port': 6379,
    'poolsize': 100,
    'auto_reconnect': True
}

client = asyncio.get_event_loop().run_until_complete(
        asyncio_redis.Pool.create(**connection_args))


async def hello(request):
    import time
    _ = time.time()
    result = await client.mget(["10003:1000031463",
                "10003:1000031055",
                "10003:1000031467",
                "10003:1000031077",
                "10003:1000031868",
                "10003:1000031464",
                "10003:1000031076",
                "10003:1000031934",
                "10003:1000031834",
                "10003:1000031469",
                "10003:1000031068",
                "10003:1000031462",
                "10003:1000031466",
                "10003:1000031899",
                "10003:1000031858",
                "10003:1000031900",
                "10003:1000031056",
                "10003:1000031078",
                "10003:1000031340",
                "10003:1000031331"])
    result_real = await result.aslist()
    print(time.time() - _)
    return web.Response(text=str(result_real))


# test data
keys = ["10003:1000031463",
                "10003:1000031055",
                "10003:1000031467",
                "10003:1000031077",
                "10003:1000031868",
                "10003:1000031464",
                "10003:1000031076",
                "10003:1000031934",
                "10003:1000031834",
                "10003:1000031469",
                "10003:1000031068",
                "10003:1000031462",
                "10003:1000031466",
                "10003:1000031899",
                "10003:1000031858",
                "10003:1000031900",
                "10003:1000031056",
                "10003:1000031078",
                "10003:1000031340",
                "10003:1000031331"]

values = ['1'] * len(keys)
redis_values = zip(keys, values)
loop = asyncio.get_event_loop()

for k, v in redis_values:
    loop.run_until_complete(client.set(k, v))

app = web.Application()
app.router.add_route('GET', '/', hello)

handler = app.make_handler()
f = loop.create_server(handler, '0.0.0.0', 8080)
srv = loop.run_until_complete(f)

try:
    loop.run_forever()
except KeyboardInterrupt:
    pass
finally:
    srv.close()
    loop.run_until_complete(srv.wait_closed())
    loop.run_until_complete(handler.finish_connections(1.0))
    loop.run_until_complete(app.finish())

ly0 commented Jan 20, 2016

ok, here's a simple code.
It's fast when a single client but when clients increases, the query time will increase significantly even up to 100 times (see the output in console).
I can't figure out where the problem is.

import asyncio
import asyncio_redis
from aiohttp import web

connection_args = {
    'host': '127.0.0.1',
    'port': 6379,
    'poolsize': 100,
    'auto_reconnect': True
}

client = asyncio.get_event_loop().run_until_complete(
        asyncio_redis.Pool.create(**connection_args))


async def hello(request):
    import time
    _ = time.time()
    result = await client.mget(["10003:1000031463",
                "10003:1000031055",
                "10003:1000031467",
                "10003:1000031077",
                "10003:1000031868",
                "10003:1000031464",
                "10003:1000031076",
                "10003:1000031934",
                "10003:1000031834",
                "10003:1000031469",
                "10003:1000031068",
                "10003:1000031462",
                "10003:1000031466",
                "10003:1000031899",
                "10003:1000031858",
                "10003:1000031900",
                "10003:1000031056",
                "10003:1000031078",
                "10003:1000031340",
                "10003:1000031331"])
    result_real = await result.aslist()
    print(time.time() - _)
    return web.Response(text=str(result_real))


# test data
keys = ["10003:1000031463",
                "10003:1000031055",
                "10003:1000031467",
                "10003:1000031077",
                "10003:1000031868",
                "10003:1000031464",
                "10003:1000031076",
                "10003:1000031934",
                "10003:1000031834",
                "10003:1000031469",
                "10003:1000031068",
                "10003:1000031462",
                "10003:1000031466",
                "10003:1000031899",
                "10003:1000031858",
                "10003:1000031900",
                "10003:1000031056",
                "10003:1000031078",
                "10003:1000031340",
                "10003:1000031331"]

values = ['1'] * len(keys)
redis_values = zip(keys, values)
loop = asyncio.get_event_loop()

for k, v in redis_values:
    loop.run_until_complete(client.set(k, v))

app = web.Application()
app.router.add_route('GET', '/', hello)

handler = app.make_handler()
f = loop.create_server(handler, '0.0.0.0', 8080)
srv = loop.run_until_complete(f)

try:
    loop.run_forever()
except KeyboardInterrupt:
    pass
finally:
    srv.close()
    loop.run_until_complete(srv.wait_closed())
    loop.run_until_complete(handler.finish_connections(1.0))
    loop.run_until_complete(app.finish())
@tailhook

This comment has been minimized.

Show comment
Hide comment
@tailhook

tailhook Jan 20, 2016

@ly0, can you show us how do you test? Do you use wrk or ab? Exact command-line? Also, please, show the test results from different kernels. It takes too much time to test with different kernels and do the benchmarks on the real hardware (and it's almost useless to benchmark in a virtual machine). The test results might show us if there is a real issue or the result is expected. And the order of magnitude of difference. Also looking at CPU usage of a server with a different number of clients would be helpful. I.e. if a process uses 100% CPU with one client, it's expected that it looks like 100x slower with 100 clients (this is an edge case, of course).

Anyway, thank you for trying to investigate the issue.

tailhook commented Jan 20, 2016

@ly0, can you show us how do you test? Do you use wrk or ab? Exact command-line? Also, please, show the test results from different kernels. It takes too much time to test with different kernels and do the benchmarks on the real hardware (and it's almost useless to benchmark in a virtual machine). The test results might show us if there is a real issue or the result is expected. And the order of magnitude of difference. Also looking at CPU usage of a server with a different number of clients would be helpful. I.e. if a process uses 100% CPU with one client, it's expected that it looks like 100x slower with 100 clients (this is an edge case, of course).

Anyway, thank you for trying to investigate the issue.

@ly0

This comment has been minimized.

Show comment
Hide comment
@ly0

ly0 Jan 24, 2016

ummmm.... let's forget the kernel differences, first I want to figure out why this code exhibit erratic response time.
Obviously it's so slow that takes too much time just mget from redis. 0.0072994232177734375 is acceptable at least, but this.. 0.1448051929473877 that is too bad.

here is a part of output

...
0.1080176830291748
0.1448051929473877
0.12695050239562988
0.10846233367919922
0.14527034759521484
0.12740874290466309
0.10893917083740234
0.14483380317687988
0.1278972625732422
0.10933065414428711
0.14528989791870117
0.12837624549865723
0.1097879409790039
0.14567160606384277
0.1288306713104248
0.11026549339294434
0.1461162567138672
0.12933778762817383
0.11076664924621582
0.14661741256713867
0.1298220157623291
0.11123061180114746
0.14703106880187988
0.13022756576538086
0.11162781715393066
0.1474297046661377
0.1306285858154297
0.11207866668701172
0.1478431224822998
0.1310739517211914
0.11252617835998535
0.14826011657714844
0.13150715827941895
0.1129751205444336
0.14873504638671875
0.13202333450317383
0.11342573165893555
0.1491858959197998
0.13248419761657715
0.11384844779968262
0.1495809555053711
0.1329026222229004
0.11420822143554688
0.15000414848327637
0.13330650329589844
0.1146395206451416
0.15040111541748047
0.13375425338745117
0.11508774757385254
0.15084528923034668
0.13423490524291992
0.1155552864074707
0.15129709243774414
0.1346888542175293
0.11595749855041504
0.15169405937194824
0.1350243091583252
0.1163642406463623
0.1520991325378418
0.13540124893188477
0.11674857139587402
0.15247631072998047
0.13579106330871582
0.11713695526123047
0.15285706520080566
0.1361536979675293
0.11752533912658691
0.15329289436340332
0.1366562843322754
0.11803507804870605
0.15376782417297363
0.13707804679870605
0.11842203140258789
0.15414834022521973
0.1374831199645996
0.11881351470947266
0.15452933311462402
0.13788771629333496
0.11919331550598145
0.15491104125976562
0.1382598876953125
0.11960124969482422
0.15535187721252441
0.13872265815734863
0.12000131607055664
0.15575528144836426
0.13912200927734375
0.12040066719055176
0.15617156028747559
0.13958001136779785
0.11180520057678223
0.15662550926208496
0.14000225067138672
0.11219191551208496
0.15702486038208008
0.14040517807006836
0.11257791519165039
0.15742111206054688
0.14083313941955566
0.11301183700561523
0.15789246559143066
0.14128351211547852
0.11341714859008789
0.1581873893737793
0.14168667793273926
0.11384129524230957
0.15860319137573242
0.14209699630737305
0.11425209045410156
0.15902018547058105
0.14260387420654297
0.11469817161560059
0.15951085090637207
0.14302444458007812
0.159745454788208
0.14324378967285156
0.1599743366241455
0.14348912239074707
0.16021347045898438
0.14372682571411133
0.16043591499328613
0.14393925666809082
0.16064858436584473
0.14417266845703125
0.16091203689575195
0.14442849159240723
0.16112303733825684
0.14467477798461914
0.16135525703430176
0.1449265480041504
0.1616809368133545
0.14516973495483398
0.16190409660339355
0.14540791511535645
0.16222620010375977
0.14572596549987793
0.16245007514953613
0.1459639072418213
0.16269707679748535
0.14618515968322754
0.1629188060760498
0.14637088775634766
0.16311883926391602
0.14659547805786133
0.16329622268676758
0.14684104919433594
0.16349530220031738
0.1470658779144287
0.10486626625061035
0.10479283332824707
0.0072994232177734375
0.007265567779541016
0.007268190383911133
0.007291555404663086
0.007415294647216797
0.007447242736816406
...

Test command

ab -kc 1000 -n 10000 http://127.0.0.1:8080/

and I found an interesting thing that the response time is increasing when requests increase.

ly0 commented Jan 24, 2016

ummmm.... let's forget the kernel differences, first I want to figure out why this code exhibit erratic response time.
Obviously it's so slow that takes too much time just mget from redis. 0.0072994232177734375 is acceptable at least, but this.. 0.1448051929473877 that is too bad.

here is a part of output

...
0.1080176830291748
0.1448051929473877
0.12695050239562988
0.10846233367919922
0.14527034759521484
0.12740874290466309
0.10893917083740234
0.14483380317687988
0.1278972625732422
0.10933065414428711
0.14528989791870117
0.12837624549865723
0.1097879409790039
0.14567160606384277
0.1288306713104248
0.11026549339294434
0.1461162567138672
0.12933778762817383
0.11076664924621582
0.14661741256713867
0.1298220157623291
0.11123061180114746
0.14703106880187988
0.13022756576538086
0.11162781715393066
0.1474297046661377
0.1306285858154297
0.11207866668701172
0.1478431224822998
0.1310739517211914
0.11252617835998535
0.14826011657714844
0.13150715827941895
0.1129751205444336
0.14873504638671875
0.13202333450317383
0.11342573165893555
0.1491858959197998
0.13248419761657715
0.11384844779968262
0.1495809555053711
0.1329026222229004
0.11420822143554688
0.15000414848327637
0.13330650329589844
0.1146395206451416
0.15040111541748047
0.13375425338745117
0.11508774757385254
0.15084528923034668
0.13423490524291992
0.1155552864074707
0.15129709243774414
0.1346888542175293
0.11595749855041504
0.15169405937194824
0.1350243091583252
0.1163642406463623
0.1520991325378418
0.13540124893188477
0.11674857139587402
0.15247631072998047
0.13579106330871582
0.11713695526123047
0.15285706520080566
0.1361536979675293
0.11752533912658691
0.15329289436340332
0.1366562843322754
0.11803507804870605
0.15376782417297363
0.13707804679870605
0.11842203140258789
0.15414834022521973
0.1374831199645996
0.11881351470947266
0.15452933311462402
0.13788771629333496
0.11919331550598145
0.15491104125976562
0.1382598876953125
0.11960124969482422
0.15535187721252441
0.13872265815734863
0.12000131607055664
0.15575528144836426
0.13912200927734375
0.12040066719055176
0.15617156028747559
0.13958001136779785
0.11180520057678223
0.15662550926208496
0.14000225067138672
0.11219191551208496
0.15702486038208008
0.14040517807006836
0.11257791519165039
0.15742111206054688
0.14083313941955566
0.11301183700561523
0.15789246559143066
0.14128351211547852
0.11341714859008789
0.1581873893737793
0.14168667793273926
0.11384129524230957
0.15860319137573242
0.14209699630737305
0.11425209045410156
0.15902018547058105
0.14260387420654297
0.11469817161560059
0.15951085090637207
0.14302444458007812
0.159745454788208
0.14324378967285156
0.1599743366241455
0.14348912239074707
0.16021347045898438
0.14372682571411133
0.16043591499328613
0.14393925666809082
0.16064858436584473
0.14417266845703125
0.16091203689575195
0.14442849159240723
0.16112303733825684
0.14467477798461914
0.16135525703430176
0.1449265480041504
0.1616809368133545
0.14516973495483398
0.16190409660339355
0.14540791511535645
0.16222620010375977
0.14572596549987793
0.16245007514953613
0.1459639072418213
0.16269707679748535
0.14618515968322754
0.1629188060760498
0.14637088775634766
0.16311883926391602
0.14659547805786133
0.16329622268676758
0.14684104919433594
0.16349530220031738
0.1470658779144287
0.10486626625061035
0.10479283332824707
0.0072994232177734375
0.007265567779541016
0.007268190383911133
0.007291555404663086
0.007415294647216797
0.007447242736816406
...

Test command

ab -kc 1000 -n 10000 http://127.0.0.1:8080/

and I found an interesting thing that the response time is increasing when requests increase.

@methane

This comment has been minimized.

Show comment
Hide comment
@methane

methane Jan 26, 2016

Member

Without TCP_NODELAY, ack sent without payload.
This is tcpdump on echo server side.

$ uname -a
Linux bench 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u5 (2015-10-09) x86_64 GNU/Linux
$ sudo tcpdump -n port 8888
12:17:03.947686 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [S], seq 3519676234, win 28400, options [mss 1420,sackOK,TS val 410897 ecr 0,nop,wscale 7], length 0
12:17:03.947744 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [S.], seq 1615309279, ack 3519676235, win 28160, options [mss 1420,sackOK,TS val 4665864 ecr 410897,nop,wscale 7], length 0
12:17:03.948356 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [.], ack 1, win 222, options [nop,nop,TS val 410897 ecr 4665864], length 0
12:17:03.948607 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 1:13, ack 1, win 222, options [nop,nop,TS val 410897 ecr 4665864], length 12
12:17:03.948616 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [.], ack 13, win 220, options [nop,nop,TS val 4665864 ecr 410897], length 0
12:17:03.950441 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 1:13, ack 13, win 220, options [nop,nop,TS val 4665865 ecr 410897], length 12
12:17:03.950638 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [.], ack 13, win 222, options [nop,nop,TS val 410898 ecr 4665865], length 0
12:17:03.950962 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 13:25, ack 13, win 222, options [nop,nop,TS val 410898 ecr 4665865], length 12
12:17:03.951292 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 13:25, ack 25, win 220, options [nop,nop,TS val 4665865 ecr 410898], length 12
12:17:03.951646 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 25:37, ack 25, win 222, options [nop,nop,TS val 410898 ecr 4665865], length 12
12:17:03.951976 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 25:37, ack 37, win 220, options [nop,nop,TS val 4665865 ecr 410898], length 12
12:17:03.952198 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 37:49, ack 37, win 222, options [nop,nop,TS val 410898 ecr 4665865], length 12
12:17:03.952465 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 37:49, ack 49, win 220, options [nop,nop,TS val 4665865 ecr 410898], length 12
12:17:03.952653 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 49:61, ack 49, win 222, options [nop,nop,TS val 410898 ecr 4665865], length 12
12:17:03.952917 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 49:61, ack 61, win 220, options [nop,nop,TS val 4665865 ecr 410898], length 12
12:17:03.953152 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 61:73, ack 61, win 222, options [nop,nop,TS val 410898 ecr 4665865], length 12
12:17:03.953419 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 61:73, ack 73, win 220, options [nop,nop,TS val 4665865 ecr 410898], length 12
12:17:03.953646 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 73:85, ack 73, win 222, options [nop,nop,TS val 410898 ecr 4665865], length 12
12:17:03.954147 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 73:85, ack 85, win 220, options [nop,nop,TS val 4665865 ecr 410898], length 12
12:17:03.954399 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 85:97, ack 85, win 222, options [nop,nop,TS val 410899 ecr 4665865], length 12
12:17:03.954664 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 85:97, ack 97, win 220, options [nop,nop,TS val 4665866 ecr 410899], length 12
12:17:03.954951 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 97:109, ack 97, win 222, options [nop,nop,TS val 410899 ecr 4665866], length 12
12:17:03.955222 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 97:109, ack 109, win 220, options [nop,nop,TS val 4665866 ecr 410899], length 12
12:17:03.955427 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 109:121, ack 109, win 222, options [nop,nop,TS val 410899 ecr 4665866], length 12
12:17:03.955696 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 109:121, ack 121, win 220, options [nop,nop,TS val 4665866 ecr 410899], length 12
12:17:03.956141 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [F.], seq 121, ack 121, win 222, options [nop,nop,TS val 410899 ecr 4665866], length 0
12:17:03.956742 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [F.], seq 121, ack 122, win 220, options [nop,nop,TS val 4665866 ecr 410899], length 0
12:17:03.956908 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [.], ack 122, win 222, options [nop,nop,TS val 410899 ecr 4665866], length 0
Member

methane commented Jan 26, 2016

Without TCP_NODELAY, ack sent without payload.
This is tcpdump on echo server side.

$ uname -a
Linux bench 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u5 (2015-10-09) x86_64 GNU/Linux
$ sudo tcpdump -n port 8888
12:17:03.947686 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [S], seq 3519676234, win 28400, options [mss 1420,sackOK,TS val 410897 ecr 0,nop,wscale 7], length 0
12:17:03.947744 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [S.], seq 1615309279, ack 3519676235, win 28160, options [mss 1420,sackOK,TS val 4665864 ecr 410897,nop,wscale 7], length 0
12:17:03.948356 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [.], ack 1, win 222, options [nop,nop,TS val 410897 ecr 4665864], length 0
12:17:03.948607 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 1:13, ack 1, win 222, options [nop,nop,TS val 410897 ecr 4665864], length 12
12:17:03.948616 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [.], ack 13, win 220, options [nop,nop,TS val 4665864 ecr 410897], length 0
12:17:03.950441 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 1:13, ack 13, win 220, options [nop,nop,TS val 4665865 ecr 410897], length 12
12:17:03.950638 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [.], ack 13, win 222, options [nop,nop,TS val 410898 ecr 4665865], length 0
12:17:03.950962 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 13:25, ack 13, win 222, options [nop,nop,TS val 410898 ecr 4665865], length 12
12:17:03.951292 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 13:25, ack 25, win 220, options [nop,nop,TS val 4665865 ecr 410898], length 12
12:17:03.951646 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 25:37, ack 25, win 222, options [nop,nop,TS val 410898 ecr 4665865], length 12
12:17:03.951976 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 25:37, ack 37, win 220, options [nop,nop,TS val 4665865 ecr 410898], length 12
12:17:03.952198 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 37:49, ack 37, win 222, options [nop,nop,TS val 410898 ecr 4665865], length 12
12:17:03.952465 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 37:49, ack 49, win 220, options [nop,nop,TS val 4665865 ecr 410898], length 12
12:17:03.952653 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 49:61, ack 49, win 222, options [nop,nop,TS val 410898 ecr 4665865], length 12
12:17:03.952917 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 49:61, ack 61, win 220, options [nop,nop,TS val 4665865 ecr 410898], length 12
12:17:03.953152 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 61:73, ack 61, win 222, options [nop,nop,TS val 410898 ecr 4665865], length 12
12:17:03.953419 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 61:73, ack 73, win 220, options [nop,nop,TS val 4665865 ecr 410898], length 12
12:17:03.953646 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 73:85, ack 73, win 222, options [nop,nop,TS val 410898 ecr 4665865], length 12
12:17:03.954147 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 73:85, ack 85, win 220, options [nop,nop,TS val 4665865 ecr 410898], length 12
12:17:03.954399 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 85:97, ack 85, win 222, options [nop,nop,TS val 410899 ecr 4665865], length 12
12:17:03.954664 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 85:97, ack 97, win 220, options [nop,nop,TS val 4665866 ecr 410899], length 12
12:17:03.954951 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 97:109, ack 97, win 222, options [nop,nop,TS val 410899 ecr 4665866], length 12
12:17:03.955222 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 97:109, ack 109, win 220, options [nop,nop,TS val 4665866 ecr 410899], length 12
12:17:03.955427 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [P.], seq 109:121, ack 109, win 222, options [nop,nop,TS val 410899 ecr 4665866], length 12
12:17:03.955696 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [P.], seq 109:121, ack 121, win 220, options [nop,nop,TS val 4665866 ecr 410899], length 12
12:17:03.956141 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [F.], seq 121, ack 121, win 222, options [nop,nop,TS val 410899 ecr 4665866], length 0
12:17:03.956742 IP 10.240.0.9.8888 > 10.240.0.10.34632: Flags [F.], seq 121, ack 122, win 220, options [nop,nop,TS val 4665866 ecr 410899], length 0
12:17:03.956908 IP 10.240.0.10.34632 > 10.240.0.9.8888: Flags [.], ack 122, win 222, options [nop,nop,TS val 410899 ecr 4665866], length 0
@methane

This comment has been minimized.

Show comment
Hide comment
@methane

methane Jan 27, 2016

Member

Same test on Ubuntu 12.04 (3.13.0-74-generic #118~precise1-Ubuntu SMP Fri Dec 18 10:38:55 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux)

10:08:45.301389 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [S], seq 1036017195, win 28400, options [mss 1420,sackOK,TS val 87456 ecr 0,nop,wscale 7], length 0
10:08:45.301443 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [S.], seq 253942418, ack 1036017196, win 28160, options [mss 1420,sackOK,TS val 864145 ecr 87456,nop,wscale 7], length 0
10:08:45.301969 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [.], ack 1, win 222, options [nop,nop,TS val 87457 ecr 864145], length 0
10:08:45.302477 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 1:13, ack 1, win 222, options [nop,nop,TS val 87457 ecr 864145], length 12
10:08:45.302490 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [.], ack 13, win 220, options [nop,nop,TS val 864145 ecr 87457], length 0
10:08:45.302636 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 1:13, ack 13, win 220, options [nop,nop,TS val 864145 ecr 87457], length 12
10:08:45.302808 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [.], ack 13, win 222, options [nop,nop,TS val 87457 ecr 864145], length 0
10:08:45.302950 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 13:25, ack 13, win 222, options [nop,nop,TS val 87457 ecr 864145], length 12
10:08:45.303066 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 13:25, ack 25, win 220, options [nop,nop,TS val 864145 ecr 87457], length 12
10:08:45.303385 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 25:37, ack 25, win 222, options [nop,nop,TS val 87457 ecr 864145], length 12
10:08:45.303460 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 25:37, ack 37, win 220, options [nop,nop,TS val 864145 ecr 87457], length 12
10:08:45.303746 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 37:49, ack 37, win 222, options [nop,nop,TS val 87457 ecr 864145], length 12
10:08:45.303821 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 37:49, ack 49, win 220, options [nop,nop,TS val 864145 ecr 87457], length 12
10:08:45.304231 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 49:61, ack 49, win 222, options [nop,nop,TS val 87457 ecr 864145], length 12
10:08:45.304324 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 49:61, ack 61, win 220, options [nop,nop,TS val 864145 ecr 87457], length 12
10:08:45.304572 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 61:73, ack 61, win 222, options [nop,nop,TS val 87457 ecr 864145], length 12
10:08:45.304649 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 61:73, ack 73, win 220, options [nop,nop,TS val 864145 ecr 87457], length 12
10:08:45.304919 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 73:85, ack 73, win 222, options [nop,nop,TS val 87457 ecr 864145], length 12
10:08:45.305009 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 73:85, ack 85, win 220, options [nop,nop,TS val 864146 ecr 87457], length 12
10:08:45.305259 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 85:97, ack 85, win 222, options [nop,nop,TS val 87457 ecr 864146], length 12
10:08:45.305350 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 85:97, ack 97, win 220, options [nop,nop,TS val 864146 ecr 87457], length 12
10:08:45.305620 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 97:109, ack 97, win 222, options [nop,nop,TS val 87457 ecr 864146], length 12
10:08:45.305710 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 97:109, ack 109, win 220, options [nop,nop,TS val 864146 ecr 87457], length 12
10:08:45.305933 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 109:121, ack 109, win 222, options [nop,nop,TS val 87458 ecr 864146], length 12
10:08:45.306011 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 109:121, ack 121, win 220, options [nop,nop,TS val 864146 ecr 87458], length 12
10:08:45.306455 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [F.], seq 121, ack 121, win 222, options [nop,nop,TS val 87458 ecr 864146], length 0
10:08:45.306631 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [F.], seq 121, ack 122, win 220, options [nop,nop,TS val 864146 ecr 87458], length 0
10:08:45.306814 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [.], ack 122, win 222, options [nop,nop,TS val 87458 ecr 864146], length 0

No problem without TCP_NODELAY.
ack has been sent soon.
I don't know how delayed ack works.

Member

methane commented Jan 27, 2016

Same test on Ubuntu 12.04 (3.13.0-74-generic #118~precise1-Ubuntu SMP Fri Dec 18 10:38:55 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux)

10:08:45.301389 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [S], seq 1036017195, win 28400, options [mss 1420,sackOK,TS val 87456 ecr 0,nop,wscale 7], length 0
10:08:45.301443 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [S.], seq 253942418, ack 1036017196, win 28160, options [mss 1420,sackOK,TS val 864145 ecr 87456,nop,wscale 7], length 0
10:08:45.301969 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [.], ack 1, win 222, options [nop,nop,TS val 87457 ecr 864145], length 0
10:08:45.302477 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 1:13, ack 1, win 222, options [nop,nop,TS val 87457 ecr 864145], length 12
10:08:45.302490 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [.], ack 13, win 220, options [nop,nop,TS val 864145 ecr 87457], length 0
10:08:45.302636 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 1:13, ack 13, win 220, options [nop,nop,TS val 864145 ecr 87457], length 12
10:08:45.302808 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [.], ack 13, win 222, options [nop,nop,TS val 87457 ecr 864145], length 0
10:08:45.302950 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 13:25, ack 13, win 222, options [nop,nop,TS val 87457 ecr 864145], length 12
10:08:45.303066 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 13:25, ack 25, win 220, options [nop,nop,TS val 864145 ecr 87457], length 12
10:08:45.303385 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 25:37, ack 25, win 222, options [nop,nop,TS val 87457 ecr 864145], length 12
10:08:45.303460 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 25:37, ack 37, win 220, options [nop,nop,TS val 864145 ecr 87457], length 12
10:08:45.303746 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 37:49, ack 37, win 222, options [nop,nop,TS val 87457 ecr 864145], length 12
10:08:45.303821 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 37:49, ack 49, win 220, options [nop,nop,TS val 864145 ecr 87457], length 12
10:08:45.304231 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 49:61, ack 49, win 222, options [nop,nop,TS val 87457 ecr 864145], length 12
10:08:45.304324 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 49:61, ack 61, win 220, options [nop,nop,TS val 864145 ecr 87457], length 12
10:08:45.304572 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 61:73, ack 61, win 222, options [nop,nop,TS val 87457 ecr 864145], length 12
10:08:45.304649 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 61:73, ack 73, win 220, options [nop,nop,TS val 864145 ecr 87457], length 12
10:08:45.304919 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 73:85, ack 73, win 222, options [nop,nop,TS val 87457 ecr 864145], length 12
10:08:45.305009 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 73:85, ack 85, win 220, options [nop,nop,TS val 864146 ecr 87457], length 12
10:08:45.305259 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 85:97, ack 85, win 222, options [nop,nop,TS val 87457 ecr 864146], length 12
10:08:45.305350 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 85:97, ack 97, win 220, options [nop,nop,TS val 864146 ecr 87457], length 12
10:08:45.305620 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 97:109, ack 97, win 222, options [nop,nop,TS val 87457 ecr 864146], length 12
10:08:45.305710 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 97:109, ack 109, win 220, options [nop,nop,TS val 864146 ecr 87457], length 12
10:08:45.305933 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [P.], seq 109:121, ack 109, win 222, options [nop,nop,TS val 87458 ecr 864146], length 12
10:08:45.306011 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [P.], seq 109:121, ack 121, win 220, options [nop,nop,TS val 864146 ecr 87458], length 12
10:08:45.306455 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [F.], seq 121, ack 121, win 222, options [nop,nop,TS val 87458 ecr 864146], length 0
10:08:45.306631 IP 10.240.0.11.8888 > 10.240.0.12.45658: Flags [F.], seq 121, ack 122, win 220, options [nop,nop,TS val 864146 ecr 87458], length 0
10:08:45.306814 IP 10.240.0.12.45658 > 10.240.0.11.8888: Flags [.], ack 122, win 222, options [nop,nop,TS val 87458 ecr 864146], length 0

No problem without TCP_NODELAY.
ack has been sent soon.
I don't know how delayed ack works.

@methane

This comment has been minimized.

Show comment
Hide comment
@methane

methane Jan 27, 2016

Member

@asvetlov Could you share me script to reproduce TCP_NODELAY problem?

Member

methane commented Jan 27, 2016

@asvetlov Could you share me script to reproduce TCP_NODELAY problem?

@asvetlov

This comment has been minimized.

Show comment
Hide comment
@asvetlov

asvetlov Feb 15, 2016

@methane very sorry, your last comment was lost in my inbox.

Reproducing is relative easy:

  1. Clone git@github.com:KeepSafe/aiohttp.git
  2. Create a python virtualenv
  3. git checkout v0.19.0
  4. pip install -e .
  5. pip install -r benchmark/requirements.txt
  6. python benchmark/async.py --tries 1 --count 10000 --concurrency 1

--concurrency 1 is crucial, please compare with --concurrency 500

The bench runs tornado and twisted as well, you may comment out run_twisted and run_tornado suites in main function.

Next v0.20.0 tag uses TCP_NODELAY and TCP_CORK, running benchmark with the same parameters displays significant performance increasing.

asvetlov commented Feb 15, 2016

@methane very sorry, your last comment was lost in my inbox.

Reproducing is relative easy:

  1. Clone git@github.com:KeepSafe/aiohttp.git
  2. Create a python virtualenv
  3. git checkout v0.19.0
  4. pip install -e .
  5. pip install -r benchmark/requirements.txt
  6. python benchmark/async.py --tries 1 --count 10000 --concurrency 1

--concurrency 1 is crucial, please compare with --concurrency 500

The bench runs tornado and twisted as well, you may comment out run_twisted and run_tornado suites in main function.

Next v0.20.0 tag uses TCP_NODELAY and TCP_CORK, running benchmark with the same parameters displays significant performance increasing.

@methane

This comment has been minimized.

Show comment
Hide comment
@methane

methane Feb 17, 2016

Member

@asvetlov Thanks! I can reproduce it.

v0.19.0: (* = seems delayed ack)

11:30:13.848452 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [S], seq 262148633, win 43690, options [mss 65495,sackOK,TS val 1671760 ecr 0,nop,wscale 7], length 0
11:30:13.848467 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [S.], seq 3208463149, ack 262148634, win 43690, options [mss 65495,sackOK,TS val 1671760 ecr 1671760,nop,wscale 7], length 0
11:30:13.848479 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 1, win 342, options [nop,nop,TS val 1671760 ecr 1671760], length 0
11:30:13.849127 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [P.], seq 1:176, ack 1, win 342, options [nop,nop,TS val 1671760 ecr 1671760], length 175
11:30:13.849149 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [.], ack 176, win 350, options [nop,nop,TS val 1671760 ecr 1671760], length 0
11:30:13.862032 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 1:176, ack 176, win 350, options [nop,nop,TS val 1671763 ecr 1671760], length 175
11:30:13.862072 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 176, win 350, options [nop,nop,TS val 1671763 ecr 1671763], length 0
11:30:13.862121 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 176:178, ack 176, win 350, options [nop,nop,TS val 1671763 ecr 1671763], length 2
11:30:13.862133 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 178, win 350, options [nop,nop,TS val 1671763 ecr 1671763], length 0
11:30:13.863090 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [P.], seq 176:351, ack 178, win 350, options [nop,nop,TS val 1671764 ecr 1671763], length 175
11:30:13.873109 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 178:353, ack 351, win 359, options [nop,nop,TS val 1671766 ecr 1671764], length 175
*11:30:13.911083 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 353, win 359, options [nop,nop,TS val 1671776 ecr 1671766], length 0
11:30:13.911100 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 353:355, ack 351, win 359, options [nop,nop,TS val 1671776 ecr 1671776], length 2
11:30:13.911113 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 355, win 359, options [nop,nop,TS val 1671776 ecr 1671776], length 0
11:30:13.911688 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [P.], seq 351:526, ack 355, win 359, options [nop,nop,TS val 1671776 ecr 1671776], length 175
11:30:13.922575 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 355:530, ack 526, win 367, options [nop,nop,TS val 1671778 ecr 1671776], length 175
*11:30:13.959092 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 530, win 367, options [nop,nop,TS val 1671788 ecr 1671778], length 0
11:30:13.959109 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 530:532, ack 526, win 367, options [nop,nop,TS val 1671788 ecr 1671788], length 2
11:30:13.959115 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 532, win 367, options [nop,nop,TS val 1671788 ecr 1671788], length 0
11:30:13.959691 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [P.], seq 526:701, ack 532, win 367, options [nop,nop,TS val 1671788 ecr 1671788], length 175
11:30:13.970407 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 532:707, ack 701, win 375, options [nop,nop,TS val 1671790 ecr 1671788], length 175
*11:30:14.007095 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 707, win 375, options [nop,nop,TS val 1671800 ecr 1671790], length 0
11:30:14.007122 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 707:709, ack 701, win 375, options [nop,nop,TS val 1671800 ecr 1671800], length 2
11:30:14.007127 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 709, win 375, options [nop,nop,TS val 1671800 ecr 1671800], length 0
11:30:14.007771 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [P.], seq 701:876, ack 709, win 375, options [nop,nop,TS val 1671800 ecr 1671800], length 175
11:30:14.020166 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 709:884, ack 876, win 384, options [nop,nop,TS val 1671803 ecr 1671800], length 175
*11:30:14.059099 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 884, win 384, options [nop,nop,TS val 1671813 ecr 1671803], length 0

v0.20.0:

11:29:54.451788 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [S], seq 127085847, win 43690, options [mss 65495,sackOK,TS val 1666911 ecr 0,nop,wscale 7], length 0
11:29:54.451802 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [S.], seq 3269442010, ack 127085848, win 43690, options [mss 65495,sackOK,TS val 1666911 ecr 1666911,nop,wscale 7], length 0
11:29:54.451813 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [.], ack 1, win 342, options [nop,nop,TS val 1666911 ecr 1666911], length 0
11:29:54.452616 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 1:176, ack 1, win 342, options [nop,nop,TS val 1666911 ecr 1666911], length 175
11:29:54.452638 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [.], ack 176, win 350, options [nop,nop,TS val 1666911 ecr 1666911], length 0
11:29:54.465378 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [P.], seq 1:178, ack 176, win 350, options [nop,nop,TS val 1666914 ecr 1666911], length 177
11:29:54.465417 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [.], ack 178, win 350, options [nop,nop,TS val 1666914 ecr 1666914], length 0
11:29:54.466275 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 176:351, ack 178, win 350, options [nop,nop,TS val 1666914 ecr 1666914], length 175
11:29:54.476194 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [P.], seq 178:355, ack 351, win 359, options [nop,nop,TS val 1666917 ecr 1666914], length 177
11:29:54.476977 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 351:526, ack 355, win 359, options [nop,nop,TS val 1666917 ecr 1666917], length 175
11:29:54.486715 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [P.], seq 355:532, ack 526, win 367, options [nop,nop,TS val 1666919 ecr 1666917], length 177
11:29:54.487477 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 526:701, ack 532, win 367, options [nop,nop,TS val 1666920 ecr 1666919], length 175
11:29:54.497199 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [P.], seq 532:709, ack 701, win 375, options [nop,nop,TS val 1666922 ecr 1666920], length 177
11:29:54.498008 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 701:876, ack 709, win 375, options [nop,nop,TS val 1666922 ecr 1666922], length 175
11:29:54.508105 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [P.], seq 709:886, ack 876, win 384, options [nop,nop,TS val 1666925 ecr 1666922], length 177
11:29:54.509052 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 876:1051, ack 886, win 384, options [nop,nop,TS val 1666925 ecr 1666925], length 175
11:29:54.518741 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [P.], seq 886:1063, ack 1051, win 392, options [nop,nop,TS val 1666927 ecr 1666925], length 177
11:29:54.519517 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 1051:1226, ack 1063, win 392, options [nop,nop,TS val 1666928 ecr 1666927], length 175
11:29:54.529200 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [P.], seq 1063:1240, ack 1226, win 400, options [nop,nop,TS val 1666930 ecr 1666928], length 177
11:29:54.529972 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 1226:1401, ack 1240, win 400, options [nop,nop,TS val 1666930 ecr 1666930], length 175
11:29:54.539961 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [P.], seq 1240:1417, ack 1401, win 409, options [nop,nop,TS val 1666933 ecr 1666930], length 177
11:29:54.540768 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 1401:1576, ack 1417, win 409, options [nop,nop,TS val 1666933 ecr 1666933], length 175
Member

methane commented Feb 17, 2016

@asvetlov Thanks! I can reproduce it.

v0.19.0: (* = seems delayed ack)

11:30:13.848452 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [S], seq 262148633, win 43690, options [mss 65495,sackOK,TS val 1671760 ecr 0,nop,wscale 7], length 0
11:30:13.848467 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [S.], seq 3208463149, ack 262148634, win 43690, options [mss 65495,sackOK,TS val 1671760 ecr 1671760,nop,wscale 7], length 0
11:30:13.848479 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 1, win 342, options [nop,nop,TS val 1671760 ecr 1671760], length 0
11:30:13.849127 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [P.], seq 1:176, ack 1, win 342, options [nop,nop,TS val 1671760 ecr 1671760], length 175
11:30:13.849149 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [.], ack 176, win 350, options [nop,nop,TS val 1671760 ecr 1671760], length 0
11:30:13.862032 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 1:176, ack 176, win 350, options [nop,nop,TS val 1671763 ecr 1671760], length 175
11:30:13.862072 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 176, win 350, options [nop,nop,TS val 1671763 ecr 1671763], length 0
11:30:13.862121 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 176:178, ack 176, win 350, options [nop,nop,TS val 1671763 ecr 1671763], length 2
11:30:13.862133 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 178, win 350, options [nop,nop,TS val 1671763 ecr 1671763], length 0
11:30:13.863090 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [P.], seq 176:351, ack 178, win 350, options [nop,nop,TS val 1671764 ecr 1671763], length 175
11:30:13.873109 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 178:353, ack 351, win 359, options [nop,nop,TS val 1671766 ecr 1671764], length 175
*11:30:13.911083 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 353, win 359, options [nop,nop,TS val 1671776 ecr 1671766], length 0
11:30:13.911100 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 353:355, ack 351, win 359, options [nop,nop,TS val 1671776 ecr 1671776], length 2
11:30:13.911113 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 355, win 359, options [nop,nop,TS val 1671776 ecr 1671776], length 0
11:30:13.911688 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [P.], seq 351:526, ack 355, win 359, options [nop,nop,TS val 1671776 ecr 1671776], length 175
11:30:13.922575 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 355:530, ack 526, win 367, options [nop,nop,TS val 1671778 ecr 1671776], length 175
*11:30:13.959092 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 530, win 367, options [nop,nop,TS val 1671788 ecr 1671778], length 0
11:30:13.959109 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 530:532, ack 526, win 367, options [nop,nop,TS val 1671788 ecr 1671788], length 2
11:30:13.959115 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 532, win 367, options [nop,nop,TS val 1671788 ecr 1671788], length 0
11:30:13.959691 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [P.], seq 526:701, ack 532, win 367, options [nop,nop,TS val 1671788 ecr 1671788], length 175
11:30:13.970407 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 532:707, ack 701, win 375, options [nop,nop,TS val 1671790 ecr 1671788], length 175
*11:30:14.007095 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 707, win 375, options [nop,nop,TS val 1671800 ecr 1671790], length 0
11:30:14.007122 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 707:709, ack 701, win 375, options [nop,nop,TS val 1671800 ecr 1671800], length 2
11:30:14.007127 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 709, win 375, options [nop,nop,TS val 1671800 ecr 1671800], length 0
11:30:14.007771 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [P.], seq 701:876, ack 709, win 375, options [nop,nop,TS val 1671800 ecr 1671800], length 175
11:30:14.020166 IP 127.0.0.1.51322 > 127.0.0.1.45567: Flags [P.], seq 709:884, ack 876, win 384, options [nop,nop,TS val 1671803 ecr 1671800], length 175
*11:30:14.059099 IP 127.0.0.1.45567 > 127.0.0.1.51322: Flags [.], ack 884, win 384, options [nop,nop,TS val 1671813 ecr 1671803], length 0

v0.20.0:

11:29:54.451788 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [S], seq 127085847, win 43690, options [mss 65495,sackOK,TS val 1666911 ecr 0,nop,wscale 7], length 0
11:29:54.451802 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [S.], seq 3269442010, ack 127085848, win 43690, options [mss 65495,sackOK,TS val 1666911 ecr 1666911,nop,wscale 7], length 0
11:29:54.451813 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [.], ack 1, win 342, options [nop,nop,TS val 1666911 ecr 1666911], length 0
11:29:54.452616 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 1:176, ack 1, win 342, options [nop,nop,TS val 1666911 ecr 1666911], length 175
11:29:54.452638 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [.], ack 176, win 350, options [nop,nop,TS val 1666911 ecr 1666911], length 0
11:29:54.465378 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [P.], seq 1:178, ack 176, win 350, options [nop,nop,TS val 1666914 ecr 1666911], length 177
11:29:54.465417 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [.], ack 178, win 350, options [nop,nop,TS val 1666914 ecr 1666914], length 0
11:29:54.466275 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 176:351, ack 178, win 350, options [nop,nop,TS val 1666914 ecr 1666914], length 175
11:29:54.476194 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [P.], seq 178:355, ack 351, win 359, options [nop,nop,TS val 1666917 ecr 1666914], length 177
11:29:54.476977 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 351:526, ack 355, win 359, options [nop,nop,TS val 1666917 ecr 1666917], length 175
11:29:54.486715 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [P.], seq 355:532, ack 526, win 367, options [nop,nop,TS val 1666919 ecr 1666917], length 177
11:29:54.487477 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 526:701, ack 532, win 367, options [nop,nop,TS val 1666920 ecr 1666919], length 175
11:29:54.497199 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [P.], seq 532:709, ack 701, win 375, options [nop,nop,TS val 1666922 ecr 1666920], length 177
11:29:54.498008 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 701:876, ack 709, win 375, options [nop,nop,TS val 1666922 ecr 1666922], length 175
11:29:54.508105 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [P.], seq 709:886, ack 876, win 384, options [nop,nop,TS val 1666925 ecr 1666922], length 177
11:29:54.509052 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 876:1051, ack 886, win 384, options [nop,nop,TS val 1666925 ecr 1666925], length 175
11:29:54.518741 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [P.], seq 886:1063, ack 1051, win 392, options [nop,nop,TS val 1666927 ecr 1666925], length 177
11:29:54.519517 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 1051:1226, ack 1063, win 392, options [nop,nop,TS val 1666928 ecr 1666927], length 175
11:29:54.529200 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [P.], seq 1063:1240, ack 1226, win 400, options [nop,nop,TS val 1666930 ecr 1666928], length 177
11:29:54.529972 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 1226:1401, ack 1240, win 400, options [nop,nop,TS val 1666930 ecr 1666930], length 175
11:29:54.539961 IP 127.0.0.1.50198 > 127.0.0.1.46597: Flags [P.], seq 1240:1417, ack 1401, win 409, options [nop,nop,TS val 1666933 ecr 1666930], length 177
11:29:54.540768 IP 127.0.0.1.46597 > 127.0.0.1.50198: Flags [P.], seq 1401:1576, ack 1417, win 409, options [nop,nop,TS val 1666933 ecr 1666933], length 175
@methane

This comment has been minimized.

Show comment
Hide comment
@methane

methane Feb 18, 2016

Member

I can reproduce the delayed ack & nagle issue on Ubuntu 14.04.
Linux 3.19 doesn't solve the issue. TCP_NODELAY is required.

inada-n@inada-n-3:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.3 LTS
Release:    14.04
Codename:   trusty
inada-n@inada-n-3:~$ uname -a
Linux inada-n-3 3.19.0-28-generic #30~14.04.1-Ubuntu SMP Tue Sep 1 09:32:55 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

inada-n@inada-n-3:~/work/aiohttp/benchmark$ ~/pyenv/versions/3.5.1/bin/python async.py --tries 1 --count 10000 --concurrency 1
.
Results for run_aiohttp
RPS: 24,    mean: 39.917 ms,    standard deviation 0.127 ms median 39.923 ms

inada-n@inada-n-3:~/work/aiohttp/benchmark$ git checkout v0.20.0
inada-n@inada-n-3:~/work/aiohttp/benchmark$ ~/pyenv/versions/3.5.1/bin/python async.py --tries 1 --count 10000 --concurrency 1
.
Results for run_aiohttp
RPS: 787,   mean: 1.220 ms, standard deviation 0.173 ms median 1.187 ms
Member

methane commented Feb 18, 2016

I can reproduce the delayed ack & nagle issue on Ubuntu 14.04.
Linux 3.19 doesn't solve the issue. TCP_NODELAY is required.

inada-n@inada-n-3:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.3 LTS
Release:    14.04
Codename:   trusty
inada-n@inada-n-3:~$ uname -a
Linux inada-n-3 3.19.0-28-generic #30~14.04.1-Ubuntu SMP Tue Sep 1 09:32:55 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

inada-n@inada-n-3:~/work/aiohttp/benchmark$ ~/pyenv/versions/3.5.1/bin/python async.py --tries 1 --count 10000 --concurrency 1
.
Results for run_aiohttp
RPS: 24,    mean: 39.917 ms,    standard deviation 0.127 ms median 39.923 ms

inada-n@inada-n-3:~/work/aiohttp/benchmark$ git checkout v0.20.0
inada-n@inada-n-3:~/work/aiohttp/benchmark$ ~/pyenv/versions/3.5.1/bin/python async.py --tries 1 --count 10000 --concurrency 1
.
Results for run_aiohttp
RPS: 787,   mean: 1.220 ms, standard deviation 0.173 ms median 1.187 ms
@Martiusweb

This comment has been minimized.

Show comment
Hide comment
@Martiusweb

Martiusweb Feb 18, 2016

Member

It's hard to give insights about TCP performance for the examples cited in this issue as we don't have a lot of details (kernel version on the server and the client, network setup, etc). I'll put there a few stuff I think are facts, I may do false assumptions or write false things (fell free to check, test and correct me).

cwnd initial value in the kernel

The initial congestion window increased from 4 to 10 in kernel 2.6.39, it significantly reduce the time required for the slow start mechanism to reach the optimal congestion window value, hence reduces the number of roundtrips when transmitting data when a connection is opened (note that it affects both sides differently, as down/up bandwidth can be asymmetric). It has a significant impact on HTTP which usually sends small bursts of data (a whole http response can be sent before the optimal congestion window is reached) instead of performing an interactive exchange between peers.

I suppose the impact of the initial cwnd can be ignored on localhost, as there is virtually no latency between roundtrips.

I also know that kernel 3.2 improved the way it handles packet loss when discovering the optimal cwnd, but I also assume it as no effect when benchmarking on localhost.

About TCP_CORK

As we know, it coalesces write()s in a kernel buffer and emits a TCP packet on the wire only if a packet can be filled, or 200ms have been waited, or TCP_CORK is unset.

On localhost, I believe it mostly affects the CPU usage: less packets are sent, hence less work for the TCP stack on both sides. On the wire, I believe it has a positive impact during slow start: as cwnd is expressed in number of packets, and CORK allow to reduce the number of packets in flight (sent but not yet ACK'ed), we can fill what the cwnd allows us to send:
cwnd * (MTU - TCP_header) => 10 * 1460 bytes = ~14kB
For messages larger than 14kB, I think it might have a negative impact on performance as it takes more time for the slow-start to reach the optimal cwnd (it is function of the number of packets sent).

I think that TCP_CORK makes sense when sending a HTTP/1 message is several write() calls: TCP packets in are emitted efficiently (regarding the HTTP/1 message boundaries). However I would not enable it for the bodies of the messages if I don't know what's in it, as some of them are time sensitive (long polling, server sent events and anything a twisted mind can put in a HTTP/1 response body).
Useful read: http://baus.net/on-tcp_cork

About TCP_NODELAY

Nagle in a nutshell : data from the first call to write() is sent ASAP, subsequent calls are corked until we received the ACK of the first packet.
TCP_NODELAY disable this "microcorking" (so to speak) : a write() call emits a packet (with respect to congestion/receive windows).

However, a client usually doesn't ACK a packet immediately:
It waits until it receive a following packet, has data to write (ack is piggybacked to the tcp packet with data), or after a 200ms delay.
It is common with HTTP/1 to see this 200ms delay, because the client receiving a response has nothing to say (it just waits for a response).
On some cases (for instance, the headers are sent in the first packet and the body in the next packet), the client doesn't ACK soon enough and the server waits 200ms to send the second packet. It is common to see this pattern on windows clients, and well documented here: https://support.microsoft.com/en-us/kb/214397

To the extend of what I understand, I would recommend to enable both TCP_CORK and TCP_NODELAY at the same time while sending the HTTP headers.
I would say that TCP_NODELAY can be activated all the time, expect when the body is sent in small fragments (a lot of inefficient write() calls). I deactivate TCP_NODELAY with HTTP/1 chunked bodies since usually it's because an upstream sends its data as soon as it is produced (ie: a fcgi script which outputs data as soon as produced without knowing in advance the size of the body).

About TCP_QUICKACK

It doesn't affect the sender side, only the receiver side. It forces ACKs to be sent ASAP.
With HTTP/1, we don't have many opportunities to piggyback ACKs to writes, so sending an ACK ASAP sounds like a good idea. I would enable TCP_QUICKACK as long as I've got nothing to write.

Notes about mobile devices

Network (radio) operations have a huge battery cost, especially when they switch between an idle state to an active (sender/receiver) state. I recommend to send large bursts as much as possible (CORK), instead of looking for latency optimizations (NODELAY) unless data is time sensitive (websockets, etc). I would also disable TCP_KEEPALIVE, which wakes up radio for "no reason".
See: http://chimera.labs.oreilly.com/books/1230000000545/ch08.html#MOBILE_NETWORK_LATENCY_OVERHEAD

tl;dr

  • use TCP_CORK to provide a hint to the kernel about your message boundaries,
  • use TCP_NODELAY when the receiver can't ACK your packets quickly (doesn't use TCP_QUICKACK, is not expected to write data anytime soon),
  • don't use TCP_KEEPALIVE
  • the version of the kernel has a direct impact on the perfs
  • benchmarking on localhost will likely be CPU bound, on the network, it will likely be IO (latency) bound, optimize for your use case.

I wanted to take some time to read the benchmark results you posted, but I'm running out of time!

Member

Martiusweb commented Feb 18, 2016

It's hard to give insights about TCP performance for the examples cited in this issue as we don't have a lot of details (kernel version on the server and the client, network setup, etc). I'll put there a few stuff I think are facts, I may do false assumptions or write false things (fell free to check, test and correct me).

cwnd initial value in the kernel

The initial congestion window increased from 4 to 10 in kernel 2.6.39, it significantly reduce the time required for the slow start mechanism to reach the optimal congestion window value, hence reduces the number of roundtrips when transmitting data when a connection is opened (note that it affects both sides differently, as down/up bandwidth can be asymmetric). It has a significant impact on HTTP which usually sends small bursts of data (a whole http response can be sent before the optimal congestion window is reached) instead of performing an interactive exchange between peers.

I suppose the impact of the initial cwnd can be ignored on localhost, as there is virtually no latency between roundtrips.

I also know that kernel 3.2 improved the way it handles packet loss when discovering the optimal cwnd, but I also assume it as no effect when benchmarking on localhost.

About TCP_CORK

As we know, it coalesces write()s in a kernel buffer and emits a TCP packet on the wire only if a packet can be filled, or 200ms have been waited, or TCP_CORK is unset.

On localhost, I believe it mostly affects the CPU usage: less packets are sent, hence less work for the TCP stack on both sides. On the wire, I believe it has a positive impact during slow start: as cwnd is expressed in number of packets, and CORK allow to reduce the number of packets in flight (sent but not yet ACK'ed), we can fill what the cwnd allows us to send:
cwnd * (MTU - TCP_header) => 10 * 1460 bytes = ~14kB
For messages larger than 14kB, I think it might have a negative impact on performance as it takes more time for the slow-start to reach the optimal cwnd (it is function of the number of packets sent).

I think that TCP_CORK makes sense when sending a HTTP/1 message is several write() calls: TCP packets in are emitted efficiently (regarding the HTTP/1 message boundaries). However I would not enable it for the bodies of the messages if I don't know what's in it, as some of them are time sensitive (long polling, server sent events and anything a twisted mind can put in a HTTP/1 response body).
Useful read: http://baus.net/on-tcp_cork

About TCP_NODELAY

Nagle in a nutshell : data from the first call to write() is sent ASAP, subsequent calls are corked until we received the ACK of the first packet.
TCP_NODELAY disable this "microcorking" (so to speak) : a write() call emits a packet (with respect to congestion/receive windows).

However, a client usually doesn't ACK a packet immediately:
It waits until it receive a following packet, has data to write (ack is piggybacked to the tcp packet with data), or after a 200ms delay.
It is common with HTTP/1 to see this 200ms delay, because the client receiving a response has nothing to say (it just waits for a response).
On some cases (for instance, the headers are sent in the first packet and the body in the next packet), the client doesn't ACK soon enough and the server waits 200ms to send the second packet. It is common to see this pattern on windows clients, and well documented here: https://support.microsoft.com/en-us/kb/214397

To the extend of what I understand, I would recommend to enable both TCP_CORK and TCP_NODELAY at the same time while sending the HTTP headers.
I would say that TCP_NODELAY can be activated all the time, expect when the body is sent in small fragments (a lot of inefficient write() calls). I deactivate TCP_NODELAY with HTTP/1 chunked bodies since usually it's because an upstream sends its data as soon as it is produced (ie: a fcgi script which outputs data as soon as produced without knowing in advance the size of the body).

About TCP_QUICKACK

It doesn't affect the sender side, only the receiver side. It forces ACKs to be sent ASAP.
With HTTP/1, we don't have many opportunities to piggyback ACKs to writes, so sending an ACK ASAP sounds like a good idea. I would enable TCP_QUICKACK as long as I've got nothing to write.

Notes about mobile devices

Network (radio) operations have a huge battery cost, especially when they switch between an idle state to an active (sender/receiver) state. I recommend to send large bursts as much as possible (CORK), instead of looking for latency optimizations (NODELAY) unless data is time sensitive (websockets, etc). I would also disable TCP_KEEPALIVE, which wakes up radio for "no reason".
See: http://chimera.labs.oreilly.com/books/1230000000545/ch08.html#MOBILE_NETWORK_LATENCY_OVERHEAD

tl;dr

  • use TCP_CORK to provide a hint to the kernel about your message boundaries,
  • use TCP_NODELAY when the receiver can't ACK your packets quickly (doesn't use TCP_QUICKACK, is not expected to write data anytime soon),
  • don't use TCP_KEEPALIVE
  • the version of the kernel has a direct impact on the perfs
  • benchmarking on localhost will likely be CPU bound, on the network, it will likely be IO (latency) bound, optimize for your use case.

I wanted to take some time to read the benchmark results you posted, but I'm running out of time!

@methane

This comment has been minimized.

Show comment
Hide comment
@methane

methane Feb 18, 2016

Member

HTTP server of aiohttp v0.19.0 sends header and body separately.

Without TCP_NODELAY, the server waits ack before sending body.
And the clients doesn't send ack soon since it doesn't have no data to send.

Here is more verbose tcpdump.

# request
12:56:26.807184 IP 127.0.0.1.53554 > 127.0.0.1.46202: Flags [P.], seq 1226:1401, ack 1240, win 400, options [nop,nop,TS val 4294932338 ecr 4294932338], length 175
E...%.@.@..T.........2.z..WPA3.T...........
..wr..wrGET /prepare HTTP/1.1
ACCEPT: */*
ACCEPT-ENCODING: gzip, deflate
HOST: 127.0.0.1:46202
USER-AGENT: Python/3.5 aiohttp/0.19.0
CONTENT-LENGTH: 0
CONNECTION: keep-alive

# response header
12:56:26.813679 IP 127.0.0.1.46202 > 127.0.0.1.53554: Flags [P.], seq 1240:1415, ack 1401, win 409, options [nop,nop,TS val 4294932339 ecr 4294932338], length 175
E....b@.@.f..........z.2A3.T..W............
..ws..wrHTTP/1.1 200 OK
CONTENT-TYPE: text/plain; charset=utf-8
CONTENT-LENGTH: 2
CONNECTION: keep-alive
DATE: Thu, 18 Feb 2016 12:56:26 GMT
SERVER: Python/3.5 aiohttp/0.19.0

# after 30ms, client sends ack for header
12:56:26.850705 IP 127.0.0.1.53554 > 127.0.0.1.46202: Flags [.], ack 1415, win 409, options [nop,nop,TS val 4294932349 ecr 4294932339], length 0
E..4%.@.@............2.z..W.A3.......(.....
..w}..ws

# body
12:56:26.850740 IP 127.0.0.1.46202 > 127.0.0.1.53554: Flags [P.], seq 1415:1417, ack 1401, win 409, options [nop,nop,TS val 4294932349 ecr 4294932349], length 2
E..6.c@.@.g\.........z.2A3....W......*.....
..w}..w}OK

# in this case, client sends ack soon.  I don't know when ack is delayed...
12:56:26.850747 IP 127.0.0.1.53554 > 127.0.0.1.46202: Flags [.], ack 1417, win 409, options [nop,nop,TS val 4294932349 ecr 4294932349], length 0
E..4%.@.@............2.z..W.A3.......(.....
..w}..w}

# Next request started
12:56:26.851679 IP 127.0.0.1.53554 > 127.0.0.1.46202: Flags [P.], seq 1401:1576, ack 1417, win 409, options [nop,nop,TS val 4294932349 ecr 4294932349], length 175
E...%.@.@..Q.........2.z..W.A3.............
..w}..w}GET /prepare HTTP/1.1
ACCEPT: */*
ACCEPT-ENCODING: gzip, deflate
HOST: 127.0.0.1:46202
USER-AGENT: Python/3.5 aiohttp/0.19.0
CONTENT-LENGTH: 0
CONNECTION: keep-alive
Member

methane commented Feb 18, 2016

HTTP server of aiohttp v0.19.0 sends header and body separately.

Without TCP_NODELAY, the server waits ack before sending body.
And the clients doesn't send ack soon since it doesn't have no data to send.

Here is more verbose tcpdump.

# request
12:56:26.807184 IP 127.0.0.1.53554 > 127.0.0.1.46202: Flags [P.], seq 1226:1401, ack 1240, win 400, options [nop,nop,TS val 4294932338 ecr 4294932338], length 175
E...%.@.@..T.........2.z..WPA3.T...........
..wr..wrGET /prepare HTTP/1.1
ACCEPT: */*
ACCEPT-ENCODING: gzip, deflate
HOST: 127.0.0.1:46202
USER-AGENT: Python/3.5 aiohttp/0.19.0
CONTENT-LENGTH: 0
CONNECTION: keep-alive

# response header
12:56:26.813679 IP 127.0.0.1.46202 > 127.0.0.1.53554: Flags [P.], seq 1240:1415, ack 1401, win 409, options [nop,nop,TS val 4294932339 ecr 4294932338], length 175
E....b@.@.f..........z.2A3.T..W............
..ws..wrHTTP/1.1 200 OK
CONTENT-TYPE: text/plain; charset=utf-8
CONTENT-LENGTH: 2
CONNECTION: keep-alive
DATE: Thu, 18 Feb 2016 12:56:26 GMT
SERVER: Python/3.5 aiohttp/0.19.0

# after 30ms, client sends ack for header
12:56:26.850705 IP 127.0.0.1.53554 > 127.0.0.1.46202: Flags [.], ack 1415, win 409, options [nop,nop,TS val 4294932349 ecr 4294932339], length 0
E..4%.@.@............2.z..W.A3.......(.....
..w}..ws

# body
12:56:26.850740 IP 127.0.0.1.46202 > 127.0.0.1.53554: Flags [P.], seq 1415:1417, ack 1401, win 409, options [nop,nop,TS val 4294932349 ecr 4294932349], length 2
E..6.c@.@.g\.........z.2A3....W......*.....
..w}..w}OK

# in this case, client sends ack soon.  I don't know when ack is delayed...
12:56:26.850747 IP 127.0.0.1.53554 > 127.0.0.1.46202: Flags [.], ack 1417, win 409, options [nop,nop,TS val 4294932349 ecr 4294932349], length 0
E..4%.@.@............2.z..W.A3.......(.....
..w}..w}

# Next request started
12:56:26.851679 IP 127.0.0.1.53554 > 127.0.0.1.46202: Flags [P.], seq 1401:1576, ack 1417, win 409, options [nop,nop,TS val 4294932349 ecr 4294932349], length 175
E...%.@.@..Q.........2.z..W.A3.............
..w}..w}GET /prepare HTTP/1.1
ACCEPT: */*
ACCEPT-ENCODING: gzip, deflate
HOST: 127.0.0.1:46202
USER-AGENT: Python/3.5 aiohttp/0.19.0
CONTENT-LENGTH: 0
CONNECTION: keep-alive
@methane

This comment has been minimized.

Show comment
Hide comment
@methane

methane Feb 18, 2016

Member

FYI, I'm interested in @asvetlov's this comment:

I bet it's because asyncio-redis doesn't use TCP_NODELAY option.
Modern linux has much more comfortable defaults for this.
I ran into the same problem in aiohttp development.

I can reproduce this typical nagle + delayed ack problem on both of Ubuntu 12.04 and 14.04.
I don't know what cause different result in @asvetlov's case.

Member

methane commented Feb 18, 2016

FYI, I'm interested in @asvetlov's this comment:

I bet it's because asyncio-redis doesn't use TCP_NODELAY option.
Modern linux has much more comfortable defaults for this.
I ran into the same problem in aiohttp development.

I can reproduce this typical nagle + delayed ack problem on both of Ubuntu 12.04 and 14.04.
I don't know what cause different result in @asvetlov's case.

@asvetlov

This comment has been minimized.

Show comment
Hide comment
@asvetlov

asvetlov Feb 18, 2016

@methane I pretty sure I saw good aiohttp 0.19.0 performance on fresh Ubuntu. It was 15.04 on that moment. Sorry, cannot prove it right now but will be able to recheck a bit later.

asvetlov commented Feb 18, 2016

@methane I pretty sure I saw good aiohttp 0.19.0 performance on fresh Ubuntu. It was 15.04 on that moment. Sorry, cannot prove it right now but will be able to recheck a bit later.

@1st1

This comment has been minimized.

Show comment
Hide comment
@1st1

1st1 Mar 2, 2016

Member

I'm closing this issue since it's not about any particular bug or improvement in asyncio.

Member

1st1 commented Mar 2, 2016

I'm closing this issue since it's not about any particular bug or improvement in asyncio.

@1st1 1st1 closed this Mar 2, 2016

@python python locked and limited conversation to collaborators Mar 2, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.