Skip to content
This repository has been archived by the owner. It is now read-only.

Set TCP_NODELAY on TCP transports by default #373

Merged
merged 1 commit into from Sep 12, 2016

Conversation

Projects
None yet
9 participants
@1st1
Copy link
Member

commented Jul 5, 2016

This PR enables TCP_NODELAY for all TCP transports.

@gvanrossum

This comment has been minimized.

Copy link
Member

commented Jul 5, 2016

Shouldn't we at least have a way to turn it off? Or why do sockets not have this on by default?

@jimfulton

This comment has been minimized.

Copy link

commented Jul 5, 2016

Assuming that appveyor eventually goes green, LGTM. Thanks!

@1st1

This comment has been minimized.

Copy link
Member Author

commented Jul 6, 2016

@gvanrossum Please see #286, where I wanted to add another base Transport class - TCPTransport, with a set_nodelay method. I still think it'd be a good idea ;)

Or why do sockets not have this on by default?

NODELAY can be a bad thing in cases where you want to send as few packages as possible. Setting it causes writes to be sent asap, instead of waiting until the buffer is full or for a TCP ACK.

Without TCP_NODELAY socket operations can have a huge latency. It's particularly important to set for database drivers, HTTP servers and clients etc, where we want the latency to be as small as possible. For modern applications always setting TCP_NODELAY is a good default (and, for instance, that's what Golang does for all TCP connections).

@Martiusweb

This comment has been minimized.

Copy link
Member

commented Jul 6, 2016

If TCP_NODELAY is enabled, a call to write()/send() means that a TCP packet is emitted regardless of if the client ACK'ed the previous packet (as long as the congestion window isn't full), while when disabled, the packet is emited only when there is as much as the MTU to send (or after a well known delay, usually 200ms).

This is a full win when the user sends a whole message (as seen by the application protocol) in one call, and for "synchronized" protocols such as http when peers don't usually write on the socket "at the same time".

Regarding the PR, isn't TCP_NODELAY available on windows/with proactor?

@1st1

This comment has been minimized.

Copy link
Member Author

commented Jul 6, 2016

If TCP_NODELAY is enabled, a call to write()/send() means that a TCP packet is emitted regardless of if the client ACK'ed the previous packet (as long as the congestion window isn't full), while when disabled, the packet is emited only when there is as much as the MTU to send (or after a well known delay, usually 200ms).

I'm curious what happens when you use os.writev() call – writing a bunch of buffers with one syscall. Will a bunch of small buffers sent with writev be aggregated in one TCP packet?

Regarding the PR, isn't TCP_NODELAY available on windows/with proactor?

Yes, good catch. I'll add it to the proactor too.

@1st1

This comment has been minimized.

Copy link
Member Author

commented Jul 6, 2016

If TCP_NODELAY is enabled, a call to write()/send() means that a TCP packet is emitted regardless of if the client ACK'ed the previous packet (as long as the congestion window isn't full), while when disabled, the packet is emited only when there is as much as the MTU to send (or after a well known delay, usually 200ms).

I'm curious what happens when you use os.writev() call – writing a bunch of buffers with one syscall. Will a bunch of small buffers sent with writev be aggregated in one TCP packet?

I think I found the answer -- writev gathers the output and transfers the data in a single operation. This is important because we want to start using it when #339 is merged.

@Martiusweb

This comment has been minimized.

Copy link
Member

commented Jul 6, 2016

Only one packet is sent, even with TCP_NODELAY (tested with Linux 4.6.3 and wireshark).

# Disable the Nagle algorithm -- small writes will be
# sent without waiting for the TCP ACK. This generally
# decreases the latency (in some cases significantly.)
_set_nodelay(self._sock)

This comment has been minimized.

Copy link
@socketpair

socketpair Jul 6, 2016

This will work only for TCP sockets, and not for UNIX stream sockets...

This comment has been minimized.

Copy link
@1st1

1st1 Jul 6, 2016

Author Member

Good catch. We need to add functional unittests for TCP/UNIX transports to catch these sort of errors...

This comment has been minimized.

Copy link
@socketpair

socketpair Jul 6, 2016

Also, accepted sockets should be set as NODELAY too. Please check that this is true.

This comment has been minimized.

Copy link
@1st1

1st1 Jul 6, 2016

Author Member

Alright. I'll prepare a new patch some time later. Thanks for reviewing this iteration!

This comment has been minimized.

Copy link
@socketpair

This comment has been minimized.

Copy link
@socketpair

socketpair Jul 7, 2016

Good catch. We need to add functional unittests for TCP/UNIX transports to catch these sort of errors...

Tests probably will show nothing, since error is swallowed. Or, tests should test that TCP_NODELAY has been actually set.

try:
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
except OSError:
pass

This comment has been minimized.

Copy link
@socketpair

socketpair Jul 6, 2016

  1. Hiding error is bad practice. Is any real case when it will fail?
  2. Why not to embed this code into the place where it is called ?

This comment has been minimized.

Copy link
@1st1

1st1 Jul 6, 2016

Author Member

IIRC some OSes have TCP_NODELAY defined, but not implemented in the kernel, in which case you'll get an OSError.

This comment has been minimized.

Copy link
@socketpair

socketpair Jul 6, 2016

Can you name that OSes ?

This comment has been minimized.

Copy link
@1st1

1st1 Jul 6, 2016

Author Member

Without extensive googling no. Golang ignores any errors when setting TCP_NODELAY for similar reasons.

It's an optional flag, raising an error doesn't make any sense here as it can break a perfectly working program in some obscure case. What I can do is to log it in debug mode, but I don't think we need that.

@@ -39,6 +39,17 @@ def _test_selector_event(selector, fd, event):
return bool(key.events & event)


if hasattr(socket, 'TCP_NODELAY'):

This comment has been minimized.

Copy link
@socketpair

socketpair Jul 6, 2016

Is there case when this option is not available ? If yes, please add comment to source code. If no, just remove that check.

This comment has been minimized.

Copy link
@1st1

1st1 Jul 6, 2016

Author Member

Yes, some old linux kernels don't implement this. And some other OSes might not too.

How would a comment add any kind of clarity here?

This comment has been minimized.

Copy link
@socketpair

socketpair Jul 6, 2016

  1. Seems, you mean TCP_CORK, which was intorduced near Linux 2.4. AFAIK, TCP_NODELAY was implemented much earlier
  2. Which OSes really still not implements that?

Don't misunderstand me. I'm asking that in order to prevent code bloating and helping the code to be simple to read and maintain.

This comment has been minimized.

Copy link
@1st1

1st1 Jul 6, 2016

Author Member

Seems, you mean TCP_CORK, which was intorduced near Linux 2.4. AFAIK, TCP_NODELAY was implemented much earlier

Maybe you're right. Anyways, from socketmodule.c:

#ifdef  TCP_NODELAY
    PyModule_AddIntMacro(m, TCP_NODELAY);
#endif

CPython doesn't make an assumption that it's always available, so shouldn't we.

@1st1 1st1 force-pushed the 1st1:nodelay branch 2 times, most recently from 7db527f to bea3a42 Sep 12, 2016

@1st1 1st1 merged commit bea3a42 into python:master Sep 12, 2016

0 of 2 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@socketpair

This comment has been minimized.

Copy link

commented on asyncio/selector_events.py in bea3a42 Sep 12, 2016

set is slightly slower than tuple, since it will be built on each _set_nodelay() call. It is faster to lookup in tuple. So just replace curly braces with ( ).

This comment has been minimized.

Copy link

replied Sep 12, 2016

Wouldn't it just be faster and simpler to do two equality checks since there's only two elements?

This comment has been minimized.

Copy link

replied Sep 12, 2016

faster, but not simplier to read.

This comment has been minimized.

Copy link

replied Sep 12, 2016

I'd argue they're similar in readability but with how often _set_nodelay is used I figured speed would be a bigger issue than a slight hit in readability.

This comment has been minimized.

Copy link

replied Sep 12, 2016

I'm stopping this demagogy. Just change please. :)

@1st1

This comment has been minimized.

Copy link
Member Author

commented Sep 12, 2016

@socketpair I don't care that much about "performance" here. Questions like "what is faster to build, set or tuple" should only be addressed when you have an extremely tight and performance critical loop in your algorithm. Even then your decision will be very CPython specific. Anyways, I like the code as it is now.

@sethmlarson

This comment has been minimized.

Copy link

commented Sep 12, 2016

@1st1 Sorry, misunderstood the usage of _set_nodelay, I thought it was applied to each socket created by asyncio, not just server sockets. If that's the case then how it is now is fine.

@1st1

This comment has been minimized.

Copy link
Member Author

commented Sep 12, 2016

@1st1 Sorry, misunderstood the usage of _set_nodelay, I thought it was applied to each socket created by asyncio, not just server sockets. If that's the case then how it is now is fine.

@SethMichaelLarson @socketpair Hm, NODELAY should be applied to client connections & server connections created with loop.create_connection & loop.create_server. Transports created by those functions are instances of _SelectorSocketTransport which applies NODELAY in its constructor, so all of them have the flag set. Am I missing something?

@sethmlarson

This comment has been minimized.

Copy link

commented Sep 12, 2016

@1st1 Oh, so it applies to all sockets as I thought previously. Disregard my above comment. If that's the case then I think I'm still +1 on not using a tuple/set to check socket.family just because of how often it's used.

@asvetlov

This comment has been minimized.

Copy link

commented Sep 12, 2016

I believe the check for set is nothing comparing to syscall.
But using TCP_NODELAY makes sense for almost all use cases and should be enabled by default.

@AndreLouisCaron

This comment has been minimized.

Copy link

commented Feb 14, 2017

@1st1

Yes, good catch. I'll add it to the proactor too.

I just installed Python 3.6 and seems like the proactor doesn't use TCP_NODELAY. Was there some kind of limitation that prevented adding it? If it's a simple omission, would you be open to a PR that adds it there too?

@AndreLouisCaron

This comment has been minimized.

Copy link

commented Feb 14, 2017

For context, I opened issue aio-libs/aiomysql#149 because it has its own switch for TCP_NODELAY (presumably for Python 3.4 and 3.5) that breaks when run against the proactor event loop.

It would be neat if we could have this fixed upstream and if we can remove that broken code from aiomysql.

@@ -640,6 +651,11 @@ def __init__(self, loop, sock, protocol, waiter=None,
self._eof = False
self._paused = False

# Disable the Nagle algorithm -- small writes will be
# sent without waiting for the TCP ACK. This generally

This comment has been minimized.

Copy link
@socketpair

socketpair Feb 15, 2017

No, TCP_NODELAY is not connected with ACK. Actually, when nodelay is NOT set, kernel will delay sending small TCP packet, waiting for possible additional data to form one big packet. Since we buffer data in our own way, we already concatenate sequence of small writes to big one. So, kernel never sees sequence of small writes and therefore it is not needed to wait for data to concatenate in kernel.

This comment has been minimized.

Copy link
@methane

methane Feb 15, 2017

Member

@socketpair see https://en.wikipedia.org/wiki/Nagle%27s_algorithm
nagle waits ACK or enough data in send buffer. That's why delayed ACK + nagle cause trouble.

This comment has been minimized.

Copy link
@socketpair

socketpair Feb 15, 2017

image

You have said only part of algorightm (that is connected with ACK). Much more important thing is what I tried to describe (that is connected with writes < MSS).

This comment has been minimized.

Copy link
@socketpair

socketpair Feb 15, 2017

P.S. I'm not right too. It's best not to describe Nagle's algorithm in the comment.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.