New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-17561: add socket.create_server_sock() and socket.has_dual_stack() #11215

Open
wants to merge 6 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@giampaolo
Copy link
Contributor

giampaolo commented Dec 18, 2018

@pitrou

This comment has been minimized.

Copy link
Member

pitrou commented Dec 18, 2018

You're hitting CI issues. In particular, IPv6 is well-known to be broken on Travis :-/

@pitrou
Copy link
Member

pitrou left a comment

Thanks for doing this!

@@ -595,6 +595,26 @@ The following functions all create :ref:`socket objects <socket-objects>`.
.. versionchanged:: 3.2
*source_address* was added.

.. function:: create_server_sock(address, *, dual_stack=True, queue_size=5, reuse_addr=True)

Convenience function which creates a TCP server bound to *address* (a

This comment has been minimized.

@pitrou

pitrou Dec 18, 2018

Member

Could the function be extended to support UDP sockets or that would require a different implementation?
If the latter, perhaps you want to rename the function to create_tcp_server?

This comment has been minimized.

@giampaolo

giampaolo Dec 19, 2018

Contributor

You're right. I forgot about UDP. =) Yes, I think we can support it. I can simply add a type=SOCK_STREAM parameter.

This comment has been minimized.

@asvetlov

asvetlov Dec 19, 2018

Contributor

If you want to support UDP please consider adding allow_broadcast argument as we have in asyncio: https://github.com/python/cpython/blob/master/Lib/asyncio/base_events.py#L1154-L1158

This comment has been minimized.

@giampaolo

giampaolo Dec 20, 2018

Contributor

I don't think this is worth adding because one can simply set SO_BROADCAST against the returned socket. reuse_addr and reuse_port parameters are different because they necessarily must be set before bind(). I would rather keep the API as simple as possible.

This comment has been minimized.

@asvetlov

asvetlov Dec 20, 2018

Contributor

Ok

"""
if not has_ipv6 \
or not hasattr(_socket, 'AF_INET6') \
or not hasattr(_socket, 'IPV6_V6ONLY'):

This comment has been minimized.

@pitrou

pitrou Dec 18, 2018

Member

You must also test for IPPROTO_IPV6 apparently.

This comment has been minimized.

@njsmith

njsmith Dec 19, 2018

Contributor

And note that due to a bug in the socket module, socket.IPPROTO_IPV6 is not exposed on Windows, even though this constant does exist: https://bugs.python.org/issue29515

# http://mail.python.org/pipermail/python-ideas/2013-March/019937.html
host = None
# If dual stack is supported and no host was specified '::' will
# accept AF_INET and AF_INET6 connections and all interfaces.

This comment has been minimized.

@pitrou

pitrou Dec 18, 2018

Member

"on all interfaces"

if reuse_addr:
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
if af == AF_INET6_ and dual_stack:
if sock.getsockopt(IPPROTO_IPV6, IPV6_V6ONLY):

This comment has been minimized.

@pitrou

pitrou Dec 18, 2018

Member

Is it worth calling getsockopt() before setsockopt()?

This comment has been minimized.

@giampaolo

giampaolo Dec 19, 2018

Contributor

Good question. I'm not sure. My reasoning here was: being setsockopt a write operation it may have more chances of failing (EPERM or whatever) than just a get operation.

sock.settimeout(2)
t = threading.Thread(target=run)
t.start()
time.sleep(.1)

This comment has been minimized.

@pitrou

pitrou Dec 18, 2018

Member

Not sure this is reliable. @vstinner will probably object.

This comment has been minimized.

@vstinner

vstinner Dec 18, 2018

Member

I object. Don't do that.

This comment has been minimized.

@giampaolo

giampaolo Dec 19, 2018

Contributor

Agree. I will use a sync primitive.

cl.sendall(b'foo')
self.assertEqual(cl.recv(1024), b'foo')

with socket.create_server_sock((None, 0)) as sock:

This comment has been minimized.

@pitrou

pitrou Dec 18, 2018

Member

You'll probably want to skip this if the testing platform doesn't support IPv6 at all (there has to be a flag for this already :-)).

This comment has been minimized.

@giampaolo

giampaolo Dec 19, 2018

Contributor

yes, will look into it

Show resolved Hide resolved Lib/socket.py
return False

def create_server_sock(address, *, dual_stack=has_dual_stack(), queue_size=5,
reuse_addr=os.name == 'posix' and sys.platform != 'cygwin'):

This comment has been minimized.

@asvetlov

asvetlov Dec 18, 2018

Contributor

In asyncio the flag is called reuse_address. Maybe use the same name here for sake of consistency?

queue_size is named backlog with the same meaning.
I suspect 5 as default is too small. In asyncio the parameter default value is 100. For blocking sockets high value is even more important than for async case.

Also asyncio loop.create_server() supports reuse_port() flag.

This comment has been minimized.

@pitrou

pitrou Dec 18, 2018

Member

By the way, it would be useful to find out if asyncio's variant can be rewritten to benefit from this new function.

This comment has been minimized.

This comment has been minimized.

@giampaolo

giampaolo Dec 19, 2018

Contributor

By the way, it would be useful to find out if asyncio's variant can be rewritten to benefit from this new function.

No because internally asyncio has to bind on multiple sockets (on purpose):
https://bugs.python.org/issue17561#msg185765

This comment has been minimized.

@giampaolo

giampaolo Dec 19, 2018

Contributor

In asyncio the flag is called reuse_address. Maybe use the same name here for sake of consistency?

Mmm. I think that for an asyncio app this function is fundamentally useless because it will use asyncio API. You're not supposed to import socket where you import asyncio, so I'm not sure how much consistency matters in this case.

queue_size is named backlog with the same meaning.

I will rename it to backlog

I suspect 5 as default is too small. In asyncio the parameter default value is 100. For blocking sockets high value is even more important than for async case.

Good point. I took a look at some projects. Tornado uses 128, Twisted uses 50, curio uses 100, Trio (and Golang) uses a big size (but never exceding 65536):
https://github.com/python-trio/trio/blob/cfa82d8abc42a0bd56be9bc788686cb22590ace6/trio/_highlevel_open_tcp_listeners.py#L10
One may argue whether we may want to set a default kwarg for socket.listen() and also socketserver module (but those are separate issues).
CC-ing @njsmith who I suspect may be interested in this.

Also asyncio loop.create_server() supports reuse_port() flag.

Interesting. I never needed SO_REUSEPORT myself but it appears it's useful so it probably deserves its own argument:
https://medium.com/uckey/the-behaviour-of-so-reuseport-addr-1-2-f8a440a35af6

@giampaolo
Copy link
Contributor

giampaolo left a comment

Thanks for your review.

@@ -595,6 +595,26 @@ The following functions all create :ref:`socket objects <socket-objects>`.
.. versionchanged:: 3.2
*source_address* was added.

.. function:: create_server_sock(address, *, dual_stack=True, queue_size=5, reuse_addr=True)

Convenience function which creates a TCP server bound to *address* (a

This comment has been minimized.

@giampaolo

giampaolo Dec 19, 2018

Contributor

You're right. I forgot about UDP. =) Yes, I think we can support it. I can simply add a type=SOCK_STREAM parameter.

Show resolved Hide resolved Lib/socket.py
return False

def create_server_sock(address, *, dual_stack=has_dual_stack(), queue_size=5,
reuse_addr=os.name == 'posix' and sys.platform != 'cygwin'):

This comment has been minimized.

@giampaolo

giampaolo Dec 19, 2018

Contributor

By the way, it would be useful to find out if asyncio's variant can be rewritten to benefit from this new function.

No because internally asyncio has to bind on multiple sockets (on purpose):
https://bugs.python.org/issue17561#msg185765

if reuse_addr:
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
if af == AF_INET6_ and dual_stack:
if sock.getsockopt(IPPROTO_IPV6, IPV6_V6ONLY):

This comment has been minimized.

@giampaolo

giampaolo Dec 19, 2018

Contributor

Good question. I'm not sure. My reasoning here was: being setsockopt a write operation it may have more chances of failing (EPERM or whatever) than just a get operation.

sock.settimeout(2)
t = threading.Thread(target=run)
t.start()
time.sleep(.1)

This comment has been minimized.

@giampaolo

giampaolo Dec 19, 2018

Contributor

Agree. I will use a sync primitive.

cl.sendall(b'foo')
self.assertEqual(cl.recv(1024), b'foo')

with socket.create_server_sock((None, 0)) as sock:

This comment has been minimized.

@giampaolo

giampaolo Dec 19, 2018

Contributor

yes, will look into it

return False

def create_server_sock(address, *, dual_stack=has_dual_stack(), queue_size=5,
reuse_addr=os.name == 'posix' and sys.platform != 'cygwin'):

This comment has been minimized.

@giampaolo

giampaolo Dec 19, 2018

Contributor

In asyncio the flag is called reuse_address. Maybe use the same name here for sake of consistency?

Mmm. I think that for an asyncio app this function is fundamentally useless because it will use asyncio API. You're not supposed to import socket where you import asyncio, so I'm not sure how much consistency matters in this case.

queue_size is named backlog with the same meaning.

I will rename it to backlog

I suspect 5 as default is too small. In asyncio the parameter default value is 100. For blocking sockets high value is even more important than for async case.

Good point. I took a look at some projects. Tornado uses 128, Twisted uses 50, curio uses 100, Trio (and Golang) uses a big size (but never exceding 65536):
https://github.com/python-trio/trio/blob/cfa82d8abc42a0bd56be9bc788686cb22590ace6/trio/_highlevel_open_tcp_listeners.py#L10
One may argue whether we may want to set a default kwarg for socket.listen() and also socketserver module (but those are separate issues).
CC-ing @njsmith who I suspect may be interested in this.

Also asyncio loop.create_server() supports reuse_port() flag.

Interesting. I never needed SO_REUSEPORT myself but it appears it's useful so it probably deserves its own argument:
https://medium.com/uckey/the-behaviour-of-so-reuseport-addr-1-2-f8a440a35af6

.. function:: has_dual_stack()

Return ``True`` if the kernel supports creating a socket which can handle
both :data:`AF_INET` and :data:`AF_INET6` (IPv4 / IPv6) connections.

This comment has been minimized.

@njsmith

njsmith Dec 19, 2018

Contributor

I think the name of this function is pretty confusing. A "dual stack host" is one that has both IPv4 and IPv6 support. A host can have dual stack support without being able to support both of them in a single socket.

Also, for Python docs it should probably be "the platform", not "the kernel".

This comment has been minimized.

@giampaolo

giampaolo Dec 20, 2018

Contributor

I agree the name is not the best. Actually, on a second thought, I don't think it's necessary to expose this function after all.

@njsmith

This comment has been minimized.

Copy link
Contributor

njsmith commented Dec 19, 2018

I'm not sure if this is a good idea. For users who don't understand all the intricacies of BSD sockets and IPv4/IPv6, dual-protocol sockets are kind of confusing. People expect them to handle IPv4 connections like IPv4 connections, but in fact they convert IPv4 connections into "virtual" IPv6 connections. This can be especially problematic when porting legacy IPv4-only code. For example, right now, python -m http.server prints the source address for each request:

~$ python -m http.server
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
127.0.0.1 - - [18/Dec/2018 19:32:10] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [18/Dec/2018 19:32:10] "GET /favicon.ico HTTP/1.1" 404 -

It does this by using sock.getpeername() to check the client address. But, if you're using a dual-protocol socket and gets a connection over IPv4, then sock.getpeername() doesn't return an IPv4 address, it returns a magical IPv6 address that has the IPv4 address encoded inside it. So flipping http.server to use this would produce:

~$ python -m http.server
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
::ffff:127.0.0.1 - - [18/Dec/2018 19:32:10] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [18/Dec/2018 19:32:10] "GET /favicon.ico HTTP/1.1" 404 -

This is surprising and confusing (and in the case of http.server, might break existing code). In some cases it can create security vulnerabilities. (E.g., you blacklisted 1.2.3.4, but forget to blacklist ::ffff:1.2.3.4.)

I think this is why tornado/asyncio/trio all make a special effort to disable the dual-protocol mode, and instead handle dual-stack servers by opening multiple single-protocol sockets.

OTOH, for people who do understand the intricacies of BSD sockets and IPv4/IPv6, a helper is not really needed... and if you want to use the socket API directly you kind of have to know a lot of obscure intricacies.

I'm not like, massively opposed. If it's merged I'll just ignore it and it won't hurt me :-). http.server.HTTPServer could work around the logging issue by manually converting V4-mapped-V6-addresses back to V4 addresses before printing them.

If you do go ahead, the docs definitely need to highlight this issue, because otherwise you're going to get complaints from confused users.

@giampaolo

This comment has been minimized.

Copy link
Contributor

giampaolo commented Dec 20, 2018

People expect them to handle IPv4 connections like IPv4 connections, but in fact they convert IPv4 connections into "virtual" IPv6 connections. [...] if you're using a dual-protocol socket and gets a connection over IPv4, then sock.getpeername() doesn't return an IPv4 address, it returns a magical IPv6 address that has the IPv4 address encoded inside it.

That's a good point which I didn't consider. I guess that invalidates my original idea of indiscriminately using this function in other parts of the stdlib, like the tcpserver module. It also means assuming dual_stack=True as the default is not a good idea. I think a better signature could be the following:

def create_server_sock(address,
                       family=AF_UNSPEC,
                       type=SOCK_STREAM,
                       *,
                       reuse_addr=os.name == 'posix' and sys.platform != 'cygwin',
                       reuse_port=False,
                       backlog=100,
                       hybrid_v46=False):

Note: if family is AF_UNSPEC or None it will be determined from address via getaddrinfo(), similarly to create_connection(). This way one can force a given socket family if they want.

It probably also makes sense to rename has_dual_stack() to supports_hybrid_v46_socket() or something.
Thoughts?

@njsmith

This comment has been minimized.

Copy link
Contributor

njsmith commented Dec 20, 2018

reuse_addr=os.name == 'posix' and sys.platform != 'cygwin',

I wouldn't expose any option for controlling this. SO_REUSEADDR is so tricky that every library I've looked at gets it wrong, so what hope do regular people have.

The right way to do it is:

  • On Unix, everyone always wants SO_REUSEADDR
  • On Windows, you should never ever ever use SO_REUSEADDR (because it means something completely different than it does on Unix, and what it means is completely broken), and instead you should always use SO_EXCLUSIVEADDRUSE.
  • No idea about cygwin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment