-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix socket.type on OSes with SOCK_NONBLOCK #76512
Comments
On Linux, socket type is both a socket type and a bit mask (of SOCK_CLOEXEC and SOCK_NONBLOCK). Therefore, anyone who write code like 'if sock.type == SOCK_STREAM' writes non-portable code, that occasionally breaks on Linux. This caused some hard to spot bugs in asyncio already: https://bugs.python.org/issue27456 -- this one was discovered only 1 year after the actual change. On Linux, in 'include/linux/net.h' there's a SOCK_TYPE_MASK macro defined to 0xF. The idea is that on Linux you should mask socket type before comparing it to SOCK_STREAM or other socket types. Using socket.SOCK_NONBLOCK on Python was always error-prone even if one targets only Linux. For instance, socket.type won't be updated to include SOCK_NONBLOCK if the socket was updated via ioctl call directly. Therefore, I propose to fix socket.type getter to always (on Linux) apply that mask in Python. |
Should we backport it to 3.6? |
Document of socket.type says: The socket type. https://docs.python.org/3/library/socket.html#socket.socket.type OK, so, are SOCK_CLOEXEC and SOCK_NONBLOCK socket type? These two constants, if defined, can be combined with the socket types and allow you to set some flags atomically... https://docs.python.org/3/library/socket.html#socket.SOCK_CLOEXEC So they are not socket type. They are constants which can be used for Regarding to this document, I think this can be bugfix rather than |
Python's handling of socket has multiple issues, see bpo-28134 for more fun. At the moment the values of type, family and protocol are unreliable at best and dangerously wrong at worst. The attributes are only correct if and only if the socket has not been created from a file descriptor and the socket hasn't been manipulated with ioctl or setsockopt(). Now to the issue at hand. I'm -1 on applying a mask to type. You can easily work around your problem by replacing "sock.type == SOCK_STREAM" with "sock.type & SOCK_STREAM == SOCK_STREAM". |
Heh :) No, that would be a buggy code. Try this on your Linux box:
The _only_ way of checking socket type reliably is to explicitly reset SOCK_NONBLOCK and SOCK_CLOEXEC or to apply 0xF mask on Linux, and don't do that on other platforms. That's why I think it's important to fix this one way or another. |
On GH (#4877), Antoine wrote:
I agree. Here's what I propose:
>>> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM | socket.SOCK_NONBLOCK)
>>> s
<socket.socket fd=3, family=AddressFamily.AF_INET, type=2049, proto=0, laddr=('0.0.0.0', 0)> What I want to see:
Thoughts? |
Should socket.true_type use getsockopt(fd, ...) to get the actual type? While we are talking about socket types, please have a look at bpo-28134 . |
Update: I've rewritten the PR from scratch.
a/ It's highly unlikely that Linux will implement 5 more new socket types anytime soon, so 0xF should last for a *very* long time. b/ It's more likely than "a/" that they add a new flag like SOCK_CLOEXEC, in which case the "type & ~(SOCK_NONBLOCK | SOCK_CLOEXEC)" approach will fail. So given a/ and b/ I think it's safe to just use 0xF mask on Linux. |
Another name option: socket.realtype |
I think the earliest open bug about this is bpo-21327. |
I've been thinking a lot about this problem, and I'm really tempted to fix sock.type property:
Let's just fix sock.type? |
There's also a precedent of us breaking socket API. In Python < 3.6: sock = socket.socket()
fd = sock.fileno()
os.close(fd)
sock.close() Everything works fine. In 3.6+: sock.close() # Will raise OSError uvloop broke on this during the beta period and I fixed the issue quickly. I don't remember seeing any complaints about this, so I suspect that this issue wasn't a big deal, people just fixed their code. |
I did a little digging. It's more complicated than just "on Linux, these show up in the socket type". Background: SOCK_NONBLOCK and SOCK_CLOEXEC are Linux-isms. The standard way to control blocking-ness and cloexec-ness is via ioctl/fcntl, and Linux supports that too. But as an extension, it also has these flags that can be passed as part of the socket type, which is nice because it lets you set the flag atomically at the time the socket is created, which is very important for CLOEXEC, and ... kinda pointless for NONBLOCK, but whatever, Linux supports it. Note that as far as Linux is concerned, this is just a sneaky way to smuggle an extra argument through the socket() call. The flags *do not become part of the socket type*. If you create a socket with type=SOCK_STREAM | SOCK_NONBLOCK, and then do getsockopt(SOL_SOCKET, SOCK_TYPE) on it, Linux reports that the socket's type is just SOCK_STREAM, *without* mentioning SOCK_NONBLOCK. One more important piece of background: Python's socket.type attribute doesn't really have any connection to the OS-level socket type. (This is part of what bpo-28134 is about.) Python just takes whatever value the user passed as the type= argument when calling socket(), and stashes it in an instance variable so it can be retrieved later. Okay, so what's going on with socket.type? Python gained support for SOCK_NONBLOCK and SOCK_CLOEXEC in commit b1c5496 (bpo-7523), which mainly just exposed the constants, so that users could manually pass these in as part of the type= argument when creating the socket. However, it also made another change: it made it so that that whenever Python toggles the blocking state of the socket, it also toggles the SOCK_NONBLOCK bit in the Python-level variable that represents socket.type. Logically, this doesn't make any sense. As noted above, SOCK_NONBLOCK is never considered part of the socket type, on any operating system. And the function that is doing this doesn't even have anything to do with SOCK_NONBLOCK; it's actually using ioctl/fcntl, which is a totally unrelated mechanism for toggling blocking-ness. And it only does this on Linux (because only Linux *has* a SOCK_NONBLOCK bit to toggle), so this is kind of just introducing gratuitous brokenness. The one advantage of this is that it makes these two pieces of code consistent: # Version A
s = socket.socket(type=socket.SOCK_STREAM | socket.SOCK_NONBLOCK)
# Blindly returns what we passed to type=, even though it's
# not the actual type:
s.type
# Version B
s = socket.socket()
s.setblocking(False)
s.type This is a weird choice. Basically version A here is revealing a bug in how Python's type tracking works (s.type disagrees with s.getsockopt(SOL_SOCKET, SOL_TYPE)), that is only triggered if the user takes advantage of a rarely-used, non-portable feature. Then version B was written to intentionally introduce the same bug in a much more common case, I guess for consistency. But the bug is still non-portable. The only time SOCK_CLOEXEC shows up in socket.type is if the user explicitly passed it in their call to socket.socket(), which has been a no-op (except for its effect on socket.type) since PEP-446 made SOCK_CLOEXEC the default. I think there's a pretty strong argument here that what we really ought to do is *never* show SOCK_NONBLOCK or SOCK_CLOEXEC in in socket.type. That's what Linux itself does, it's the only possibility on every other OS, and it fixes this very annoying issue. (To add to Yury's comments about bugs in asycnio, it took me ages to figure out wtf was going on here and work around it in trio. The current behavior really does cause bugs.) It also sets the stage for fixing bpo-28134 -- right now socket.type's semantics are "whatever you passed in, whether that's actually the socket's type or not", and that's something that's impossible to determine reliably for a bare fd; I'm suggesting they should become "the socket's type", which *is* possible to determine reliably for a bare fd, at least in some cases. |
Re-reading my above post, I realized I want to emphasize a little more how odd the current relationship is between socket.type and SOCK_CLOEXEC. Right now, the way it works is: Python *always* adds SOCK_CLOEXEC to the type that it passes to the operating system when creating a socket. You can later toggle this with socket.set_inheritable, and check it with socket.get_inheritable. You might expect that you could also check it with (socket.type | SOCK_CLOEXEC), and this does tell you *something*... but it has *no relationship at all* to whether the socket has close-on-exec enabled. Instead, what it tells you whether you passed SOCK_CLOEXEC to the socket constructor. Given that passing SOCK_CLOEXEC to the socket constructor is otherwise *completely ignored*, this seems like a weird piece of information to make available. |
Nathaniel, thanks a lot for the comprehensive analysis. It's now obvious that this weird Linux-specific socket.type quirk is of our own making and specific only to Python. I've updated the PR:
That's it. I'm now certain that this is the correct way of handling this situation. socket.type must be fixed. Please review the updated PR. |
This issue is not specific to Linux. FreeBSD is also affected: >>> s=socket.socket()
>>> s.type
<SocketKind.SOCK_STREAM: 1>
>>> s.setblocking(False)
>>> s.type
536870913 See for example https://sourceware.org/ml/libc-alpha/2013-08/msg00528.html |
So instead of '#ifdef linux' we should use |
The most portable solution seems to getsockopt(): >>> s.getsockopt(socket.SOL_SOCKET, socket.SO_TYPE)
1 But I would prefer to avoid a syscall just to clear flags :-/ |
Yes, that's what I want to do. We control what constants the socket module exports. Let's just clear them if we export them -- that's a good working solution (and also the most backwards compatible, as opposed to calling sock.getsockopt) |
I've merged the PR. Summary of the final change:
This is 3.7 only change. Big thanks to Nathaniel and Victor for the help! |
In this case, bpo-27456 should be reopened or a new issue should be opened to fix asyncio in Python 3.6, no? |
I'll make a PR to fix it in 3.7. Code in 3.6 is good. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: