Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix socket.type on OSes with SOCK_NONBLOCK #76512

Closed
1st1 opened this issue Dec 15, 2017 · 23 comments
Closed

Fix socket.type on OSes with SOCK_NONBLOCK #76512

1st1 opened this issue Dec 15, 2017 · 23 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@1st1
Copy link
Member

1st1 commented Dec 15, 2017

BPO 32331
Nosy @pitrou, @vstinner, @tiran, @njsmith, @asvetlov, @methane, @1st1
PRs
  • bpo-32331: Fix socket.type on Linux; expose SOCK_TYPE_MASK. #4877
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-12-19.01:08:26.719>
    created_at = <Date 2017-12-15.03:42:49.184>
    labels = ['3.7', 'type-feature', 'library']
    title = 'Fix socket.type on OSes with SOCK_NONBLOCK'
    updated_at = <Date 2017-12-19.01:14:14.988>
    user = 'https://github.com/1st1'

    bugs.python.org fields:

    activity = <Date 2017-12-19.01:14:14.988>
    actor = 'yselivanov'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-12-19.01:08:26.719>
    closer = 'yselivanov'
    components = ['Library (Lib)']
    creation = <Date 2017-12-15.03:42:49.184>
    creator = 'yselivanov'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 32331
    keywords = ['patch']
    message_count = 23.0
    messages = ['308363', '308370', '308371', '308376', '308409', '308410', '308411', '308432', '308437', '308442', '308444', '308446', '308450', '308451', '308455', '308589', '308590', '308591', '308592', '308603', '308604', '308605', '308606']
    nosy_count = 7.0
    nosy_names = ['pitrou', 'vstinner', 'christian.heimes', 'njs', 'asvetlov', 'methane', 'yselivanov']
    pr_nums = ['4877']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue32331'
    versions = ['Python 3.7']

    @1st1
    Copy link
    Member Author

    1st1 commented Dec 15, 2017

    On Linux, socket type is both a socket type and a bit mask (of SOCK_CLOEXEC and SOCK_NONBLOCK). Therefore, anyone who write code like 'if sock.type == SOCK_STREAM' writes non-portable code, that occasionally breaks on Linux.

    This caused some hard to spot bugs in asyncio already: https://bugs.python.org/issue27456 -- this one was discovered only 1 year after the actual change.

    On Linux, in 'include/linux/net.h' there's a SOCK_TYPE_MASK macro defined to 0xF. The idea is that on Linux you should mask socket type before comparing it to SOCK_STREAM or other socket types.

    Using socket.SOCK_NONBLOCK on Python was always error-prone even if one targets only Linux. For instance, socket.type won't be updated to include SOCK_NONBLOCK if the socket was updated via ioctl call directly.

    Therefore, I propose to fix socket.type getter to always (on Linux) apply that mask in Python.

    @1st1 1st1 added 3.7 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Dec 15, 2017
    @methane
    Copy link
    Member

    methane commented Dec 15, 2017

    Should we backport it to 3.6?

    @methane
    Copy link
    Member

    methane commented Dec 15, 2017

    Document of socket.type says:

    The socket type.

    https://docs.python.org/3/library/socket.html#socket.socket.type

    OK, so, are SOCK_CLOEXEC and SOCK_NONBLOCK socket type?

    These two constants, if defined, can be combined with the socket types and allow you to set some flags atomically...

    https://docs.python.org/3/library/socket.html#socket.SOCK_CLOEXEC

    So they are not socket type. They are constants which can be used for
    setting some flags.

    Regarding to this document, I think this can be bugfix rather than
    behavior change.

    @tiran
    Copy link
    Member

    tiran commented Dec 15, 2017

    Python's handling of socket has multiple issues, see bpo-28134 for more fun. At the moment the values of type, family and protocol are unreliable at best and dangerously wrong at worst. The attributes are only correct if and only if the socket has not been created from a file descriptor and the socket hasn't been manipulated with ioctl or setsockopt().

    Now to the issue at hand. I'm -1 on applying a mask to type. You can easily work around your problem by replacing "sock.type == SOCK_STREAM" with "sock.type & SOCK_STREAM == SOCK_STREAM".

    @1st1
    Copy link
    Member Author

    1st1 commented Dec 15, 2017

    You can easily work around your problem by replacing "sock.type == SOCK_STREAM" with "sock.type & SOCK_STREAM == SOCK_STREAM".

    Heh :) No, that would be a buggy code. Try this on your Linux box:

    (socket.SOCK_SEQPACKET & socket.SOCK_STREAM) == socket.SOCK_STREAM
    

    The _only_ way of checking socket type reliably is to explicitly reset SOCK_NONBLOCK and SOCK_CLOEXEC or to apply 0xF mask on Linux, and don't do that on other platforms.

    That's why I think it's important to fix this one way or another.

    @1st1
    Copy link
    Member Author

    1st1 commented Dec 15, 2017

    On GH (#4877), Antoine wrote:

    I agree with Victor:

    it can't go into 3.6
    making the change in 3.7 is contentious
    Can there be an other way to solve the issue? Can we for example keep socket.type as it is and add a utility function or a property reporting the "masked" type?

    I agree. Here's what I propose:

    1. We don't touch socket.type. We don't know what kind of code exists out there and in what ways it will break. The ship has sailed to fix it.

    2. We add socket.SOCK_TYPE_MASK (on Linux).

    3. We fix socket.__repr__ to correctly render type. Currently:

        >>> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM | socket.SOCK_NONBLOCK)
        >>> s
        <socket.socket fd=3, family=AddressFamily.AF_INET, type=2049, proto=0, laddr=('0.0.0.0', 0)>

    What I want to see:

    <socket.socket fd=3, family=AddressFamily.AF_INET, type=SOCK_STREAM | SOCK_NONBLOCK, proto=0, laddr=('0.0.0.0', 0)>
    
    1. We add socket.true_type which will apply SOCK_TYPE_MASK and return a socket type that's safe to compare to socket.SOCK_* constants.

    Thoughts?

    @tiran
    Copy link
    Member

    tiran commented Dec 15, 2017

    Should socket.true_type use getsockopt(fd, ...) to get the actual type?

    While we are talking about socket types, please have a look at bpo-28134 .

    @1st1 1st1 changed the title apply SOCK_TYPE_MASK to socket.type on Linux Add socket.truetype property Dec 16, 2017
    @1st1
    Copy link
    Member Author

    1st1 commented Dec 16, 2017

    Update:

    I've rewritten the PR from scratch.

    1. SOCK_TYPE_MASK isn't exported on Linux. Therefore we will not export socket.SOCK_TYPE_MASK too.

    2. I've added the new socket.truetype property.

    3. When a socket is created without an FD, e.g. "socket.socket(family, type, proto)", we trust that the type of the socket is correct. It might contain extra flags on Linux, so we filter them out.

    4. When a socket is created from an FD, e.g. with "socket.fromfd" or with "socket.socket(family, type, proto, FD)" we don't trust the type. So socket.truetype will call getsockopt(SOL_SOCKET, SO_TYPE) on the FD to ensure we have the correct socket type at hand.

    5. Since Linux doesn't export SOCK_TYPE_MASK I decided to hardcode it in socketmodule.c. My reasoning:

    a/ It's highly unlikely that Linux will implement 5 more new socket types anytime soon, so 0xF should last for a *very* long time.

    b/ It's more likely than "a/" that they add a new flag like SOCK_CLOEXEC, in which case the "type & ~(SOCK_NONBLOCK | SOCK_CLOEXEC)" approach will fail.

    So given a/ and b/ I think it's safe to just use 0xF mask on Linux.

    @1st1
    Copy link
    Member Author

    1st1 commented Dec 16, 2017

    Another name option: socket.realtype

    @njsmith
    Copy link
    Contributor

    njsmith commented Dec 16, 2017

    I think the earliest open bug about this is bpo-21327.

    @1st1
    Copy link
    Member Author

    1st1 commented Dec 16, 2017

    I've been thinking a lot about this problem, and I'm really tempted to fix sock.type property:

    1. The problem first appeared in Python 3.2.

    2. Python 2.7 doesn't have this problem at all, and doesn't even export socket.SOCK_NONBLOCK. If we fix this in 3.7 it *will* actually help some poor souls with porting their network applications.

    3. People use Python when they want a high-level portable language. This annoying Linux quirk makes it super hard to write correct socket code.

    4. I can't actually come up with any decent Linux-only example of using this quirk of socket.type. Why would one check if sock.type has SOCK_NONBLOCK? And even if they check it, one call to sock.settimeout() will make sock.type information outdated and simply wrong.

    Let's just fix sock.type?

    @1st1
    Copy link
    Member Author

    1st1 commented Dec 16, 2017

    There's also a precedent of us breaking socket API. In Python < 3.6:

      sock = socket.socket()
      fd = sock.fileno()
      os.close(fd)
      sock.close()

    Everything works fine. In 3.6+:

      sock.close()  # Will raise OSError

    uvloop broke on this during the beta period and I fixed the issue quickly. I don't remember seeing any complaints about this, so I suspect that this issue wasn't a big deal, people just fixed their code.

    @njsmith
    Copy link
    Contributor

    njsmith commented Dec 16, 2017

    I did a little digging. It's more complicated than just "on Linux, these show up in the socket type".

    Background: SOCK_NONBLOCK and SOCK_CLOEXEC are Linux-isms. The standard way to control blocking-ness and cloexec-ness is via ioctl/fcntl, and Linux supports that too. But as an extension, it also has these flags that can be passed as part of the socket type, which is nice because it lets you set the flag atomically at the time the socket is created, which is very important for CLOEXEC, and ... kinda pointless for NONBLOCK, but whatever, Linux supports it.

    Note that as far as Linux is concerned, this is just a sneaky way to smuggle an extra argument through the socket() call. The flags *do not become part of the socket type*. If you create a socket with type=SOCK_STREAM | SOCK_NONBLOCK, and then do getsockopt(SOL_SOCKET, SOCK_TYPE) on it, Linux reports that the socket's type is just SOCK_STREAM, *without* mentioning SOCK_NONBLOCK.

    One more important piece of background: Python's socket.type attribute doesn't really have any connection to the OS-level socket type. (This is part of what bpo-28134 is about.) Python just takes whatever value the user passed as the type= argument when calling socket(), and stashes it in an instance variable so it can be retrieved later.

    Okay, so what's going on with socket.type? Python gained support for SOCK_NONBLOCK and SOCK_CLOEXEC in commit b1c5496 (bpo-7523), which mainly just exposed the constants, so that users could manually pass these in as part of the type= argument when creating the socket. However, it also made another change: it made it so that that whenever Python toggles the blocking state of the socket, it also toggles the SOCK_NONBLOCK bit in the Python-level variable that represents socket.type.

    Logically, this doesn't make any sense. As noted above, SOCK_NONBLOCK is never considered part of the socket type, on any operating system. And the function that is doing this doesn't even have anything to do with SOCK_NONBLOCK; it's actually using ioctl/fcntl, which is a totally unrelated mechanism for toggling blocking-ness. And it only does this on Linux (because only Linux *has* a SOCK_NONBLOCK bit to toggle), so this is kind of just introducing gratuitous brokenness.

    The one advantage of this is that it makes these two pieces of code consistent:

       # Version A
       s = socket.socket(type=socket.SOCK_STREAM | socket.SOCK_NONBLOCK)
       # Blindly returns what we passed to type=, even though it's
       # not the actual type:
       s.type
    
       # Version B
       s = socket.socket()
       s.setblocking(False)
       s.type

    This is a weird choice. Basically version A here is revealing a bug in how Python's type tracking works (s.type disagrees with s.getsockopt(SOL_SOCKET, SOL_TYPE)), that is only triggered if the user takes advantage of a rarely-used, non-portable feature. Then version B was written to intentionally introduce the same bug in a much more common case, I guess for consistency. But the bug is still non-portable.

    The only time SOCK_CLOEXEC shows up in socket.type is if the user explicitly passed it in their call to socket.socket(), which has been a no-op (except for its effect on socket.type) since PEP-446 made SOCK_CLOEXEC the default.

    I think there's a pretty strong argument here that what we really ought to do is *never* show SOCK_NONBLOCK or SOCK_CLOEXEC in in socket.type. That's what Linux itself does, it's the only possibility on every other OS, and it fixes this very annoying issue. (To add to Yury's comments about bugs in asycnio, it took me ages to figure out wtf was going on here and work around it in trio. The current behavior really does cause bugs.) It also sets the stage for fixing bpo-28134 -- right now socket.type's semantics are "whatever you passed in, whether that's actually the socket's type or not", and that's something that's impossible to determine reliably for a bare fd; I'm suggesting they should become "the socket's type", which *is* possible to determine reliably for a bare fd, at least in some cases.

    @njsmith
    Copy link
    Contributor

    njsmith commented Dec 16, 2017

    Re-reading my above post, I realized I want to emphasize a little more how odd the current relationship is between socket.type and SOCK_CLOEXEC. Right now, the way it works is:

    Python *always* adds SOCK_CLOEXEC to the type that it passes to the operating system when creating a socket. You can later toggle this with socket.set_inheritable, and check it with socket.get_inheritable.

    You might expect that you could also check it with (socket.type | SOCK_CLOEXEC), and this does tell you *something*... but it has *no relationship at all* to whether the socket has close-on-exec enabled. Instead, what it tells you whether you passed SOCK_CLOEXEC to the socket constructor. Given that passing SOCK_CLOEXEC to the socket constructor is otherwise *completely ignored*, this seems like a weird piece of information to make available.

    @1st1 1st1 changed the title Add socket.truetype property Fix socket.type on Linux Dec 16, 2017
    @1st1
    Copy link
    Member Author

    1st1 commented Dec 16, 2017

    Nathaniel, thanks a lot for the comprehensive analysis.

    It's now obvious that this weird Linux-specific socket.type quirk is of our own making and specific only to Python.

    I've updated the PR:

    1. *type* argument of 'socket.socket()' is passed as is to OS functions. It is now cleared of SOCK_NONBLOCK and SOCK_CLOEXEC on Linux.

    2. socket.setblocking() no longer applies SOCK_NONBLOCK to socket.type.

    That's it. I'm now certain that this is the correct way of handling this situation. socket.type must be fixed.

    Please review the updated PR.

    @vstinner
    Copy link
    Member

    This issue is not specific to Linux. FreeBSD is also affected:

    >>> s=socket.socket()
    >>> s.type
    <SocketKind.SOCK_STREAM: 1>
    >>> s.setblocking(False)
    >>> s.type
    536870913

    See for example https://sourceware.org/ml/libc-alpha/2013-08/msg00528.html

    @1st1
    Copy link
    Member Author

    1st1 commented Dec 18, 2017

    This issue is not specific to Linux. FreeBSD is also affected:

    So instead of '#ifdef linux' we should use #ifdef SOCK_NONBLOCK etc. I'll update the PR.

    @vstinner
    Copy link
    Member

    The most portable solution seems to getsockopt():

    >>> s.getsockopt(socket.SOL_SOCKET, socket.SO_TYPE)
    1

    But I would prefer to avoid a syscall just to clear flags :-/

    @1st1
    Copy link
    Member Author

    1st1 commented Dec 18, 2017

    But I would prefer to avoid a syscall just to clear flags :-/

    Yes, that's what I want to do.

    We control what constants the socket module exports. Let's just clear them if we export them -- that's a good working solution (and also the most backwards compatible, as opposed to calling sock.getsockopt)

    @1st1
    Copy link
    Member Author

    1st1 commented Dec 19, 2017

    New changeset 9818142 by Yury Selivanov in branch 'master':
    bpo-32331: Fix socket.type when SOCK_NONBLOCK is available (bpo-4877)
    9818142

    @1st1
    Copy link
    Member Author

    1st1 commented Dec 19, 2017

    I've merged the PR.

    Summary of the final change:

    1. socket.socket(family, type, proto) constructor clears SOCK_NONBLOCK and SOCK_CLOEXEC from 'type' before assigning it to 'sock.type'.

    2. socket.socket(family, SOCK_STREAM | SOCK_NONBLOCK) will still create a non-blocking socket.

    3. socket.setblocking() no longer sets/unsets SOCK_NONBLOCK flag on sock.type.

    This is 3.7 only change.

    Big thanks to Nathaniel and Victor for the help!

    @1st1 1st1 closed this as completed Dec 19, 2017
    @1st1 1st1 changed the title Fix socket.type on Linux Fix socket.type on OSes with SOCK_NONBLOCK Dec 19, 2017
    @vstinner
    Copy link
    Member

    This is 3.7 only change.

    In this case, bpo-27456 should be reopened or a new issue should be opened to fix asyncio in Python 3.6, no?

    @1st1
    Copy link
    Member Author

    1st1 commented Dec 19, 2017

    In this case, bpo-27456 should be reopened or a new issue should be opened to fix asyncio in Python 3.6, no?

    I'll make a PR to fix it in 3.7. Code in 3.6 is good.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants