Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSLSocket.send() returns 0 for non-blocking socket #65150

Closed
nikratio mannequin opened this issue Mar 16, 2014 · 31 comments
Closed

SSLSocket.send() returns 0 for non-blocking socket #65150

nikratio mannequin opened this issue Mar 16, 2014 · 31 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@nikratio
Copy link
Mannequin

nikratio mannequin commented Mar 16, 2014

BPO 20951
Nosy @pitrou, @giampaolo, @tiran, @bitdancer, @bdarnell
Files
  • issue20951.diff
  • issue20951.diff
  • issue20951.diff
  • deprecation_patch.diff
  • docpatch.diff
  • issue20951_r2.diff
  • issue20951_r3.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-04-29.08:06:41.244>
    created_at = <Date 2014-03-16.21:16:31.275>
    labels = ['type-feature', 'library']
    title = 'SSLSocket.send() returns 0 for non-blocking socket'
    updated_at = <Date 2014-05-01.12:09:37.606>
    user = 'https://bugs.python.org/nikratio'

    bugs.python.org fields:

    activity = <Date 2014-05-01.12:09:37.606>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-04-29.08:06:41.244>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2014-03-16.21:16:31.275>
    creator = 'nikratio'
    dependencies = []
    files = ['34448', '34504', '34541', '34608', '34633', '35062', '35082']
    hgrepos = []
    issue_num = 20951
    keywords = ['patch']
    message_count = 31.0
    messages = ['213759', '213761', '213764', '213766', '213774', '213776', '213778', '213779', '213780', '213782', '214042', '214121', '214316', '214772', '214847', '214848', '214876', '214877', '214878', '214889', '214931', '217322', '217483', '217484', '217485', '217489', '217492', '217567', '217583', '217673', '217689']
    nosy_count = 8.0
    nosy_names = ['janssen', 'pitrou', 'giampaolo.rodola', 'christian.heimes', 'r.david.murray', 'nikratio', 'python-dev', 'Ben.Darnell']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue20951'
    versions = ['Python 3.5']

    @nikratio
    Copy link
    Mannequin Author

    nikratio mannequin commented Mar 16, 2014

    When using non-blocking operation, the SSLSocket.send method returns 0 if no data can be sent at this point.

    This is counterintuitive, because in the same situation (write to non-blocking socket that isn't ready for IO):

    • A regular (non-SSL) socket raises BlockingIOError
    • libc's send(2) does not return 0, but -EAGAIN or -EWOULDBLOCK.
    • OpenSSL's ssl_write does not return 0, but returns an SSL_ERROR_WANT_WRITE error
    • The ssl module's documentation describes the SSLWantWrite exception as "A subclass of SSLError raised by a non-blocking SSL socket when trying to read or write data, but more data needs to be sent on the underlying TCP transport before the request can be fulfilled."
    • Consistent with that, trying to *read* from a non-blocking SSLSocket when no data is ready raises SSLWantRead, instead of returning zero.

    This behavior also makes it more complicated to write code that works with both SSLSockets and regular sockets.

    Since the current behavior undocumented at best (and contradicting the documentation at worst), can we change this in Python 3.5?

    @nikratio nikratio mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Mar 16, 2014
    @nikratio
    Copy link
    Mannequin Author

    nikratio mannequin commented Mar 16, 2014

    This is actually seems to be not just an inconvience, but a real bug: since SSLSocket.sendall() uses SSLSocket.send() internally, the former method will busy-loop when called on a non-blocking socket.

    Note also that the .sendto and .write methods already behave consistently and raise SSLWantWrite. It seems it's really just the send() method that is the lone outlier.

    The attached patch changes ssl.send to raise SSLWantWrite instead of returning zero. The full testsuite still runs fine. I'm a bit sceptical though, because the code looks as if send() was deliberately written to catch the SSLWantWrite exception and return zero instead.. Can anyone familiar with the code comment on this?

    @bitdancer
    Copy link
    Member

    A little hg sleuthing (which I assume you did but I'll record for the record) reveals that this was introduced by Bill Jansen in changeset 8a281bfc058d. Following the bugs mentioned in the checkin message, it looks like it *might* have been related to bpo-1251, but there really isn't enough information in the issues or the checkin to tell for sure. It certainly sounds like the problems mentioned in that issue may be relevant, though (the disconnection between the unecrypted data send and what actually gets placed on the wire and when).

    I see you already added Bill Jansen to nosy, so that's probably the best bet for getting an answer, if we are lucky and he both responds and remembers :)

    @pitrou
    Copy link
    Member

    pitrou commented Mar 16, 2014

    It's probably too late to change this, unfortunately. There are non-blocking frameworks and libraries out there relying on the current behaviour.

    As for sendall(), it doesn't really make sense on a non-blocking socket anyway.

    @nikratio
    Copy link
    Mannequin Author

    nikratio mannequin commented Mar 16, 2014

    Antoine, do you know that there are frameworks out there using this, or is that a guess? asyncio, for example, seems to expect an SSLWantWrite exception as well. (it also works with a zero return, but it's not clear from the code if that's by design or by a chance).

    @pitrou
    Copy link
    Member

    pitrou commented Mar 16, 2014

    Antoine, do you know that there are frameworks out there using this,
    or is that a guess?

    It's just a guess.

    @nikratio
    Copy link
    Mannequin Author

    nikratio mannequin commented Mar 16, 2014

    Twisted does not seem to rely on it either (there's no mention of SSLWant* in the source at all, and without that, you can't possibly have support for non-blocking ssl sockets).

    @nikratio
    Copy link
    Mannequin Author

    nikratio mannequin commented Mar 16, 2014

    gevent is calling _sslobject.write() directly, so it would not be affected by any change.

    @nikratio
    Copy link
    Mannequin Author

    nikratio mannequin commented Mar 16, 2014

    Tornado uses SSLSocket.send(), and it looks as if a SSLWantWrite exception is not caught but would propagate, so this would probably break.

    @nikratio
    Copy link
    Mannequin Author

    nikratio mannequin commented Mar 17, 2014

    More info on twisted: it uses PyOpenSSL rather than the stdlib ssl module, so it's not affected at all.

    @nikratio
    Copy link
    Mannequin Author

    nikratio mannequin commented Mar 19, 2014

    Since this behavior cannot be changed without breaking third-party libraries (why did they work around this rather than reporting a bug?), I'd suggest to document the current behavior and allow programs to opt-in to getting exceptions.

    I've attached a patch to that end. Feedback would be appreciated.

    @bitdancer bitdancer added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Mar 19, 2014
    @pitrou
    Copy link
    Member

    pitrou commented Mar 19, 2014

    I don't think complicating the situation by exposing two different kinds of non-blocking sockets is the solution here.

    Either we decide it is worth breaking compatibility and we change the behaviour by default (I'm rather against this), or we simply document the discrepancy.

    @nikratio
    Copy link
    Mannequin Author

    nikratio mannequin commented Mar 21, 2014

    I'd like to argue with the wise words of Nick Coghlan here:

    --snip--
    There's a great saying in the usability world: "You can't document your way out of a usability problem". What it means is that if all the affordances of your application (or programming language!) push users towards a particular logical conclusion ([...]), having a caveat in your documentation isn't going to help, because people aren't even going to think to ask the question. It doesn't matter if you originally had a good reason for the behaviour, you've ended up in a place where your behaviour is confusing and inconsistent, because there is one piece of behaviour that is out of line with an otherwise consistent mental model.
    --snip--

    This was said in context of the bool(datetime.time) discussion, but I think it applies here as well. The rest of Python consistently raises an exception when something would block in non-blocking mode. This is reasonable behavior to expect. I agree that we shouldn't suddenly break this, but emitting a deprecation warning in Python 3.5, and changing the default in 3.6 seems reasonable to me. This is three years of transition time, and based on my random sampling so far, I doubt that there are a lot of affected modules or applications.

    @nikratio
    Copy link
    Mannequin Author

    nikratio mannequin commented Mar 25, 2014

    (refreshed patch)

    @pitrou
    Copy link
    Member

    pitrou commented Mar 25, 2014

    There's a great saying in the usability world: "You can't document
    your way out of a usability problem".

    However, adding a flag to change behaviour at runtime creates *another* usability problem. It's not obvious it would actually make things better (and implementors of async networking frameworks haven't asked for it, AFAICT).

    @giampaolo
    Copy link
    Contributor

    -1 about adding raise_on_blocking_send=False option as IMO it unnecessarily complicates the API.

    Note: when working with plain sockets send() returning 0 means the connection has been closed by the other peer, same for os.sendfile().
    It appears ssl module is the only one behaving differently therefore I'd be for signaling the discrepancy in the doc.

    @bdarnell
    Copy link
    Mannequin

    bdarnell mannequin commented Mar 26, 2014

    Giampaolo, where do you see that send() may return zero if the other side has closed? I've always gotten an error in that case (EPIPE)

    I vote -1 to adding a new flag to control whether it returns zero or raises and +0 to just fixing it in Python 3.5 (I don't think returning zero is an unreasonable thing to do; it's not obvious to me from send(2) that it is guaranteed to never return zero although I believe that to be the case). It'll break Tornado, but there will be plenty of time to get a fix out before then. If there were a convenient place to put a deprecation warning I'd vote to deprecate in 3.5 and fix in 3.6, but there's no good way for the application to signal that it expects a WANT_WRITE exception.

    Another option may be to have SSLSocket.send() convert the WANT_WRITE exception into a socket.error with errno EAGAIN. This wouldn't break Tornado and would make socket.send and SSLSocket.send more consistent, but it's weird to hide the true error like this.

    @giampaolo
    Copy link
    Contributor

    Sorry, my fault. I got confused with os.sendfile() which returns 0 on EOF.

    @nikratio
    Copy link
    Mannequin Author

    nikratio mannequin commented Mar 26, 2014

    On 03/25/2014 06:53 PM, Ben Darnell wrote:

    Another option may be to have SSLSocket.send() convert the WANT_WRITE exception into a socket.error with errno EAGAIN. This wouldn't break Tornado and would make socket.send and SSLSocket.send more consistent, but it's weird to hide the true error like this.

    I think that would only make sense if the SSLWant{Read/Write}Error
    exceptions are eliminated completely, so that all methods raise
    BlockingError (==EAGAIN) instead.

    Raising BlockingError is marginally better than returning zero, but I
    think not worth the change.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 26, 2014

    I vote -1 to adding a new flag to control whether it returns zero or
    raises and +0 to just fixing it in Python 3.5 (I don't think returning
    zero is an unreasonable thing to do; it's not obvious to me from
    send(2) that it is guaranteed to never return zero although I believe
    that to be the case). It'll break Tornado, but there will be plenty
    of time to get a fix out before then.

    If that's your opinion then I'm inclined to trust you.

    Another option may be to have SSLSocket.send() convert the WANT_WRITE
    exception into a socket.error with errno EAGAIN.

    I don't think it's a good idea, since it hides the true reason of the
    error (also, it suppresses the distinction between WANT_READ and
    WANT_WRITE, which tells you whether you need to select() the socket for
    reading or writing).

    @nikratio
    Copy link
    Mannequin Author

    nikratio mannequin commented Mar 27, 2014

    As an alternative, I have attached a pure docpatch that just documents the future behavior.

    Someone with commit privileges: please take your pick :-).

    @nikratio
    Copy link
    Mannequin Author

    nikratio mannequin commented Apr 27, 2014

    As discussed on python-dev, here is a patch that changes the behavior of send() and sendall() to raise SSLWant* exceptions instead of returning zero.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 29, 2014

    New changeset 3cf067049211 by Antoine Pitrou in branch 'default':
    Issue bpo-20951: SSLSocket.send() now raises either SSLWantReadError or SSLWantWriteError on a non-blocking socket if the operation would block. Previously, it would return 0.
    http://hg.python.org/cpython/rev/3cf067049211

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 29, 2014

    New changeset b0f6983d63df by Antoine Pitrou in branch 'default':
    Add porting note for issue bpo-20951.
    http://hg.python.org/cpython/rev/b0f6983d63df

    @pitrou
    Copy link
    Member

    pitrou commented Apr 29, 2014

    Patch finally committed. Thanks Nikolaus!

    @pitrou pitrou closed this as completed Apr 29, 2014
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 29, 2014

    New changeset 7f50e1836ddb by Antoine Pitrou in branch 'default':
    Fix failure in test_poplib after issue bpo-20951.
    http://hg.python.org/cpython/rev/7f50e1836ddb

    @pitrou
    Copy link
    Member

    pitrou commented Apr 29, 2014

    Ok, there was a failure in test_poplib when run with -unetwork, I fixed it.

    @nikratio
    Copy link
    Mannequin Author

    nikratio mannequin commented Apr 30, 2014

    Antoine, are you sure this was a problem related to this patch?

    The test seems to work just fine for me:

    $ hg update -C -r b0f6983d63df
    $ make clean
    $ ./configure --with-pydebug && make -j1
    $ ./python -m test -u network,urlfetch -j 8 test_poplib
    [1/1] test_poplib
    1 test OK.

    Am I doing something wrong?

    @pitrou
    Copy link
    Member

    pitrou commented Apr 30, 2014

    Am I doing something wrong?

    I can reproduce the failure here.
    There might be different behaviour accross OpenSSL versions (mine is
    1.0.1e).

    @nikratio
    Copy link
    Mannequin Author

    nikratio mannequin commented May 1, 2014

    Maybe. I have 1.0.1g. Could you maybe post the output of the failed test? I'd like to understand how the patch broke the test (looking at your patch alone didn't tell me much).

    @pitrou
    Copy link
    Member

    pitrou commented May 1, 2014

    Actually, the test hangs after one of the threads crashes:

    test__all__ (test.test_poplib.TestPOP3_SSLClass) ... Exception in thread Thread-23:
    Traceback (most recent call last):
      File "/home/antoine/cpython/default/Lib/threading.py", line 920, in _bootstrap_inner
        self.run()
      File "/home/antoine/cpython/default/Lib/test/test_poplib.py", line 218, in run
        asyncore.loop(timeout=0.1, count=1)
      File "/home/antoine/cpython/default/Lib/asyncore.py", line 212, in loop
        poll_fun(timeout, map)
      File "/home/antoine/cpython/default/Lib/asyncore.py", line 153, in poll
        read(obj)
      File "/home/antoine/cpython/default/Lib/asyncore.py", line 87, in read
        obj.handle_error()
      File "/home/antoine/cpython/default/Lib/asyncore.py", line 83, in read
        obj.handle_read_event()
      File "/home/antoine/cpython/default/Lib/asyncore.py", line 422, in handle_read_event
        self.handle_accept()
      File "/home/antoine/cpython/default/Lib/asyncore.py", line 499, in handle_accept
        self.handle_accepted(*pair)
      File "/home/antoine/cpython/default/Lib/test/test_poplib.py", line 228, in handle_accepted
        self.handler_instance = self.handler(conn)
      File "/home/antoine/cpython/default/Lib/test/test_poplib.py", line 368, in __init__
        self.push('+OK dummy pop3 server ready. <timestamp>')
      File "/home/antoine/cpython/default/Lib/test/test_poplib.py", line 82, in push
        asynchat.async_chat.push(self, data.encode("ISO-8859-1") + b'\r\n')
      File "/home/antoine/cpython/default/Lib/asynchat.py", line 190, in push
        self.initiate_send()
      File "/home/antoine/cpython/default/Lib/asynchat.py", line 243, in initiate_send
        self.handle_error()
      File "/home/antoine/cpython/default/Lib/asynchat.py", line 241, in initiate_send
        num_sent = self.send(data)
      File "/home/antoine/cpython/default/Lib/asyncore.py", line 366, in send
        result = self.socket.send(data)
      File "/home/antoine/cpython/default/Lib/ssl.py", line 667, in send
        return self._sslobj.write(data)
    ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:1636)

    This was due to a simplistic handling of asyncore SSL connections in test_poplib, which I've fixed by reusing the code from test_ftplib.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants