Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_socket.SendfileUsingSendfileTest fails on illumos #86403

Closed
wiedi mannequin opened this issue Nov 1, 2020 · 8 comments
Closed

test_socket.SendfileUsingSendfileTest fails on illumos #86403

wiedi mannequin opened this issue Nov 1, 2020 · 8 comments
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir topic-IO

Comments

@wiedi
Copy link
Mannequin

wiedi mannequin commented Nov 1, 2020

BPO 42237
Nosy @asvetlov, @jstasiak, @miss-islington, @kulikjak, @wiedi
PRs
  • Disable some sendfile tests on sunos until bpo-42237 is resolved #23085
  • bpo-42237: Fix os.sendfile() on illumos #23154
  • [3.9] bpo-42237: Fix os.sendfile() on illumos (GH-23154) #23244
  • [3.8] bpo-42237: Fix os.sendfile() on illumos (GH-23154). #23246
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-11-12.12:46:46.566>
    created_at = <Date 2020-11-01.20:07:47.968>
    labels = ['3.8', '3.10', 'library', '3.9', 'expert-IO']
    title = 'test_socket.SendfileUsingSendfileTest fails on illumos'
    updated_at = <Date 2020-11-12.12:46:46.563>
    user = 'https://github.com/wiedi'

    bugs.python.org fields:

    activity = <Date 2020-11-12.12:46:46.563>
    actor = 'asvetlov'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-11-12.12:46:46.566>
    closer = 'asvetlov'
    components = ['Library (Lib)', 'IO']
    creation = <Date 2020-11-01.20:07:47.968>
    creator = 'wiedi'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42237
    keywords = ['patch']
    message_count = 8.0
    messages = ['380160', '380348', '380356', '380358', '380362', '380810', '380812', '380817']
    nosy_count = 5.0
    nosy_names = ['asvetlov', 'jstasiak', 'miss-islington', 'kulikjak', 'wiedi']
    pr_nums = ['23085', '23154', '23244', '23246']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue42237'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    @wiedi
    Copy link
    Mannequin Author

    wiedi mannequin commented Nov 1, 2020

    I'm investigating some test failures related to sendfile on illumos:

    testCount (test.test_socket.SendfileUsingSendfileTest) ... FAIL
    testCountSmall (test.test_socket.SendfileUsingSendfileTest) ... ok
    testCountWithOffset (test.test_socket.SendfileUsingSendfileTest) ... ok
    testEmptyFileSend (test.test_socket.SendfileUsingSendfileTest) ... ok
    testNonBlocking (test.test_socket.SendfileUsingSendfileTest) ... ok
    testNonRegularFile (test.test_socket.SendfileUsingSendfileTest) ... ok
    testOffset (test.test_socket.SendfileUsingSendfileTest) ... ERROR
    testRegularFile (test.test_socket.SendfileUsingSendfileTest) ... ok
    testWithTimeout (test.test_socket.SendfileUsingSendfileTest) ... FAIL
    testWithTimeoutTriggeredSend (test.test_socket.SendfileUsingSendfileTest) ... ok
    test_errors (test.test_socket.SendfileUsingSendfileTest) ... ok

    ======================================================================
    ERROR: testOffset (test.test_socket.SendfileUsingSendfileTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/root/cpython/Lib/socket.py", line 386, in _sendfile_use_sendfile
        sent = os_sendfile(sockno, fileno, offset, blocksize)
    OSError: [Errno 22] Invalid argument
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/root/cpython/Lib/test/test_socket.py", line 374, in _tearDown
        raise exc
      File "/root/cpython/Lib/test/test_socket.py", line 392, in clientRun
        test_func()
      File "/root/cpython/Lib/test/test_socket.py", line 6057, in _testOffset
        sent = meth(file, offset=5000)
      File "/root/cpython/Lib/socket.py", line 399, in _sendfile_use_sendfile
        raise _GiveupOnSendfile(err)
    socket._GiveupOnSendfile: [Errno 22] Invalid argument

    ======================================================================
    FAIL: testCount (test.test_socket.SendfileUsingSendfileTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/root/cpython/Lib/test/test_socket.py", line 6085, in testCount
        self.assertEqual(len(data), count)
    AssertionError: 5405948743 != 5000007

    ======================================================================
    FAIL: testWithTimeout (test.test_socket.SendfileUsingSendfileTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/root/cpython/Lib/test/test_socket.py", line 6159, in testWithTimeout
        self.assertEqual(len(data), self.FILESIZE)
    AssertionError: 429006848 != 10485760

    Ran 11 tests in 33.603s

    FAILED (failures=2, errors=1)

    Looking at the testCount case I could observe repeated calls to sendfile() with out_fd=7, in_fd=6 off=0, len=5000007 which returned -1 with errno set to EAGAIN.

    @wiedi wiedi mannequin added 3.10 only security fixes stdlib Python modules in the Lib dir topic-IO labels Nov 1, 2020
    @jstasiak
    Copy link
    Mannequin

    jstasiak mannequin commented Nov 4, 2020

    I thought I'd give it a shot and I believe i found the issue. Let's use the testCount test as an example.

    The client side (or the data sending side) looks like this:

        def _testCount(self):
            address = self.serv.getsockname()
            file = open(os_helper.TESTFN, 'rb')
            sock = socket.create_connection(address,
                                            timeout=support.LOOPBACK_TIMEOUT)
            with sock, file:
                count = 5000007
                meth = self.meth_from_sock(sock)
                sent = meth(file, count=count)
                self.assertEqual(sent, count)
                self.assertEqual(file.tell(), count)

    So we're sending 5000007 bytes of data at once to a socket that has a timeout of 5.0 seconds set (default in those tests).

    Somewhere along the way socket._sendfile_use_sendfile() is called and in it there's a loop:

            try:
                while True:
                    (...)
                    try:
                        sent = os_sendfile(sockno, fileno, offset, blocksize)
                    except BlockingIOError:
                        if not timeout:
                            # Block until the socket is ready to send some
                            # data; avoids hogging CPU resources.
                            selector_select()
                        continue
                (...)
                return total_sent
    

    On my test VM running openindiana 5.11 (I think that's the version number?) this is basically an infinite loop (I think it'll end at some point, but I didn't have the patience to verify this). That's because trying to send 5000007 bytes to that socket with 5 seconds timeout will trigger BlockingIOError. Why?

    This is the relevant part of os.sendfile() implementation from posixmodule.c:

    do {
        Py_BEGIN_ALLOW_THREADS
        ret = sendfile(out_fd, in_fd, &offset, count);
        Py_END_ALLOW_THREADS
    } while (ret < 0 && errno == EINTR && !(async_err = PyErr_CheckSignals()));
    if (ret < 0)
        return (!async_err) ? posix_error() : NULL;
    return Py_BuildValue("n", ret);
    

    offset is 0 in our case, but its exact value doesn't matter. The trouble is this is what illumos sendfile man page[1] says:

    RETURN VALUES

       (...) In some error cases sendfile() may still write some
       data before encountering an error and returning -1.  When that occurs,
       off is updated to point to the byte that follows the last byte copied and
       should be compared with its value before calling sendfile() to determine
       how much data was sent.
    

    After some input from Jakub Kulik I believe this is a unique behavior (Linux, Oracle Solaris, Mac OS, FreeBSD and likely all or almost all other OSes don't do this) and it lacks handling on the Python side.

    I tested this and indeed the sendfile(out_fd, in_fd, &offset, count) call *does* do partial writes in our case which gets reported as EAGAIN which gets converted to BlockingIOError which makes socket._sendfile_use_sendfile() retry again and again, each time resending the very beginning of the data and not going any further, therefore accumulating a lot of garbage on the receiving socket's side.

    This patch works for me and I run the whole test_socket test suite with it, no more failures:

    diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c
    index 203f98515dfd..eeff11b5b8ea 100644
    --- a/Modules/posixmodule.c
    +++ b/Modules/posixmodule.c
    @@ -9475,13 +9475,20 @@ os_sendfile_impl(PyObject *module, int out_fd, int in_fd, PyObject *offobj,
    }
    #endif

    +    off_t original_offset = offset;
        do {
            Py_BEGIN_ALLOW_THREADS
            ret = sendfile(out_fd, in_fd, &offset, count);
            Py_END_ALLOW_THREADS
        } while (ret < 0 && errno == EINTR && !(async_err = PyErr_CheckSignals()));
    -    if (ret < 0)
    -        return (!async_err) ? posix_error() : NULL;
    +    if (ret < 0) {
    +       if (offset != original_offset) {
    +           ret = offset - original_offset;
    +       }
    +       else {
    +           return (!async_err) ? posix_error() : NULL;
    +       }
    +    }
        return Py_BuildValue("n", ret);
    #endif
    }

    If it's verified to be a good change I'm happy to resubmit it in a PR with appropriate illumos-specific #ifndefs.

    [1] https://illumos.org/man/3ext/sendfile

    @wiedi
    Copy link
    Mannequin Author

    wiedi mannequin commented Nov 4, 2020

    Excellent analysis, that's it! I've also tested your patch on SmartOS and it works great.

    @kulikjak
    Copy link
    Mannequin

    kulikjak mannequin commented Nov 4, 2020

    I did some further digging, and this is indeed not possible on Oracle Solaris (that is, sendfile() cannot write bytes and fail during the same call). We considered this a bug and changed/fixed it several years ago.

    Manual page doesn't mention that either:
    https://docs.oracle.com/cd/E88353_01/html/E37843/sendfile-3c.html

    I am not sure whether Illumos will want to change this as well (probably not since it is mentioned in the man page as expected behavior), but either way, the proposed change doesn't harm Oracle Solaris because 'if (offset != original_offset)' can never be true.

    @jstasiak
    Copy link
    Mannequin

    jstasiak mannequin commented Nov 4, 2020

    Thank you! I submitted a PR with a slightly modified patch (the comparison only happens on Solaris family of systems), I'd appreciate your confirmation that it still works (it's working for me on openindiana).

    @kulikjak kulikjak mannequin added the 3.9 only security fixes label Nov 5, 2020
    @asvetlov
    Copy link
    Contributor

    New changeset fd4ed57 by Jakub Stasiak in branch 'master':
    bpo-42237: Fix os.sendfile() on illumos (GH-23154)
    fd4ed57

    @miss-islington
    Copy link
    Contributor

    New changeset 7ae19ef by Miss Islington (bot) in branch '3.9':
    bpo-42237: Fix os.sendfile() on illumos (GH-23154)
    7ae19ef

    @asvetlov
    Copy link
    Contributor

    New changeset f37628e by Jakub Stasiak in branch '3.8':
    [3.8] bpo-42237: Fix os.sendfile() on illumos (GH-23154). (GH-23246)
    f37628e

    @asvetlov asvetlov added the 3.8 (EOL) end of life label Nov 12, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir topic-IO
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants