Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add splice() to the os module #85791

Closed
pablogsal opened this issue Aug 24, 2020 · 30 comments
Closed

Add splice() to the os module #85791

pablogsal opened this issue Aug 24, 2020 · 30 comments
Assignees
Labels
3.10 stdlib

Comments

@pablogsal
Copy link
Member

@pablogsal pablogsal commented Aug 24, 2020

BPO 41625
Nosy @vstinner, @serhiy-storchaka, @aixtools, @corona10, @pablogsal, @miss-islington, @shihai1991
PRs
  • #21947
  • #23340
  • #23350
  • #23351
  • #23354
  • #23608
  • Files
  • socket.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/pablogsal'
    closed_at = <Date 2020-12-02.17:57:39.517>
    created_at = <Date 2020-08-24.15:57:18.528>
    labels = ['library', '3.10']
    title = 'Add splice() to the os module'
    updated_at = <Date 2020-12-08.21:54:24.782>
    user = 'https://github.com/pablogsal'

    bugs.python.org fields:

    activity = <Date 2020-12-08.21:54:24.782>
    actor = 'Michael.Felt'
    assignee = 'pablogsal'
    closed = True
    closed_date = <Date 2020-12-02.17:57:39.517>
    closer = 'pablogsal'
    components = ['Library (Lib)']
    creation = <Date 2020-08-24.15:57:18.528>
    creator = 'pablogsal'
    dependencies = []
    files = ['49624']
    hgrepos = []
    issue_num = 41625
    keywords = ['patch']
    message_count = 30.0
    messages = ['375851', '375852', '375857', '375872', '375873', '375875', '375876', '375877', '375878', '378581', '381196', '381228', '381233', '381259', '381261', '381268', '381281', '381282', '381290', '381295', '381311', '381327', '381888', '381902', '382290', '382291', '382292', '382302', '382331', '382768']
    nosy_count = 7.0
    nosy_names = ['vstinner', 'serhiy.storchaka', 'Michael.Felt', 'corona10', 'pablogsal', 'miss-islington', 'shihai1991']
    pr_nums = ['21947', '23340', '23350', '23351', '23354', '23608']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue41625'
    versions = ['Python 3.10']

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Aug 24, 2020

    The splice system call moves data between two file descriptors without copying between kernel address space and user address space. This can be a very useful addition for libraries implementing low-level file management.

    @pablogsal pablogsal added stdlib 3.10 labels Aug 24, 2020
    @pablogsal pablogsal self-assigned this Aug 24, 2020
    @pablogsal pablogsal added stdlib 3.10 labels Aug 24, 2020
    @pablogsal pablogsal self-assigned this Aug 24, 2020
    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Aug 24, 2020

    I don't recall the subtle differences between sendfile() and splice(). I recall that in early Linux versions, one was limited to sockets, and only on one side. But later, it became possible to pass two sockets, or one file on disk and one socket, etc.

    Python exposes sendfile() as os.sendfile() since Python 3.3:
    https://docs.python.org/dev/library/os.html#os.sendfile

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Aug 24, 2020

    I don't recall the subtle differences between sendfile() and splice().

    Basically, splice() is specialized for pipes:

    splice() only works if one of the file descriptors refer to a pipe. So you can use for e.g. socket-to-pipe or pipe-to-file without copying the data into userspace. But you can't do file-to-file copies with it.

    sendfile() only works if the source file descriptor refers to something that can be mmap()ed (i.e. mostly normal files) and before 2.6.33 the destination must be a socket.

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Aug 25, 2020

    The API of splice() looks complicated. How would you use it in Python?

    Are off_in and off_out adjusted as in copy_file_range() and sendfile()? It is not clear from the man page. If they are, how would you return updated values?

    Are you going to add vmsplice() and tee() too? Since it is Linux-specific API, would not be better to add a purposed module linux?

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Aug 25, 2020

    Are you going to add vmsplice() and tee() too? Since it is Linux-specific API, would not be better to add a purposed module linux?

    It's not uncommon that a syscall added to the Linux kernel is later added to other platforms.

    Example: getrandom() exists in Linux and Solaris.

    Example: memfd_create() was designed in Linux, and added later to FreeBSD: freebsd/freebsd-src@575e351 (see bpo-41013).

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Aug 25, 2020

    OpenBSD uses a different API:
    https://man.openbsd.org/sosplice.9

    int sosplice(struct socket *so, int fd, off_t max, struct timeval *tv);
    int somove(struct socket *so, int wait);

    "The function sosplice() is used to splice together a source and a drain socket."

    "The function somove() transfers data from the source's receive buffer to the drain's send buffer."

    "Socket splicing can be invoked from userland via the setsockopt(2) system-call at the SOL_SOCKET level with the socket option SO_SPLICE."

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Aug 25, 2020

    The API of splice() looks complicated. How would you use it in Python?

    It has the same API as copy_file_range and other similar system calls that we already expose, so we just need to do the same thing we do there.

    Are off_in and off_out adjusted as in copy_file_range() and sendfile()? It is not clear from the man page. If they are, how would you return updated values?

    It behaves the same as in copy_file_range() with the exception that one has to be None (the one associated with the pipe file descriptor). We don't return the updated values (neither we do in copy_file_range()).

    Are you going to add vmsplice() and tee() too? Since it is Linux-specific API, would not be better to add a purposed module linux?

    We can certainly discuss adding vmsplice() and tee() (probably tee is more interesting), but in my humble oppinion that would be a different discussion.

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Aug 25, 2020

    OpenBSD uses a different API:

    The semantics are considerably different (splice() is about pipes while sosplice() talks about general sockets). Also, the point of splice() is to skip copying from kernel buffers, but sosplice() does not mention that it does not copy between userspace and kernel space

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Aug 25, 2020

    Since it is Linux-specific API, would not be better to add a purposed module linux?

    This is an interesting point, but I think that at this particular point it would be more confusing for users than not (normally people go to the os module for system calls) and as Victor mention, we would need to update the os module if some other operative system adds the system call later

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Oct 13, 2020

    Heads up: I plant to land this next week in case someone could to do a review or has something against

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Nov 17, 2020

    New changeset a57b3d3 by Pablo Galindo in branch 'master':
    bpo-41625: Expose the splice() system call in the os module (GH-21947)
    a57b3d3

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Nov 17, 2020

    .. availability:: Linux kernel >= 2.6.17 or glibc >= 2.5

    Do you mean "Linux kernel >= 2.6.17 and glibc >= 2.5" ?

    .. data:: SPLICE_F_MOVE

    Maybe also add " .. versionadded:: 3.10" on these constants.

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Nov 17, 2020

    Do you mean "Linux kernel >= 2.6.17 and glibc >= 2.5" ?

    My understanding is that glibc provides emulation for glibc >= 2.5

    The section from the manpage says:

       The splice() system call first appeared in Linux 2.6.17; library
       support was added to glibc in version 2.5.
    

    Not sure how to interpret that. You want to change the "or" to "and"?

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Nov 17, 2020

    I reopen the issue. This issue broke Python compilation on AIX.

    https://buildbot.python.org/all/#/builders/302/builds/377

    configure: "checking for splice... yes"

    "./Modules/posixmodule.c", line 15146.53: 1506-045 (S) Undeclared identifier SPLICE_F_MOVE.
    "./Modules/posixmodule.c", line 15147.57: 1506-045 (S) Undeclared identifier SPLICE_F_NONBLOCK.
    "./Modules/posixmodule.c", line 15148.53: 1506-045 (S) Undeclared identifier SPLICE_F_MORE.

    make: 1254-004 The error code from the last command is 1.

    The code:

    /* constants for splice */
    #ifdef HAVE_SPLICE
        if (PyModule_AddIntConstant(m, "SPLICE_F_MOVE", SPLICE_F_MOVE)) return -1;
        if (PyModule_AddIntConstant(m, "SPLICE_F_NONBLOCK", SPLICE_F_NONBLOCK)) return -1;
        if (PyModule_AddIntConstant(m, "SPLICE_F_MORE", SPLICE_F_MORE)) return -1;
    #endif

    @vstinner vstinner reopened this Nov 17, 2020
    @vstinner vstinner reopened this Nov 17, 2020
    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Nov 17, 2020

    The splice() system call first appeared in Linux 2.6.17;
    library support was added to glibc in version 2.5.

    There is no emulation. It's just a function which wraps the syscall:

    https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/splice.c;h=fe21cf1988c48ce887a22c9e5e5f36cbd653a4c8;hb=HEAD

    I understand that you need Linux kernel >= 2.6.17 *and* glibc >= 2.5.

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Nov 17, 2020

    New changeset fa96608 by Pablo Galindo in branch 'master':
    bpo-41625: Add versionadded to os.splice() constants (GH-23340)
    fa96608

    @miss-islington
    Copy link
    Contributor

    @miss-islington miss-islington commented Nov 17, 2020

    New changeset e59958f by Pablo Galindo in branch 'master':
    bpo-41625: Specify that Linux >= 2.6.17 *and* glibc >= 2.5 are requir… (GH-23351)
    e59958f

    @miss-islington
    Copy link
    Contributor

    @miss-islington miss-islington commented Nov 17, 2020

    New changeset 2a9eddf by Pablo Galindo in branch 'master':
    bpo-41625: Add a guard for Linux for splice() constants in the os module (GH-23350)
    2a9eddf

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Nov 17, 2020

    Nice, AIX can build again Python. But now the 3 tests fail since the test uses a pipe and a file, whereas on AIX, it seems like splice() requires one end to be a socket.

    I wrote attached PR 23354 to skip the 3 tests on AIX.

    ======================================================================
    ERROR: test_splice (test.test_os.FileTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/test_os.py", line 406, in test_splice
        i = os.splice(in_fd, write_fd, 5)
    OSError: [Errno 57] Socket operation on non-socket

    ======================================================================
    ERROR: test_splice_offset_in (test.test_os.FileTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/test_os.py", line 440, in test_splice_offset_in
        i = os.splice(in_fd, write_fd, bytes_to_copy, offset_src=in_skip)
    OSError: [Errno 57] Socket operation on non-socket

    ======================================================================
    ERROR: test_splice_offset_out (test.test_os.FileTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/test_os.py", line 479, in test_splice_offset_out
        i = os.splice(read_fd, out_fd, bytes_to_copy, offset_dst=out_seek)
    OSError: [Errno 57] Socket operation on non-socket

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Nov 17, 2020

    New changeset 1de61d3 by Victor Stinner in branch 'master':
    bpo-41625: Skip os.splice() tests on AIX (GH-23354)
    1de61d3

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Nov 18, 2020

    FYI I checked and AIX is fixed. All tests pass again on POWER6 AIX 3.x buildbot.

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Nov 18, 2020

    Thanks a lot Victor

    @aixtools
    Copy link
    Contributor

    @aixtools aixtools commented Nov 26, 2020

    This is still broken.

    Since this was included in master - the AIX buildbot is failing to compile (https://buildbot.python.org/all/#/builders/438/builds/391 and https://buildbot.python.org/all/#/builders/302/builds/377)

    Strangely enough - the first bot continues to fail compile at the same location - while the second bot (running in a different environment) starting passing compile and all tests starting with https://buildbot.python.org/all/#/builders/302/builds/406.

    Note: bot 1 is using what I call (personal opinion) a mixed environment with some libraries coming from OSS packages and some from IBM AIX. bot 2 - relies on IBM AIX libraries.

    ++++++
    Note: manual build on same system as bot 1 using gcc - gives same error:

    aixtools@gcc119:[/home/aixtools/cpython/cpython-master]make V=1
    gcc -pthread -c -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -O -std=c99 -Wextra -Wno-unused-result -Wno-unused-parameter -Wno-missing-field-initializers -Werror=implicit-function-declaration -fvisibility=hidden -I./Include/internal -I. -I./Include -DPy_BUILD_CORE -DABIFLAGS='""' -o Python/sysmodule.o ./Python/sysmodule.c
    gcc -pthread -c -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -O -std=c99 -Wextra -Wno-unused-result -Wno-unused-parameter -Wno-missing-field-initializers -Werror=implicit-function-declaration -fvisibility=hidden -I./Include/internal -I. -I./Include -DPy_BUILD_CORE -o Modules/config.o Modules/config.c
    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -O -std=c99 -Wextra -Wno-unused-result -Wno-unused-parameter -Wno-missing-field-initializers -Werror=implicit-function-declaration -fvisibility=hidden -I./Include/internal -I. -I./Include -DPy_BUILD_CORE_BUILTIN -DPy_BUILD_CORE_BUILTIN -I./Include/internal -c ./Modules/posixmodule.c -o Modules/posixmodule.o
    ./Modules/posixmodule.c: In function 'os_splice_impl':
    ./Modules/posixmodule.c:10429:15: error: implicit declaration of function 'splice'; did you mean 'plock'? [-Werror=implicit-function-declaration]
    ret = splice(src, p_offset_src, dst, p_offset_dst, count, flags);
    ^~~~~~
    plock
    cc1: some warnings being treated as errors
    make: 1254-004 The error code from the last command is 1.

    • On same system, using xlc-v13, the build completes normally.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Nov 26, 2020

    ./Modules/posixmodule.c:10429:15: error: implicit declaration of function 'splice'; did you mean 'plock'? [-Werror=implicit-function-declaration]

    Is it possible that posixmodule.c lacks an #include to get the function on AIX?

    On AIX 7.1, man splice says:

           #include <sys/types.h>
           #include <sys/socket.h>
           int splice(socket1, socket2, flags)
           int socket1, socket2;
           int flags;

    posixmodule.c doesn't include it on AIX:

    #if defined(__FreeBSD__) || defined(__DragonFly__) || defined(__APPLE__)
    #  ifdef HAVE_SYS_SOCKET_H
    #    include <sys/socket.h>
    #  endif
    #endif

    Michael: Would you mind to try building the master branch of Python with attached socket.patch? (on the worker where Python no longer builds)

    @vstinner vstinner reopened this Nov 26, 2020
    @vstinner vstinner reopened this Nov 26, 2020
    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Dec 2, 2020

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Dec 2, 2020

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Dec 2, 2020

    Seems that adding #include <sys/socket.h> does not work so I am going to skip adding this function on AIX. If someone is interested in fixing it, they can remove the #ifdef and figure out what's going on with that buildbot

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Dec 2, 2020

    Seems that adding #include <sys/socket.h> does not work so I am going to skip adding this function on AIX.

    I'm fine with not implementing the function on AIX for now.

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Dec 2, 2020

    New changeset dedc2cd by Pablo Galindo in branch 'master':
    bpo-41625: Do not add os.splice on AIX due to compatibility issues (GH-23608)
    dedc2cd

    @aixtools
    Copy link
    Contributor

    @aixtools aixtools commented Dec 8, 2020

    Sorry Victor - family matters - so I was not watching for several days.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 stdlib
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants