Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient zero-copy for shutil.copy* functions (Linux, OSX and Win) #77852

Closed
giampaolo opened this issue May 28, 2018 · 18 comments
Closed

Efficient zero-copy for shutil.copy* functions (Linux, OSX and Win) #77852

giampaolo opened this issue May 28, 2018 · 18 comments
Labels
3.8 performance stdlib

Comments

@giampaolo
Copy link
Contributor

@giampaolo giampaolo commented May 28, 2018

BPO 33671
Nosy @facundobatista, @ncoghlan, @pitrou, @scoder, @vstinner, @giampaolo, @tarekziade, @merwok, @bitdancer, @encukou, @asvetlov, @methane, @socketpair, @vadmium, @MojoVampire, @asottile
PRs
  • #7160
  • #7681
  • #12016
  • #27516
  • Files
  • shutil-zero-copy.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-06-12.21:48:06.592>
    created_at = <Date 2018-05-28.16:17:23.634>
    labels = ['3.8', 'library', 'performance']
    title = 'Efficient zero-copy for shutil.copy* functions (Linux, OSX and Win)'
    updated_at = <Date 2021-07-31.19:15:48.402>
    user = 'https://github.com/giampaolo'

    bugs.python.org fields:

    activity = <Date 2021-07-31.19:15:48.402>
    actor = 'eric.araujo'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-06-12.21:48:06.592>
    closer = 'giampaolo.rodola'
    components = ['Library (Lib)']
    creation = <Date 2018-05-28.16:17:23.634>
    creator = 'giampaolo.rodola'
    dependencies = []
    files = ['47621']
    hgrepos = []
    issue_num = 33671
    keywords = ['patch', 'needs review']
    message_count = 18.0
    messages = ['317878', '317880', '317905', '317906', '317932', '317989', '319401', '319405', '319484', '319486', '319980', '320247', '320456', '336492', '336502', '336514', '336516', '398660']
    nosy_count = 21.0
    nosy_names = ['facundobatista', 'ncoghlan', 'pitrou', 'scoder', 'vstinner', 'giampaolo.rodola', 'gps', 'StyXman', 'tarek', 'eric.araujo', 'r.david.murray', 'petr.viktorin', 'asvetlov', 'methane', 'SilentGhost', 'neologix', 'socketpair', 'python-dev', 'martin.panter', 'josh.r', 'Anthony Sottile']
    pr_nums = ['7160', '7681', '12016', '27516']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue33671'
    versions = ['Python 3.8']

    @giampaolo
    Copy link
    Contributor Author

    @giampaolo giampaolo commented May 28, 2018

    Patch in attachment uses platform specific zero-copy syscalls on Linux and Solaris (os.sendfile(2)), Windows (CopyFileW) and OSX (fcopyfile(2)) speeding up shutil.copyfile() and other functions using it (copy(), copy2(), copytree(), move()).

    Average speedup for a 512MB file copy is +24% on Linux, +50% on OSX and +48% on Windows by copying file on the same partition (SSD disk was used).

    Follows some benchmarks.

    Setup
    =====

    Create 128K, 8M, 512M file:

        $ python -c "import os; f = open('f1', 'wb'); f.write(os.urandom(128 * 1024))"
        $ python -c "import os; f = open('f1', 'wb'); f.write(os.urandom(8 * 1024 * 1024))"
        $ python -c "import os; f = open('f1', 'wb'); f.write(os.urandom(512 * 1024 * 1024))"

    Benchmark:

        $ time ./python -m timeit -s 'import shutil; p1 = "f1"; p2 = "f2"' 'shutil.copyfile(p1, p2)'

    Linux
    =====

    128K copy (+13%):

    without patch:
        1000 loops, best of 5: 228 usec per loop
        real    0m1.756s
        user    0m0.386s
        sys     0m1.116s
    
        with patch:
            1000 loops, best of 5: 198 usec per loop
            real    0m1.464s
            user    0m0.281s
            sys     0m0.958s

    8MB copy (+24%):

    without patch:
        50 loops, best of 5: 10.1 msec per loop
        real    0m2.703s
        user    0m0.316s
        sys     0m1.847s
    
        with patch:
            50 loops, best of 5: 7.78 msec per loop
            real    0m2.447s
            user    0m0.086s
            sys     0m1.682s

    512MB copy (+26%):

    without patch:
        1 loop, best of 5: 872 msec per loop
        real    0m5.574s
        user    0m0.402s
        sys     0m3.115s
    
        with patch:
            1 loop, best of 5: 646 msec per loop
            real    0m5.475s
            user    0m0.037s
            sys     0m2.959s

    OSX
    ===

    128K copy (+8.5%):

    without patch:
        500 loops, best of 5: 508 usec per loop
        real    0m2.971s
        user    0m0.442s
        sys     0m2.168s
    
        with patch:
            500 loops, best of 5: 464 usec per loop
            real    0m2.798s
            user    0m0.379s
            sys     0m2.031s

    8MB copy (+67%):

    without patch:
        20 loops, best of 5: 32.8 msec per loop
        real    0m3.672s
        user    0m0.357s
        sys     0m1.434s
    
        with patch:
            20 loops, best of 5: 10.8 msec per loop
            real    0m1.860s
            user    0m0.079s
            sys     0m0.719s

    512MB copy (+50%):

    without patch:
        1 loop, best of 5: 953 msec per loop
        real    0m5.930s
        user    0m1.021s
        sys     0m4.835s
    
    with patch:
        1 loop, best of 5: 480 msec per loop
        real    0m3.150s
        user    0m0.067s
        sys     0m2.740s
    

    Windows
    =======

    128K copy (+69%):

    without patch:
        50 loops, best of 5: 6.45 msec per loop
    with patch:
        50 loops, best of 5: 1.99 msec per loop
    

    8M copy (+64%):

    without patch:
        10 loops, best of 5: 22.6 msec per loop
    with patch:
        50 loops, best of 5: 7.95 msec per loop
    

    512M copy (+48%):

    without patch:
        1 loop, best of 5: 1.21 sec per loop
    with patch:
        1 loop, best of 5: 629 msec per loop
    

    @giampaolo giampaolo added 3.8 stdlib performance labels May 28, 2018
    @giampaolo giampaolo changed the title Efficient efficient zero-copy syscalls for shutil.copy* functions (Linux, OSX and Win) Efficient zero-copy for shutil.copy* functions (Linux, OSX and Win) May 28, 2018
    @giampaolo
    Copy link
    Contributor Author

    @giampaolo giampaolo commented May 28, 2018

    PR: #7160

    @scoder
    Copy link
    Contributor

    @scoder scoder commented May 28, 2018

    Nice, I really like this.

    Apart from the usual bit of minor style issues, I couldn't see anything inherently wrong with the PR, but I'll leave the detailed reviews to those who'd have to maintain the code in the future. :)

    @scoder
    Copy link
    Contributor

    @scoder scoder commented May 28, 2018

    Regarding the benchmarks, just to be sure, did you try reversing the run order to make sure you don't get unfair caching effects for the later runs?

    @socketpair
    Copy link
    Mannequin

    @socketpair socketpair mannequin commented May 28, 2018

    http://man7.org/linux/man-pages/man2/ioctl_ficlonerange.2.html

    That possibly should be used under Linux in order to really acheive zero-copying. Just like modern cp command.

    @giampaolo
    Copy link
    Contributor Author

    @giampaolo giampaolo commented May 29, 2018

    Yes, I tried changing benchmarks order and zero-copy variants are always faster. As for instantaneous CoW copy, it is debatable. E.g. "cp" command does not do it by default:
    https://unix.stackexchange.com/questions/80351/why-is-cp-reflink-auto-not-the-default-behaviour
    I think shutil should follow the same lead, and perhaps provide a cow=bool argument in the future.

    @giampaolo
    Copy link
    Contributor Author

    @giampaolo giampaolo commented Jun 12, 2018

    New changeset 4a172cc by Giampaolo Rodola in branch 'master':
    bpo-33671: efficient zero-copy for shutil.copy* functions (Linux, OSX and Win) (bpo-7160)
    4a172cc

    @giampaolo
    Copy link
    Contributor Author

    @giampaolo giampaolo commented Jun 12, 2018

    For future reference, as per #7160 discussion, we decided not to use CopyFileEx on Windows and instead increase read() buffer size from 16KB to 1MB (Windows only) resulting in a 40.8% speedup (instead of 48%). Also copyfileobj() has been optimized on all platforms by using readinto()/memoryview()/bytearray().
    Updated benchmarks on Windows:

    128KB copy (+27%)

    without patch:
        50 loops, best of 5: 7.69 sec per loop
    with patch:
        50 loops, best of 5: 5.61 sec per loop
    

    8MB copy (+45.6%)

    without patch:
        10 loops, best of 5: 20.8 sec per loop
    with patch:
        20 loops, best of 5: 11.3 sec per loop
    

    512MB copy (+40.8%)

    without patch:
        1 loop, best of 5: 1.26 sec per loop
    with patch:
        1 loop, best of 5: 646 msec per loop
    

    @StyXman
    Copy link
    Mannequin

    @StyXman StyXman mannequin commented Jun 13, 2018

    Thanks Gianpaolo for pushing for this. Great job.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Jun 13, 2018

    Thanks Gianpaolo for pushing for this. Great job.

    I concur: great job! Cool optimization.

    @giampaolo
    Copy link
    Contributor Author

    @giampaolo giampaolo commented Jun 19, 2018

    New changeset c7f02a9 by Giampaolo Rodola in branch 'master':
    bpo-33671 / shutil.copyfile: use memoryview() with dynamic size on Windows (bpo-7681)
    c7f02a9

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Jun 22, 2018

    New changeset 8fbbdf0 by Victor Stinner in branch 'master':
    bpo-33671: Add support.MS_WINDOWS and support.MACOS (GH-7800)
    8fbbdf0

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Jun 26, 2018

    New changeset 937ee9e by Victor Stinner in branch 'master':
    Revert "bpo-33671: Add support.MS_WINDOWS and support.MACOS (GH-7800)" (GH-7919)
    937ee9e

    @methane
    Copy link
    Member

    @methane methane commented Feb 24, 2019

    New changeset 3b0abb0 by Inada Naoki (Giampaolo Rodola) in branch 'master':
    bpo-33671: allow setting shutil.copyfile() bufsize globally (GH-12016)
    3b0abb0

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 25, 2019

    shutil.COPY_BUFSIZE isn't documented. Is it a deliberate choice?
    https://docs.python.org/dev/library/shutil.html

    @giampaolo
    Copy link
    Contributor Author

    @giampaolo giampaolo commented Feb 25, 2019

    Yes, it's deliberate, see PR-12016.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 25, 2019

    Yes, it's deliberate, see PR-12016.

    "I decided not to document shutil.COPY_BUFSIZE because I consider it a corner case."

    Ok. I have no opinion on that, I just wanted to ask the question :-)

    @merwok
    Copy link
    Member

    @merwok merwok commented Jul 31, 2021

    New changeset b08c48e by Anthony Sottile in branch 'main':
    bpo-33671 fix orphaned comment in shutil.copyfileobj (GH-27516)
    b08c48e

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 performance stdlib
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants