Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize new io library #48811

Closed
tiran opened this issue Dec 6, 2008 · 30 comments
Closed

Optimize new io library #48811

tiran opened this issue Dec 6, 2008 · 30 comments
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@tiran
Copy link
Member

tiran commented Dec 6, 2008

BPO 4561
Nosy @birkenfeld, @rhettinger, @amauryfa, @pitrou, @giampaolo, @tiran
Superseder
  • bpo-4565: Rewrite the IO stack in C
  • Files
  • count_linenendings2.patch
  • test_write.txt
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-01-18.12:33:27.331>
    created_at = <Date 2008-12-06.12:14:50.793>
    labels = ['extension-modules', 'library', 'performance']
    title = 'Optimize new io library'
    updated_at = <Date 2009-01-18.12:33:27.330>
    user = 'https://github.com/tiran'

    bugs.python.org fields:

    activity = <Date 2009-01-18.12:33:27.330>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-01-18.12:33:27.331>
    closer = 'pitrou'
    components = ['Extension Modules', 'Library (Lib)']
    creation = <Date 2008-12-06.12:14:50.793>
    creator = 'christian.heimes'
    dependencies = []
    files = ['12255', '12257']
    hgrepos = []
    issue_num = 4561
    keywords = ['patch']
    message_count = 30.0
    messages = ['77116', '77117', '77118', '77120', '77124', '77125', '77129', '77131', '77145', '77160', '77172', '77176', '77177', '77178', '77762', '77884', '77894', '77895', '77896', '77900', '77901', '77903', '77904', '77906', '77910', '77912', '77915', '77917', '78097', '80094']
    nosy_count = 9.0
    nosy_names = ['georg.brandl', 'rhettinger', 'beazley', 'amaury.forgeotdarc', 'pitrou', 'giampaolo.rodola', 'christian.heimes', 'donmez', 'wplappert']
    pr_nums = []
    priority = 'critical'
    resolution = 'duplicate'
    stage = 'test needed'
    status = 'closed'
    superseder = '4565'
    type = 'performance'
    url = 'https://bugs.python.org/issue4561'
    versions = ['Python 3.0', 'Python 3.1']

    @tiran
    Copy link
    Member Author

    tiran commented Dec 6, 2008

    The new io library needs some serious profiling and optimization work.
    I've already fixed a severe slowdown in _fileio.FileIO's read buffer
    allocation algorithm (bpo-4533).

    More profiling tests have shown a speed problem in write() files opened
    in text mode. For example three str.count() calls are taking up 20% of
    the time. The str.count calls can be replaced with an optimized C
    function that returns the count of (\r\n, \n, \r) in one pass instead of
    three passes.

    @tiran tiran added extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir performance Performance or resource usage labels Dec 6, 2008
    @beazley
    Copy link
    Mannequin

    beazley mannequin commented Dec 6, 2008

    I've done some profiling and the performance of reading line-by-line is
    considerably worse in Python 3 than in Python 2. For example, this
    code:

    for line in open("somefile.txt"):
        pass

    Ran 35 times slower in Python 3.0 than Python 2.6 when I tested it on a
    big text file (100 Megabytes). If you disable Unicode by opening the
    file in binary mode, it runs even slower.

    This slowdown is really unacceptable for anyone who uses Python for
    parsing big non-Unicode text files (and I would claim that there are
    many such people).

    @tiran
    Copy link
    Member Author

    tiran commented Dec 6, 2008

    Your issue is most like caused by bpo-4533. Please download the latest svn
    version of Python 3.0 (branches/release30_maint) and try again.

    @tiran
    Copy link
    Member Author

    tiran commented Dec 6, 2008

    Here is a patch againt the py3k branch that reduces the time for the
    line ending detection from 0.55s to 0.22s for a 50MB file on my test
    system.

    @beazley
    Copy link
    Mannequin

    beazley mannequin commented Dec 6, 2008

    Tried this using projects/python/branches/release30-maint and using the
    patch that was just attached. With a 66MB input file, here are the
    results of this code fragment:

    for line in open("BIGFILE):
        pass

    Python 2.6: 0.67s
    Python 3.0: 32.687s (48 times slower)

    This is running on a MacBook with a warm disk cache. For what it's
    worth, I didn't see any improvement with that patch.

    @beazley
    Copy link
    Mannequin

    beazley mannequin commented Dec 6, 2008

    Just as one other followup, if you change the code in the last example
    to use binary mode like this:

    for line in open("BIG","rb"):
        pass

    You get the following results:

    Python 2.6: 0.64s
    Python 3.0: 42.26s (66 times slower)

    @birkenfeld
    Copy link
    Member

    David, the reading bug fix/optimization is not (yet?) on
    release30-maint, only on branches/py3k.

    @beazley
    Copy link
    Mannequin

    beazley mannequin commented Dec 6, 2008

    Just checked it with branches/py3k and the performance is the same.

    @tiran
    Copy link
    Member Author

    tiran commented Dec 6, 2008

    What's your OS, David? Please post the output of "uname -r" and ./python
    -c "import sys; print(sys.version)"

    @beazley
    Copy link
    Mannequin

    beazley mannequin commented Dec 6, 2008

    bash-3.2$ uname -a
    Darwin david-beazleys-macbook.local 9.5.1 Darwin Kernel Version 9.5.1: Fri
    Sep 19 16:19:24 PDT 2008; root:xnu-1228.8.30~1/RELEASE_I386 i386
    bash-3.2$ ./python.exe -c "import sys; print(sys.version)"
    3.1a0 (py3k:67609, Dec 6 2008, 08:47:06)
    [GCC 4.0.1 (Apple Inc. build 5465)]
    bash-3.2$

    @tiran
    Copy link
    Member Author

    tiran commented Dec 6, 2008

    I've updated the patch with proper formatting, some minor cleanups and a
    unit test.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 6, 2008

    I don't think this is a public API, so the function should probably be
    renamed _count_lineendings.
    Also, are there some benchmark numbers?

    @tiran
    Copy link
    Member Author

    tiran commented Dec 6, 2008

    I'll come up with some reading benchmarks tomorrow. For now here is a
    benchmark of write(). You can clearly see the excessive usage of closed,
    len() and isinstance().

    @tiran
    Copy link
    Member Author

    tiran commented Dec 6, 2008

    Roundup doesn't display .log files as plain text files.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 13, 2008

    Christian, by benchmarks I meant a measurement of text reading with and
    without the patch.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 15, 2008

    I've written a small file IO benchmark, available here:
    http://svn.python.org/view/sandbox/trunk/iobench/

    It runs under both 2.6 and 3.x, so that we can compare speeds of
    respective implementations.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 16, 2008

    Without Christian's patch:

    [400KB.txt] read one byte/char at a time... 0.2685 MB/s (100% CPU)
    [400KB.txt] read 20 bytes/chars at a time... 4.536 MB/s (98% CPU)
    [400KB.txt] read one line at a time... 3.805 MB/s (99% CPU)
    [400KB.txt] read 4096 bytes/chars at a time... 29.23 MB/s (100% CPU)

    [ 20KB.txt] read whole contents at once... 52.42 MB/s (99% CPU)
    [400KB.txt] read whole contents at once... 45.83 MB/s (100% CPU)
    [ 10MB.txt] read whole contents at once... 48.78 MB/s (99% CPU)

    With the patch:

    [400KB.txt] read one byte/char at a time... 0.2761 MB/s (100% CPU)
    [400KB.txt] read 20 bytes/chars at a time... 4.656 MB/s (99% CPU)
    [400KB.txt] read one line at a time... 3.956 MB/s (98% CPU)
    [400KB.txt] read 4096 bytes/chars at a time... 33.85 MB/s (100% CPU)

    [ 20KB.txt] read whole contents at once... 66.17 MB/s (99% CPU)
    [400KB.txt] read whole contents at once... 56.65 MB/s (99% CPU)
    [ 10MB.txt] read whole contents at once... 63.69 MB/s (99% CPU)

    Python 2.6's builtin file object:

    [400KB.txt] read one byte/char at a time... 1.347 MB/s (97% CPU)
    [400KB.txt] read 20 bytes/chars at a time... 26.65 MB/s (99% CPU)
    [400KB.txt] read one line at a time... 184.4 MB/s (100% CPU)
    [400KB.txt] read 4096 bytes/chars at a time... 1163 MB/s (99% CPU)

    [ 20KB.txt] read whole contents at once... 1072 MB/s (100% CPU)
    [400KB.txt] read whole contents at once... 889.1 MB/s (100% CPU)
    [ 10MB.txt] read whole contents at once... 600 MB/s (100% CPU)

    @rhettinger
    Copy link
    Contributor

    I'm getting caught-up with the IO changes in 3.0 and am a bit confused.
    The PEP says, "programmers who don't want to muck about in the new I/O
    world can expect that the open() factory method will produce an object
    backwards-compatible with old-style file objects." So, I would have
    expected that the old implementation could have remained in-place and
    the resultant object registered as matching the appropriate IO ABC. If
    that had been done, the performance would be unchanged. Does anyone
    know why the entire old implementation had to be thrown-out in cases
    where the API was unchanged? Is there anything about New IO that is
    fundamentally different so that the old implementation had to be tossed
    in all cases?

    @amauryfa
    Copy link
    Member

    The previous implementation only returns bytes and does not translate
    newlines. For this particular case, indeed, the plain old FILE* based
    object is faster.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 16, 2008

    I seem to recall one of the design principles of the new IO stack was to
    avoid relying on the C stdlib's buffered API, which has too many
    platform-dependant behaviours.

    In any case, binary reading has acceptable performance in py3k (although
    3x-4x slower than in 2.x), it's text I/O which is truely horrendous.

    @rhettinger
    Copy link
    Contributor

    I don't agree that that was a worthy design goal. Tons of code (incl
    the old CPython successfully used the stdlib functions). IMO, a 3x or
    4x falloff for binary reads/writes is a significant disincentive for
    adopting Py3.0. For binary reads/writes, I would like to see the open()
    factory function return the old, fast object instead to trying to slowly
    simulate it without adding any benefits noticeable to an end-user. IMO,
    it's a case of practicality beating purity.

    @beazley
    Copy link
    Mannequin

    beazley mannequin commented Dec 16, 2008

    I agree with Raymond. For binary reads, I'll go farther and say that
    even a 10% slowdown in performance would be surprising if not
    unacceptable to some people. I know that as hard as it might be for
    everyone to believe, there are a lot of people who crank lots of non-
    Unicode data with Python. In fact, Python 2.X is pretty good at it.

    It's fine that text mode now uses Unicode, but if I don't want that, I
    would certainly expect the binary file modes to run at virtually the
    same speed as Python 2 (e.g., okay, they work with bytes instead of
    strings, but is the bytes type really all that different from the old
    Python 2 str type?).

    @pitrou
    Copy link
    Member

    pitrou commented Dec 16, 2008

    I don't agree that that was a worthy design goal.

    I don't necessarily agree either, but it's probably too late now.
    The py3k buffered IO object has additional methods (e.g. peek(),
    read1()) which can be used by upper layers (text IO) and so can't be
    replaced with the old 2.x file object.

    In any case, Amaury has started rewriting the IO lib in C (*) and
    getting good binary IO performance shouldn't be too difficult.

    (*) http://svn.python.org/view/sandbox/trunk/io-c/

    @pitrou
    Copy link
    Member

    pitrou commented Dec 16, 2008

    I know that as hard as it might be for
    everyone to believe, there are a lot of people who crank lots of non-
    Unicode data with Python.

    But "cranking data" implies you'll do something useful with it, and
    therefore spend CPU time doing those useful things (likely much more CPU
    time than you spent read()ing the data in the first place).

    In any case, you can try to open your file in unbuffered mode:
    open("foobar", "rb", buffering=0)

    it will bypass the Python buffering layer and will go directly to the
    raw C unbuffered object.

    (e.g., okay, they work with bytes instead of
    strings, but is the bytes type really all that different from the old
    Python 2 str type?)

    No. It's a bit more limited, doesn't support autoconversion to/from
    unicode, but that's all.

    @beazley
    Copy link
    Mannequin

    beazley mannequin commented Dec 16, 2008

    Good luck with that. Most people who get bright ideas such as "gee,
    maybe I'll write my own version of X" where "X" is some part of the
    standard C library pertaining to I/O, end up fighting a losing battle.
    Of course, I'd love to be proven wrong, but I don't think I will in this
    case.

    As for cranking data, that does not necessarily imply heavy-duty CPU
    processing. Someone might be reading large datafiles simply to perform
    some kind of data extraction, filtering, minor translation, or other
    operation that is easily carried out in Python, but where the programs
    are still I/O bound. For example, the kinds of processing one might
    otherwise do using awk, sed, perl, etc.

    @tiran
    Copy link
    Member Author

    tiran commented Dec 16, 2008

    David:
    Amaury's work is going to be a part of the standard library as soon as
    his work is done. I'm confident that we can reach the old speed of the
    2.x file type by carefully moving code to C modules.

    @beazley
    Copy link
    Mannequin

    beazley mannequin commented Dec 16, 2008

    I wish I shared your optimism about this, but I don't. Here's a short
    explanation why.

    The problem of I/O and the associated interface between hardware, the
    operating system kernel, and user applications is one of the most
    fundamental and carefully studied problems in all of computer systems.
    The C library and its associated I/O functionality provide the user-
    space implementation of this interface. However, if you peel the covers
    off of the C library, you're going to find a lot of really hairy stuff
    in there. Examples might include:

    1. Low-level optimization related to the system hardware (processor
      architecture, caching, I/O bus, etc.).

    2. Hand-written finely tuned assembly code.

    3. Low-level platform-specific system calls such as ioctl().

    4. System calls related to shared memory regions, kernel buffers, etc.
      (i.e., optimizations that try to eliminate buffer copies).

    5. Undocumented vendor-specific "proprietary" system calls (i.e.,
      unknown "magic").

    So, you'll have to forgive me for being skeptical, but I just don't
    think any programmer is going to sit down and bang out a new
    implementation of buffered I/O that is going to match the performance of
    what's provided by the C library.

    Again, I would love to be proven wrong.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 16, 2008

    [...]

    Although I agree all this is important, I'd challenge the assumption it
    has its place in the buffered IO library rather than in lower-level
    layers (i.e. kernel & userspace unbuffered IO).

    In any case, it will be difficult to undo the current design decisions
    (however misguided they may or may not be) of the py3k IO library and
    we'll have to make the best out of them!

    @loewis loewis mannequin added release-blocker and removed deferred-blocker labels Dec 20, 2008
    @pitrou
    Copy link
    Member

    pitrou commented Dec 20, 2008

    We can't solve this for 3.0.1, downgrading to critical.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 18, 2009

    Marking this as a duplicate of bpo-4565 "Rewrite the IO stack in C".

    @pitrou pitrou closed this as completed Jan 18, 2009
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants