Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_io.FileIO uses a quadratic-time buffer growth algorithm #57368

Closed
nadeemvawda mannequin opened this issue Oct 12, 2011 · 7 comments
Closed

_io.FileIO uses a quadratic-time buffer growth algorithm #57368

nadeemvawda mannequin opened this issue Oct 12, 2011 · 7 comments
Labels
performance Performance or resource usage topic-IO

Comments

@nadeemvawda
Copy link
Mannequin

nadeemvawda mannequin commented Oct 12, 2011

BPO 13159
Nosy @pitrou, @vstinner, @benjaminp
Files
  • buffer-growth.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-10-13.14:01:59.975>
    created_at = <Date 2011-10-12.16:02:00.072>
    labels = ['expert-IO', 'performance']
    title = '_io.FileIO uses a quadratic-time buffer growth algorithm'
    updated_at = <Date 2011-10-13.14:01:59.973>
    user = 'https://bugs.python.org/nadeemvawda'

    bugs.python.org fields:

    activity = <Date 2011-10-13.14:01:59.973>
    actor = 'nadeem.vawda'
    assignee = 'nadeem.vawda'
    closed = True
    closed_date = <Date 2011-10-13.14:01:59.975>
    closer = 'nadeem.vawda'
    components = ['IO']
    creation = <Date 2011-10-12.16:02:00.072>
    creator = 'nadeem.vawda'
    dependencies = []
    files = ['23389']
    hgrepos = []
    issue_num = 13159
    keywords = ['patch']
    message_count = 7.0
    messages = ['145403', '145418', '145443', '145448', '145449', '145450', '145453']
    nosy_count = 6.0
    nosy_names = ['pitrou', 'vstinner', 'nadeem.vawda', 'benjamin.peterson', 'stutzbach', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue13159'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.3']

    @nadeemvawda
    Copy link
    Mannequin Author

    nadeemvawda mannequin commented Oct 12, 2011

    As mentioned in bpo-6715, the buffer growth strategy used by _io.FileIO
    has quadratic running time if each reallocation is O(n). The code in
    question is new_buffersize() from Modules/_io/fileio.c:

        if (currentsize > SMALLCHUNK) {
            /* Keep doubling until we reach BIGCHUNK;
               then keep adding BIGCHUNK. */
            if (currentsize <= BIGCHUNK)
                return currentsize + currentsize;
            else
                return currentsize + BIGCHUNK;
        }
        return currentsize + SMALLCHUNK;

    Is there a reason for this? One possible improvement would be to instead
    use the same formula as list.resize() in Objects/listobject.c:

        new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);

    which has amortized O(n) running time behaviour.

    Your thoughts?

    @nadeemvawda nadeemvawda mannequin added topic-IO performance Performance or resource usage labels Oct 12, 2011
    @nadeemvawda
    Copy link
    Mannequin Author

    nadeemvawda mannequin commented Oct 12, 2011

    I've attached a patch that makes the suggested change to FileIO, and also
    to _bz2.BZ2Compressor/Decompressor (which currently have the same issue).

    @pitrou
    Copy link
    Member

    pitrou commented Oct 12, 2011

    The patch looks good to me, thanks.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 13, 2011

    New changeset d18c80a8c119 by Nadeem Vawda in branch '3.2':
    Issue bpo-13159: Replace FileIO's quadratic-time buffer growth algorithm with a linear-time one.
    http://hg.python.org/cpython/rev/d18c80a8c119

    New changeset 4a6709a071d0 by Nadeem Vawda in branch 'default':
    Merge bpo-13159: Replace FileIO's quadratic-time buffer growth algorithm with a linear-time one.
    http://hg.python.org/cpython/rev/4a6709a071d0

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 13, 2011

    New changeset c1c434e30e06 by Nadeem Vawda in branch '2.7':
    Issue bpo-13159: Replace FileIO's quadratic-time buffer growth algorithm with a linear-time one.
    http://hg.python.org/cpython/rev/c1c434e30e06

    @pitrou
    Copy link
    Member

    pitrou commented Oct 13, 2011

    Thank you :)

    @nadeemvawda
    Copy link
    Mannequin Author

    nadeemvawda mannequin commented Oct 13, 2011

    No problem :)

    @nadeemvawda nadeemvawda mannequin closed this as completed Oct 13, 2011
    @nadeemvawda nadeemvawda mannequin self-assigned this Oct 13, 2011
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage topic-IO
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant