Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poor cStringIO.StringO seek performance #54254

Closed
boogenhagn mannequin opened this issue Oct 7, 2010 · 11 comments
Closed

poor cStringIO.StringO seek performance #54254

boogenhagn mannequin opened this issue Oct 7, 2010 · 11 comments
Assignees
Labels
performance Performance or resource usage

Comments

@boogenhagn
Copy link
Mannequin

boogenhagn mannequin commented Oct 7, 2010

BPO 10045
Nosy @freddrake, @pitrou, @bitdancer
Files
  • cStringIO.diff: cStringIO patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/freddrake'
    closed_at = <Date 2010-10-11.19:13:56.762>
    created_at = <Date 2010-10-07.17:37:48.395>
    labels = ['performance']
    title = 'poor cStringIO.StringO seek performance'
    updated_at = <Date 2010-10-11.19:13:56.761>
    user = 'https://bugs.python.org/boogenhagn'

    bugs.python.org fields:

    activity = <Date 2010-10-11.19:13:56.761>
    actor = 'fdrake'
    assignee = 'fdrake'
    closed = True
    closed_date = <Date 2010-10-11.19:13:56.762>
    closer = 'fdrake'
    components = []
    creation = <Date 2010-10-07.17:37:48.395>
    creator = 'boogenhagn'
    dependencies = []
    files = ['19150']
    hgrepos = []
    issue_num = 10045
    keywords = ['patch']
    message_count = 11.0
    messages = ['118123', '118124', '118126', '118147', '118215', '118353', '118354', '118355', '118356', '118357', '118382']
    nosy_count = 4.0
    nosy_names = ['fdrake', 'pitrou', 'boogenhagn', 'r.david.murray']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue10045'
    versions = ['Python 2.7']

    @boogenhagn
    Copy link
    Mannequin Author

    boogenhagn mannequin commented Oct 7, 2010

    cStringIO.StringO's seek method has O(n) characteristics in certain,
    albeit pathological, cases, while the pure Python implementation and
    cStringIO.StringI's seek methods both execute in constant time in all cases.

    When the file offset is set n bytes beyond the end of actual data,
    the gap is filled in with n bytes in cStringIO.StringO's seek method.

    however, POSIX states that reads of data in the gap will return null bytes
    only if a subsequent write has taken place, so filling in the gap is not
    required at the time of the seek.

    This patch for 2.7 corrects the behavior by unifying StringO and StringI's
    seek methods, and moving the writing of null bytes to StringO's write
    method. There may be a more elegant way to write this, I don't know.
    I believe this issue affects Python 3 as well, though I have yet to
    test it.

    NOTE: Perhaps this seems like an extreme edge case not worthy of a fix, but
    this actually caused problems for us when parsing images with malformed
    EXIF data; a web request for uploading such a photo was taking on the order
    of 15 minutes. When we stopped using cStringIO.StringO, it took seconds.

    @boogenhagn boogenhagn mannequin added the performance Performance or resource usage label Oct 7, 2010
    @boogenhagn
    Copy link
    Mannequin Author

    boogenhagn mannequin commented Oct 7, 2010

    The second sentence should have said "the gap is filled in with n null bytes"

    @bitdancer
    Copy link
    Member

    I'm changing the versions to just 2.7 (though I'm not sure this can be considered a bug fix), since StringIO is reimplemented as part of io in 3.x.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 7, 2010

    I don't think there's much point in fixing this. 2.7 users can use io.BytesIO, which is a fast type implemented in C.

    @boogenhagn
    Copy link
    Mannequin Author

    boogenhagn mannequin commented Oct 8, 2010

    Fair enough, but there is a great deal of existing code that already
    uses cStringIO.

    @freddrake
    Copy link
    Member

    Causing perfectly good Python 2 applications to degrade in performance is bad, even if something else is available.

    This should be fixed as a regression.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 11, 2010

    This should be fixed as a regression.

    As far as I understand, this is not a regression. I don't think the cStringIO code has changed in years.

    @freddrake
    Copy link
    Member

    Ok, reading more carefully, it's not a regression. But it's certainly a bug, and should be fixed.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 11, 2010

    Ok, reading more carefully, it's not a regression. But it's certainly
    a bug, and should be fixed.

    Right. The patch looks straightforward, but I'm not familiar with the
    cStringIO code. Could you take a look?

    @freddrake
    Copy link
    Member

    Assigning to myself for review.

    @freddrake freddrake self-assigned this Oct 11, 2010
    @freddrake
    Copy link
    Member

    Committed with minor changes in r85366 (release27-maint branch).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants