Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast BytesIO implementation + misc changes #46091

Closed
avassalotti opened this issue Jan 7, 2008 · 26 comments
Closed

Fast BytesIO implementation + misc changes #46091

avassalotti opened this issue Jan 7, 2008 · 26 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage stdlib Python modules in the Lib dir tests Tests in the Lib/test dir

Comments

@avassalotti
Copy link
Member

BPO 1751
Nosy @gvanrossum, @brettcannon, @birkenfeld, @pitrou, @tiran, @avassalotti
Files
  • bytesio+misc-fixes-2.patch
  • bytesio+misc-fixes-3.patch
  • bytesio+misc-fixes-4.patch
  • bytesio+misc-fixes-5.patch
  • bytesio+misc-fixes-6.patch
  • bytesio+misc-fixes-8.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/avassalotti'
    closed_at = <Date 2008-05-06.20:03:15.781>
    created_at = <Date 2008-01-07.07:20:30.665>
    labels = ['interpreter-core', 'tests', 'library', 'performance']
    title = 'Fast BytesIO implementation + misc changes'
    updated_at = <Date 2008-05-06.20:03:15.766>
    user = 'https://github.com/avassalotti'

    bugs.python.org fields:

    activity = <Date 2008-05-06.20:03:15.766>
    actor = 'alexandre.vassalotti'
    assignee = 'alexandre.vassalotti'
    closed = True
    closed_date = <Date 2008-05-06.20:03:15.781>
    closer = 'alexandre.vassalotti'
    components = ['Interpreter Core', 'Library (Lib)', 'Tests']
    creation = <Date 2008-01-07.07:20:30.665>
    creator = 'alexandre.vassalotti'
    dependencies = []
    files = ['9099', '9752', '9899', '9904', '9905', '10024']
    hgrepos = []
    issue_num = 1751
    keywords = ['patch']
    message_count = 26.0
    messages = ['59436', '59447', '59460', '59461', '59503', '64020', '64664', '64739', '64741', '64755', '64757', '64758', '64762', '64763', '64766', '64769', '65444', '65446', '65452', '65453', '65474', '65476', '66308', '66325', '66327', '66328']
    nosy_count = 6.0
    nosy_names = ['gvanrossum', 'brett.cannon', 'georg.brandl', 'pitrou', 'christian.heimes', 'alexandre.vassalotti']
    pr_nums = []
    priority = 'critical'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue1751'
    versions = ['Python 3.0']

    @avassalotti
    Copy link
    Member Author

    Finally, here is my C implementation of BytesIO. The code is well tested
    and include the proper unit tests. The only remaining issues are:

    • The behavior of the close() method.
    • The failure of test_profile and test_cProfile.

    Currently, I have no idea how to fix the tests for the profilers. The
    weird thing is both pass when run with:

    % ./python Lib/test/regrtest.py -R: test_profile

    @avassalotti avassalotti added interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir tests Tests in the Lib/test dir labels Jan 7, 2008
    @gvanrossum
    Copy link
    Member

    Any chance of uploading a single patch that contains all these changes?

    The profile tests often fail when io.py changes because they happen to
    depend on "golden output" which includes line numbers of code in io.py
    that happens to be traced during the test. I don't know why -R: makes
    it pass but it could be that it doesn't compare the golden output.

    @avassalotti
    Copy link
    Member Author

    So, here's one big patch. I have updated the behavior of close(), so that

    The profile tests often fail when io.py changes because they happen to
    depend on "golden output" which includes line numbers of code in io.py
    that happens to be traced during the test.

    Yes, I knew that. But, how can I fix the test so that it passes even if
    _bytesio is not available?

    Oh, one more thing. In the misc fixes for io.py, I added a checkClosed
    in IOBase.readline(). As a side-effect, this make __next__ raises a
    ValueError, instead of StopIteration. Is that correct?

    >>> f = open("load.py")
    [45681 refs]
    >>> next(f)
    'import sys\n'
    [45700 refs]
    >>> f.close()
    [45703 refs]
    >>> next(f)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/alex/src/python.org/py3k/Lib/io.py", line 1440, in __next__
        line = self.readline()
      File "/home/alex/src/python.org/py3k/Lib/io.py", line 1449, in readline
        raise ValueError("read from closed file")
    ValueError: read from closed file
    [45703 refs]

    @avassalotti
    Copy link
    Member Author

    [grrr, I eat my words]

    So, here's one big patch. I have updated the behavior of close(), so that

    ... it matches the behavior of 2.x.

    As a side-effect, this make __next__ raises a
    ValueError, instead of StopIteration.

    ... when the file is closed.

    @avassalotti
    Copy link
    Member Author

    I got a patch that also fixes the profiler tests. That was easy after
    all. :-)

    @avassalotti
    Copy link
    Member Author

    Here is a patch against the latest trunk (r61578) that includes the
    accelerator module of io.BytesIO with its test suite. The patch also
    changes the behavior of the truncate method to imply a seek(). Please
    review!

    @birkenfeld
    Copy link
    Member

    Bumping priority so that this gets reviewed before next release.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 30, 2008

    From a quick glance: I may be wrong, but it looks like in
    bytesio_writelines(), you'll have some reference leaks if the while
    loop exits early because bytesio_write() returns NULL; item and it
    should be decred'ed before returning.

    @avassalotti
    Copy link
    Member Author

    Yes, that was a leak. It should be fixed now.

    Merci Antoine!

    @pitrou
    Copy link
    Member

    pitrou commented Mar 30, 2008

    Alexandre, is it normal that the unit tests look much less complete in
    your latest patch than they were in the previous one?
    Also, they don't test the Python fallback implementation anymore.

    @avassalotti
    Copy link
    Member Author

    Oops, I forgot to include the unit tests, with "svn add", when I made
    the diff.

    @avassalotti
    Copy link
    Member Author

    There is a small bug in the last patch I posted. The unit tests assumed
    (wrongly) that accelerator module for io.StringIO (i.e., _stringio) was
    present.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 30, 2008

    Ok I took a detailed look at _bytesio.c.

    In write_bytes() there is the following resizing logic:

        if (self->pos + len > self->string_size) {
            if (resize_buffer(self, self->pos + len) < 0)
                return -1;
        }

    Replacing self->string_size with self->buf_size should avoid some
    spurious reallocations.

    For some reason, using the help() function on a BytesIO instance does
    not display the class help.

    Overall, the new module looks fine. I can't say anything about the io.py
    or _fileio.c fixes.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 30, 2008

    One last thing: you probably noticed it, but there is one test which
    needs correcting in test_StringIO. It's due to the fact that calling
    next() on a closed BytesIO object raises ValueError now rather than
    StopIteration.

    @avassalotti
    Copy link
    Member Author

    I think that check in write_bytes() is a guard to avoid resize_buffer()
    from truncating the string stored in the buffer. However, I am not sure
    if it is still necessary.

    I don't know why help() doesn't work on BytesIO instances. But, the
    problem doesn't seems to be in my code, since help(sys.stdin) doesn't
    work either.

    Finally regarding test_StringIO, see msg59460. Anyway, test_StringIO is
    superseded by test_memoryio included my patch and should be removed from
    the stdlib.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 30, 2008

    Well, by construction self->buf_size should always be greater than
    self->string_size, so I don't think there's any risk of truncating
    anything here.
    I tried the change, and the tests still ran fine.

    @pitrou pitrou added the performance Performance or resource usage label Apr 9, 2008
    @tiran
    Copy link
    Member

    tiran commented Apr 13, 2008

    I'm going to review the patch later. How are we going to back port the
    stuff to 2.6?

    @tiran
    Copy link
    Member

    tiran commented Apr 13, 2008

    Hey Alexandre!

    The latest patch doesn't apply cleanly. Please provide a new patch.

    @avassalotti
    Copy link
    Member Author

    Here's an updated patch.

    @avassalotti
    Copy link
    Member Author

    Oops, I forgot again to "svn add" the new files.

    @gvanrossum
    Copy link
    Member

    Do I need to look at this, or is the review going well without my
    interference?

    @avassalotti
    Copy link
    Member Author

    Hi Guido,

    The patch changes a minor things to io.py to allow io.BytesIO to pass my
    test suite, so you may want to check it out. Other than that, I think
    the review is going fine.

    @pitrou
    Copy link
    Member

    pitrou commented May 6, 2008

    Since nothing happened for almost one month here, perhaps it would be
    better to merge Alexandre's changes as they are? We'll see quickly if
    unexpected things happen :-)

    It would also allow to progress on other issues, e.g. bpo-2523.

    Also, I stand by the suggestion I made about the resizing logic, but it
    shouldn't be a showstopper either. We can improve that later.

    @brettcannon
    Copy link
    Member

    Alexandre, you have until the end of the week to commit it or someone else
    will. =) I don't want this going in so close to the alphas, but it should
    go in almost immediately afterwards.

    @pitrou
    Copy link
    Member

    pitrou commented May 6, 2008

    By the way, is someone already working on a C-accelerated TextIO?

    @avassalotti
    Copy link
    Member Author

    Patch committed in r62778.

    Antoine wrote:

    Also, I stand by the suggestion I made about the resizing logic, but it
    shouldn't be a showstopper either. We can improve that later.

    I made your suggested change to the resizing logic. Thanks again for the
    review!

    By the way, is someone already working on a C-accelerated TextIO?

    Yes. I was waiting to commit the C optimization for io.BytesIO, before
    continuing the one for io.StringIO. The implementation is mostly
    done---it just needs to be updated with respect to the recent changes to
    TextIO.

    @avassalotti avassalotti self-assigned this May 6, 2008
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage stdlib Python modules in the Lib dir tests Tests in the Lib/test dir
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants