Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xreadlines caching, file iterator #36878

Closed
orenti mannequin opened this issue Jul 11, 2002 · 15 comments
Closed

xreadlines caching, file iterator #36878

orenti mannequin opened this issue Jul 11, 2002 · 15 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@orenti
Copy link
Mannequin

orenti mannequin commented Jul 11, 2002

BPO 580331
Nosy @gvanrossum, @tim-one
Files
  • xreadlinescache.patch
  • xreadlinescache2.patch
  • fileiterreadahead.patch
  • fileiterreadahead2.patch
  • fileiterreadahead3.patch: xreadlines just return self now
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gvanrossum'
    closed_at = <Date 2002-08-06.15:56:40.000>
    created_at = <Date 2002-07-11.21:45:57.000>
    labels = ['interpreter-core']
    title = 'xreadlines caching, file iterator'
    updated_at = <Date 2002-08-06.15:56:40.000>
    user = 'https://bugs.python.org/orenti'

    bugs.python.org fields:

    activity = <Date 2002-08-06.15:56:40.000>
    actor = 'gvanrossum'
    assignee = 'gvanrossum'
    closed = True
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2002-07-11.21:45:57.000>
    creator = 'orenti'
    dependencies = []
    files = ['4420', '4421', '4422', '4423', '4424']
    hgrepos = []
    issue_num = 580331
    keywords = ['patch']
    message_count = 15.0
    messages = ['40543', '40544', '40545', '40546', '40547', '40548', '40549', '40550', '40551', '40552', '40553', '40554', '40555', '40556', '40557']
    nosy_count = 3.0
    nosy_names = ['gvanrossum', 'tim.peters', 'orenti']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue580331'
    versions = ['Python 2.3']

    @orenti
    Copy link
    Mannequin Author

    orenti mannequin commented Jul 11, 2002

    Calling f.xreadlines() multiple times returns the same
    xreadlines object.

    A file is an iterator - __iter__() returns self and next() calls
    the cached xreadlines object's next method.

    @orenti orenti mannequin closed this as completed Jul 11, 2002
    @orenti orenti mannequin assigned gvanrossum Jul 11, 2002
    @orenti orenti mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jul 11, 2002
    @orenti orenti mannequin closed this as completed Jul 11, 2002
    @orenti orenti mannequin assigned gvanrossum Jul 11, 2002
    @orenti orenti mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jul 11, 2002
    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    I posted some comments to python-dev.

    @orenti
    Copy link
    Mannequin Author

    orenti mannequin commented Jul 16, 2002

    Logged In: YES
    user_id=562624

    Now invalidates cache on a seek.

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    I'm reviewing this and will check it in, or something like
    it (probably).

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    Alas, there's a fatal flaw. The file object and the
    xreadlines object now both have pointers to each other,
    creating an unbreakable cycle (since neither participates in
    GC). Weak refs can't be used to resolve this dilemma. I
    personally think that's enough to just stick with the status
    quo (I was never more than +0 on the idea of making the file
    an interator anyway). But I'll leave it to Oren to come up
    with another hack (please use this same SF patch).

    Oren, if you'd like to give up, please say so and I'll close
    the item in a jiffy. In fact, I positively encourage you to
    give up. But I don't expect you to take this offer. :-)

    @orenti
    Copy link
    Mannequin Author

    orenti mannequin commented Aug 5, 2002

    Logged In: YES
    user_id=562624

    The version of the patch still makes a file an iterator but it no
    longer depends on xreadlines - it implements the readahead
    buffering inside the file object.

    It is about 19% faster than xreadlines for normal text files and
    about 40% faster for files with 100k lines.

    The methods readline and read do not use this readahead
    mechanism because it skews the current file position (just like
    xreadlines does).

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    This begins to look good.

    What's a normal text file? One with a million bytes? :-)

    Have you made sure this works as expected in Universal
    newline mode?

    I'd like a patch that doesn't use #define WITH_READAHEAD_BUFFER.

    You might also experiment with larger buffer sizes (I
    predict that a larger buffer doesn't make much difference,
    since it didn't for xreadlines, but it would be nice to
    verify that and then add a comment; at least once a year
    someone asks whether the buffer shouldn't be much larger).

    @orenti
    Copy link
    Mannequin Author

    orenti mannequin commented Aug 5, 2002

    Logged In: YES
    user_id=562624

    What's a normal text file? One with a million bytes? :-)
    I meant 100kBYTE lines... Some apps actually use such long
    lines.

    Yes, it works just fine with universal newlines.

    Ok, the #ifdefs will go.

    Strange, a bigger buffer seems to actually slow it down... I'll
    have to investigate this further.

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    OK, I'll await a new patch.

    @tim-one
    Copy link
    Member

    tim-one commented Aug 5, 2002

    Logged In: YES
    user_id=31435

    Just FYI, in apps that do "read + process" in a loop, a small
    buffer is often faster because the data has a decent shot at
    staying in L1 cache. Make the buffer very large (100s of
    Kb), and it won't even stay in L2 cache.

    @orenti
    Copy link
    Mannequin Author

    orenti mannequin commented Aug 5, 2002

    Logged In: YES
    user_id=562624

    Updated patch.

    What to do about the xreadlines method? The patch doesn't
    touch it but It could be made an alias to __iter__ and the
    dependency of file objects on the xreadlines module will be
    eliminated.

    On my linux machine the highest performance is achieved for
    buffer sizes somewhere around 4096-8192. Higher or lower
    values are significantly slower.

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    Thanks!

    Making xreadlines an alias for __iter__ sounds about right,
    for backwards compatibility.

    Then we should probably deprecate xreadlines, despite the
    fact that it could be useful for other file-like objects;
    it's just not a pretty enough interface.

    @orenti
    Copy link
    Mannequin Author

    orenti mannequin commented Aug 6, 2002

    Logged In: YES
    user_id=562624

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    Hm, test_file fails on a technicality. I'll take it from
    here. Thanks!

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    Thanks! Checked in.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants