xreadlines caching, file iterator #36878

orenti · 2002-07-11T21:45:57Z

BPO	580331
Nosy	@gvanrossum, @tim-one
Files	xreadlinescache.patch xreadlinescache2.patch fileiterreadahead.patch fileiterreadahead2.patch fileiterreadahead3.patch: xreadlines just return self now

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/gvanrossum'
closed_at = <Date 2002-08-06.15:56:40.000>
created_at = <Date 2002-07-11.21:45:57.000>
labels = ['interpreter-core']
title = 'xreadlines caching, file iterator'
updated_at = <Date 2002-08-06.15:56:40.000>
user = 'https://bugs.python.org/orenti'

bugs.python.org fields:

activity = <Date 2002-08-06.15:56:40.000>
actor = 'gvanrossum'
assignee = 'gvanrossum'
closed = True
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2002-07-11.21:45:57.000>
creator = 'orenti'
dependencies = []
files = ['4420', '4421', '4422', '4423', '4424']
hgrepos = []
issue_num = 580331
keywords = ['patch']
message_count = 15.0
messages = ['40543', '40544', '40545', '40546', '40547', '40548', '40549', '40550', '40551', '40552', '40553', '40554', '40555', '40556', '40557']
nosy_count = 3.0
nosy_names = ['gvanrossum', 'tim.peters', 'orenti']
pr_nums = []
priority = 'normal'
resolution = 'accepted'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue580331'
versions = ['Python 2.3']

orenti · 2002-07-11T21:45:57Z

Calling f.xreadlines() multiple times returns the same
xreadlines object.

A file is an iterator - __iter__() returns self and next() calls
the cached xreadlines object's next method.

gvanrossum · 2002-07-15T14:38:46Z

Logged In: YES
user_id=6380

I posted some comments to python-dev.

orenti · 2002-07-16T05:26:26Z

Logged In: YES
user_id=562624

Now invalidates cache on a seek.

gvanrossum · 2002-07-17T01:33:21Z

Logged In: YES
user_id=6380

I'm reviewing this and will check it in, or something like
it (probably).

gvanrossum · 2002-07-17T17:50:24Z

Logged In: YES
user_id=6380

Alas, there's a fatal flaw. The file object and the
xreadlines object now both have pointers to each other,
creating an unbreakable cycle (since neither participates in
GC). Weak refs can't be used to resolve this dilemma. I
personally think that's enough to just stick with the status
quo (I was never more than +0 on the idea of making the file
an interator anyway). But I'll leave it to Oren to come up
with another hack (please use this same SF patch).

Oren, if you'd like to give up, please say so and I'll close
the item in a jiffy. In fact, I positively encourage you to
give up. But I don't expect you to take this offer. :-)

orenti · 2002-08-05T06:27:50Z

Logged In: YES
user_id=562624

The version of the patch still makes a file an iterator but it no
longer depends on xreadlines - it implements the readahead
buffering inside the file object.

It is about 19% faster than xreadlines for normal text files and
about 40% faster for files with 100k lines.

The methods readline and read do not use this readahead
mechanism because it skews the current file position (just like
xreadlines does).

gvanrossum · 2002-08-05T14:52:00Z

Logged In: YES
user_id=6380

This begins to look good.

What's a normal text file? One with a million bytes? :-)

Have you made sure this works as expected in Universal
newline mode?

I'd like a patch that doesn't use #define WITH_READAHEAD_BUFFER.

You might also experiment with larger buffer sizes (I
predict that a larger buffer doesn't make much difference,
since it didn't for xreadlines, but it would be nice to
verify that and then add a comment; at least once a year
someone asks whether the buffer shouldn't be much larger).

orenti · 2002-08-05T15:22:18Z

Logged In: YES
user_id=562624

What's a normal text file? One with a million bytes? :-)
I meant 100kBYTE lines... Some apps actually use such long
lines.

Yes, it works just fine with universal newlines.

Ok, the #ifdefs will go.

Strange, a bigger buffer seems to actually slow it down... I'll
have to investigate this further.

gvanrossum · 2002-08-05T15:29:03Z

Logged In: YES
user_id=6380

OK, I'll await a new patch.

tim-one · 2002-08-05T16:14:25Z

Logged In: YES
user_id=31435

Just FYI, in apps that do "read + process" in a loop, a small
buffer is often faster because the data has a decent shot at
staying in L1 cache. Make the buffer very large (100s of
Kb), and it won't even stay in L2 cache.

orenti · 2002-08-05T20:01:19Z

Logged In: YES
user_id=562624

Updated patch.

What to do about the xreadlines method? The patch doesn't
touch it but It could be made an alias to __iter__ and the
dependency of file objects on the xreadlines module will be
eliminated.

On my linux machine the highest performance is achieved for
buffer sizes somewhere around 4096-8192. Higher or lower
values are significantly slower.

gvanrossum · 2002-08-05T20:07:00Z

Logged In: YES
user_id=6380

Thanks!

Making xreadlines an alias for __iter__ sounds about right,
for backwards compatibility.

Then we should probably deprecate xreadlines, despite the
fact that it could be useful for other file-like objects;
it's just not a pretty enough interface.

orenti · 2002-08-06T05:03:45Z

Logged In: YES
user_id=562624

gvanrossum · 2002-08-06T15:30:28Z

Logged In: YES
user_id=6380

Hm, test_file fails on a technicality. I'll take it from
here. Thanks!

gvanrossum · 2002-08-06T15:56:40Z

Logged In: YES
user_id=6380

Thanks! Checked in.

orenti mannequin closed this as completed Jul 11, 2002

orenti mannequin assigned gvanrossum Jul 11, 2002

orenti mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jul 11, 2002

orenti mannequin closed this as completed Jul 11, 2002

orenti mannequin assigned gvanrossum Jul 11, 2002

orenti mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jul 11, 2002

ezio-melotti transferred this issue from another repository Apr 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xreadlines caching, file iterator #36878

xreadlines caching, file iterator #36878

orenti mannequin commented Jul 11, 2002

orenti mannequin commented Jul 11, 2002

gvanrossum commented Jul 15, 2002

orenti mannequin commented Jul 16, 2002

gvanrossum commented Jul 17, 2002

gvanrossum commented Jul 17, 2002

orenti mannequin commented Aug 5, 2002

gvanrossum commented Aug 5, 2002

orenti mannequin commented Aug 5, 2002

gvanrossum commented Aug 5, 2002

tim-one commented Aug 5, 2002

orenti mannequin commented Aug 5, 2002

gvanrossum commented Aug 5, 2002

orenti mannequin commented Aug 6, 2002

gvanrossum commented Aug 6, 2002

gvanrossum commented Aug 6, 2002

xreadlines caching, file iterator #36878

xreadlines caching, file iterator #36878

Comments

orenti mannequin commented Jul 11, 2002

orenti mannequin commented Jul 11, 2002

gvanrossum commented Jul 15, 2002

orenti mannequin commented Jul 16, 2002

gvanrossum commented Jul 17, 2002

gvanrossum commented Jul 17, 2002

orenti mannequin commented Aug 5, 2002

gvanrossum commented Aug 5, 2002

orenti mannequin commented Aug 5, 2002

gvanrossum commented Aug 5, 2002

tim-one commented Aug 5, 2002

orenti mannequin commented Aug 5, 2002

gvanrossum commented Aug 5, 2002

orenti mannequin commented Aug 6, 2002

gvanrossum commented Aug 6, 2002

gvanrossum commented Aug 6, 2002