New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
breaking file iter loop leaves file in stale state #36189
Comments
Given a file created with this snippet: >>> f = open("tmp.txt", "w")
>>> for i in range(10000):
... f.write("%s\n" % i)
...
>>> f.close() Iterating over a file multiple times has unexpected >>> f = open("tmp.txt")
>>> for line in f:
... print line.strip()
... break
...
0
>>> for line in f:
... print line.strip()
... break
...
1861
>>> I expected the last output line to be 1 instead of While I understand the cause (xreadlines being [ ... ] Each iteration returns the same result as That is true within one for loop, but not when you Another example of breakage: f = open(...)
for line in f:
if somecondition(line):
break
...
data = f.read() # read rest in one slurp The fundamental problem IMO is that the file I understand that speed is a major issue here, so Here's a report from an actual user: Here's what I *think* should happen (but: I'm Maybe files should grow a .next() method, so iter(f) |
Logged In: YES Agreed on all points of fact. Also +1 on fixing it ...The easy way: make f.__iter__() call readline() ...The hard way (JvR's proposal): add a level of input ...As it stands, iter(f) seems like a broken |
Logged In: YES If I understand the checkin message Guido wrote for 2.113,
|
Logged In: YES I'm sure Guido was aware of this. Making the simplest-to- |
Logged In: YES At the cost of, what, sensible, predictable semantics?
|
Logged In: YES Tim wrote: "I'm sure Guido was aware of this."
Tim wrote: "Making the simplest-to-spell idiom as fast as
|
Logged In: YES I'm in favor for the "let files be their own iterator and set .next equal to .readline" solution. The above example can _really_ bite you. The current xreadlinesmodule.c could be converted to a standalone module, if it really is necessary to optimize. The trivial solution for this problem is to change CHUNKSIZE (in Modules/xreadlinesmodule) to 1. Or, even better, to convert it into an instance variable, so you can do this: f=open(...)
fi=f.iter(chunk=2000)
for line in fi:
... if you want speed, or just write for line in f: (which internally converts to f.iter(chunk=1)) if you want safety. I'm not too firm with Python C interfacing, otherwise I'd write a patch... any takers? |
Logged In: YES There are two forces at work here. You want the most common case (a single "for line in file" And you want full generality, basically equating next() with Unfortunately, the only way to go blindingly fast is to do We could make the default file iterator use readline, but I'm not sure which requirement is more common (speed, or In the past we've had a lot of flak about the slowness of |
Logged In: YES Closing as won't fix. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: