New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iteration breaks with bz2.open(filename,'rt') #59751
Comments
The bz2 library in Python3.3b1 doesn't support iteration for text-mode properly. Example: >>> f = bz2.open('access-log-0108.bz2')
>>> next(f) # Works
b'140.180.132.213 - - [24/Feb/2008:00:08:59 -0600] "GET /ply/ply.html HTTP/1.1" 200 97238\n'
>>> g = bz2.open('access-log-0108.bz2','rt')
>>> next(g) # Fails
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> |
I can't seem to reproduce this with an up-to-date checkout from Mercurial: >>> import bz2
>>> g = bz2.open('access-log-0108.bz2','rt')
>>> next(g)
'140.180.132.213 - - [24/Feb/2008:00:08:59 -0600] "GET /ply/ply.html HTTP/1.1" 200 97238\n' (where 'access-log-0108.bz2' is a file I created with the output above as Would it be possible for you to upload the file you used to trigger this |
File attached. The file can be read in its entirety in binary mode. |
The cause of this problem is that BZ2File.read1() sometimes returns b"", even though It would seem that BZ2File cannot satisfy the contract of the read1() method - we Simply removing the read1() method would simply trade this problem for a bigger one Antoine, what do you think of this? |
Agreed. IMO, read1()'s contract should be read as a best-effort thing, not an absolute guarantee. Returning an empty string when there is still data available is wrong. |
I encountered this when implemented bzip2 support in zipfile (bpo-14371). I solved this also by rewriting read and read1 to make as many reads from the underlying file as necessary to return a non-empty result. |
New changeset cdf27a213bd2 by Nadeem Vawda in branch 'default': |
OK, BZ2File should now be fixed. It looks like LZMAFile and GzipFile may |
New changeset 5284e65e865b by Nadeem Vawda in branch 'default': |
Done. Thanks for the bug report, David. |
What about peek()? |
Before these fixes, it looks like all three classes' peek() methods were susceptible The fixes for BZ2File.read1() and LZMAFile.read1() should have fixed peek() as well; For GzipFile, peek() is still potentially broken - I'll push a fix shortly. |
New changeset 8c07ff7f882f by Nadeem Vawda in branch 'default': |
I have a doubts. Is it not a dead cycle if the end of the compressed data will happen on the end of reading block? Maybe instead of "while self.extrasize <= 0:" worth to write "while self.extrasize <= 0 and self.fileobj is not None:"? |
No, if _read() is called once the file is already at EOF, it raises an |
New changeset 0f25119ceee8 by Serhiy Storchaka in branch '3.2': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: