-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tarfile stream read performance regression #78191
Comments
Buffer read of large files in a compressed tarfile stream performs poorly. The buffered read in tarfile _Stream is extending a bytes object. This performance regression was introduced in b506dc3. How to test: # read with tarfile as stream (note pipe symbol in 'r|gz')
import tarfile
tfile = tarfile.open("test.tgz", 'r|gz')
for t in tfile:
file = tfile.extractfile(t)
if file:
print(len(file.read())) |
Nice catch. I confirmed this is a hard regression of performance. While we live with this regression for a long time, I feel it's worth enough to backport. Can you write NEWS entry for it? |
Yes, it performance is really bad for large files, and memory consumption as well. I will write something for NEWS. |
thanks |
+ buf = b"".join(t) @hajoscher: "It never caused a problem, since this line is never called; size is never None in the function call. But still, should be fixed, I guess." Would it be possible to have an unit test for this modified line? Untested code is broken, as you showed :-) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: