New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add write buffering to gzip #43459
Comments
A series of write() calls is dog slow compared to The attached script demonstrates the speed-up |
Although the script does not work as-is (missing import of "string", |
This is true for all objects whose input could be concatenated. For example with hashlib: data = ['foobar']*100000
mdX = hashlib.sha1()
for d in data:
mdX.update(d)
mdY = hashlib.sha1()
mdY.update("".join(data)) mdX.digest() == mdY.digest() the second version is multiple times faster... |
In the test script, simply changing def emit(f, data=snips):
for datum in data:
f.write(datum) to def gemit(f, data=snips):
datas = ''.join(data)
f.write(datas) improves direct gzip performance from to [0.43065404891967773, 0.50007486343383789, 0.26698708534240723] which means that you're better off letting the application handle buffering issues. Furthermore, the problem with gzip-level buffering is the choice of the default buffer size. Time to close ? |
agreed |
Additionally, since bpo-7471 was fixed, you should be able to wrap a GzipFile in a Buffered{Reader,Writer} object for faster buffering. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: