Add write buffering to gzip #43459

rhettinger · 2006-06-05T16:40:29Z

BPO	1501108
Nosy	@rhettinger, @pitrou, @devdanzin
Files	gztest.py: Script to generate comparative timings

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2010-04-03.12:45:17.892>
created_at = <Date 2006-06-05.16:40:29.000>
labels = ['extension-modules', 'type-feature']
title = 'Add write buffering to gzip'
updated_at = <Date 2010-04-03.12:45:17.891>
user = 'https://github.com/rhettinger'

bugs.python.org fields:

activity = <Date 2010-04-03.12:45:17.891>
actor = 'pitrou'
assignee = 'none'
closed = True
closed_date = <Date 2010-04-03.12:45:17.892>
closer = 'pitrou'
components = ['Extension Modules']
creation = <Date 2006-06-05.16:40:29.000>
creator = 'rhettinger'
dependencies = []
files = ['8272']
hgrepos = []
issue_num = 1501108
keywords = []
message_count = 6.0
messages = ['54815', '83968', '83984', '102233', '102249', '102253']
nosy_count = 5.0
nosy_names = ['rhettinger', 'pitrou', 'ajaksu2', 'ebfe', 'neologix']
pr_nums = []
priority = 'normal'
resolution = 'out of date'
stage = 'needs patch'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue1501108'
versions = ['Python 3.1', 'Python 2.7']

rhettinger · 2006-06-05T16:40:29Z

A series of write() calls is dog slow compared to
building-up a pool of data and then writing it in
larger batches.

The attached script demonstrates the speed-up
potential. It compares a series of GzipFile.write()
calls to an alternate approach using cStringIO.write()
calls followed by a GzipFile.write(sio.getvalue()). On
my box, there is a three-fold speed-up.

pitrou · 2009-03-22T11:14:33Z

Although the script does not work as-is (missing import of "string",
typo between "frags" and "wfrags"), I can conform the 3x ratio.

ebfe · 2009-03-22T21:21:27Z

This is true for all objects whose input could be concatenated.

For example with hashlib:

data = ['foobar']*100000
mdX = hashlib.sha1()
for d in data:
    mdX.update(d)
mdY = hashlib.sha1()
mdY.update("".join(data))

mdX.digest() == mdY.digest()

the second version is multiple times faster...

neologix · 2010-04-03T10:20:09Z

In the test script, simply changing

def emit(f, data=snips):
    for datum in data:
        f.write(datum)

to

def gemit(f, data=snips):
    datas = ''.join(data)
    f.write(datas)

improves direct gzip performance from
[1.1799781322479248, 0.50524115562438965, 0.2713780403137207]
[1.183434009552002, 0.50997591018676758, 0.26801109313964844]
[1.173914909362793, 0.51325297355651855, 0.26233196258544922]

to

[0.43065404891967773, 0.50007486343383789, 0.26698708534240723]
[0.43662095069885254, 0.49983596801757812, 0.2686460018157959]
[0.43778109550476074, 0.50057196617126465, 0.2687230110168457]

which means that you're better off letting the application handle buffering issues. Furthermore, the problem with gzip-level buffering is the choice of the default buffer size.

Time to close ?

ebfe · 2010-04-03T12:33:36Z

agreed

pitrou · 2010-04-03T12:45:18Z

Additionally, since bpo-7471 was fixed, you should be able to wrap a GzipFile in a Buffered{Reader,Writer} object for faster buffering.

rhettinger added extension-modules C modules in the Modules dir type-feature A feature request or enhancement labels Jun 5, 2006

pitrou closed this as completed Apr 3, 2010

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add write buffering to gzip #43459

Add write buffering to gzip #43459

rhettinger commented Jun 5, 2006

rhettinger commented Jun 5, 2006

pitrou commented Mar 22, 2009

ebfe mannequin commented Mar 22, 2009

neologix mannequin commented Apr 3, 2010

ebfe mannequin commented Apr 3, 2010

pitrou commented Apr 3, 2010

Add write buffering to gzip #43459

Add write buffering to gzip #43459

Comments

rhettinger commented Jun 5, 2006

rhettinger commented Jun 5, 2006

pitrou commented Mar 22, 2009

ebfe mannequin commented Mar 22, 2009

neologix mannequin commented Apr 3, 2010

ebfe mannequin commented Apr 3, 2010

pitrou commented Apr 3, 2010