New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decompression performance #89
Comments
Archivemount does not use zlib directly but libarchive, hence the name. But libarchive seems to use zlib, which makes some of the timing differences a bit weirder. Here are some more low-level timings for decompressing 397MiB gzipped data to to 512 MiB base64-encoded random data (so basically pure Huffman-encoding with only a few LZ77 backreferences):
Running my self-written benchmark inside the
Some benchmarks via the command line:
Some observations:
Repeat benchmarks with tarred and gzipped Silesia corpus like lzbench uses.
Some observations on Silesia decompression:
|
Hi @mxmlnkn, I haven't spent any time optimising the indexed_gzip codebase, so I'm sure performance could be improved. The core of indexed_gzip is essentially an adaptation of the original The conda-forge packages are built with However, using indexed_gzip purely for a one-off decompress isn't intended usage, so I'm not sure if these comparisons makes a lot of sense. The point of indexed_gzip is for improved random seek time, after an index has been built. (edited - added example code) If you are interested in "raw" performance, you could adjust your benchmark code to disable the CRC32 check, increased read buffer sizes, and with a seek point spacing larger than the size of the uncompressed data, which would effectively disable creation of the index. e.g.:
This should cause the read to be performed via a single call to the zlib |
I get some very weird results when testing out these parameters. When I increase the buffers or even just the spacing, then it takes a lot more time. This might explain why I observed the 10x slower speed with ratarmount! I think, I changed the default spacing to 16 MiB. python3 -m pip install --user pgzip indexed_gzip
python3 -c '
import indexed_gzip as igz; import time; t0 = time.time();
igz.open("small.gz").read();
print(f"Decompression took {time.time() - t0:.3f} s")'
# Decompression took 4.666 s
python3 -c '
import indexed_gzip as igz; import time; t0 = time.time();
igz.IndexedGzipFile("small.gz", spacing=2**30, readbuf_size=2**30, buffer_size=2**30).read();
print(f"Decompression took {time.time() - t0:.3f} s")'
# Floating point exception
python3 -c '
import indexed_gzip as igz; import time; t0 = time.time();
igz.open("small.gz", spacing=int(1*1024**2), readbuf_size=int(1*1024**2), buffer_size=int(1*1024**2)).read();
print(f"Decompression took {time.time() - t0:.3f} s")'
# Decompression took 4.286 s
python3 -c '
import indexed_gzip as igz; import time; t0 = time.time();
igz.open("small.gz", spacing=int(32*1024**2), readbuf_size=int(1*1024**2), buffer_size=int(1*1024**2)).read();
print(f"Decompression took {time.time() - t0:.3f} s")'
# Decompression took 7.407 s
python3 -c '
import indexed_gzip as igz; import time; t0 = time.time();
igz.open("small.gz", spacing=int(128*1024**2), readbuf_size=int(1024**2), buffer_size=int(128*1024**2)).read();
print(f"Decompression took {time.time() - t0:.3f} s")'
# Decompression took 13.290 s
python3 -c '
import indexed_gzip as igz; import time; t0 = time.time();
igz.IndexedGzipFile("small.gz", spacing=16*1024**2).read();
print(f"Decompression took {time.time() - t0:.3f} s")'
# Decompression took 6.403 s Note also the weird floating point exception when I try the large buffer / spacing sizes you recommended. Also, I think there might be a problem with the buffers because even 512 MiB buffers should not slow down a program that much. Except maybe if you are clearing them to zeros without using memset or something like that.. |
That does sound strange - I'll look into it when I get a minute... |
So I did try to look at the code and did some benchmarks and I arrived at the conclusion that the underlying problem is that As a quick workaround, The above explanation does still not explain why runtimes get worse for larger Here are some benchmarks I did for varying configurations. It can be seen that calls with 4 KiB even slows down Python's GzipFile, but everything above 32 KiB should be sufficiently large reads for optimal speed. This is a stark difference to IndexedGzipFile which might lead to performance bugs like I observed with ratarmount. With large enough reads (in the limit, reading the whole sample file at once), the speeds are good though.
|
I saw that you are already using a BufferedReader and that there even is a
|
Hello,
I noticed that
indexed_gzip
seems to be 10 times slower than archivemount and fuse-archive when decoding large gzipped TARs with ratarmount. I would think that both use zlib, which makes this surprising to me. Are you compiling with-O3
? If not, would that help? Or maybe are there other bottlenecks like storing the windows? I just think that this 10x difference would be really helpful to improve upon. I'm pretty sure that it isn't ratarmount who is at fault here but some benchmarks using onlyindexed_gzip
or even the C backend directly still might be helpful for performance "debugging".The text was updated successfully, but these errors were encountered: