New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
q9 significantly faster than q6 for specific input #632
Comments
Additional measurementsAs part of my investigation, I'm comparing compressed size and compression speed of zstd and brotli, at various quality levels. I'm posting here a few of the results, in CSV format. The Patch that compresses very well (manifold)For the 2.2GB
Patch that doesn't compress well at all (perfect-heist)For comparison, with another patch mostly made up of Uncompressed data size is around 260MiB.
As you can see, the compression speeds (30MiB/s for q6, 5MiB/s for q9) are a lot more in line with what one would expect. Another patch that compresses well (fate)This patch is 1.6GiB uncompressed, and it also compresses quite well:
Again we see q9 being faster (and compressing better). |
Thanks for the report, investigation and a sample file. |
Hello. I'm terribly sorry - I haven't had time to analyze the roots of the problem. So I will keep this issue open. On the bright side - the last update have improved the situation. Here are the results of measurements, comparing brotli before and after the last commit.
Speed misbalance is fixed and compression ratio is improved =) |
As pointed out by @fasterthanlime, LGBLOCK is adjusted differently for quality >= 9, but in fact, it gets a larger value which results in faster and denser compression for this particular data. LGBLOCK is the length of so-called input blocks (typically 64K to 256K) and for each input block at least one command used to be produced (at commit da254cf). As a consequence, long chunks of zeros (spanning k full input blocks) resulted in a sequence of k equal commands (insert-length = 0, copy-length = 2^LGBLOCK, copy-distance = 1), although a single command would actually suffice (and the mandatory extra bits prevent Huffman coding from mitigating the issue). This is the reason why compression gets denser with longer input blocks on files with long chunks of zeros. In terms of compression time, with longer input blocks the hasher (which is invoked O(1) times per input block full of zeros) is invoked less times in total which results in faster compression. The described inefficiency has been fixed by introducing the method Finally, there is still some room for improving the compression time for qualities <= 9 by optimizing out some calls to the hasher which cannot further improve the HasherSearchResult found so far. |
I've been in touch with Jyrki Alakuijala on twitter after I noticed compressing with q9 being faster than q6, and they mentioned q9 being faster "unusual".
The behavior was observed deep into a golang program using (non-official) cgo bindings, so I've been trying to reproduce it with vanilla brotli, and I was finally able to!
Exhibit
This is not a "benchmark done right", but I've been playing with various input files for dozens of runs for the last 48h and I've noticed this pattern quite a bit.
How I built brotli
I followed the instructions in
README.md
- this was run on windows-amd64, compiled with cmake:CMake auto-detected and used
Visual Studio 14 2015
to compile. Note that when I originally observed the performance difference, brotli's sources were built with gcc 7.2.0 (built by the MSYS2 project) via cgo.The version of brotli I used was da254cf (HEAD of master at the time of this writing)
The data
Here's a
none.br.zip
file, which is a 2.3MB zip archive (due to github issues limitations) containing a brotli-q9-compressed version of my sample input.📦 none.br.zip
The uncompressed input (
none
) is a 2.2GB file. Its SHA-256 checksum is:352994da7a1b3f10e665e7eb4f477e9ebeaf5088742e4fddfacd7d4901d51df5
It compresses particularly well (brotli performs splendidly in particular) because it's a patch file inspired by the original BSDiff paper.
In particular, the file:
It's supposed to compress well because bsdiff is basically:
00000029000000000000290000000000
) - this is calleddiff data
extra data
Other differences from the original bsdiff:
diff data
extra data
diff data
andextra data
directly after the relevant instruction - and limits an instruction's slice size. This lets us to streaming diff & apply with lower memory/disk requirements, with a slight compression penalty, I have a tweet thread about this for the curious.Hypotheses
Although I've played with compression formats a lot, I cannot claim to understand its inner workings, but I've noticed the following:
speed up RLE-ish data compression
commit (and this input data might be RLE-ish? depending on the meaning of it), but afaict I was seeing the performance difference before that commit.Conclusion
At this point I'm tempted to use q=9 rather than q=6-8 (our diff process is much slower than brotli compression anyway, and we're running it on a machine with many cores, so it's not a bottleneck), but I figured it'd be interesting to brotli developers and I might learn a thing or two in the process :)
The text was updated successfully, but these errors were encountered: