Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This makes several changes that in combination gives close to the same compression, but with a big speedup in most cases. We change the hash table to contain hashes of 6 bytes. The speed is about the same, but this usually gives better compression since hashes are of better quality. This typically also makes the content faster to decode since longer matches are preferred. Hash table size is now defined separately of window size. I found that 16 bits was a good value, especially since the better hash table opens up for other optimization. We check 3 bytes, then skip one (plus more if data is hard to compress. This gives most of the speedup, but also looses us some compression. We index 2 bytes before the end of each match. This doesn't impact speed much and gives a nice compression boost. This combines well with #49 (not included in this benchmark) Now for the numbers. They are all before/after, best of 2 runs. ``` file out level insize outsize millis mb/s consensus.db.10gb lz4 0 10737418240 5057961420 35446 288.88 consensus.db.10gb lz4 0 10737418240 5077608378 23226 440.87 file out level insize outsize millis mb/s rawstudio-mint14.tar lz4 0 8558382592 4568741520 25369 321.73 rawstudio-mint14.tar lz4 0 8558382592 4592776475 17168 475.41 file out level insize outsize millis mb/s github-ranks-backup.bin lz4 0 1862623243 579273817 4074 436.02 github-ranks-backup.bin lz4 0 1862623243 627056167 3522 504.35 file out level insize outsize millis mb/s github-june-2days-2019.json lz4 0 6273951764 1355117284 10763 555.86 github-june-2days-2019.json lz4 0 6273951764 1293582359 9136 654.91 file out level insize outsize millis mb/s gob-stream lz4 0 1911399616 384235547 3481 523.66 gob-stream lz4 0 1911399616 384292384 2827 644.80 file out level insize outsize millis mb/s 10gb.tar lz4 0 10065157632 6481808453 23629 406.23 10gb.tar lz4 0 10065157632 5902162074 22592 424.88 file out level insize outsize millis mb/s enwik9 lz4 0 1000000000 489160425 3733 255.47 enwik9 lz4 0 1000000000 482276927 3520 270.93 file out level insize outsize millis mb/s silesia.tar lz4 0 211947520 99218419 691 292.51 silesia.tar lz4 0 211947520 96766005 590 342.01 file out level insize outsize millis mb/s sharnd.out lz4 0 500000000 500000495 169 2821.52 sharnd.out lz4 0 500000000 500000495 166 2872.51 ``` Only [github-ranks-backup.bin](https://files.klauspost.com/compress/github-ranks-backup.bin.zst) has a significant size increase. The others are very close or better than before. All show minor to a significant speedup.
- Loading branch information
Showing
5 changed files
with
81 additions
and
33 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.