Skip to content
gtoubassi edited this page Mar 14, 2011 · 22 revisions

Performance

As of 4/13/2011, performance of femtozip vs gzip/deflate and gzip/deflate+dictionary:

Algorithm Compression time (millis) Decompression time (millis) Compression ratio
FemtoZip 1362 60 29.7%
FemtoZip No Dict* 159 115 99.3%
GZip 283 81 92.9%
GZip+Dict 2886 359 52.8%
* - FemtoZip No Dict is simply femtozip hacked to have no dictionary, which although in practice should not occur, gives an idea how the core algorithm performs when compared with vanilla gzip (comparing FZ with GZ+Dict isn't great apples and apples because GZip+Dict has such poor performance on the compression side

Conclusions

  • FemtoZip is faster on decompression. This is attributed to the fact that windowing complexity is eliminated, and more importantly the fact that a huffman tree does not have to be computed on the fly (in fact gzip computes huffman trees 2 ways: custom and default, in order to compare storage tradeoffs since the cost of a custom tree impacts compression rate).
  • FemtoZip is faster than GZip+Dict on compression, but slower than GZip. The existance of a dictionary slows down compression because more matches need to be pursued. This is to be expected but gives an idea of practical compression performance vs vanilla GZip. FemtoZip No Dict shows what the core compression algorithm does without a dictionary for a more apples/apples comparison. In this case FZ is faster.

Methodology

  • 10000 documents generated by cpp-datagen.
  • Generate default (femtozip) model and gzip (--models GZip) and gzip+dioct (--models GZipDictionary)
  • Compress using command line like: ./fzip --model /tmp/user-json/model.fzm --compress --benchmark /tmp/user-json/test
  • Compress using command line like: ./fzip --model /tmp/user-json/model.fzm --decompress --benchmark /tmp/user-json/test-fz
  • Average of 5 runs on macbook air
  • Note --benchmark doesn't actually write out the output files, so to do the decompress you will have to run it once without --benchmark to generate those files.