Skip to content
gtoubassi edited this page Mar 13, 2011 · 22 revisions

Performance

As of 4/13/2011, performance of femtozip vs gzip/deflate and gzip/deflate+dictionary:

Algorithm Compression time (millis) Decompression time (millis)
FemtoZip 1568 73
GZip 283 81
GZip+Dict 2886 359

Conclusions

  • FemtoZip is faster on decompression. This is attributed to the fact that windowing complexity is eliminated, and more importantly the fact that a huffman tree does not have to be computed on the fly (in fact gzip computes huffman trees 2 ways: custom and default, in order to compare storage tradeoffs since the cost of a custom tree impacts compression rate).
  • FemtoZip gets rocked by GZip on decompression. Partly because it is using a dictionary which means more prior substrings to compare. When that is eliminated, FZ still takes about 3.2x longer, and initial investigation using Instruments.app shows it is all the memory management associated with the PrefixHash (both allocation and freeing). Some overhead due to streams in BitInput/Output. Lets kill streams as the higher level apis assume buffer/length.
  • FemtoZip is faster on compression than GZip+Dict. This is because GZip+Dict must hash the dictionary for each document. No api for prehashing a dictionary once and reusing as multiple documents are compressed. No idea why FemtoZip would be faster than GZip+Dict on decompression. Maybe there is some unnecessary dictionary based overhead?

Methodology

  • 10000 documents generated by cpp-datagen.
  • Generate default (femtozip) model and gzip (--models GZip) and gzip+dioct (--models GZipDictionary)
  • Compress using command line like: ./fzip --model /tmp/user-json/model.fzm --compress --benchmark /tmp/user-json/test
  • Compress using command line like: ./fzip --model /tmp/user-json/model.fzm --decompress --benchmark /tmp/user-json/test-fz
  • Average of 5 runs on macbook air
  • Note --benchmark doesn't actually write out the output files, so to do the decompress you will have to run it once without --benchmark to generate those files.
Clone this wiki locally