-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Look to consolidate on HdrHistogram's V1 compressed wire form for storage instead of SkinnyHistogram #21
Comments
Hi @giltene, You are right, we created the SkinnyHistogram to get better and faster serialization and compression rates and times. It is very promising that V1 wire format can provide us with similar benefits. We will definetely upgrade to the latest versions of HdrHistogram and try the zlib compression. We'll let you know about our results. Regards, |
Hi @giltene, I would like to share with you some results comparing the SkinnyHistogram and the latest HdrHistogram version. Some scenarios are based on latencies of a realword service and others are artificial. It would be great to add more real cases to see how both implementations react to different scenarios but I think this is a good start.
The following summary output was run with
VM version: JDK 1.8.0_40, VM 25.40-b25 Compression
Throughput
jmh full output: https://gist.github.com/pablosmedina/5d4bd1ef727f4ccdc6af I'm sure we can include more cases. Please let me know your impressions. Thanks Pablo |
Ok. Results speak. And these certainly showed benefit. I now had time to do some more analysis and testing (in between walking the streets of Seattle on the weekend). It appears that your skinny representation wins is terms of space mainly due to two separate factors:
It also tends to win in terms of speed because of the reduced size of the stream that zlib compression has to process (presumably the additional but simpler scan doing delta encoding is much cheaper than the work removed). So for HdrHistogram, I've decided to adopt the main winning features. To start with, I've moved to using ZigZag LEB128 (see ZigZagEncoding.java: https://github.com/HdrHistogram/HdrHistogram/blob/master/src/main/java/org/HdrHistogram/ZigZagEncoding.java). Next, I played with variants of RLE, "TZLE" (Trailing Zero Length Encoding), and both combined, and it looks like TZLE alone works pretty well (adding RLE on top of that helps a bit in some cases and hurts a bit in others, for an overall loss, probably because it is somewhat redundant with zlib's RLE). The encoding scheme I ended up with is "simpler" than SkinnyHistogram's: Positive counts are directly encoded in the stream as ZigZag longs, and a negative count indicates a trailing-zeros count associated with the following count value. The nice way ZigZag handles small (closer to zero) negative numbers makes this work well). This scheme seems to produce similar or better compression ratios and speed to SkinnyHistogram across various test cases. To establish this comparison, I ported a part of SkinnyHistogram to Java, along with the data sets used in the above tests, to which I added some additional data sets (read in from actual jHiccup log output). see data sets and volume comparisons in https://github.com/HdrHistogram/HdrHistogram/blob/master/HdrHistogram-benchmarks/src/main/java/bench/HistogramData.java , which are all exercised when running the jmh benchmarks in https://github.com/HdrHistogram/HdrHistogram/blob/master/HdrHistogram-benchmarks/src/main/java/bench/HdrHistogramEncodingBench.java It is exciting to see this new scheme provide significant compression (volume) benefits across the board, and especially on actual jHiccup log lines (the 2 decimal point values there are the more realistic ones, and those show 18%-24% improvements). The fact that the compression speed is also 4-5x father is an obvious nicety as well. So: Thanks!! The Java code that uses this new encoding scheme by default is now on github, but not yet released as a version or to maven central. (It still supports decoding the older V0 and V1 formats, but will encode in this new V2 format). I'd appreciate it if you could play with it a bit and see how it looks in comparison to SkinnyHistogram in your environments. |
Hi @giltene, Those are very good news. We are glad to contribute to improve HdrHistogram. I'll take a look at your changes and try the new format. I'll let you know the results in our environments. Thanks! Pablo |
A new HdrHistogram 2.1.7 was just push to both github and maven central. It includes a V2 encoding (will still decode V1 and V0) that uses ZigZag LEB128 and ZLE (Zero Length Encoding, indicated by negative counts in the stream), largely inspired by the discussion above. It also includes benchmarking against a Java-port of the SkinnyHistogram logic. Below are the compression ratio comparisons vs. HdrHistogram 2.1.7 and 2.1.6 for the various cases now covered by the benchmark (using your original data sets and adding some). The results demonstrate both how SkinnyHistogram had better compression ratios than HdrHistogram's V1, and how HdrHistogram's V2 (in 2.1.7) has dramatically improved, now beating SkinnyHistogram across the board (with a single exception where they are 2 bytes off in total size, making HdrHistogram's V2 0.97% larger). Speed-wise things look pretty good as well, although the fact that my benchmarks of SkinnyHistogram's speed use my own Java port of Skinny and my own ZigZag LEB128 implementation (as opposed to the one from kryo that Khronus uses) ) may have a large effect here (the data size comparisons are much easier to trust that the code speed comparisons given the difference in implementation). HdrHistogram's (new) encodeIntoCompressedByteBuffer() now appears to be 2x+ to 4x+ faster than (my port of) skinnyEncodeIntoCompressedByteBuffer() across the entire set of data samples used. I'm not sure why the speed difference is so high, but the results look good... HdrHistogram 2.1.7/Skinny/%Reduction [uncompressed] and compressed:
HdrHistogram 2.1.6/Skinny/%Reduction [uncompressed] and compressed:
Summary output from running the following on my laptop (2.3 GHz Intel Core i7 MacBook Pro):
|
Hi @giltene, Awesome! Looks very good! I'll definitely try the new hdr. I guess that your ZigZag implementation makes the difference in terms of speed. We will test it and upgrade to 2.1.7. |
Done. Thanks @giltene ! |
SkinnyHistogram appears to have been created before HdrHistigram's V1 wire format stabilized. On the assumption that the purpose for SkinnyHistogram was improved compressibility, I believe that the V1 wire format should roughly match it (or beat it) in that sense.
Based on some experiments (playing with adding forms of RLE and non-zero field encodings for sparse matrixes) I had run, the zlib compression used in the compressed HdrHistogram wire form tends to pick up on all the benefits that RLE and/or only-non-zero encoding schemes get, resulting in similar compressed payload sizes regardless of whether or not some form of pre-zlib efficiency is attempted. Furthermore, when Histograms are not extremely sparse (e.g. whenever a latency hiccups occurs that ends up covering a wide spectrum of values), zlib alone seems to slightly beat RLE+zlib in eventual payload size compactness.
The benefit of using the common wire format is that it is currently supported in Java, C, and Python ports, and I expect it to be the one other language variants adopt as well...
The text was updated successfully, but these errors were encountered: