Uses MergingDigest instead of AVLDigest in percentiles agg #28702

colings86 · 2018-02-16T09:45:56Z

This change modifies TDigestState to use MergingDigest instead of AVLTreeDigest. Benchmarks comparing the insertion performance of the two implementations show that MergingDigest is between 2 and 250 times faster than AVLTreeDigest in the tested scenarios. Details of the benchmark are below.

The change is straight forward but one thing which is still outstanding is to add a test which tests the assumption that the serialisation is compatible between older versions using AVLTreeDigest and new versions after this change using MergingDigest. This should be covered by bwc tests in theory but I would like to add an explicit unit test to check this as well.

The benchmark was performed on the Raw classes outside of Elasticsearch. Three scenarios were tested but in each case the test was warmed up by inserting 10,000 values into each implementation over 10 warm up runs and then measurements were taken inserting 1,000,000 values over 100 test runs for each implementation. For each test run the same values were used on both implementations. The compression was 100.0 in all tests. Measurements were taken by calling System.nanoTime() before and after inserting the 1,000,000 values. Measurements were combined into a mean average for the final results.

Scenarios:

Random Value Doubles - Test values were doubles taken from java.util.Random.nextDouble(). For each test run the Random instance was seeded with the same seed for both implementations but the seed was changed for each test run.
Sequential Integer Values As Doubles - Test values were taken from an incrementing long variable and then converted to a double using Double.valueOf(long). The long variable was reset for every test run and between testing each implementation.
Repeated Same Value Doubles - Test values were taken from java.util.Random.nextDouble() at the beginning of a test run and this same value was repeatedly inserted into the implementation under test. The value was kept the same between the two implementations but a new value was obtained between test runs.

Results

Random Value Doubles

Average AVLTreeDigest (ns): 2.4393659506E8
Average AVLTreeDigest (ms): 243.93659506
Average MergingDigest (ns): 1.1975786487E8
Average MergingDigest (ms): 119.75786487
Average MergingDigest / Average AVLTreeDigest (%): 49.09384950648495
Average AVLTreeDigest / Average MergingDigest (raw value): 2.036915031215686

Sequential Integer Values As Doubles

Average AVLTreeDigest (ns): 9.4348391258E8
Average AVLTreeDigest (ms): 943.48391258
Average MergingDigest (ns): 1.0778998414E8
Average MergingDigest (ms): 107.78998414
Average MergingDigest / Average AVLTreeDigest (%): 11.424676425615287
Average AVLTreeDigest / Average MergingDigest (raw value): 8.752983128326491

Repeated Same Value Doubles

Average AVLTreeDigest (ns): 7.421614153E9
Average AVLTreeDigest (ms): 7421.614153
Average MergingDigest (ns): 2.755736346E7
Average MergingDigest (ms): 27.55736346
Average MergingDigest / Average AVLTreeDigest (%): 0.3713122629645282
Average AVLTreeDigest / Average MergingDigest (raw value): 269.315102069637521

Closes #19528

colings86 · 2018-02-19T08:59:40Z

Additionally to the benchmark I performed above, @danielmitterdorfer wrote a JMH benchmark test with the same methodology and ran it on our microbenchmark infrastructure. These benchmarks show less of a difference between the two algorithms but still show a 2-10 times improvement in speed for MergingDigest when compared to AVLTreeDigest:

Benchmark                                 (algorithm)  (compression)  (numRecordsToAdd)  Mode  Cnt    Score     Error  Units
TDigestBenchmark.measureRandomValues              avl          100.0             100000  avgt   30   25.539 ±   0.120  ms/op
TDigestBenchmark.measureRandomValues              avl          100.0            1000000  avgt   30  254.238 ±   1.494  ms/op
TDigestBenchmark.measureRandomValues          merging          100.0             100000  avgt   30   11.277 ±   0.017  ms/op
TDigestBenchmark.measureRandomValues          merging          100.0            1000000  avgt   30  112.874 ±   0.139  ms/op
TDigestBenchmark.measureRepeatedValues            avl          100.0             100000  avgt   30   18.373 ±   2.710  ms/op
TDigestBenchmark.measureRepeatedValues            avl          100.0            1000000  avgt   30  319.961 ± 233.514  ms/op
TDigestBenchmark.measureRepeatedValues        merging          100.0             100000  avgt   30    3.643 ±   0.100  ms/op
TDigestBenchmark.measureRepeatedValues        merging          100.0            1000000  avgt   30   34.807 ±   0.214  ms/op
TDigestBenchmark.measureSequentialValues          avl          100.0             100000  avgt   30   14.979 ±   0.269  ms/op
TDigestBenchmark.measureSequentialValues          avl          100.0            1000000  avgt   30  154.670 ±   1.717  ms/op
TDigestBenchmark.measureSequentialValues      merging          100.0             100000  avgt   30    6.857 ±   0.013  ms/op
TDigestBenchmark.measureSequentialValues      merging          100.0            1000000  avgt   30   69.149 ±   0.170  ms/op

It should be noted that TDigestBenchmark.measureRepeatedValues avl 100.0 1000000 did not reach a stable state so in order to be 100% sure we'd need to run more iterations and if it still does not reach a stable state we'd need to investigate why. I think the goal of this benchmark is still achieved regardless since we wanted to make sure that MergingDigest improved insertion performance for the percentiles aggregation in all three of these cases, which the above results show (even with the large error on one of the cases).

colings86 · 2018-03-13T11:53:57Z

@elastic/es-search-aggs

This change modifies TDigestState to use MergingDigest instead of AVLTreeDigest. Benchmarks comparing the insertion performance of the two implementations show that MergingDigest is between 2 and 250 times faster than AVLTreeDigest in the tested scenarios. Details of the benchmark are below. The benchmark was performed on the Raw classes outside of Elasticsearch. Three scenarios were tested but in each case the test was warmed up by inserting 10,000 values into each implementation over 10 warm up runs and then measurements were taken inserting 1,000,000 values over 100 test runs for each implementation. For each test run the same values were used on both implementations. The compression was 100.0 in all tests. Measurements were taken by calling `System.nanoTime()` before and after inserting the 1,000,000 values. Measurements were combined into a mean average for the final results. 1. Random Value Doubles - Test values were doubles taken from `java.util.Random.nextDouble()`. For each test run the Random instance was seeded with the same seed for both implementations but the seed was changed for each test run. 2. Sequential Integer Values As Doubles - Test values were taken from an incrementing long variable and then converted to a double using `Double.valueOf(long)`. The long variable was reset for every test run and between testing each implementation. 3. Repeated Same Value Doubles - Test values were taken from `java.util.Random.nextDouble()` at the beginning of a test run and this same value was repeatedly inserted into the implementation under test. The value was kept the same between the two implementations but a new value was obtained between test runs. Average AVLTreeDigest (ns): 2.4393659506E8 Average AVLTreeDigest (ms): 243.93659506 Average MergingDigest (ns): 1.1975786487E8 Average MergingDigest (ms): 119.75786487 Average MergingDigest / Average AVLTreeDigest (%): 49.09384950648495 Average AVLTreeDigest / Average MergingDigest (raw value): 2.036915031215686 Average AVLTreeDigest (ns): 9.4348391258E8 Average AVLTreeDigest (ms): 943.48391258 Average MergingDigest (ns): 1.0778998414E8 Average MergingDigest (ms): 107.78998414 Average MergingDigest / Average AVLTreeDigest (%): 11.424676425615287 Average AVLTreeDigest / Average MergingDigest (raw value): 8.752983128326491 Average AVLTreeDigest (ns): 7.421614153E9 Average AVLTreeDigest (ms): 7421.614153 Average MergingDigest (ns): 2.755736346E7 Average MergingDigest (ms): 27.55736346 Average MergingDigest / Average AVLTreeDigest (%): 0.3713122629645282 Average AVLTreeDigest / Average MergingDigest (raw value): 269.315102069637521

colings86 · 2018-09-13T08:57:00Z

Things still to do on this PR:

Ensure bwc in serialisation between AVLTree and merging digests with a test (i.e. a AVLTree digest should serialise to wire format and deserialise to a Meging digest)

colings86 · 2018-11-09T14:35:23Z

Since this PR is superseded by #35182 I'm going to close it

colings86 added >enhancement :Analytics/Aggregations Aggregations v7.0.0 v6.3.0 labels Feb 16, 2018

colings86 self-assigned this Feb 16, 2018

colings86 requested a review from jpountz February 16, 2018 09:45

colings86 added the WIP label Feb 16, 2018

danielmitterdorfer mentioned this pull request Mar 9, 2018

Use String.join() to describe a list of tasks #28941

Merged

colings86 mentioned this pull request Mar 13, 2018

percentiles aggregation return anomaly value when hits too large #19528

Closed

bleskes added v6.3.1 v6.4.0 and removed v6.3.0 v6.3.1 labels Apr 26, 2018

lcawl added v6.4.1 and removed v6.4.0 labels Aug 23, 2018

colings86 removed the v6.4.1 label Sep 13, 2018

not-napoleon mentioned this pull request Nov 1, 2018

Feature/replace avl digest with merging digest #35182

Closed

colings86 closed this Nov 9, 2018

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uses MergingDigest instead of AVLDigest in percentiles agg #28702

Uses MergingDigest instead of AVLDigest in percentiles agg #28702

colings86 commented Feb 16, 2018 •

edited

Loading

colings86 commented Feb 19, 2018 •

edited

Loading

colings86 commented Mar 13, 2018

colings86 commented Sep 13, 2018

colings86 commented Nov 9, 2018

Uses MergingDigest instead of AVLDigest in percentiles agg #28702

Uses MergingDigest instead of AVLDigest in percentiles agg #28702

Conversation

colings86 commented Feb 16, 2018 • edited Loading

Scenarios:

Results

Random Value Doubles

Sequential Integer Values As Doubles

Repeated Same Value Doubles

colings86 commented Feb 19, 2018 • edited Loading

colings86 commented Mar 13, 2018

colings86 commented Sep 13, 2018

colings86 commented Nov 9, 2018

colings86 commented Feb 16, 2018 •

edited

Loading

colings86 commented Feb 19, 2018 •

edited

Loading