Use atomics instead of synchronized for TimeWindowQuantiles #483

ghost · 2019-06-05T15:08:56Z

Idea:

We could implement TimeWindowQuantiles#rotate with atomics (compare-and-swap, retry-loop), as buckets are only rotated occasionally (every 2 minutes by default).

Pros:

avoids suspending threads for highly concurrent workloads (get, insert)
reduces constant overhead (get, insert)

Cons:

introduces busy waiting and retry-loop (rotate)
~~reduces locality (CKMSQuantiles[] vs ConcurrentLinkedQueue<CKMSQuantiles>)~~

Neutral:

changes thread contention characteristics (synchronized vs CAS & retry)

Please check the code comments for considerations about correctness.

Relates to #480 , #481

Signed-off-by: Rudolf Rakos <rrakos@evolutiongaming.com>

ghost · 2019-06-05T15:15:15Z

I ran multiple benchmarks with 24 threads and picked the result with the smallest error, but of course I can’t know how it behaves under different workloads.

It looks like this doesn't make a significant difference. 🙁

Benchmark:

# JMH version: 1.21
# VM version: JDK 11.0.2, OpenJDK 64-Bit Server VM, 11.0.2+9
# VM invoker: /Library/Java/JavaVirtualMachines/openjdk-11.0.2.jdk/Contents/Home/bin/java
# VM options: -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=49590:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8
# Warmup: 5 iterations, 10 s each
# Measurement: 4 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 24 threads, will synchronize iterations

Before:

Benchmark                                                           Mode  Cnt       Score        Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4     186.796 ±     15.029  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4      61.782 ±      5.032  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4      59.388 ±     12.457  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  530692.958 ± 627614.926  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  563807.323 ± 389413.917  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  559719.005 ± 631676.260  ns/op

After:

Benchmark                                                           Mode  Cnt       Score        Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4     203.730 ±     16.390  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4      64.107 ±     19.793  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4      70.482 ±     15.269  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  502982.254 ± 521478.460  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  531822.021 ± 753135.995  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  607377.546 ± 686712.470  ns/op

Ps.: I'm still running benchmarks with more frequent rotations.

simpleclient/src/main/java/io/prometheus/client/TimeWindowQuantiles.java

Signed-off-by: Rudolf Rakos <rrakos@evolutiongaming.com>

ghost · 2019-06-07T19:37:29Z

c18a7dd:

Benchmark                                                           Mode  Cnt       Score        Error  Units
SummaryBenchmark.codahaleHistogramBenchmark                         avgt    4    5306.534 ±    585.112  ns/op
SummaryBenchmark.prometheusSimpleHistogramBenchmark                 avgt    4     246.799 ±     39.907  ns/op
SummaryBenchmark.prometheusSimpleHistogramChildBenchmark            avgt    4      86.755 ±      3.767  ns/op
SummaryBenchmark.prometheusSimpleHistogramNoLabelsBenchmark         avgt    4      92.294 ±     14.266  ns/op
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4     206.669 ±     20.043  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4      61.726 ±      9.183  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4      64.475 ±      3.540  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  508214.578 ± 445238.056  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  542223.756 ± 666929.481  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  541703.117 ± 658466.450  ns/op

brian-brazil

There's so much noise in the benchmark that we can't tell much.
What if you hack it to rotate on every insert, and benchmark then?

simpleclient/src/main/java/io/prometheus/client/TimeWindowQuantiles.java

Signed-off-by: Rudolf Rakos <rrakos@evolutiongaming.com>

simpleclient/src/main/java/io/prometheus/client/TimeWindowQuantiles.java

ghost · 2019-06-13T11:21:06Z

e364c84:

Benchmark                                                           Mode  Cnt       Score        Error  Units
SummaryBenchmark.codahaleHistogramBenchmark                         avgt    4    6577.726 ±    949.927  ns/op
SummaryBenchmark.prometheusSimpleHistogramBenchmark                 avgt    4     367.783 ±    494.777  ns/op
SummaryBenchmark.prometheusSimpleHistogramChildBenchmark            avgt    4      95.329 ±     33.166  ns/op
SummaryBenchmark.prometheusSimpleHistogramNoLabelsBenchmark         avgt    4      98.552 ±     30.289  ns/op
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4     227.764 ±    112.328  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4      65.826 ±      5.762  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4      64.002 ±      5.090  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  510744.048 ± 414448.241  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  526222.356 ± 525411.576  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  538330.329 ± 663403.555  ns/op

Signed-off-by: Rudolf Rakos <rrakos@evolutiongaming.com>

brian-brazil · 2019-06-14T12:30:12Z

Hmm, test failure but I don't think it's due to you.

ghost · 2019-06-15T13:13:10Z

Benchmarks with unrealistically frequent rotations by hardcoding TimeWindowQuantiles. durationBetweenRotates.

Before - 4e0e752 - durationBetweenRotates=1ms:

Benchmark                                                           Mode  Cnt      Score      Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4    180.660 ±   31.025  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4     59.794 ±    2.183  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4     62.603 ±    2.560  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  16668.278 ±  899.351  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  16473.560 ± 1271.724  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  16279.621 ± 1513.338  ns/op

After - ca4d1c6 - durationBetweenRotates=1ms:

Benchmark                                                           Mode  Cnt      Score      Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4    212.302 ±   12.712  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4     65.080 ±   13.284  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4     65.784 ±    9.402  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  18453.884 ±  297.138  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  18400.019 ± 1027.233  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  18478.745 ±  512.781  ns/op

Before - 4e0e752 - durationBetweenRotates=100us:

Benchmark                                                           Mode  Cnt      Score     Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4    190.128 ±  37.522  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4     59.393 ±   1.586  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4     62.860 ±   7.727  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  12261.290 ± 206.572  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  12279.661 ± 117.961  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  12317.630 ± 270.580  ns/op

After - ca4d1c6 - durationBetweenRotates=100us:

Benchmark                                                           Mode  Cnt      Score     Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4    184.746 ±  11.240  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4     60.173 ±   1.495  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4     59.304 ±   0.370  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  12573.994 ± 151.325  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  12656.324 ± 580.513  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  12633.963 ± 596.408  ns/op

brian-brazil · 2019-06-17T07:36:55Z

Hmm, it's showing as slower.

ghost · 2019-06-17T10:20:05Z

I'll experiment with CopyOnWriteArrayList / AtomicReferenceArray/ volatile array, maybe it's possible to use it in a non-blocking way without affecting correctness.

…tLinkedQueue<CKMSQuantiles> Signed-off-by: Rudolf Rakos <rrakos@evolutiongaming.com>

ghost · 2019-06-18T10:24:20Z

ed59f81:

Benchmark                                                           Mode  Cnt       Score        Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4     181.323 ±      7.251  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4      59.402 ±      0.821  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4      61.234 ±      4.918  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  563721.916 ± 631300.414  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  570436.118 ± 961307.427  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  568640.914 ± 948878.246  ns/op

ed59f81 - durationBetweenRotates=1ms:

Benchmark                                                           Mode  Cnt      Score     Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4    183.888 ±   3.245  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4     60.077 ±   2.056  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4     61.730 ±   2.587  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  17614.669 ± 503.163  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  17400.240 ± 450.374  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  17494.129 ± 361.709  ns/op

ed59f81 - durationBetweenRotates=100us:

Benchmark                                                           Mode  Cnt      Score     Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4    188.570 ±  24.384  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4     61.799 ±   4.451  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4     67.595 ±   3.957  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  12654.044 ± 507.662  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  12626.006 ± 211.931  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  12492.773 ± 163.373  ns/op

ghost · 2019-07-01T09:51:44Z

I'll close this Pull Request in a couple of days, as I won't be able to test and profile the changes, and #481 is significantly faster under high thread contention according to benchmarks.

Use atomic update instead of synchronized for TimeWindowQuantiles

37d8303

Signed-off-by: Rudolf Rakos <rrakos@evolutiongaming.com>

ghost commented Jun 5, 2019

View reviewed changes

simpleclient/src/main/java/io/prometheus/client/TimeWindowQuantiles.java Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

ghost commented Jun 6, 2019

View reviewed changes

simpleclient/src/main/java/io/prometheus/client/TimeWindowQuantiles.java Outdated Show resolved Hide resolved

ghost mentioned this pull request Jun 6, 2019

HdrSummary based on Summary and HdrHistogram #484

Closed

Fix and simplify TimeWindowQuantiles concurrent bucket rotation

c18a7dd

Signed-off-by: Rudolf Rakos <rrakos@evolutiongaming.com>

brian-brazil reviewed Jun 10, 2019

View reviewed changes

simpleclient/src/main/java/io/prometheus/client/TimeWindowQuantiles.java Outdated Show resolved Hide resolved

Cleanup code style (naming conventions) [code review]

e364c84

Signed-off-by: Rudolf Rakos <rrakos@evolutiongaming.com>

brian-brazil reviewed Jun 12, 2019

View reviewed changes

simpleclient/src/main/java/io/prometheus/client/TimeWindowQuantiles.java Outdated Show resolved Hide resolved

Use System#nanoTime instead of System#currentTimeMillis [code review]

ca4d1c6

Signed-off-by: Rudolf Rakos <rrakos@evolutiongaming.com>

Experiment with AtomicReference<CKMSQuantiles[]> instead of Concurren…

ed59f81

…tLinkedQueue<CKMSQuantiles> Signed-off-by: Rudolf Rakos <rrakos@evolutiongaming.com>

ghost mentioned this pull request Jun 18, 2019

Thread contention in TimeWindowQuantiles because of synchronized #480

Closed

ghost closed this Jul 3, 2019

ghost deleted the summary-atomic-twq branch July 3, 2019 09:21

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use atomics instead of synchronized for TimeWindowQuantiles #483

Use atomics instead of synchronized for TimeWindowQuantiles #483

ghost commented Jun 5, 2019 •

edited by ghost

ghost commented Jun 5, 2019

This comment has been minimized.

ghost commented Jun 7, 2019

brian-brazil left a comment

ghost commented Jun 13, 2019

brian-brazil commented Jun 14, 2019

ghost commented Jun 15, 2019

brian-brazil commented Jun 17, 2019

ghost commented Jun 17, 2019 •

edited by ghost

ghost commented Jun 18, 2019

ghost commented Jul 1, 2019

Use atomics instead of synchronized for TimeWindowQuantiles #483

Use atomics instead of synchronized for TimeWindowQuantiles #483

Conversation

ghost commented Jun 5, 2019 • edited by ghost

ghost commented Jun 5, 2019

This comment has been minimized.

ghost commented Jun 7, 2019

brian-brazil left a comment

Choose a reason for hiding this comment

ghost commented Jun 13, 2019

brian-brazil commented Jun 14, 2019

ghost commented Jun 15, 2019

brian-brazil commented Jun 17, 2019

ghost commented Jun 17, 2019 • edited by ghost

ghost commented Jun 18, 2019

ghost commented Jul 1, 2019

ghost commented Jun 5, 2019 •

edited by ghost

ghost commented Jun 17, 2019 •

edited by ghost