Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use atomics instead of synchronized for TimeWindowQuantiles #483

Closed
wants to merge 5 commits into from
Closed

Use atomics instead of synchronized for TimeWindowQuantiles #483

wants to merge 5 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Jun 5, 2019

Idea:

We could implement TimeWindowQuantiles#rotate with atomics (compare-and-swap, retry-loop), as buckets are only rotated occasionally (every 2 minutes by default).

Pros:

  • avoids suspending threads for highly concurrent workloads (get, insert)
  • reduces constant overhead (get, insert)

Cons:

  • introduces busy waiting and retry-loop (rotate)
  • reduces locality (CKMSQuantiles[] vs ConcurrentLinkedQueue<CKMSQuantiles>)

Neutral:

  • changes thread contention characteristics (synchronized vs CAS & retry)

Please check the code comments for considerations about correctness.

Relates to #480 , #481

Signed-off-by: Rudolf Rakos <rrakos@evolutiongaming.com>
@ghost
Copy link
Author

ghost commented Jun 5, 2019

I ran multiple benchmarks with 24 threads and picked the result with the smallest error, but of course I can’t know how it behaves under different workloads.

It looks like this doesn't make a significant difference. 🙁

Benchmark:

# JMH version: 1.21
# VM version: JDK 11.0.2, OpenJDK 64-Bit Server VM, 11.0.2+9
# VM invoker: /Library/Java/JavaVirtualMachines/openjdk-11.0.2.jdk/Contents/Home/bin/java
# VM options: -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=49590:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8
# Warmup: 5 iterations, 10 s each
# Measurement: 4 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 24 threads, will synchronize iterations

Before:

Benchmark                                                           Mode  Cnt       Score        Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4     186.796 ±     15.029  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4      61.782 ±      5.032  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4      59.388 ±     12.457  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  530692.958 ± 627614.926  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  563807.323 ± 389413.917  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  559719.005 ± 631676.260  ns/op

After:

Benchmark                                                           Mode  Cnt       Score        Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4     203.730 ±     16.390  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4      64.107 ±     19.793  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4      70.482 ±     15.269  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  502982.254 ± 521478.460  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  531822.021 ± 753135.995  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  607377.546 ± 686712.470  ns/op

Ps.: I'm still running benchmarks with more frequent rotations.

@ghost

This comment has been minimized.

Signed-off-by: Rudolf Rakos <rrakos@evolutiongaming.com>
@ghost
Copy link
Author

ghost commented Jun 7, 2019

c18a7dd:

Benchmark                                                           Mode  Cnt       Score        Error  Units
SummaryBenchmark.codahaleHistogramBenchmark                         avgt    4    5306.534 ±    585.112  ns/op
SummaryBenchmark.prometheusSimpleHistogramBenchmark                 avgt    4     246.799 ±     39.907  ns/op
SummaryBenchmark.prometheusSimpleHistogramChildBenchmark            avgt    4      86.755 ±      3.767  ns/op
SummaryBenchmark.prometheusSimpleHistogramNoLabelsBenchmark         avgt    4      92.294 ±     14.266  ns/op
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4     206.669 ±     20.043  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4      61.726 ±      9.183  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4      64.475 ±      3.540  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  508214.578 ± 445238.056  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  542223.756 ± 666929.481  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  541703.117 ± 658466.450  ns/op

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's so much noise in the benchmark that we can't tell much.
What if you hack it to rotate on every insert, and benchmark then?

Signed-off-by: Rudolf Rakos <rrakos@evolutiongaming.com>
@ghost
Copy link
Author

ghost commented Jun 13, 2019

e364c84:

Benchmark                                                           Mode  Cnt       Score        Error  Units
SummaryBenchmark.codahaleHistogramBenchmark                         avgt    4    6577.726 ±    949.927  ns/op
SummaryBenchmark.prometheusSimpleHistogramBenchmark                 avgt    4     367.783 ±    494.777  ns/op
SummaryBenchmark.prometheusSimpleHistogramChildBenchmark            avgt    4      95.329 ±     33.166  ns/op
SummaryBenchmark.prometheusSimpleHistogramNoLabelsBenchmark         avgt    4      98.552 ±     30.289  ns/op
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4     227.764 ±    112.328  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4      65.826 ±      5.762  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4      64.002 ±      5.090  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  510744.048 ± 414448.241  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  526222.356 ± 525411.576  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  538330.329 ± 663403.555  ns/op

Signed-off-by: Rudolf Rakos <rrakos@evolutiongaming.com>
@brian-brazil
Copy link
Contributor

Hmm, test failure but I don't think it's due to you.

@ghost
Copy link
Author

ghost commented Jun 15, 2019

Benchmarks with unrealistically frequent rotations by hardcoding TimeWindowQuantiles. durationBetweenRotates.

Before - 4e0e752 - durationBetweenRotates=1ms:

Benchmark                                                           Mode  Cnt      Score      Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4    180.660 ±   31.025  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4     59.794 ±    2.183  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4     62.603 ±    2.560  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  16668.278 ±  899.351  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  16473.560 ± 1271.724  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  16279.621 ± 1513.338  ns/op

After - ca4d1c6 - durationBetweenRotates=1ms:

Benchmark                                                           Mode  Cnt      Score      Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4    212.302 ±   12.712  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4     65.080 ±   13.284  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4     65.784 ±    9.402  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  18453.884 ±  297.138  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  18400.019 ± 1027.233  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  18478.745 ±  512.781  ns/op

Before - 4e0e752 - durationBetweenRotates=100us:

Benchmark                                                           Mode  Cnt      Score     Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4    190.128 ±  37.522  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4     59.393 ±   1.586  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4     62.860 ±   7.727  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  12261.290 ± 206.572  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  12279.661 ± 117.961  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  12317.630 ± 270.580  ns/op

After - ca4d1c6 - durationBetweenRotates=100us:

Benchmark                                                           Mode  Cnt      Score     Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4    184.746 ±  11.240  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4     60.173 ±   1.495  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4     59.304 ±   0.370  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  12573.994 ± 151.325  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  12656.324 ± 580.513  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  12633.963 ± 596.408  ns/op

@brian-brazil
Copy link
Contributor

Hmm, it's showing as slower.

@ghost
Copy link
Author

ghost commented Jun 17, 2019

I'll experiment with CopyOnWriteArrayList / AtomicReferenceArray/ volatile array, maybe it's possible to use it in a non-blocking way without affecting correctness.

…tLinkedQueue<CKMSQuantiles>

Signed-off-by: Rudolf Rakos <rrakos@evolutiongaming.com>
@ghost
Copy link
Author

ghost commented Jun 18, 2019

ed59f81:

Benchmark                                                           Mode  Cnt       Score        Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4     181.323 ±      7.251  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4      59.402 ±      0.821  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4      61.234 ±      4.918  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  563721.916 ± 631300.414  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  570436.118 ± 961307.427  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  568640.914 ± 948878.246  ns/op

ed59f81 - durationBetweenRotates=1ms:

Benchmark                                                           Mode  Cnt      Score     Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4    183.888 ±   3.245  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4     60.077 ±   2.056  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4     61.730 ±   2.587  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  17614.669 ± 503.163  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  17400.240 ± 450.374  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  17494.129 ± 361.709  ns/op

ed59f81 - durationBetweenRotates=100us:

Benchmark                                                           Mode  Cnt      Score     Error  Units
SummaryBenchmark.prometheusSimpleSummaryBenchmark                   avgt    4    188.570 ±  24.384  ns/op
SummaryBenchmark.prometheusSimpleSummaryChildBenchmark              avgt    4     61.799 ±   4.451  ns/op
SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark           avgt    4     67.595 ±   3.957  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesBenchmark          avgt    4  12654.044 ± 507.662  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesChildBenchmark     avgt    4  12626.006 ± 211.931  ns/op
SummaryBenchmark.prometheusSimpleSummaryQuantilesNoLabelsBenchmark  avgt    4  12492.773 ± 163.373  ns/op

@ghost
Copy link
Author

ghost commented Jul 1, 2019

I'll close this Pull Request in a couple of days, as I won't be able to test and profile the changes, and #481 is significantly faster under high thread contention according to benchmarks.

@ghost ghost closed this Jul 3, 2019
@ghost ghost deleted the summary-atomic-twq branch July 3, 2019 09:21
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant