-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flatten MetricName builder #1373
Conversation
This drops the immutables implementation of the MetricName.builder in favor of a custom-tailored implementation using TagMap. Tag additions are copy-on-write, which would not scale well with large maps, however we tend to use small (N < 10) tag maps. ``` Benchmark Mode Cnt Score Error Units MetricNameBenchmark.benchmarkName avgt 5 142.270 ± 5.227 ns/op MetricNameBenchmark.benchmarkName:·gc.alloc.rate avgt 5 2000.444 ± 72.309 MB/sec MetricNameBenchmark.benchmarkName:·gc.alloc.rate.norm avgt 5 448.014 ± 0.007 B/op MetricNameBenchmark.benchmarkName:·gc.churn.G1_Eden_Space avgt 5 2003.666 ± 908.211 MB/sec MetricNameBenchmark.benchmarkName:·gc.churn.G1_Eden_Space.norm avgt 5 448.922 ± 209.311 B/op MetricNameBenchmark.benchmarkName:·gc.churn.G1_Survivor_Space avgt 5 0.005 ± 0.014 MB/sec MetricNameBenchmark.benchmarkName:·gc.churn.G1_Survivor_Space.norm avgt 5 0.001 ± 0.003 B/op MetricNameBenchmark.benchmarkName:·gc.count avgt 5 19.000 counts MetricNameBenchmark.benchmarkName:·gc.time avgt 5 21.000 ms ``` ``` Benchmark Mode Cnt Score Error Units MetricNameBenchmark.benchmarkName avgt 5 67.294 ± 2.825 ns/op MetricNameBenchmark.benchmarkName:·gc.alloc.rate avgt 5 1585.976 ± 66.417 MB/sec MetricNameBenchmark.benchmarkName:·gc.alloc.rate.norm avgt 5 168.007 ± 0.001 B/op MetricNameBenchmark.benchmarkName:·gc.churn.G1_Eden_Space avgt 5 1576.576 ± 0.744 MB/sec MetricNameBenchmark.benchmarkName:·gc.churn.G1_Eden_Space.norm avgt 5 167.027 ± 7.090 B/op MetricNameBenchmark.benchmarkName:·gc.churn.G1_Survivor_Space avgt 5 0.004 ± 0.008 MB/sec MetricNameBenchmark.benchmarkName:·gc.churn.G1_Survivor_Space.norm avgt 5 ≈ 10⁻³ B/op MetricNameBenchmark.benchmarkName:·gc.count avgt 5 20.000 counts MetricNameBenchmark.benchmarkName:·gc.time avgt 5 22.000 ms ```
Generate changelog in
|
TODO: I need to expand the benchmark to better understand the point at which the old implementation outperforms this one. |
Results from m1 macbook air on battery, so take them with a grain of salt. aarch64 jdk-17.0.1. A builder with 13 tag values resulted in ~10% more byes allocated per operation, but performed ~20% better. 7 tags per metric is more than we expect in most cases, but surprisingly showed lower allocation than the previous implementation with 30% greater throughput. After:
Before:
After (13 tags):
Before (13 tags):
|
tritium-registry/src/main/java/com/palantir/tritium/metrics/registry/ImmutableMetricName.java
Outdated
Show resolved
Hide resolved
tritium-registry/src/main/java/com/palantir/tritium/metrics/registry/ImmutableMetricName.java
Outdated
Show resolved
Hide resolved
tritium-registry/src/main/java/com/palantir/tritium/metrics/registry/RealMetricName.java
Show resolved
Hide resolved
tritium-registry/src/main/java/com/palantir/tritium/metrics/registry/RealMetricName.java
Show resolved
Hide resolved
@Nullable | ||
private String safeName; | ||
|
||
private TagMap tagMap = TagMap.EMPTY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if it is worth keeping mutable Map<String, String> tagMap = new HashMap<>();
in builder and doing the final copy in build()
.
In the common cases of say 4 tags we would have 4 expansions & copies with immutable TagMap, but just one 1 expansion & copy with mutable builder. Probably nets out about the same, but might test out in the morning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially tried this, although I only tested using the original three-tags case. Performance changed from ~142 ns/op down to ~138 ns/op, roughly equivalent to the original ImmutableSortedMap.naturalOrder()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good to go as is. I ran a quick test locally on JDK 11.0.13, OpenJDK 64-Bit Server VM, 11.0.13+8-LTS
with 16 core Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
.
Builder using TagMap:
Benchmark Mode Cnt Score Error Units
MetricNameBenchmark.benchmarkName13Tags avgt 5 1096.259 ± 52.889 ns/op
MetricNameBenchmark.benchmarkName13Tags:·gc.alloc.rate avgt 5 691.103 ± 32.862 MB/sec
MetricNameBenchmark.benchmarkName13Tags:·gc.alloc.rate.norm avgt 5 1192.000 ± 0.001 B/op
MetricNameBenchmark.benchmarkName13Tags:·gc.churn.G1_Eden_Space avgt 5 680.688 ± 650.573 MB/sec
MetricNameBenchmark.benchmarkName13Tags:·gc.churn.G1_Eden_Space.norm avgt 5 1171.395 ± 1086.287 B/op
MetricNameBenchmark.benchmarkName13Tags:·gc.churn.G1_Old_Gen avgt 5 0.009 ± 0.079 MB/sec
MetricNameBenchmark.benchmarkName13Tags:·gc.churn.G1_Old_Gen.norm avgt 5 0.016 ± 0.134 B/op
MetricNameBenchmark.benchmarkName13Tags:·gc.churn.G1_Survivor_Space avgt 5 0.800 ± 2.811 MB/sec
MetricNameBenchmark.benchmarkName13Tags:·gc.churn.G1_Survivor_Space.norm avgt 5 1.387 ± 4.878 B/op
MetricNameBenchmark.benchmarkName13Tags:·gc.count avgt 5 9.000 counts
MetricNameBenchmark.benchmarkName13Tags:·gc.time avgt 5 19.000 ms
MetricNameBenchmark.benchmarkName3Tags avgt 5 110.109 ± 4.138 ns/op
MetricNameBenchmark.benchmarkName3Tags:·gc.alloc.rate avgt 5 969.659 ± 35.967 MB/sec
MetricNameBenchmark.benchmarkName3Tags:·gc.alloc.rate.norm avgt 5 168.000 ± 0.001 B/op
MetricNameBenchmark.benchmarkName3Tags:·gc.churn.G1_Eden_Space avgt 5 909.846 ± 802.157 MB/sec
MetricNameBenchmark.benchmarkName3Tags:·gc.churn.G1_Eden_Space.norm avgt 5 157.790 ± 142.323 B/op
MetricNameBenchmark.benchmarkName3Tags:·gc.churn.G1_Old_Gen avgt 5 0.005 ± 0.031 MB/sec
MetricNameBenchmark.benchmarkName3Tags:·gc.churn.G1_Old_Gen.norm avgt 5 0.001 ± 0.006 B/op
MetricNameBenchmark.benchmarkName3Tags:·gc.churn.G1_Survivor_Space avgt 5 0.533 ± 4.590 MB/sec
MetricNameBenchmark.benchmarkName3Tags:·gc.churn.G1_Survivor_Space.norm avgt 5 0.094 ± 0.808 B/op
MetricNameBenchmark.benchmarkName3Tags:·gc.count avgt 5 12.000 counts
MetricNameBenchmark.benchmarkName3Tags:·gc.time avgt 5 15.000 ms
MetricNameBenchmark.benchmarkName7Tags avgt 5 341.880 ± 3.927 ns/op
MetricNameBenchmark.benchmarkName7Tags:·gc.alloc.rate avgt 5 877.358 ± 10.247 MB/sec
MetricNameBenchmark.benchmarkName7Tags:·gc.alloc.rate.norm avgt 5 472.000 ± 0.001 B/op
MetricNameBenchmark.benchmarkName7Tags:·gc.churn.G1_Eden_Space avgt 5 909.056 ± 795.569 MB/sec
MetricNameBenchmark.benchmarkName7Tags:·gc.churn.G1_Eden_Space.norm avgt 5 488.978 ± 426.224 B/op
MetricNameBenchmark.benchmarkName7Tags:·gc.churn.G1_Old_Gen avgt 5 0.006 ± 0.032 MB/sec
MetricNameBenchmark.benchmarkName7Tags:·gc.churn.G1_Old_Gen.norm avgt 5 0.003 ± 0.017 B/op
MetricNameBenchmark.benchmarkName7Tags:·gc.churn.G1_Survivor_Space avgt 5 0.800 ± 4.589 MB/sec
MetricNameBenchmark.benchmarkName7Tags:·gc.churn.G1_Survivor_Space.norm avgt 5 0.431 ± 2.476 B/op
MetricNameBenchmark.benchmarkName7Tags:·gc.count avgt 5 12.000 counts
MetricNameBenchmark.benchmarkName7Tags:·gc.time avgt 5 19.000 ms
Builder using HashMap:
Benchmark Mode Cnt Score Error Units
MetricNameBenchmark.benchmarkName13Tags avgt 5 1102.731 ± 55.355 ns/op
MetricNameBenchmark.benchmarkName13Tags:·gc.alloc.rate avgt 5 562.526 ± 28.167 MB/sec
MetricNameBenchmark.benchmarkName13Tags:·gc.alloc.rate.norm avgt 5 976.000 ± 0.001 B/op
MetricNameBenchmark.benchmarkName13Tags:·gc.churn.G1_Eden_Space avgt 5 603.632 ± 796.288 MB/sec
MetricNameBenchmark.benchmarkName13Tags:·gc.churn.G1_Eden_Space.norm avgt 5 1048.448 ± 1394.980 B/op
MetricNameBenchmark.benchmarkName13Tags:·gc.churn.G1_Survivor_Space avgt 5 0.267 ± 2.295 MB/sec
MetricNameBenchmark.benchmarkName13Tags:·gc.churn.G1_Survivor_Space.norm avgt 5 0.470 ± 4.048 B/op
MetricNameBenchmark.benchmarkName13Tags:·gc.count avgt 5 8.000 counts
MetricNameBenchmark.benchmarkName13Tags:·gc.time avgt 5 21.000 ms
MetricNameBenchmark.benchmarkName3Tags avgt 5 225.453 ± 8.105 ns/op
MetricNameBenchmark.benchmarkName3Tags:·gc.alloc.rate avgt 5 1105.089 ± 39.260 MB/sec
MetricNameBenchmark.benchmarkName3Tags:·gc.alloc.rate.norm avgt 5 392.000 ± 0.001 B/op
MetricNameBenchmark.benchmarkName3Tags:·gc.churn.G1_Eden_Space avgt 5 1138.591 ± 7.194 MB/sec
MetricNameBenchmark.benchmarkName3Tags:·gc.churn.G1_Eden_Space.norm avgt 5 403.913 ± 15.075 B/op
MetricNameBenchmark.benchmarkName3Tags:·gc.churn.G1_Old_Gen avgt 5 0.013 ± 0.067 MB/sec
MetricNameBenchmark.benchmarkName3Tags:·gc.churn.G1_Old_Gen.norm avgt 5 0.004 ± 0.024 B/op
MetricNameBenchmark.benchmarkName3Tags:·gc.churn.G1_Survivor_Space avgt 5 0.533 ± 4.589 MB/sec
MetricNameBenchmark.benchmarkName3Tags:·gc.churn.G1_Survivor_Space.norm avgt 5 0.189 ± 1.624 B/op
MetricNameBenchmark.benchmarkName3Tags:·gc.count avgt 5 15.000 counts
MetricNameBenchmark.benchmarkName3Tags:·gc.time avgt 5 15.000 ms
MetricNameBenchmark.benchmarkName7Tags avgt 5 474.985 ± 20.967 ns/op
MetricNameBenchmark.benchmarkName7Tags:·gc.alloc.rate avgt 5 760.064 ± 33.249 MB/sec
MetricNameBenchmark.benchmarkName7Tags:·gc.alloc.rate.norm avgt 5 568.000 ± 0.001 B/op
MetricNameBenchmark.benchmarkName7Tags:·gc.churn.G1_Eden_Space avgt 5 756.664 ± 10.632 MB/sec
MetricNameBenchmark.benchmarkName7Tags:·gc.churn.G1_Eden_Space.norm avgt 5 565.509 ± 22.428 B/op
MetricNameBenchmark.benchmarkName7Tags:·gc.churn.G1_Old_Gen avgt 5 0.009 ± 0.059 MB/sec
MetricNameBenchmark.benchmarkName7Tags:·gc.churn.G1_Old_Gen.norm avgt 5 0.006 ± 0.044 B/op
MetricNameBenchmark.benchmarkName7Tags:·gc.churn.G1_Survivor_Space avgt 5 0.800 ± 4.589 MB/sec
MetricNameBenchmark.benchmarkName7Tags:·gc.churn.G1_Survivor_Space.norm avgt 5 0.603 ± 3.452 B/op
MetricNameBenchmark.benchmarkName7Tags:·gc.count avgt 5 10.000 counts
MetricNameBenchmark.benchmarkName7Tags:·gc.time avgt 5 19.000 ms
I've rerun benchmarks on an x86_64 linux system (xeon W-2175) with azul zulu 17.0.2 The copy-on-write approach does more allocation in nontrivial cases, but creation throughput is better despite this. After:
After, using a HashMap in the builderAs described in #1373 (comment)
Before:
|
Released 0.42.0 |
This drops the immutables implementation of the MetricName.builder
in favor of a custom-tailored implementation using TagMap.
Tag additions are copy-on-write, which would not scale well
with large maps, however we tend to use small (N < 10) tag
maps.
After this PR
==COMMIT_MSG==
Optimize MetricName builder for small tag maps
==COMMIT_MSG==