Additional performance improvements for cluster mode#680
Conversation
|
Dropping hashCode rework commit - I have a better option in mind. |
|
Looks good. Can you rebase please? Sorry for the hassle, you're right that the changelog requirement makes this a pain. |
The receiver of this data expects metrics[] and we are naming the data as if we sent the registries instead of their metrics.
As $645 notes, the overhead of calculating real labels is substantial in cluster mode. single letter labels with single letter values don't reflect real world behavior.
Also rename the 'current' tests to align the output in a way one can actually read while the screen is scrolling. This is rescued from an attempt to support running cluster benchmarks. After substantial effort the code ran but the results were inaccurate.
I would have preferred end-to-end tests for the cluster code but that would require some substantial PRs to benchmark-regressions.
This function has been identified as being part of a hot path by prometheus#628
I'd be happy to replace it with a maintained lib 😀 |
|
Turns out I was wrong about benchmark.js allowing async setup and teardown. That’s all benchmark-regression and why benchmark.js’s lifecycle events don’t fire in time to say conditionally start background processes. I did file a PR for this. But benchmark.js using eval() to run benchmark code made that PR ten times harder than it needed to be. |
Most of these improvements are on the worker side of the connection, but when combined with #679 should provide some
relief for people using this mode.
This PR includes:
reworking hashObject() to optimize for the common case, which is metrics with labelsNo breaking changes.
This should partially address the concerns brought up in #628, and partly fix #676
Code coverage improvement in the benchmarks:
~~There's a lot of jitter in these tests, but the common theme is that the benchmarks with no labels show a tiny regression versus the label paths showing 10+% improvements.
Given that the no-label path was previously almost 500x faster, that's a reasonable tradeoff. If prom-client is not fast enough for anyone, it will be someone using a lot of labels.~~
Edit: most of the hashObject changes were dropped from this PR at the last moment, once I realized a much better option was possible. It still might be worth resurrecting these for the metrics aggregation logic, which still uses this code.
#687 introduces a new class which consolidates all of this logic and replaces the key lookup with a much more compact value.
⚠ counter ➭ inc is 0.06717% acceptably slower.
✓ counter ➭ inc with labels is 25.43% faster.
✓ gauge ➭ inc is 2.639% faster.
✓ gauge ➭ inc with labels is 21.03% faster.
✓ histogram ➭ observe#1 with 64 is 16.87% faster.
✓ histogram ➭ observe#2 with 8 is 10.97% faster.
⚠ histogram ➭ observe#2 with 4 and 2 with 2 is 1.345% acceptably slower.
⚠ histogram ➭ observe#2 with 2 and 2 with 4 is 3.383% acceptably slower.
⚠ histogram ➭ observe#6 with 2 is 3.862% acceptably slower.
✓ registry ➭ getMetricsAsJSON#1 with 64 is 4.661% faster.
✓ registry ➭ getMetricsAsJSON#2 with 8 is 5.511% faster.
✓ registry ➭ getMetricsAsJSON#2 with 4 and 2 with 2 is 5.085% faster.
✓ registry ➭ getMetricsAsJSON#2 with 2 and 2 with 4 is 5.989% faster.
✓ registry ➭ getMetricsAsJSON#6 with 2 is 4.661% faster.
✓ registry ➭ metrics#1 with 64 is 7.167% faster.
✓ registry ➭ metrics#1 with 64 and openMetrics is 4.562% faster.
✓ registry ➭ metrics#2 with 8 is 3.493% faster.
✓ registry ➭ metrics#2 with 8 and openMetrics is 4.773% faster.
✓ registry ➭ metrics#2 with 4 and 2 with 2 is 5.141% faster.
✓ registry ➭ metrics#2 with 4 and 2 with 2 and openMetrics is 2.863% faster.
✓ registry ➭ metrics#2 with 2 and 2 with 4 is 4.577% faster.
✓ registry ➭ metrics#2 with 2 and 2 with 4 and openMetrics is 3.778% faster.
✓ registry ➭ metrics#6 with 2 is 2.311% faster.
✓ registry ➭ metrics#6 with 2 and openMetrics is 1.362% faster.
✓ summary ➭ observe#1 with 64 is 18.31% faster.
✓ summary ➭ observe#2 with 8 is 5.565% faster.
✓ summary ➭ observe#2 with 4 and 2 with 2 is 0.6756% faster.
⚠ summary ➭ observe#2 with 2 and 2 with 4 is 5.010% acceptably slower.
⚠ summary ➭ observe#6 with 2 is 4.731% acceptably slower.
✓ cluster ➭ aggregate() is 1.476% faster.
Author notes:
Originally this PR was meant to include full e2e benchmarks of the cluster code, however the two benchmark frameworks being used are not particularly cooperative in this respect. I spent more time trying to get those to work than I did on the rest of the work combined. In the end I had code that ran properly, however the variability in the run time was so great (due to coordinating multiple runs of the benchmark code across multiple processes), that none of the output was actionable. That work inspired the addition of the debug module, and I opted to retain that code so that some value could be derived from that goose chase.
I believe this could be fixed with a single PR to benchmark-regressions to have it delegate setup() to benchmark.js, allowing the coordination overhead to occur between the measurement windows. However that project has been dormant for a very, very long time and I do not have any expectation of a PR being landed if offered.