Additional performance improvements for cluster mode by jdmarshall · Pull Request #680 · prometheus/client_js

jdmarshall · 2025-07-05T15:23:30Z

Most of these improvements are on the worker side of the connection, but when combined with #679 should provide some
relief for people using this mode.

This PR includes:

substantial additional code coverage in the benchmarks via addition of more scenarios
a workaround for benchmark.js not allowing async test bodies (despite supporting async setup and teardown?)
removal of truthy conditionals in hot paths
removal of double lookups during map insertions
~~reworking hashObject() to optimize for the common case, which is metrics with labels~~

No breaking changes.

This should partially address the concerns brought up in #628, and partly fix #676

Code coverage improvement in the benchmarks:

File                             | % Stmts | % Branch | % Funcs | % Lines |
---------------------------------|---------|----------|---------|---------|
All files                        |   46.54 |    22.47 |   42.65 |    47.3 | 
All files                        |    51.3 |    27.01 |   48.78 |   52.04 |

~~There's a lot of jitter in these tests, but the common theme is that the benchmarks with no labels show a tiny regression versus the label paths showing 10+% improvements.

Given that the no-label path was previously almost 500x faster, that's a reasonable tradeoff. If prom-client is not fast enough for anyone, it will be someone using a lot of labels.~~

Edit: most of the hashObject changes were dropped from this PR at the last moment, once I realized a much better option was possible. It still might be worth resurrecting these for the metrics aggregation logic, which still uses this code.

#687 introduces a new class which consolidates all of this logic and replaces the key lookup with a much more compact value.

⚠ counter ➭ inc is 0.06717% acceptably slower.
✓ counter ➭ inc with labels is 25.43% faster.
✓ gauge ➭ inc is 2.639% faster.
✓ gauge ➭ inc with labels is 21.03% faster.
✓ histogram ➭ observe#1 with 64 is 16.87% faster.
✓ histogram ➭ observe#2 with 8 is 10.97% faster.
⚠ histogram ➭ observe#2 with 4 and 2 with 2 is 1.345% acceptably slower.
⚠ histogram ➭ observe#2 with 2 and 2 with 4 is 3.383% acceptably slower.
⚠ histogram ➭ observe#6 with 2 is 3.862% acceptably slower.
✓ registry ➭ getMetricsAsJSON#1 with 64 is 4.661% faster.
✓ registry ➭ getMetricsAsJSON#2 with 8 is 5.511% faster.
✓ registry ➭ getMetricsAsJSON#2 with 4 and 2 with 2 is 5.085% faster.
✓ registry ➭ getMetricsAsJSON#2 with 2 and 2 with 4 is 5.989% faster.
✓ registry ➭ getMetricsAsJSON#6 with 2 is 4.661% faster.
✓ registry ➭ metrics#1 with 64 is 7.167% faster.
✓ registry ➭ metrics#1 with 64 and openMetrics is 4.562% faster.
✓ registry ➭ metrics#2 with 8 is 3.493% faster.
✓ registry ➭ metrics#2 with 8 and openMetrics is 4.773% faster.
✓ registry ➭ metrics#2 with 4 and 2 with 2 is 5.141% faster.
✓ registry ➭ metrics#2 with 4 and 2 with 2 and openMetrics is 2.863% faster.
✓ registry ➭ metrics#2 with 2 and 2 with 4 is 4.577% faster.
✓ registry ➭ metrics#2 with 2 and 2 with 4 and openMetrics is 3.778% faster.
✓ registry ➭ metrics#6 with 2 is 2.311% faster.
✓ registry ➭ metrics#6 with 2 and openMetrics is 1.362% faster.
✓ summary ➭ observe#1 with 64 is 18.31% faster.
✓ summary ➭ observe#2 with 8 is 5.565% faster.
✓ summary ➭ observe#2 with 4 and 2 with 2 is 0.6756% faster.
⚠ summary ➭ observe#2 with 2 and 2 with 4 is 5.010% acceptably slower.
⚠ summary ➭ observe#6 with 2 is 4.731% acceptably slower.
✓ cluster ➭ aggregate() is 1.476% faster.

Author notes:

Originally this PR was meant to include full e2e benchmarks of the cluster code, however the two benchmark frameworks being used are not particularly cooperative in this respect. I spent more time trying to get those to work than I did on the rest of the work combined. In the end I had code that ran properly, however the variability in the run time was so great (due to coordinating multiple runs of the benchmark code across multiple processes), that none of the output was actionable. That work inspired the addition of the debug module, and I opted to retain that code so that some value could be derived from that goose chase.

I believe this could be fixed with a single PR to benchmark-regressions to have it delegate setup() to benchmark.js, allowing the coordination overhead to occur between the measurement windows. However that project has been dormant for a very, very long time and I do not have any expectation of a PR being landed if offered.

jdmarshall · 2025-07-07T14:26:00Z

Dropping hashCode rework commit - I have a better option in mind.

zbjornson · 2025-07-08T03:49:19Z

Looks good.

Can you rebase please? Sorry for the hassle, you're right that the changelog requirement makes this a pain.

The receiver of this data expects metrics[] and we are naming the data as if we sent the registries instead of their metrics.

As $645 notes, the overhead of calculating real labels is substantial in cluster mode. single letter labels with single letter values don't reflect real world behavior.

Also rename the 'current' tests to align the output in a way one can actually read while the screen is scrolling. This is rescued from an attempt to support running cluster benchmarks. After substantial effort the code ran but the results were inaccurate.

I would have preferred end-to-end tests for the cluster code but that would require some substantial PRs to benchmark-regressions.

This function has been identified as being part of a hot path by prometheus#628

SimenB · 2025-07-08T06:14:55Z

Author notes:

Originally this PR was meant to include full e2e benchmarks of the cluster code, however the two benchmark frameworks being used are not particularly cooperative in this respect. I spent more time trying to get those to work than I did on the rest of the work combined. In the end I had code that ran properly, however the variability in the run time was so great (due to coordinating multiple runs of the benchmark code across multiple processes), that none of the output was actionable. That work inspired the addition of the debug module, and I opted to retain that code so that some value could be derived from that goose chase.

I believe this could be fixed with a single PR to benchmark-regressions to have it delegate setup() to benchmark.js, allowing the coordination overhead to occur between the measurement windows. However that project has been dormant for a very, very long time and I do not have any expectation of a PR being landed if offered.

I'd be happy to replace it with a maintained lib 😀

zbjornson

👍

jdmarshall · 2025-07-08T15:42:37Z

Turns out I was wrong about benchmark.js allowing async setup and teardown. That’s all benchmark-regression and why benchmark.js’s lifecycle events don’t fire in time to say conditionally start background processes.

I did file a PR for this. But benchmark.js using eval() to run benchmark code made that PR ten times harder than it needed to be.

jdmarshall mentioned this pull request Jul 5, 2025

Poor code coverage in the benchmarks #676

Closed

jdmarshall force-pushed the perf/cluster branch from 0169190 to 025d977 Compare July 6, 2025 18:33

This was referenced Jul 7, 2025

Improvements to defaultLabels in metrics, and matching benchmarks #683

Closed

Support for asynchronous functions, and skipping individual runs nowells/benchmark-regression#3

Open

Prom-client high memory usage #611

Open

jdmarshall force-pushed the perf/cluster branch from 025d977 to 4a023c3 Compare July 7, 2025 14:25

jdmarshall mentioned this pull request Jul 8, 2025

Performance: New storage engine for stats, with improved key management #687

Merged

jdmarshall added 10 commits July 7, 2025 21:11

Remove imperialism.

e5fe848

Fixed misleading variable name.

af196f9

The receiver of this data expects metrics[] and we are naming the data as if we sent the registries instead of their metrics.

Beef up getLabelCombinations to allow for more complex labels.

b07383e

As $645 notes, the overhead of calculating real labels is substantial in cluster mode. single letter labels with single letter values don't reflect real world behavior.

Adding tests for AggregatorRegistry

f2a9199

I would have preferred end-to-end tests for the cluster code but that would require some substantial PRs to benchmark-regressions.

Make the counter tests a little beefier.

b3caf75

Add more work to cluster benchmarks.

5209edc

Make sure prometheus and openMetrics paths have benchmarks.

34a9c02

if Map.has(), Map.get() is an extra lookup.

2ae6a64

This function has been identified as being part of a hot path by prometheus#628

Remove some additional truthy checks.

ca6330e

jdmarshall force-pushed the perf/cluster branch from 4a023c3 to ca6330e Compare July 8, 2025 04:12

zbjornson approved these changes Jul 8, 2025

View reviewed changes

zbjornson merged commit e913c11 into prometheus:master Jul 8, 2025
12 checks passed

jdmarshall mentioned this pull request Jul 13, 2025

Add integration with rpm package debug #649

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Additional performance improvements for cluster mode#680

Additional performance improvements for cluster mode#680
zbjornson merged 10 commits into
prometheus:masterfrom
jdmarshall:perf/cluster

jdmarshall commented Jul 5, 2025 •

edited

Loading

Uh oh!

jdmarshall commented Jul 7, 2025

Uh oh!

zbjornson commented Jul 8, 2025

Uh oh!

SimenB commented Jul 8, 2025

Uh oh!

zbjornson left a comment

Uh oh!

Uh oh!

jdmarshall commented Jul 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jdmarshall commented Jul 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdmarshall commented Jul 7, 2025

Uh oh!

zbjornson commented Jul 8, 2025

Uh oh!

SimenB commented Jul 8, 2025

Uh oh!

zbjornson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jdmarshall commented Jul 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jdmarshall commented Jul 5, 2025 •

edited

Loading