Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of MeterAdapter #443

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

exyi
Copy link

@exyi exyi commented Oct 20, 2023

(sorry for an empty email message, I accidentally pressed enter in the title)

The MeterAdapter is quite slow compared to using prometheus-net directly, I noticed that especially in a multi threaded scenario - it isn't that hard to hammer one metric in parallel on a many core machine. This PR adds benchmarks to demonstrate the improvement, and improves performance both in the single-threaded and the multi-threaded case.

Main changes are

  • _options.ResolveHistogramBuckets is not called on each observation. It is not cached together with the rest of metric initialization code.
  • The locks are removed from LifetimeManager

Other details should be described in the commit messages or comments.

Before

Method MeasurementCount ThreadCount TargetMetricType Mean Error StdDev Gen0 Completed Work Items Lock Contentions Gen1 Allocated
MeasurementPerformance 200000 1 CounterInt 101.2 ms 0.40 ms 0.80 ms 9000.0000 - - - 158.69 MB
MeasurementPerformance 200000 1 CounterFloat 102.8 ms 0.69 ms 1.35 ms 3000.0000 - - - 158.69 MB
MeasurementPerformance 200000 1 HistogramInt 154.2 ms 1.36 ms 2.71 ms 4000.0000 - - - 196.84 MB
MeasurementPerformance 200000 1 HistogramFloat 151.8 ms 1.15 ms 2.27 ms 4000.0000 - - - 196.84 MB
MeasurementPerformance 200000 16 CounterInt 1,438.3 ms 39.26 ms 165.81 ms 159000.0000 - 89825.0000 1000.0000 2539.07 MB
MeasurementPerformance 200000 16 CounterFloat 1,474.8 ms 29.78 ms 124.11 ms 53000.0000 - 56599.0000 - 2539.07 MB
MeasurementPerformance 200000 16 HistogramInt 1,769.7 ms 35.19 ms 115.24 ms 66000.0000 - 74652.0000 - 3149.42 MB
MeasurementPerformance 200000 16 HistogramFloat 1,725.7 ms 33.78 ms 112.99 ms 66000.0000 - 62252.0000 - 3149.42 MB

After

Method MeasurementCount ThreadCount TargetMetricType Mean Error StdDev Completed Work Items Lock Contentions Gen0 Allocated
MeasurementPerformance 200000 1 CounterInt 37.89 ms 0.041 ms 0.080 ms - - - 27.47 MB
MeasurementPerformance 200000 1 CounterFloat 38.83 ms 0.267 ms 0.520 ms - - 1000.0000 27.47 MB
MeasurementPerformance 200000 1 HistogramInt 48.21 ms 0.178 ms 0.347 ms - - 2000.0000 33.57 MB
MeasurementPerformance 200000 1 HistogramFloat 47.69 ms 0.125 ms 0.246 ms - - - 33.57 MB
MeasurementPerformance 200000 16 CounterInt 881.51 ms 29.158 ms 123.458 ms - 13.0000 27000.0000 439.46 MB
MeasurementPerformance 200000 16 CounterFloat 906.75 ms 27.166 ms 115.024 ms - 10.0000 9000.0000 439.46 MB
MeasurementPerformance 200000 16 HistogramInt 928.74 ms 30.498 ms 129.131 ms - 16.0000 33000.0000 537.11 MB
MeasurementPerformance 200000 16 HistogramFloat 851.88 ms 30.896 ms 130.815 ms - 15.0000 33000.0000 537.11 MB

@exyi exyi changed the title Perf meteradapter Improve performance of MeterAdapter Oct 20, 2023
@exyi
Copy link
Author

exyi commented Oct 20, 2023

The main remaining bottleneck now is the RWLock wrapping the LifetimeManager, in a profile 88% time is spent entering and exiting the lock. It's not that trivial to remove it, but I have an idea how to make this also lockfree (and also do away with the race which now may occur when deleting and incrementing metric at the same time). Would you be interested in that?

@sandersaares
Copy link
Member

Thanks! There have also been some other optimization-related changes that cause some code conflicts here but I will try manually merge some of the good ideas from this PR.

One thing that has already been done is reducing the lifetime management from metric instance to metric template level, although I am wondering if it might be feasible to kick it up even one more level to the metric factory to reduce the overhead further.

Single-threaded Counter benchmark goes from 41.2 MB to 27.47 MB allocated.
Runtime is only about 4% faster, which might as well be noise
I added a cache (Instrument, tags) -> prometheus metric, so all
the label processing is done only once per label combination.
More importantly, the user-specified options.ResolveHistogramBuckets
results are now cached.

The improvement to single-threaded performance is about 2x
with about 1/6 of allocated memory:
Counter | 116.1 ms + 158.69 MB -> 52.38 ms + 27.47 MB
Histogram | 179.3 ms + 196.84 MB ->  59.40 ms + 33.57 MB

Multithreaded performance is unfortunatelly unafected (maybe bit worse),
since the limiting factor are the locks in ManagedLifetimeMetricHandle class
We only need to be able to determine if a lease is active
or there was any activity in the last timer period.
That is now done 2 atomic counters - a count of leased
leases and a count returned leases.

The timer still has a potential race if a new lease occurs
right after the timer checks it didn't occur and attempts to
delete it. This was there even with the locks and this doesn't
make any worse (AFAIK)

Performance of single-threaded benchmark is unaffected,
the 16 thread test sees 1.5-3x improvement (it is noisy :/)

Counter (4 runs) | 1339ms...1971ms -> 881ms...939ms
Histogram (4 runs) | 2021ms...2158ms -> 771ms...928ms

The improvement is this small only because there is also
a RWLock which not that trivial to remove (and no,
.NET RWLock is not lockfree with purely read workload)
@exyi
Copy link
Author

exyi commented Nov 29, 2023

Great! Moving the the lifetime timer higher up is definitely good news. At least for me it doesn't really matter if it's global or per-template metric. I don't have thousands of meters, just many different label values.

I rebased it on the current master branch, there was only a minor conflict. Do you have these changes in a private branch?

@sandersaares
Copy link
Member

Yeah, I am currently working in the "optimizing" branch

@sandersaares
Copy link
Member

sandersaares commented Nov 30, 2023

| Method         | MeasurementCount | Mean     | Error    | StdDev   | Allocated |
|--------------- |----------------- |---------:|---------:|---------:|----------:|
| CounterInt     | 100000           | 31.15 ms | 0.336 ms | 0.314 ms |     106 B |
| CounterFloat   | 100000           | 31.03 ms | 0.283 ms | 0.265 ms |     106 B |
| HistogramInt   | 100000           | 32.06 ms | 0.169 ms | 0.158 ms |     106 B |
| HistogramFloat | 100000           | 31.52 ms | 0.296 ms | 0.277 ms |      53 B |

In optimizing branch the story is starting to look decent now. Thanks for peeking under the covers here, you motivated some good improvement both with your PR and with other ideas that came from this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants