Improve performance of MeterAdapter #443

exyi · 2023-10-20T20:41:01Z

(sorry for an empty email message, I accidentally pressed enter in the title)

The MeterAdapter is quite slow compared to using prometheus-net directly, I noticed that especially in a multi threaded scenario - it isn't that hard to hammer one metric in parallel on a many core machine. This PR adds benchmarks to demonstrate the improvement, and improves performance both in the single-threaded and the multi-threaded case.

Main changes are

_options.ResolveHistogramBuckets is not called on each observation. It is not cached together with the rest of metric initialization code.
The locks are removed from LifetimeManager

Other details should be described in the commit messages or comments.

Before

Method	MeasurementCount	ThreadCount	TargetMetricType	Mean	Error	StdDev	Gen0	Completed Work Items	Lock Contentions	Gen1	Allocated
MeasurementPerformance	200000	1	CounterInt	101.2 ms	0.40 ms	0.80 ms	9000.0000	-	-	-	158.69 MB
MeasurementPerformance	200000	1	CounterFloat	102.8 ms	0.69 ms	1.35 ms	3000.0000	-	-	-	158.69 MB
MeasurementPerformance	200000	1	HistogramInt	154.2 ms	1.36 ms	2.71 ms	4000.0000	-	-	-	196.84 MB
MeasurementPerformance	200000	1	HistogramFloat	151.8 ms	1.15 ms	2.27 ms	4000.0000	-	-	-	196.84 MB
MeasurementPerformance	200000	16	CounterInt	1,438.3 ms	39.26 ms	165.81 ms	159000.0000	-	89825.0000	1000.0000	2539.07 MB
MeasurementPerformance	200000	16	CounterFloat	1,474.8 ms	29.78 ms	124.11 ms	53000.0000	-	56599.0000	-	2539.07 MB
MeasurementPerformance	200000	16	HistogramInt	1,769.7 ms	35.19 ms	115.24 ms	66000.0000	-	74652.0000	-	3149.42 MB
MeasurementPerformance	200000	16	HistogramFloat	1,725.7 ms	33.78 ms	112.99 ms	66000.0000	-	62252.0000	-	3149.42 MB

After

Method	MeasurementCount	ThreadCount	TargetMetricType	Mean	Error	StdDev	Completed Work Items	Lock Contentions	Gen0	Allocated
MeasurementPerformance	200000	1	CounterInt	37.89 ms	0.041 ms	0.080 ms	-	-	-	27.47 MB
MeasurementPerformance	200000	1	CounterFloat	38.83 ms	0.267 ms	0.520 ms	-	-	1000.0000	27.47 MB
MeasurementPerformance	200000	1	HistogramInt	48.21 ms	0.178 ms	0.347 ms	-	-	2000.0000	33.57 MB
MeasurementPerformance	200000	1	HistogramFloat	47.69 ms	0.125 ms	0.246 ms	-	-	-	33.57 MB
MeasurementPerformance	200000	16	CounterInt	881.51 ms	29.158 ms	123.458 ms	-	13.0000	27000.0000	439.46 MB
MeasurementPerformance	200000	16	CounterFloat	906.75 ms	27.166 ms	115.024 ms	-	10.0000	9000.0000	439.46 MB
MeasurementPerformance	200000	16	HistogramInt	928.74 ms	30.498 ms	129.131 ms	-	16.0000	33000.0000	537.11 MB
MeasurementPerformance	200000	16	HistogramFloat	851.88 ms	30.896 ms	130.815 ms	-	15.0000	33000.0000	537.11 MB

exyi · 2023-10-20T21:03:47Z

The main remaining bottleneck now is the RWLock wrapping the LifetimeManager, in a profile 88% time is spent entering and exiting the lock. It's not that trivial to remove it, but I have an idea how to make this also lockfree (and also do away with the race which now may occur when deleting and incrementing metric at the same time). Would you be interested in that?

sandersaares · 2023-11-29T12:56:51Z

Thanks! There have also been some other optimization-related changes that cause some code conflicts here but I will try manually merge some of the good ideas from this PR.

One thing that has already been done is reducing the lifetime management from metric instance to metric template level, although I am wondering if it might be feasible to kick it up even one more level to the metric factory to reduce the overhead further.

Single-threaded Counter benchmark goes from 41.2 MB to 27.47 MB allocated. Runtime is only about 4% faster, which might as well be noise

I added a cache (Instrument, tags) -> prometheus metric, so all the label processing is done only once per label combination. More importantly, the user-specified options.ResolveHistogramBuckets results are now cached. The improvement to single-threaded performance is about 2x with about 1/6 of allocated memory: Counter | 116.1 ms + 158.69 MB -> 52.38 ms + 27.47 MB Histogram | 179.3 ms + 196.84 MB -> 59.40 ms + 33.57 MB Multithreaded performance is unfortunatelly unafected (maybe bit worse), since the limiting factor are the locks in ManagedLifetimeMetricHandle class

We only need to be able to determine if a lease is active or there was any activity in the last timer period. That is now done 2 atomic counters - a count of leased leases and a count returned leases. The timer still has a potential race if a new lease occurs right after the timer checks it didn't occur and attempts to delete it. This was there even with the locks and this doesn't make any worse (AFAIK) Performance of single-threaded benchmark is unaffected, the 16 thread test sees 1.5-3x improvement (it is noisy :/) Counter (4 runs) | 1339ms...1971ms -> 881ms...939ms Histogram (4 runs) | 2021ms...2158ms -> 771ms...928ms The improvement is this small only because there is also a RWLock which not that trivial to remove (and no, .NET RWLock is not lockfree with purely read workload)

exyi · 2023-11-29T14:34:41Z

Great! Moving the the lifetime timer higher up is definitely good news. At least for me it doesn't really matter if it's global or per-template metric. I don't have thousands of meters, just many different label values.

I rebased it on the current master branch, there was only a minor conflict. Do you have these changes in a private branch?

sandersaares · 2023-11-30T07:52:33Z

Yeah, I am currently working in the "optimizing" branch

sandersaares · 2023-11-30T21:41:17Z

| Method         | MeasurementCount | Mean     | Error    | StdDev   | Allocated |
|--------------- |----------------- |---------:|---------:|---------:|----------:|
| CounterInt     | 100000           | 31.15 ms | 0.336 ms | 0.314 ms |     106 B |
| CounterFloat   | 100000           | 31.03 ms | 0.283 ms | 0.265 ms |     106 B |
| HistogramInt   | 100000           | 32.06 ms | 0.169 ms | 0.158 ms |     106 B |
| HistogramFloat | 100000           | 31.52 ms | 0.296 ms | 0.277 ms |      53 B |

In optimizing branch the story is starting to look decent now. Thanks for peeking under the covers here, you motivated some good improvement both with your PR and with other ideas that came from this!

exyi changed the title ~~Perf meteradapter~~ Improve performance of MeterAdapter Oct 20, 2023

exyi force-pushed the perf-meteradapter branch from 2acff56 to ba895bf Compare October 20, 2023 21:08

exyi added 2 commits November 29, 2023 15:04

Add MeterAdapter benchmarks and unit tests

ed3639a

Avoid allocations in LabelSequence by implementing IEquatable

aecb5c8

Single-threaded Counter benchmark goes from 41.2 MB to 27.47 MB allocated. Runtime is only about 4% faster, which might as well be noise

exyi force-pushed the perf-meteradapter branch from ba895bf to a30ca40 Compare November 29, 2023 14:04

exyi added 2 commits November 29, 2023 15:07

exyi force-pushed the perf-meteradapter branch from a30ca40 to 4e86977 Compare November 29, 2023 14:21

sandersaares added a commit that referenced this pull request Nov 30, 2023

Incorporate MetricAdapterBenchmarks and MetricAdapterTests from #443

5cfa219

sandersaares added a commit that referenced this pull request Nov 30, 2023

MeterAdapter improvements based on ideas from #443

9f9bbf2

sandersaares added a commit that referenced this pull request Dec 4, 2023

Incorporate MetricAdapterBenchmarks and MetricAdapterTests from #443

1cd2a44

sandersaares added a commit that referenced this pull request Dec 4, 2023

MeterAdapter improvements based on ideas from #443

a6a737a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of MeterAdapter #443

Improve performance of MeterAdapter #443

exyi commented Oct 20, 2023 •

edited

exyi commented Oct 20, 2023

sandersaares commented Nov 29, 2023

exyi commented Nov 29, 2023

sandersaares commented Nov 30, 2023

sandersaares commented Nov 30, 2023 •

edited

Improve performance of MeterAdapter #443

Are you sure you want to change the base?

Improve performance of MeterAdapter #443

Conversation

exyi commented Oct 20, 2023 • edited

Before

After

exyi commented Oct 20, 2023

sandersaares commented Nov 29, 2023

exyi commented Nov 29, 2023

sandersaares commented Nov 30, 2023

sandersaares commented Nov 30, 2023 • edited

exyi commented Oct 20, 2023 •

edited

sandersaares commented Nov 30, 2023 •

edited