Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve perf in discovery jobs metrics to data lookup #1092

Merged

Conversation

thepalbi
Copy link
Contributor

@thepalbi thepalbi commented Aug 3, 2023

This PR addessed some performance issues we've been seeing with running YACE discovery with large enough jobs. We noticed there was a lot of cpu % usage around findGetMetricDataByID. This lead us to believe this is a hot path, and as well, performs lots of side-allocations.

This PR re-implements that logic with a map to lookup the corresponding metric data, rather than looping over. Also, a benchmark is added for further development.

The benchstat below shows an improvement of around ~90% in both cpu and memory.

benchstat

goos: darwin
goarch: arm64
pkg: github.com/nerdswords/yet-another-cloudwatch-exporter/pkg/job
                                      │    main.p     │                pr.p                │
                                      │    sec/op     │   sec/op     vs base               │
MapResultsToMetricDatas/small_case-10    271.13µ ± 3%   10.35µ ± 1%  -96.18% (p=0.002 n=6)
MapResultsToMetricDatas/big_case-10     28916.8µ ± 4%   927.1µ ± 6%  -96.79% (p=0.002 n=6)
geomean                                   2.800m        97.96µ       -96.50%

                                      │     main.p      │                pr.p                 │
                                      │      B/op       │     B/op      vs base               │
MapResultsToMetricDatas/small_case-10     364367.5 ± 0%     444.0 ± 0%  -99.88% (p=0.002 n=6)
MapResultsToMetricDatas/big_case-10     36133.26Ki ± 0%   15.32Ki ± 2%  -99.96% (p=0.002 n=6)
geomean                                    3.502Mi        2.577Ki       -99.93%

                                      │    main.p     │               pr.p                │
                                      │   allocs/op   │ allocs/op   vs base               │
MapResultsToMetricDatas/small_case-10   7921.000 ± 0%   2.000 ± 0%  -99.97% (p=0.002 n=6)
MapResultsToMetricDatas/big_case-10     809687.5 ± 0%   356.0 ± 3%  -99.96% (p=0.002 n=6)
geomean                                   80.08k        26.68       -99.97

main

goos: darwin
goarch: arm64
pkg: github.com/nerdswords/yet-another-cloudwatch-exporter/pkg/job
BenchmarkMapResultsToMetricDatas/small_case-10         	    4726	    269036 ns/op	  364367 B/op	    7921 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	    4749	    276557 ns/op	  364366 B/op	    7921 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	    4291	    280005 ns/op	  364369 B/op	    7921 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	    4486	    273217 ns/op	  364368 B/op	    7921 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	    4452	    268510 ns/op	  364368 B/op	    7921 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	    4748	    266801 ns/op	  364367 B/op	    7921 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	      40	  28897098 ns/op	36997460 B/op	  809555 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	      39	  29039734 ns/op	37003479 B/op	  809821 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	      39	  29934966 ns/op	37003487 B/op	  809820 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	      39	  28842040 ns/op	37003451 B/op	  809820 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	      42	  27786820 ns/op	36986299 B/op	  809062 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	      42	  28936529 ns/op	36986283 B/op	  809062 allocs/op
PASS
ok  	github.com/nerdswords/yet-another-cloudwatch-exporter/pkg/job	22.906s

pr

goos: darwin
goarch: arm64
pkg: github.com/nerdswords/yet-another-cloudwatch-exporter/pkg/job
BenchmarkMapResultsToMetricDatas/small_case-10         	  113601	     10365 ns/op	     444 B/op	       2 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	  117096	     10364 ns/op	     444 B/op	       2 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	  115774	     10357 ns/op	     444 B/op	       2 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	  117190	     10299 ns/op	     445 B/op	       2 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	  116364	     10344 ns/op	     444 B/op	       2 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	  116276	     10241 ns/op	     444 B/op	       2 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	    1191	    955207 ns/op	   15689 B/op	     356 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	    1153	    925903 ns/op	   15933 B/op	     367 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	    1154	    923614 ns/op	   15936 B/op	     367 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	    1190	    928290 ns/op	   15686 B/op	     356 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	    1198	    925005 ns/op	   15635 B/op	     354 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	    1231	    979040 ns/op	   15436 B/op	     345 allocs/op
PASS
ok  	github.com/nerdswords/yet-another-cloudwatch-exporter/pkg/job	50.249s

@thepalbi thepalbi changed the title Pablo/improve perf Improve perf in discovery jobs metrics to data lookup Aug 3, 2023
Makefile Show resolved Hide resolved
Copy link
Contributor

@kgeckhart kgeckhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions. A nit on naming I would consider renaming dataIndex -> metricIDToData

pkg/job/discovery.go Outdated Show resolved Hide resolved
pkg/job/discovery.go Outdated Show resolved Hide resolved
pkg/job/discovery_test.go Outdated Show resolved Hide resolved
@thepalbi
Copy link
Contributor Author

thepalbi commented Aug 3, 2023

Pasting results for the new bench parameters. Seeing the same improvement for all cases.

// benchstat

goos: darwin
goarch: arm64
pkg: github.com/nerdswords/yet-another-cloudwatch-exporter/pkg/job
                                       │    main.p     │                pr.p                │
                                       │    sec/op     │   sec/op     vs base               │
MapResultsToMetricDatas/small_case-10    120.624µ ± 1%   5.273µ ± 0%  -95.63% (p=0.002 n=6)
MapResultsToMetricDatas/medium_case-10   51827.5µ ± 1%   874.9µ ± 0%  -98.31% (p=0.002 n=6)
MapResultsToMetricDatas/big_case-10      189.002m ± 1%   2.339m ± 1%  -98.76% (p=0.002 n=6)
geomean                                    10.57m        221.0µ       -97.91%

                                       │     main.p      │                pr.p                 │
                                       │      B/op       │     B/op      vs base               │
MapResultsToMetricDatas/small_case-10      180350.0 ± 0%     444.0 ± 0%  -99.75% (p=0.002 n=6)
MapResultsToMetricDatas/medium_case-10   17827.83Ki ± 0%   59.45Ki ± 0%  -99.67% (p=0.002 n=6)
MapResultsToMetricDatas/big_case-10       36674.7Ki ± 0%   129.9Ki ± 1%  -99.65% (p=0.002 n=6)
geomean                                     4.751Mi        14.96Ki       -99.69%

                                       │    main.p     │               pr.p                │
                                       │   allocs/op   │ allocs/op   vs base               │
MapResultsToMetricDatas/small_case-10    3921.000 ± 0%   2.000 ± 0%  -99.95% (p=0.002 n=6)
MapResultsToMetricDatas/medium_case-10   402139.0 ± 0%   160.5 ± 2%  -99.96% (p=0.002 n=6)
MapResultsToMetricDatas/big_case-10      851628.5 ± 0%   832.5 ± 7%  -99.90% (p=0.002 n=6)
geomean                                    110.3k        64.41       -99.94%


// main
goos: darwin
goarch: arm64
pkg: github.com/nerdswords/yet-another-cloudwatch-exporter/pkg/job
BenchmarkMapResultsToMetricDatas/small_case-10         	    9882	    121283 ns/op	  180350 B/op	    3921 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	    9498	    121337 ns/op	  180350 B/op	    3921 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	    9619	    120365 ns/op	  180349 B/op	    3921 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	    9457	    120595 ns/op	  180350 B/op	    3921 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	    9843	    120653 ns/op	  180350 B/op	    3921 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	    9766	    120513 ns/op	  180350 B/op	    3921 allocs/op
BenchmarkMapResultsToMetricDatas/medium_case-10        	      20	  51746290 ns/op	18255670 B/op	  402139 allocs/op
BenchmarkMapResultsToMetricDatas/medium_case-10        	      20	  52272069 ns/op	18255706 B/op	  402140 allocs/op
BenchmarkMapResultsToMetricDatas/medium_case-10        	      22	  51884055 ns/op	18235357 B/op	  401218 allocs/op
BenchmarkMapResultsToMetricDatas/medium_case-10        	      20	  51971454 ns/op	18255691 B/op	  402139 allocs/op
BenchmarkMapResultsToMetricDatas/medium_case-10        	      20	  51740535 ns/op	18255695 B/op	  402139 allocs/op
BenchmarkMapResultsToMetricDatas/medium_case-10        	      20	  51770856 ns/op	18255715 B/op	  402140 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	       6	 190434111 ns/op	37554901 B/op	  851628 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	       6	 189570618 ns/op	37554928 B/op	  851629 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	       6	 188955708 ns/op	37554866 B/op	  851628 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	       6	 188437854 ns/op	37554850 B/op	  851628 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	       6	 187887264 ns/op	37554928 B/op	  851629 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	       6	 189048431 ns/op	37554976 B/op	  851629 allocs/op
PASS
ok  	github.com/nerdswords/yet-another-cloudwatch-exporter/pkg/job	26.972s

// pr
goos: darwin
goarch: arm64
pkg: github.com/nerdswords/yet-another-cloudwatch-exporter/pkg/job
BenchmarkMapResultsToMetricDatas/small_case-10         	  221618	      5257 ns/op	     444 B/op	       2 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	  225144	      5275 ns/op	     444 B/op	       2 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	  217838	      5271 ns/op	     444 B/op	       2 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	  225252	      5288 ns/op	     444 B/op	       2 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	  226856	      5261 ns/op	     444 B/op	       2 allocs/op
BenchmarkMapResultsToMetricDatas/small_case-10         	  227900	      5282 ns/op	     444 B/op	       2 allocs/op
BenchmarkMapResultsToMetricDatas/medium_case-10        	    1264	    872667 ns/op	   60931 B/op	     163 allocs/op
BenchmarkMapResultsToMetricDatas/medium_case-10        	    1282	    876165 ns/op	   60881 B/op	     161 allocs/op
BenchmarkMapResultsToMetricDatas/medium_case-10        	    1287	    875340 ns/op	   60867 B/op	     160 allocs/op
BenchmarkMapResultsToMetricDatas/medium_case-10        	    1298	    873536 ns/op	   60840 B/op	     159 allocs/op
BenchmarkMapResultsToMetricDatas/medium_case-10        	    1303	    874362 ns/op	   60824 B/op	     158 allocs/op
BenchmarkMapResultsToMetricDatas/medium_case-10        	    1279	    876690 ns/op	   60889 B/op	     161 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	     457	   2339118 ns/op	  134309 B/op	     890 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	     494	   2339057 ns/op	  132844 B/op	     824 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	     482	   2341229 ns/op	  133295 B/op	     844 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	     488	   2352369 ns/op	  133068 B/op	     834 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	     490	   2333186 ns/op	  132991 B/op	     831 allocs/op
BenchmarkMapResultsToMetricDatas/big_case-10           	     492	   2335667 ns/op	  132917 B/op	     827 allocs/op
PASS
ok  	github.com/nerdswords/yet-another-cloudwatch-exporter/pkg/job	85.981s

pkg/job/discovery.go Outdated Show resolved Hide resolved
pkg/job/discovery.go Outdated Show resolved Hide resolved
@thepalbi
Copy link
Contributor Author

thepalbi commented Aug 4, 2023

Posting results after rebasing, since there were optimizations in the last commit, over the same code.

//bench

goos: darwin
goarch: arm64
pkg: github.com/nerdswords/yet-another-cloudwatch-exporter/pkg/job
                                       │    main.p     │                pr.p                │
                                       │    sec/op     │   sec/op     vs base               │
MapResultsToMetricDatas/big_case-10      161.075m ± 1%   2.205m ± 5%  -98.63% (p=0.002 n=6)
MapResultsToMetricDatas/small_case-10    122.046µ ± 5%   5.413µ ± 2%  -95.57% (p=0.002 n=6)
MapResultsToMetricDatas/medium_case-10   48466.8µ ± 3%   805.2µ ± 2%  -98.34% (p=0.002 n=6)
geomean                                    9.840m        212.6µ       -97.84%

                                       │     main.p      │                pr.p                 │
                                       │      B/op       │     B/op      vs base               │
MapResultsToMetricDatas/big_case-10       36619.6Ki ± 0%   145.7Ki ± 1%  -99.60% (p=0.002 n=6)
MapResultsToMetricDatas/small_case-10      180430.0 ± 0%     524.0 ± 0%  -99.71% (p=0.002 n=6)
MapResultsToMetricDatas/medium_case-10   17837.60Ki ± 0%   67.17Ki ± 0%  -99.62% (p=0.002 n=6)
geomean                                     4.750Mi        17.11Ki       -99.65%

                                       │    main.p     │                pr.p                │
                                       │   allocs/op   │  allocs/op   vs base               │
MapResultsToMetricDatas/big_case-10      786.824k ± 0%   2.013k ± 0%  -99.74% (p=0.002 n=6)
MapResultsToMetricDatas/small_case-10     3931.00 ± 0%    12.00 ± 0%  -99.69% (p=0.002 n=6)
MapResultsToMetricDatas/medium_case-10   393.127k ± 0%   1.004k ± 0%  -99.74% (p=0.002 n=6)
geomean                                    106.7k         289.5       -99.73%

Co-authored-by: Cristian Greco <cristian@regolo.cc>
// skip elements that have been already mapped but still exist in metricIDToData
if metricData.MetricID == nil {
continue
}
// Copy to avoid a loop closure bug
dataPoint := metricDataResult.Datapoint
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cristiangreco this is the pointer reference bug right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

@cristiangreco cristiangreco merged commit 5f73bad into prometheus-community:master Aug 4, 2023
3 checks passed
@thepalbi thepalbi deleted the pablo/improve-perf branch August 4, 2023 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants