Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing the memory footprint of active series custom trackers #2568

Merged
merged 16 commits into from Oct 25, 2022

Conversation

gubjanos
Copy link
Contributor

@gubjanos gubjanos commented Jul 28, 2022

What this PR does

This PR is replacing the bool slice tracking matches for each matcher configured for each series reference with a preAllocDynamicSlice tracking only matching matchers' indexes. The usage pattern of the custom trackers implies that in most usecases only a fraction of matchers are matching, thus this design saves on the stored values. The new data structure preAllocDynamicSlice is optimized for this.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@gubjanos gubjanos requested review from colega and bboreham July 28, 2022 08:03
@pracucci
Copy link
Collaborator

Can we see benchmarks? I would expect accessing the map being way slower than accessing the slice.

@gubjanos gubjanos requested a review from pracucci July 28, 2022 08:22
@pracucci
Copy link
Collaborator

I've run the existing benchmarks this way:

git checkout main
go test -run '^$' -bench '.*' -benchmem -count=3 ./pkg/ingester/activeseries > before.txt

hub pr checkout 2568
go test -run '^$' -bench '.*' -benchmem -count=3 ./pkg/ingester/activeseries > after.txt

benchstat before.txt after.txt

No significant difference:

name                                                   old time/op    new time/op    delta
ActiveSeriesTest_single_series/50-12                     51.1ns ± 2%    55.1ns ± 3%   ~     (p=0.100 n=3+3)
ActiveSeriesTest_single_series/100-12                    51.0ns ± 1%    55.6ns ± 0%   ~     (p=0.100 n=3+3)
ActiveSeriesTest_single_series/500-12                    55.4ns ± 5%    58.1ns ± 4%   ~     (p=0.400 n=3+3)
ActiveSeries_UpdateSeries/rounds=1_series=100000-12      93.5ms ± 1%    93.5ms ± 2%   ~     (p=0.700 n=3+3)
ActiveSeries_UpdateSeries/rounds=1_series=1000000-12      1.30s ± 4%     1.30s ± 5%   ~     (p=1.000 n=3+3)
ActiveSeries_UpdateSeries/rounds=10_series=100000-12      431ms ± 2%     428ms ± 1%   ~     (p=1.000 n=3+3)
ActiveSeries_UpdateSeries/rounds=10_series=1000000-12     5.35s ± 0%     5.36s ± 0%   ~     (p=0.700 n=3+3)
ActiveSeries_Active_once-12                               242µs ± 0%     277µs ± 3%   ~     (p=0.100 n=3+3)
ActiveSeries_Active_twice-12                              255µs ± 1%     287µs ± 0%   ~     (p=0.100 n=3+3)

name                                                   old alloc/op   new alloc/op   delta
ActiveSeriesTest_single_series/50-12                      0.00B          0.00B        ~     (all equal)
ActiveSeriesTest_single_series/100-12                     0.00B          0.00B        ~     (all equal)
ActiveSeriesTest_single_series/500-12                     0.00B          0.00B        ~     (all equal)
ActiveSeries_UpdateSeries/rounds=1_series=100000-12      19.6MB ± 0%    18.0MB ± 0%   ~     (p=0.100 n=3+3)
ActiveSeries_UpdateSeries/rounds=1_series=1000000-12      238MB ± 0%     222MB ± 0%   ~     (p=0.100 n=3+3)
ActiveSeries_UpdateSeries/rounds=10_series=100000-12     19.6MB ± 0%    18.0MB ± 0%   ~     (p=0.100 n=3+3)
ActiveSeries_UpdateSeries/rounds=10_series=1000000-12     238MB ± 0%     222MB ± 0%   ~     (p=0.100 n=3+3)
ActiveSeries_Active_once-12                                104B ± 3%      115B ± 1%   ~     (p=0.100 n=3+3)
ActiveSeries_Active_twice-12                               108B ± 1%      120B ± 1%   ~     (p=0.100 n=3+3)

name                                                   old allocs/op  new allocs/op  delta
ActiveSeriesTest_single_series/50-12                       0.00           0.00        ~     (all equal)
ActiveSeriesTest_single_series/100-12                      0.00           0.00        ~     (all equal)
ActiveSeriesTest_single_series/500-12                      0.00           0.00        ~     (all equal)
ActiveSeries_UpdateSeries/rounds=1_series=100000-12        208k ± 0%      208k ± 0%   ~     (p=0.200 n=3+3)
ActiveSeries_UpdateSeries/rounds=1_series=1000000-12      2.02M ± 0%     2.02M ± 0%   ~     (p=0.100 n=3+3)
ActiveSeries_UpdateSeries/rounds=10_series=100000-12       208k ± 0%      208k ± 0%   ~     (p=0.200 n=3+3)
ActiveSeries_UpdateSeries/rounds=10_series=1000000-12     2.02M ± 0%     2.02M ± 0%   ~     (p=0.400 n=3+3)
ActiveSeries_Active_once-12                                5.00 ± 0%      5.00 ± 0%   ~     (all equal)
ActiveSeries_Active_twice-12                               6.00 ± 0%      6.00 ± 0%   ~     (all equal)

Then I checked the benchmarks (I should have done it earlier) and I've realized no benchmark set the matchers, so obviously there's no difference. Please @gubjanos add benchmark(s) using matchers too.

@gubjanos gubjanos changed the title Changing active series matches from []bool to map[int]bool for better… Changing active series matches from []bool to []int of indexes for better… Jul 28, 2022
Comment on lines 44 to 47
matches[i] = sm.Matches(series)
if sm.Matches(series) {
matches = append(matches, i)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we see benchmarks of this? :)

This will have to allocate a new slice every 2^n matchers, which is probably more expensive than just allocating a slice of bools since it will push unnecessary pressure on the GC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're worried about memory consumption here, then IMO we should keep the previous approach and change []bool to a bitarray backed by []uint64

@gubjanos
Copy link
Contributor Author

gubjanos commented Aug 1, 2022

I have created a benchmark for testing Matchers.
The benchmark checks for different number of trackers and different number of labels on the serie. Please note that this setup is biased to my assumption, as only one of the trackers are matching. If we would increse the number of matching trackers, the memory footprint would increase. In this way I think it is better than implementing a bitset, but still only if the assumption of sparse matching holds.
Here are my results:

name                                                   old time/op    new time/op    delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-16          150ns ± 3%     154ns ± 2%   ~     (p=0.700 n=3+3)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-16         148ns ± 2%     152ns ± 1%   ~     (p=0.200 n=3+3)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-16        152ns ± 1%     155ns ± 2%   ~     (p=0.400 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-16        1.09µs ± 2%    1.05µs ± 1%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-16       1.12µs ± 1%    1.05µs ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-16      1.14µs ± 1%    1.05µs ± 1%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-16       9.90µs ± 2%   10.14µs ± 3%   ~     (p=0.400 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-16      9.82µs ± 1%   10.16µs ± 2%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-16     9.74µs ± 1%   10.48µs ± 1%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-16       164µs ± 1%     169µs ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-16      165µs ± 1%     168µs ± 2%   ~     (p=0.200 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-16     163µs ± 1%     180µs ± 1%   ~     (p=0.100 n=3+3)

name                                                   old alloc/op   new alloc/op   delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-16          16.0B ± 0%      8.0B ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-16         16.0B ± 0%      8.0B ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-16        16.0B ± 0%      8.0B ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-16          112B ± 0%        8B ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-16         112B ± 0%        8B ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-16        112B ± 0%        8B ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-16       1.02kB ± 0%    0.01kB ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-16      1.02kB ± 0%    0.01kB ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-16     1.02kB ± 0%    0.01kB ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-16      10.2kB ± 0%     0.0kB ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-16     10.2kB ± 0%     0.0kB ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-16    10.2kB ± 0%     0.0kB ± 0%   ~     (p=0.100 n=3+3)

name                                                   old allocs/op  new allocs/op  delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-16           1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-16          1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-16         1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-16          1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-16         1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-16        1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-16         1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-16        1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-16       1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-16        1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-16       1.00 ± 0%      1.00 ± 0%   ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-16      1.00 ± 0%      1.00 ± 0%   ~     (all equal)

@gubjanos gubjanos requested a review from colega August 1, 2022 14:27
@colega
Copy link
Contributor

colega commented Aug 2, 2022

This result looks good to me, however, if we want to optimize here and we know that most of the series match just once, we can do better.

I made a quick PoC replacing your []int by a fixedSlice type that tries to use a fixed array before allocating a slice, like:

const fixedSliceSize = 4

type fixedSlice struct {
	arr  [fixedSliceSize]int
	arrl int
	rest []int
}

Ran your benchmarks, and it's faster:

$ benchstat BenchmarkMatchesSeries.slice.txt BenchmarkMatchesSeries.fixed.txt
name                                                   old time/op  new time/op  delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-12        216ns ± 4%   174ns ± 4%  -19.09%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-12       215ns ± 1%   176ns ± 2%  -17.93%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-12      208ns ± 5%   167ns ± 1%  -19.63%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-12      1.38µs ± 4%  1.30µs ± 2%   -5.93%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-12     1.35µs ± 2%  1.29µs ± 2%   -4.60%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-12    1.34µs ± 1%  1.30µs ± 2%   -3.34%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-12     16.6µs ± 1%  15.2µs ± 2%   -8.19%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-12    16.4µs ± 1%  14.9µs ± 1%   -9.20%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-12   17.2µs ± 3%  16.4µs ± 1%   -4.46%  (p=0.016 n=5+4)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-12     238µs ±14%   232µs ± 4%     ~     (p=1.000 n=5+5)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-12    236µs ± 1%   213µs ± 6%   -9.62%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-12   241µs ± 3%   211µs ± 1%  -12.19%  (p=0.008 n=5+5)

And the best thing, it does zero allocations for your test case (each series matching just one matcher):

name                                                   old alloc/op   new alloc/op   delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-12          8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-12         8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-12        8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-12         8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-12        8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-12       8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-12        8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-12       8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-12      8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-12       8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-12      8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-12     8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)

name                                                   old allocs/op  new allocs/op  delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-12           1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-12          1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-12         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-12          1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-12         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-12        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-12         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-12        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-12       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-12        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-12       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-12      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)

IMO, reducing one allocation per active series is a pretty big deal (that's 1.5M less stuff in the heap for GC to iterate on, in the worst case, because not all series match something), but I would like to hear @bboreham's opinion on this.

OTOH, I'm saying it's a quick PoC because I'd fine-tune a few things if we used this for prod:

  • I'd not use ints, but uint16 for matching tracker position, and limit the max number of trackers to 64k (let's be honest, if you have 64k trackers, you're not using the feature properly, and something will explode way before).
  • I'd reduce the size of fixedSliceSize to 3, and change arrl type to byte, so that three values plus len would pack into 8 bytes. This would mean that it's only an 8-byte overhead over the existing []int which is 24bytes plus a heap allocation.
  • I'd rename fixedSlice into something less terrible.

@pracucci
Copy link
Collaborator

@gubjanos I think the last comment from @colega was very valid. Are you still interested in following up with the suggestions from Oleg?

@pracucci
Copy link
Collaborator

pracucci commented Oct 7, 2022

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

@gubjanos
Copy link
Contributor Author

@pracucci @colega Sorry for the delay. Yes, I am indeed interested in pushing this forward, but could we please first agree that this is something we would like to ship? @bboreham any objection?

@colega
Copy link
Contributor

colega commented Oct 18, 2022

I think it's interesting to ship this improvements, then we can talk about how trackers can be further improved.

@gubjanos gubjanos requested a review from a team as a code owner October 18, 2022 15:46
pkg/ingester/activeseries/matchers.go Outdated Show resolved Hide resolved
pkg/ingester/activeseries/custom_trackers_config.go Outdated Show resolved Hide resolved
pkg/ingester/activeseries/matchers.go Outdated Show resolved Hide resolved
@colega
Copy link
Contributor

colega commented Oct 20, 2022

Sorry for the delay, changes lgtm with a few nitpicks and a missing changelog entry.

Can you also please post the latest benchmark comparison? Thank you.

@gubjanos
Copy link
Contributor Author

Here is the comparison between your PoC version and the one with the improvements.

name                                                   old time/op    new time/op    delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-16          128ns ± 0%     126ns ± 0%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-16         129ns ± 2%     126ns ± 1%   ~     (p=0.200 n=3+3)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-16        133ns ± 4%     129ns ± 2%   ~     (p=0.400 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-16         949ns ± 2%     919ns ± 2%   ~     (p=0.200 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-16        930ns ± 1%     915ns ± 1%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-16       921ns ± 1%     907ns ± 1%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-16       8.55µs ± 1%    8.63µs ± 0%   ~     (p=0.700 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-16      8.82µs ± 4%    8.99µs ± 0%   ~     (p=0.700 n=3+3)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-16     8.68µs ± 2%    8.73µs ± 1%   ~     (p=1.000 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-16       147µs ± 2%     148µs ± 2%   ~     (p=0.700 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-16      145µs ± 2%     157µs ± 2%   ~     (p=0.100 n=3+3)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-16     143µs ± 5%     154µs ± 3%   ~     (p=0.100 n=3+3)

name                                                   old alloc/op   new alloc/op   delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-16          0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-16         0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-16        0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-16         0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-16        0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-16       0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-16        0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-16       0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-16      0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-16       0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-16      0.00B          0.00B        ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-16     0.00B          0.00B        ~     (all equal)

name                                                   old allocs/op  new allocs/op  delta
MatchesSeries/TrackerCount:_10,_LabelCount:_1-16           0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_10,_LabelCount:_10-16          0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_10,_LabelCount:_100-16         0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_1-16          0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_10-16         0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_100,_LabelCount:_100-16        0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_1-16         0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_10-16        0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_1000,_LabelCount:_100-16       0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_1-16        0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_10-16       0.00           0.00        ~     (all equal)
MatchesSeries/TrackerCount:_10000,_LabelCount:_100-16      0.00           0.00        ~     (all equal)

@gubjanos gubjanos changed the title Changing active series matches from []bool to []int of indexes for better… Reducing the memory footprint of active series custom trackers Oct 24, 2022
@colega
Copy link
Contributor

colega commented Oct 24, 2022

I've run a benchmark comparison between the current main version and the one that's being introduced here

Click here to see the benchmark I've ported to `main` to compare
func BenchmarkMatchesSeries(b *testing.B) {
	trackerCounts := []int{10, 100, 1000}
	asms := make([]*Matchers, len(trackerCounts))

	for i, matcherCount := range trackerCounts {
		configMap := map[string]string{}
		for j := 0; j < matcherCount; j++ {
			configMap[strconv.Itoa(j)] = fmt.Sprintf(`{this_will_match_%d="true"}`, j)
		}
		config, err := NewCustomTrackersConfig(configMap)
		require.NoError(b, err)
		asms[i] = NewMatchers(config)
	}

	makeLabels := func(total, matching int) labels.Labels {
		if total < matching {
			b.Fatal("wrong test setup, total < matching")
		}
		lbs := make(labels.Labels, 0, total)
		for i := 0; i < matching; i++ {
			lbs = append(lbs, labels.Label{Name: fmt.Sprintf(`this_will_match_%d`, i), Value: "true"})
		}
		for i := matching; i < total; i++ {
			lbs = append(lbs, labels.Label{Name: fmt.Sprintf("something_else_%d", i), Value: "true"})
		}
		return lbs
	}

	for i, trackerCount := range trackerCounts {
		for _, bc := range []struct {
			total, matching int
		}{
			{1, 0},
			{1, 1},
			{10, 1},
			{10, 2},
			{10, 5},
			{25, 1},
			{25, 2},
			{25, 5},
			{100, 1},
			{100, 2},
			{100, 5},
		} {
			series := makeLabels(bc.total, bc.matching)
			b.Run(fmt.Sprintf("TrackerCount: %d, Labels: %d, Matching: %d", trackerCount, bc.total, bc.matching), func(b *testing.B) {
				for x := 0; x < b.N; x++ {
					got := asms[i].Matches(series)
					matched := 0
					for _, v := range got {
						if v {
							matched++
						}
					}
					require.Equal(b, bc.matching, matched)
				}
			})
		}
	}
}

(I'll post a suggestion with some changes to your benchmark too in a separate comment, so it will match the one I ran here)

The results look great and kind of as the expected ones:

name                                                        old time/op    new time/op    delta
MatchesSeries/TrackerCount:_10,_Labels:_1,_Matching:_0         435ns ± 1%     428ns ± 1%    -1.69%  (p=0.016 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_1,_Matching:_1         461ns ± 1%     454ns ± 1%    -1.57%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_1        512ns ± 1%     509ns ± 1%      ~     (p=0.421 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_2        532ns ± 2%     531ns ± 2%      ~     (p=0.841 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_5        574ns ± 2%     625ns ± 3%    +9.02%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_1        777ns ± 1%     787ns ± 1%    +1.39%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_2        765ns ± 1%     789ns ± 3%    +3.15%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_5        723ns ± 2%     777ns ± 0%    +7.43%  (p=0.016 n=5+4)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_1      2.17µs ± 0%    2.29µs ± 2%    +5.35%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_2      2.00µs ± 0%    2.11µs ± 0%    +5.08%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_5      1.50µs ± 3%    1.60µs ± 1%    +6.44%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_1,_Matching:_0        885ns ± 1%     811ns ± 1%    -8.35%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_1,_Matching:_1        915ns ± 2%     846ns ± 0%    -7.53%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_1      1.43µs ± 1%    1.35µs ± 0%    -5.90%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_2      1.46µs ± 0%    1.38µs ± 0%    -5.33%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_5      1.52µs ± 0%    1.51µs ± 0%    -0.79%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_1      2.37µs ± 0%    2.39µs ± 2%      ~     (p=0.635 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_2      2.44µs ± 0%    2.53µs ± 3%    +3.73%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_5      2.51µs ± 0%    2.58µs ± 1%    +3.02%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_1     6.86µs ± 1%    6.94µs ± 0%      ~     (p=0.190 n=5+4)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_2     6.70µs ± 0%    6.77µs ± 0%    +1.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_5     6.23µs ± 1%    6.35µs ± 0%    +1.84%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_1,_Matching:_0      5.16µs ± 0%    4.62µs ± 1%   -10.48%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_1,_Matching:_1      5.31µs ± 0%    4.83µs ± 1%    -9.08%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_1     9.43µs ± 1%    8.83µs ± 0%    -6.27%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_2     9.72µs ± 0%    8.95µs ± 1%    -7.92%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_5     9.79µs ± 0%    9.22µs ± 3%    -5.79%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_1     16.4µs ± 1%    20.5µs ±12%   +25.12%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_2     17.2µs ± 3%    18.4µs ± 5%    +7.07%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_5     18.8µs ± 0%    20.0µs ± 4%    +6.34%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_1    52.7µs ± 2%    52.7µs ± 0%      ~     (p=0.421 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_2    53.0µs ± 1%    52.5µs ± 0%    -0.89%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_5    52.1µs ± 0%    52.2µs ± 0%      ~     (p=0.548 n=5+5)

name                                                        old alloc/op   new alloc/op   delta
MatchesSeries/TrackerCount:_10,_Labels:_1,_Matching:_0         16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_1,_Matching:_1         16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_1        16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_2        16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_5        16.0B ± 0%      8.0B ± 0%   -50.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_1        16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_2        16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_5        16.0B ± 0%      8.0B ± 0%   -50.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_1       16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_2       16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_5       16.0B ± 0%      8.0B ± 0%   -50.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_1,_Matching:_0         112B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_1,_Matching:_1         112B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_1        112B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_2        112B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_5        112B ± 0%        8B ± 0%   -92.86%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_1        112B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_2        112B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_5        112B ± 0%        8B ± 0%   -92.86%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_1       112B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_2       112B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_5       112B ± 0%        8B ± 0%   -92.86%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_1,_Matching:_0      1.02kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_1,_Matching:_1      1.02kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_1     1.02kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_2     1.02kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_5     1.02kB ± 0%    0.01kB ± 0%   -99.22%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_1     1.02kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_2     1.02kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_5     1.02kB ± 0%    0.01kB ± 0%   -99.22%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_1    1.02kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_2    1.02kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_5    1.02kB ± 0%    0.01kB ± 0%   -99.22%  (p=0.008 n=5+5)

name                                                        old allocs/op  new allocs/op  delta
MatchesSeries/TrackerCount:_10,_Labels:_1,_Matching:_0          1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_1,_Matching:_1          1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_1         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_2         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_10,_Matching:_5         1.00 ± 0%      1.00 ± 0%      ~     (all equal)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_1         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_2         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_25,_Matching:_5         1.00 ± 0%      1.00 ± 0%      ~     (all equal)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_1        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_2        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_10,_Labels:_100,_Matching:_5        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
MatchesSeries/TrackerCount:_100,_Labels:_1,_Matching:_0         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_1,_Matching:_1         1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_1        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_2        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_10,_Matching:_5        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_1        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_2        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_25,_Matching:_5        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_1       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_2       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_100,_Labels:_100,_Matching:_5       1.00 ± 0%      1.00 ± 0%      ~     (all equal)
MatchesSeries/TrackerCount:_1000,_Labels:_1,_Matching:_0        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_1,_Matching:_1        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_1       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_2       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_10,_Matching:_5       1.00 ± 0%      1.00 ± 0%      ~     (all equal)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_1       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_2       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_25,_Matching:_5       1.00 ± 0%      1.00 ± 0%      ~     (all equal)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_1      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_2      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
MatchesSeries/TrackerCount:_1000,_Labels:_100,_Matching:_5      1.00 ± 0%      1.00 ± 0%      ~     (all equal)

There's a similar performance (+/- 5% everywhere) with zero allocations except for 5 matching trackers.

CHANGELOG.md Outdated Show resolved Hide resolved
gubjanos and others added 3 commits October 24, 2022 20:16
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Copy link
Contributor

@colega colega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @gubjanos, great work!

@colega colega enabled auto-merge (squash) October 25, 2022 07:23
@colega colega merged commit 7d30b5f into main Oct 25, 2022
@colega colega deleted the jg/active_series_memory_perf branch October 25, 2022 07:33
masonmei pushed a commit to udmire/mimir that referenced this pull request Nov 4, 2022
…na#2568)

* Changing active series matches from []bool to map[int]bool for better memory performance

* Changing map[int]bool to int slice

* Changing slice allocation based on Oleg's suggestion

* Benchmark

* PoC: fixed size backed array slice for matchers

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Adding Oleg's improvements as well

* Limiting the maximum number of trackers as 64k

* Making linter happy

* Making preAllocDynamicSlice and also activeseries.mathches unexported

* Commenting and cleaning up preAllocDynamicSlice

* Failing the benchmark if the config fails to initizalize

* Updating changelog

* Update CHANGELOG.md

Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Update pkg/ingester/activeseries/matchers_test.go

Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
masonmei pushed a commit to udmire/mimir that referenced this pull request Nov 4, 2022
…na#2568)

* Changing active series matches from []bool to map[int]bool for better memory performance

* Changing map[int]bool to int slice

* Changing slice allocation based on Oleg's suggestion

* Benchmark

* PoC: fixed size backed array slice for matchers

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Adding Oleg's improvements as well

* Limiting the maximum number of trackers as 64k

* Making linter happy

* Making preAllocDynamicSlice and also activeseries.mathches unexported

* Commenting and cleaning up preAllocDynamicSlice

* Failing the benchmark if the config fails to initizalize

* Updating changelog

* Update CHANGELOG.md

Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Update pkg/ingester/activeseries/matchers_test.go

Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants