Reduce postingsCacheKey() memory allocations #4861

pracucci · 2023-04-27T15:29:15Z

What this PR does

We've observed some high memory allocations in postingsCacheKey() and FetchMultiPostings() in store-gateways, when dealing with thousands or even millions of postings. In particular, under some production workload, we observed 14% CPU taken by postingsCacheKey() and 20% CPU taken by the mapassign done by FetchMultiPostings():

In this PR I'm proposing an optimization to reduce the memory allocations both in postingsCacheKey() and FetchMultiPostings().

Benchmarks:

                                      │ before.txt  │             after.txt              │
                                       │   sec/op    │   sec/op     vs base               │
RemoteIndexCache_FetchMultiPostings-12   17.18m ± 6%   12.09m ± 6%  -29.62% (p=0.002 n=6)
StringCacheKeys/postings-12              1.745µ ± 2%   1.216µ ± 2%  -30.32% (p=0.002 n=6)
StringCacheKeys/series_ref-12            115.6n ± 2%   114.4n ± 1%   -1.08% (p=0.041 n=6)
StringCacheKeys/expanded_postings-12     345.3n ± 5%   342.4n ± 2%        ~ (p=0.223 n=6)
geomean                                  5.881µ        4.898µ       -16.72%

                                       │  before.txt   │               after.txt               │
                                       │     B/op      │     B/op      vs base                 │
RemoteIndexCache_FetchMultiPostings-12   12.054Mi ± 0%   6.289Mi ± 0%  -47.83% (p=0.002 n=6)
StringCacheKeys/postings-12               2480.00 ± 0%     80.00 ± 0%  -96.77% (p=0.002 n=6)
StringCacheKeys/series_ref-12               64.00 ± 0%     64.00 ± 0%        ~ (p=1.000 n=6) ¹
StringCacheKeys/expanded_postings-12        192.0 ± 0%     192.0 ± 0%        ~ (p=1.000 n=6) ¹
geomean                                   4.326Ki        1.558Ki       -63.98%
¹ all samples are equal

                                       │ before.txt  │              after.txt               │
                                       │  allocs/op  │  allocs/op   vs base                 │
RemoteIndexCache_FetchMultiPostings-12   61.11k ± 0%   20.27k ± 0%  -66.83% (p=0.002 n=6)
StringCacheKeys/postings-12               5.000 ± 0%    1.000 ± 0%  -80.00% (p=0.002 n=6)
StringCacheKeys/series_ref-12             1.000 ± 0%    1.000 ± 0%        ~ (p=1.000 n=6) ¹
StringCacheKeys/expanded_postings-12      3.000 ± 0%    3.000 ± 0%        ~ (p=1.000 n=6) ¹
geomean                                   30.94         15.70       -49.25%
¹ all samples are equal

Note: the size of postings cached in RemoteIndexCache_FetchMultiPostings is not realistic (too small) so don't look too much at the memory reduction % there, but allocations reduction is still legit.

Which issue(s) this PR fixes or relates to

N/A

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

56quarters

LGTM modulo the test failure (I wasn't able to reproduce locally)

56quarters · 2023-04-27T17:48:32Z

pkg/storegateway/indexcache/remote.go

+	key := make([]byte, expectedLen)
+	offset := 0
+
+	offset += copy(key[offset:], prefix)


nit: this seems very similar to what bytes.Buffer does

That's true, but I haven't found an easy way to do base64.RawURLEncoding.Encode() without an extra allocation. Also bytes.Buffer.String() does an extra allocation too.

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci · 2023-04-28T04:35:23Z

LGTM modulo the test failure (I wasn't able to reproduce locally)

I'm not sure what's happening, but I noticed that it reports 1 extra allocation when tests run with -race (as in CI). No extra allocation is reported without -race. To move forward, I've changed the test assertion to tolerate 1 extra allocation and added a comment to explain it.

pracucci force-pushed the optimize-postingsCacheKey branch from f4420da to 6bacfa2 Compare April 27, 2023 15:38

pracucci marked this pull request as ready for review April 27, 2023 15:50

pracucci requested a review from a team as a code owner April 27, 2023 15:50

pracucci requested review from 56quarters and dimitarvdimitrov April 27, 2023 15:50

56quarters approved these changes Apr 27, 2023

View reviewed changes

pracucci added 4 commits April 28, 2023 06:21

Optimized postingsCacheKey()

a4c7825

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Added CHANGELOG entry

c51223b

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Remove useless assignment

324f86f

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Tolerate 1 extra allocation when running with -race

cff339e

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci force-pushed the optimize-postingsCacheKey branch from bee3a9e to cff339e Compare April 28, 2023 04:34

pracucci merged commit b9df1ab into main Apr 28, 2023

pracucci deleted the optimize-postingsCacheKey branch April 28, 2023 06:46

colega mentioned this pull request Apr 28, 2023

Don't use zeropool for postings cache key building #4869

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce postingsCacheKey() memory allocations #4861

Reduce postingsCacheKey() memory allocations #4861

pracucci commented Apr 27, 2023 •

edited

Loading

56quarters left a comment

56quarters Apr 27, 2023

pracucci Apr 28, 2023

pracucci commented Apr 28, 2023

Reduce postingsCacheKey() memory allocations #4861

Reduce postingsCacheKey() memory allocations #4861

Conversation

pracucci commented Apr 27, 2023 • edited Loading

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

56quarters left a comment

Choose a reason for hiding this comment

56quarters Apr 27, 2023

Choose a reason for hiding this comment

pracucci Apr 28, 2023

Choose a reason for hiding this comment

pracucci commented Apr 28, 2023

pracucci commented Apr 27, 2023 •

edited

Loading