store-gateway: more efficient series caching #3751

dimitarvdimitrov · 2022-12-16T13:48:54Z

This PR changes the serialization format of messages stores in the series cache from gob to protobuf. The changes are only used in the streaming implementation.

Benchmarks

In benchmarks that use the series cache extensively it shows

between 14% and 65% reduced memory allocations
between 8% and 64% reduced CPU usage

The benchmarks with 0% change are either not using the streaming implementation or are only caching 1 series.

expand

name                                                                                                                                old time/op    new time/op    delta
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_default_options/10000000of10000000-10                                    69.8ms ± 1%    71.1ms ± 1%   +1.92%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_default_options/1000000of1000000-10                                        732ms ± 1%     725ms ± 1%     ~     (p=0.056 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_default_options/10000000of10000000-10                                     255µs ± 2%     257µs ± 2%     ~     (p=0.421 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10                    366µs ± 1%     334µs ± 0%   -8.62%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10                     393µs ± 1%     355µs ± 0%   -9.66%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10                    90.3ms ± 1%    80.2ms ± 2%  -11.28%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(10K_per_batch)/1000000of1000000-10                       885ms ± 1%     777ms ± 2%  -12.25%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10                   94.6ms ± 1%    82.7ms ± 1%  -12.59%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(1K_per_batch)/1000000of1000000-10                        899ms ± 2%     783ms ± 0%  -12.98%  (p=0.016 n=5+4)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10    65.2ms ± 1%    49.2ms ± 0%  -24.56%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_and_index_cache_(1K_per_batch)/1000000of1000000-10        566ms ± 2%     398ms ± 1%  -29.77%  (p=0.029 n=4+4)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                                                                             200µs ± 0%     106µs ± 1%  -46.91%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                                                                           202µs ± 1%     106µs ± 1%  -47.50%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                                                                        2.10ms ± 0%    1.07ms ± 0%  -48.86%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10                                                       2.08ms ± 0%    1.02ms ± 1%  -50.80%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                                                                        2.70ms ± 0%    1.15ms ± 1%  -57.37%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10                                                       2.77ms ± 0%    1.16ms ± 1%  -58.10%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10                                                          298µs ± 0%     115µs ± 4%  -61.41%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                                                                           322µs ± 1%     122µs ± 0%  -62.10%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                                                                             325µs ± 1%     121µs ± 0%  -62.78%  (p=0.016 n=4+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10                                                       310µs ± 0%     114µs ± 2%  -63.18%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10                                                    310µs ± 1%     113µs ± 0%  -63.44%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10     106µs ± 0%      37µs ± 0%  -64.73%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                                                                         3.87µs ± 0%    0.24µs ± 1%  -93.84%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                                                                      3.87µs ± 1%    0.20µs ± 0%  -94.86%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                                                                         14.6µs ± 0%     0.7µs ± 1%  -95.49%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/without_sharding-10                                                                                      14.6µs ± 1%     0.7µs ± 1%  -95.54%  (p=0.008 n=5+5)

name                                                                                                                                old alloc/op   new alloc/op   delta
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_default_options/1000000of1000000-10                                       1.95GB ± 0%    1.95GB ± 0%     ~     (p=0.548 n=5+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_default_options/10000000of10000000-10                                     179MB ± 0%     179MB ± 0%     ~     (p=0.690 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_default_options/10000000of10000000-10                                     710kB ± 0%     710kB ± 0%     ~     (p=0.516 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10                    854kB ± 1%     844kB ± 1%   -1.15%  (p=0.016 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10                     733kB ± 0%     722kB ± 0%   -1.52%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_and_index_cache_(1K_per_batch)/1000000of1000000-10        949MB ± 0%     815MB ± 0%  -14.09%  (p=0.029 n=4+4)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10    96.3MB ± 0%    82.1MB ± 0%  -14.79%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                                                                        3.37MB ± 0%    2.57MB ± 0%  -23.88%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(1K_per_batch)/1000000of1000000-10                       1.53GB ± 0%    1.16GB ± 0%  -24.49%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10                     152MB ± 0%     115MB ± 0%  -24.71%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10                                                       3.45MB ± 0%    2.59MB ± 0%  -24.93%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                                                                           374kB ± 0%     273kB ± 0%  -26.97%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                                                                             373kB ± 0%     272kB ± 0%  -27.01%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10                    147MB ± 0%     106MB ± 0%  -27.80%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(10K_per_batch)/1000000of1000000-10                      1.46GB ± 0%    1.05GB ± 0%  -27.82%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10                                                    325kB ± 0%     225kB ± 0%  -30.94%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10                                                       325kB ± 0%     225kB ± 0%  -30.94%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10                                                          325kB ± 0%     225kB ± 0%  -30.94%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                                                                           270kB ± 0%     132kB ± 0%  -51.21%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                                                                             269kB ± 0%     131kB ± 0%  -51.30%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10                                                       3.75MB ± 0%    1.31MB ± 0%  -65.05%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10    71.3kB ± 1%    24.8kB ± 2%  -65.24%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                                                                        3.72MB ± 0%    1.27MB ± 0%  -65.85%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                                                                         2.68kB ± 0%    0.20kB ± 0%  -92.65%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                                                                      2.68kB ± 0%    0.18kB ± 0%  -93.25%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                                                                         9.80kB ± 0%    0.26kB ± 0%  -97.39%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/without_sharding-10                                                                                      9.78kB ± 0%    0.24kB ± 0%  -97.55%  (p=0.008 n=5+5)

name                                                                                                                                old allocs/op  new allocs/op  delta
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_default_options/1000000of1000000-10                                        11.0M ± 0%     11.0M ± 0%     ~     (p=0.548 n=5+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10     1.50M ± 0%     0.90M ± 0%     ~     (p=0.079 n=4+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_default_options/10000000of10000000-10                                     1.10M ± 0%     1.10M ± 0%     ~     (p=0.579 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_default_options/10000000of10000000-10                                       802 ± 0%       802 ± 0%     ~     (all equal)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(10K_per_batch)/1000000of1000000-10                       10.0M ± 0%     10.0M ± 0%   -0.06%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10                    1.00M ± 0%     1.00M ± 0%   -0.07%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(1K_per_batch)/1000000of1000000-10                        10.1M ± 0%     10.1M ± 0%   -0.50%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10                     1.01M ± 0%     1.01M ± 0%   -0.50%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10                       816 ± 0%       680 ± 0%  -16.67%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10                      814 ± 0%       678 ± 0%  -16.71%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_and_index_cache_(1K_per_batch)/1000000of1000000-10        15.0M ± 0%      9.0M ± 0%  -40.08%  (p=0.029 n=4+4)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10     1.39k ± 0%     0.37k ± 0%  -73.49%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10                                                          6.28k ± 0%     1.04k ± 0%  -83.48%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10                                                       6.28k ± 0%     1.04k ± 0%  -83.48%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10                                                    6.28k ± 0%     1.04k ± 0%  -83.48%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                                                                             6.26k ± 0%     1.02k ± 0%  -83.71%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                                                                           6.26k ± 0%     1.02k ± 0%  -83.71%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                                                                           39.0 ± 0%       6.0 ± 0%  -84.62%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                                                                        39.0 ± 0%       6.0 ± 0%  -84.62%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                                                                              54.0 ± 0%       6.0 ± 0%  -88.89%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                                                                            54.0 ± 0%       6.0 ± 0%  -88.89%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                                                                         54.3k ± 0%      6.0k ± 0%  -88.90%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10                                                        60.2k ± 0%      6.0k ± 0%  -90.00%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                                                                          64.0 ± 0%       6.0 ± 0%  -90.62%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10                                                         64.0 ± 0%       6.0 ± 0%  -90.62%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/without_sharding-10                                                                                         259 ± 0%         9 ± 0%  -96.53%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                                                                            260 ± 0%         9 ± 0%  -96.54%  (p=0.008 n=5+5)

Notes to reviewers

Protobuf

Much of the gains were due to more efficient protobuf deserialization. For slices (slices of labels in our case) the default protobuf implementation starts with a nil slice and appends to it. This makes a lot of extra allocations and leaves unused memory.

In order to achieve this I had to copy some of the generated protobuf code for mimirpb.Metric in order to optimize the preallocation of labels slices. I copied the code into storepb.PreallocatingSliceMetric.

The way I chose to do this is to first iterate the bytes buffer and count the number of elements (labels). Then allocate a slice with that capacity and offload to the generated protobuf implementation for actually decoding into the structs.

A longer-term option is to write a protogen plugin that will generate the code.

Writing a solution with generics was non-trivial because the struct that we marshal and send mimirpb.LabelAdapter is not what implements proto.Marshler and proto.Unmarshler *mimirpb.LabelAdapter.

pkg/storegateway/snappy_gob_codec_test.go

pkg/storegateway/storepb/custom.go

pkg/storegateway/storepb/cache.proto

pracucci · 2022-12-16T14:50:38Z

pkg/storegateway/series_refs.go

 		res.series = append(res.series, seriesChunkRefs{
-			lset: lset,
+			lset: mimirpb.FromLabelAdaptersToLabels(lset.Labels),


I've some concerns about this. Once #3555 will be in, mimirpb.FromLabelAdaptersToLabels will make a copy, probably vanishing the benefits of this PR. I suggest to talk to @bboreham on how to make this PR #3555 friendly.

This is non blocking, since #3555 is not merged yet, but it's something I would like to think about.

pkg/storegateway/storepb/custom.go

dimitarvdimitrov · 2022-12-16T15:27:58Z

@pracucci raised a point about the size of the cache entries with gob vs with protobuf

I used the existing benchmarks to also print the size of the items in the cache. Since we snappy encode both the length is pretty much the same. There are some small benefits with protobuf.

I also ran without snappy, but the gains in replica memory will be dominated by the increased memory for memcached.

cache entry size - gob vs protobuf

1000 series with 1 matcher StoreSeriesForPostings len
gob: 6328     protobuf:  6108     without snappy: 44006
1000 series with 1 matcher, mismatching matchers StoreSeriesForPostings len
gob: 6328     protobuf:  6108     without snappy: 44006
1000 series with 1 matcher, mismatching postingsKey StoreSeriesForPostings len
gob: 6328     protobuf:  6108     without snappy: 44006
1000 series with 1 matcher, mismatching shard StoreSeriesForPostings len
gob: 6330     protobuf:  6112     without snappy: 44010
1000 series with 10 matchers StoreSeriesForPostings len
gob: 6362     protobuf:  6143     without snappy: 44042
6000 series with 6 labels each StoreSeriesForPostings len
gob: 47848    protobuf: 47686     without snappy: 516006
6000 series with 6 labels with more repetitions StoreSeriesForPostings len
gob: 48960    protobuf: 48850     without snappy: 532896
with sharding StoreSeriesForPostings len
gob: 248      protobuf: 28        without snappy: 26
without sharding StoreSeriesForPostings len
gob: 237      protobuf: 18        without snappy: 16

gob with snappy vs protobuf without snappy

name                                                                                 old time/op    new time/op    delta
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10        2.08ms ± 0%    0.87ms ± 1%  -58.06%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                         2.10ms ± 0%    0.86ms ± 1%  -58.80%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                         2.70ms ± 0%    1.10ms ± 1%  -59.15%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10        2.77ms ± 0%    1.10ms ± 1%  -60.43%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                              200µs ± 0%      79µs ± 5%  -60.59%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                            202µs ± 1%      78µs ± 0%  -61.62%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                            322µs ± 1%     114µs ± 2%  -64.67%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10           298µs ± 0%     104µs ± 1%  -65.10%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                              325µs ± 1%     113µs ± 1%  -65.41%  (p=0.016 n=4+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10        310µs ± 0%     104µs ± 1%  -66.46%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10     310µs ± 1%     104µs ± 1%  -66.61%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                          14.6µs ± 0%     0.6µs ± 0%  -95.57%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/without_sharding-10                                       14.6µs ± 1%     0.6µs ± 0%  -95.61%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                          3.87µs ± 0%    0.17µs ± 0%  -95.64%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                       3.87µs ± 1%    0.16µs ± 0%  -95.81%  (p=0.008 n=5+5)

name                                                                                 old alloc/op   new alloc/op   delta
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                         3.37MB ± 0%    2.05MB ± 0%  -39.19%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                            374kB ± 0%     224kB ± 0%  -40.13%  (p=0.000 n=5+4)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                              373kB ± 0%     223kB ± 0%  -40.19%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10        3.45MB ± 0%    2.05MB ± 0%  -40.60%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10        325kB ± 0%     176kB ± 0%  -46.05%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10     325kB ± 0%     176kB ± 0%  -46.05%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10           325kB ± 0%     176kB ± 0%  -46.05%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                            270kB ± 0%      74kB ± 0%  -72.46%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                              269kB ± 0%      74kB ± 0%  -72.60%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10        3.75MB ± 0%    0.69MB ± 0%  -81.65%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                         3.72MB ± 0%    0.66MB ± 0%  -82.15%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                          2.68kB ± 0%    0.13kB ± 0%  -95.04%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                       2.68kB ± 0%    0.12kB ± 0%  -95.63%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/without_sharding-10                                       9.78kB ± 0%    0.22kB ± 0%  -97.71%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                          9.80kB ± 0%    0.22kB ± 0%  -97.71%  (p=0.008 n=5+5)

name                                                                                 old allocs/op  new allocs/op  delta
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10           6.28k ± 0%     1.04k ± 0%  -83.50%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10        6.28k ± 0%     1.04k ± 0%  -83.50%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10     6.28k ± 0%     1.04k ± 0%  -83.50%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                              6.26k ± 0%     1.02k ± 0%  -83.73%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                            6.26k ± 0%     1.02k ± 0%  -83.73%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                            39.0 ± 0%       5.0 ± 0%  -87.18%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                         39.0 ± 0%       5.0 ± 0%  -87.18%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                          54.3k ± 0%      6.0k ± 0%  -88.90%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10         60.2k ± 0%      6.0k ± 0%  -90.00%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                               54.0 ± 0%       5.0 ± 0%  -90.74%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                             54.0 ± 0%       5.0 ± 0%  -90.74%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                           64.0 ± 0%       5.0 ± 0%  -92.19%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10          64.0 ± 0%       5.0 ± 0%  -92.19%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/without_sharding-10                                          259 ± 0%         8 ± 0%  -96.91%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                             260 ± 0%         8 ± 0%  -96.92%  (p=0.008 n=5+5)

protobuf with snappy vs protobuf without snappy

name                                                                                 old time/op    new time/op    delta
FetchCachedSeriesForPostings/without_sharding-10                                        652ns ± 1%     641ns ± 0%   -1.65%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                           658ns ± 1%     646ns ± 0%   -1.79%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                         1.15ms ± 1%    1.10ms ± 1%   -4.16%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10        1.16ms ± 1%    1.10ms ± 1%   -5.55%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                            122µs ± 0%     114µs ± 2%   -6.78%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                              121µs ± 0%     113µs ± 1%   -7.05%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10     113µs ± 0%     104µs ± 1%   -8.67%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10        114µs ± 2%     104µs ± 1%   -8.89%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10           115µs ± 4%     104µs ± 1%   -9.57%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10        1.02ms ± 1%    0.87ms ± 1%  -14.76%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                        199ns ± 0%     162ns ± 0%  -18.43%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                         1.07ms ± 0%    0.86ms ± 1%  -19.44%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                              106µs ± 1%      79µs ± 5%  -25.76%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                            106µs ± 1%      78µs ± 0%  -26.89%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                           239ns ± 1%     169ns ± 0%  -29.33%  (p=0.008 n=5+5)

name                                                                                 old alloc/op   new alloc/op   delta
FetchCachedSeriesForPostings/without_sharding-10                                         240B ± 0%      224B ± 0%   -6.67%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                            256B ± 0%      224B ± 0%  -12.50%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                            273kB ± 0%     224kB ± 0%  -18.02%  (p=0.000 n=5+4)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                              272kB ± 0%     223kB ± 0%  -18.06%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                         2.57MB ± 0%    2.05MB ± 0%  -20.12%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10        2.59MB ± 0%    2.05MB ± 0%  -20.88%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10        225kB ± 0%     176kB ± 0%  -21.88%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10           225kB ± 0%     176kB ± 0%  -21.88%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10     225kB ± 0%     176kB ± 0%  -21.88%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                            197B ± 0%      133B ± 0%  -32.49%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                         181B ± 0%      117B ± 0%  -35.36%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                            132kB ± 0%      74kB ± 0%  -43.56%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                              131kB ± 0%      74kB ± 0%  -43.72%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10        1.31MB ± 0%    0.69MB ± 0%  -47.50%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                         1.27MB ± 0%    0.66MB ± 0%  -47.74%  (p=0.008 n=5+5)

name                                                                                 old allocs/op  new allocs/op  delta
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10         6.02k ± 0%     6.02k ± 0%   -0.02%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                          6.02k ± 0%     6.02k ± 0%   -0.02%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10        1.04k ± 0%     1.04k ± 0%   -0.10%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10           1.04k ± 0%     1.04k ± 0%   -0.10%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10     1.04k ± 0%     1.04k ± 0%   -0.10%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                              1.02k ± 0%     1.02k ± 0%   -0.10%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                            1.02k ± 0%     1.02k ± 0%   -0.10%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                            9.00 ± 0%      8.00 ± 0%  -11.11%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/without_sharding-10                                         9.00 ± 0%      8.00 ± 0%  -11.11%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                           6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10          6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                               6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                             6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                            6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                         6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.008 n=5+5)

pracucci

Great work! I left few minor comments, but can't see anything wrong. Like we did for #3739, I would love to see a quick comparison load testing the store-gateway with the scenario "many large requests but without chunks".

goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/storegateway BenchmarkBucket_Series_WithSkipChunks BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples bucket_test.go:2382: Creating 250000 1-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1000000SeriesWith1Samples2097057031/001/0 bucket_test.go:2382: Creating 250000 1-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1000000SeriesWith1Samples2097057031/001/1 bucket_test.go:2382: Creating 250000 1-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1000000SeriesWith1Samples2097057031/001/2 bucket_test.go:2382: Creating 250000 1-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1000000SeriesWith1Samples2097057031/001/3 BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_default_options BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_default_options/1000000of1000000 BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_default_options/1000000of1000000-10 2 723917000 ns/op BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(1K_per_batch) BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(1K_per_batch)/1000000of1000000 BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(1K_per_batch)/1000000of1000000-10 2 893729125 ns/op BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(10K_per_batch) BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(10K_per_batch)/1000000of1000000 BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(10K_per_batch)/1000000of1000000-10 2 838661438 ns/op BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_and_index_cache_(1K_per_batch) BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_and_index_cache_(1K_per_batch)/1000000of1000000 BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_and_index_cache_(1K_per_batch)/1000000of1000000-10 1 1976057791 ns/op BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples bucket_test.go:2382: Creating 25000 25-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks100000SeriesWith100Samples2208983086/001/0 bucket_test.go:2382: Creating 25000 25-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks100000SeriesWith100Samples2208983086/001/1 bucket_test.go:2382: Creating 25000 25-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks100000SeriesWith100Samples2208983086/001/2 bucket_test.go:2382: Creating 25000 25-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks100000SeriesWith100Samples2208983086/001/3 BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_default_options BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_default_options/10000000of10000000 BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_default_options/10000000of10000000-10 15 71419614 ns/op BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch) BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/10000000of10000000 BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10 12 95010087 ns/op BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch) BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/10000000of10000000 BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10 12 96728188 ns/op BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_and_index_cache_(1K_per_batch) BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000 BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10 16 64947492 ns/op BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples bucket_test.go:2382: Creating 1 2500000-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1SeriesWith10000000Samples337119323/001/0 bucket_test.go:2382: Creating 1 2500000-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1SeriesWith10000000Samples337119323/001/1 bucket_test.go:2382: Creating 1 2500000-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1SeriesWith10000000Samples337119323/001/2 bucket_test.go:2382: Creating 1 2500000-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1SeriesWith10000000Samples337119323/001/3 BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(10K_per_batch) BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(10K_per_batch)/10000000of10000000 BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10 3210 362288 ns/op BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_and_index_cache_(1K_per_batch) BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000 BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10 10000 100400 ns/op BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_default_options BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_default_options/10000000of10000000 BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_default_options/10000000of10000000-10 4920 246681 ns/op BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(1K_per_batch) BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(1K_per_batch)/10000000of10000000 BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10 3175 374901 ns/op PASS Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/storegateway BenchmarkFetchCachedSeriesForPostings BenchmarkFetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions BenchmarkFetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10 423 2777498 ns/op 3450443 B/op 60247 allocs/op BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher-10 3871 307280 ns/op 373021 B/op 6257 allocs/op BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10 3886 300735 ns/op 325350 B/op 6278 allocs/op BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10 3943 303476 ns/op 325352 B/op 6278 allocs/op BenchmarkFetchCachedSeriesForPostings/with_sharding BenchmarkFetchCachedSeriesForPostings/with_sharding-10 79370 14776 ns/op 9799 B/op 260 allocs/op BenchmarkFetchCachedSeriesForPostings/without_sharding BenchmarkFetchCachedSeriesForPostings/without_sharding-10 80967 14740 ns/op 9783 B/op 259 allocs/op BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10 3955 299970 ns/op 325348 B/op 6278 allocs/op BenchmarkFetchCachedSeriesForPostings/6000_series_with_6_labels_each BenchmarkFetchCachedSeriesForPostings/6000_series_with_6_labels_each-10 435 2709429 ns/op 3370466 B/op 54257 allocs/op BenchmarkFetchCachedSeriesForPostings/1000_series_with_10_matchers BenchmarkFetchCachedSeriesForPostings/1000_series_with_10_matchers-10 3841 308489 ns/op 373550 B/op 6257 allocs/op PASS Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/storegateway BenchmarkStoreCachedSeriesForPostings BenchmarkStoreCachedSeriesForPostings/with_sharding BenchmarkStoreCachedSeriesForPostings/with_sharding-10 300554 3755 ns/op 2680 B/op 39 allocs/op BenchmarkStoreCachedSeriesForPostings/without_sharding BenchmarkStoreCachedSeriesForPostings/without_sharding-10 318122 3722 ns/op 2680 B/op 39 allocs/op BenchmarkStoreCachedSeriesForPostings/6000_series_with_6_labels_each BenchmarkStoreCachedSeriesForPostings/6000_series_with_6_labels_each-10 583 2040430 ns/op 3718159 B/op 64 allocs/op BenchmarkStoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions BenchmarkStoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10 577 2021253 ns/op 3750927 B/op 64 allocs/op BenchmarkStoreCachedSeriesForPostings/1000_series_with_1_matcher BenchmarkStoreCachedSeriesForPostings/1000_series_with_1_matcher-10 5826 196531 ns/op 269312 B/op 54 allocs/op BenchmarkStoreCachedSeriesForPostings/1000_series_with_10_matchers BenchmarkStoreCachedSeriesForPostings/1000_series_with_10_matchers-10 5984 196326 ns/op 269801 B/op 54 allocs/op PASS Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

dimitarvdimitrov · 2022-12-19T10:58:53Z

I ran this PR in a dev cluster where requests were served almost exclusively from the series cache. In the graphs below zone-b is running with this PR, while zone-a is running on the commit titled "Add benchmark for storeCachedSeriesForPostings" to ensure as much common between the two as possible. Zone-c is running the default implementation.

there is reduction in CPU usage roughly by 25%
there's a 50% reduction in allocated bytes/sec (and in garbage collections/sec)
working set, heap in-use, and RSS memory are pretty much the same for zone-a and zone-b
average latency can be argued to be lower, but it's marginally and may not be significant because this is a R/W deployed cluster - so there is some noise from other components sharing a Pod

Increased RSS and working set are likely an artefact of the mmap-less store-gateway work (both zone-a and zone-b have it compared to zone-c)

I will now address the remaining comments on this PR

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

colega · 2022-12-19T11:51:09Z

pkg/storegateway/series_refs.go

 func storeCachedSeriesForPostings(ctx context.Context, indexCache indexcache.IndexCache, userID string, blockID ulid.ULID, matchers []*labels.Matcher, shard *sharding.ShardSelector, postingsKey indexcache.PostingsKey, set seriesChunkRefsSet, logger log.Logger) {
-	entry := seriesForPostingsCacheEntry{
-		LabelSets:   make([]labels.Labels, set.len()),
-		MatchersKey: indexcache.CanonicalLabelMatchersKey(matchers),
-		Shard:       maybeNilShard(shard),
+	nonNilShard := maybeNilShard(shard)
+	matchersKey := indexcache.CanonicalLabelMatchersKey(matchers)
+	data, err := encodeCachedSeriesForPostings(set, matchersKey, nonNilShard)
+	if err != nil {
+		logSeriesForPostingsCacheEvent(ctx, logger, userID, blockID, matchers, shard, postingsKey, "msg", "can't encode series for caching", "err", err)
+		return
+	}
+	indexCache.StoreSeriesForPostings(ctx, userID, blockID, matchersKey, shard, postingsKey, data)
+}


Not directly related to this PR, but I guess this method was created in the streaming store-gateway implementation PRs?

Why does it need all three matchersKey, shard and postingsKey? The first two reference the same data set as the third one, aren't they?

~~I'm going to move this comment to the original PR: #3687~~ Nevermind, let's keep the conversation here.

I think this is outside of scope of this PR, but it's in scope for what i was going to do next.

Your comment made me realize that caching matchers may even introduce false-negative cache misses. You can reword a matcher to select the same series.

{a="1"} and {a!="2", a!="3"}

may still select the same postings, but we'll cache them separately.

I think the case is different for the shard because we include it in the cache key verbatim, not hash it - that reduces the collisions to only sets that have the same shard. If we also hash the shard key, then cache keys for all shards can collide.

What I'm unsure about is the strength of the hash if we don't hash the matchers - if we hash a matcher and a set of postings, is it the same strength as hashing a set of postings? Given that the matchers + shard have a 1:1 relationship with a set of postings, my answer would be no.

We'll be hashing a smaller thing - only set of postings, so the set of inputs that map onto 52 bits of hash will be smaller.. should give each input value a slightly smaller chance of collisions. So removing the matchers should even make the hash stronger?

The method name says "StoreSeriesForPostings" so you should just store series for postingsKey. You don't care about matchers, shards or whatever else led you to look up those postings. Since the cache key will be the a hash of the postings, you should store the postings list itself in the cached value too, to verify that you brought the expected cached item when retrieving.

you should store the postings list itself in the cached value too, to verify that you brought the expected cached item when retrieving

in this case we'll store postings twice - once for expanded postings and once for the series. I wanted to avoid that. WDYT?

an alternative is to store the number of postings

Okay, I see your concern now. However, when encoded with delta encoding, most of the postings are less than couple of bytes, so storing them alongside the series shouldn't be a concern (because series are potentially hundreds of bytes).

(OTOH, number of postings is just another hashing function, one of the bad ones 😄 )

i know why we need the shard - the postings we have are not necessarily postings that belong only to this shard - see this comment

mimir/pkg/storegateway/series_refs.go

Lines 725 to 726 in 73e834e

// Calculate the cache key before we filter out anything from the postings,

// so that the key doesn't depend on the series hash cache or any other filtering we do on the postings list.

this means that the postings for different shards are actually the same. But then the series that we cache are filtered by the shard. So we definitely need the shard in the cache key.

But I agree we can remove the matchers.

on verifying the postings for collisions.. Are you aware that we currently store two checksums of the postings - blake32 and sha1?

pkg/storegateway/series_refs.go

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

colega · 2022-12-19T13:47:57Z

pkg/mimirpb/compat_test.go

+		assert.Equal(t, metric, preallocMetric.Metric)
+		assert.Equal(t, len(metric.Labels), cap(preallocMetric.Labels))


Nit: you can use testing.AllocsPerRun to verify the number of allocations (which should be 10 + 1 I guess)

i tried that and it didn't work, that's why i decided to not include the test.

the problem was that Metric.Unmarshal was doing the same number of allocations - 1. So it seemed like `PrealloctingMetric had no effect in terms of allocations. I didn't look into why further

colega · 2022-12-19T13:48:52Z

pkg/storegateway/series_refs.go

-
-	var entry seriesForPostingsCacheEntry
-	if err := decodeSnappyGob(data, &entry); err != nil {
+	data, err := snappy.Decode(nil, data)


I think it's worth pooling the buffer provided to snappy.Decode here, wdyt?

yeah i can agree. It's accounting for 10% of allocations of the store-gateway.

I'm reluctant to do it since I'm not sure about the effectiveness of those and whether we won't end up with pooled slices much larger than what we actually need - I assume some things will be mere hundreds of bytes and other will be megabytes. Maybe we can use the pool.BucketBytes. Do you have suggestions for the sizes of buckets?

colega

LGTM, left some nitpicks you might want to consider.

pracucci

Great job, LGTM!

dimitarvdimitrov added the component/store-gateway label Dec 16, 2022

dimitarvdimitrov requested a review from a team as a code owner December 16, 2022 13:48

dimitarvdimitrov commented Dec 16, 2022

View reviewed changes

pkg/storegateway/snappy_gob_codec_test.go Outdated Show resolved Hide resolved

colega reviewed Dec 16, 2022

View reviewed changes

pkg/storegateway/storepb/custom.go Outdated Show resolved Hide resolved

pracucci reviewed Dec 16, 2022

View reviewed changes

pracucci approved these changes Dec 16, 2022

View reviewed changes

dimitarvdimitrov added 10 commits December 19, 2022 10:53

Add protobuf for series caching

7a5410a

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Hack unmarshalling

09698c1

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Linting things

719e9f9

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Remove unnecessary test

94b3648

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Add godoc to PreallocatingSliceMetric

0dc546e

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

fix linter again

be1af4c

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Format proto file

3ef1390

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

dimitarvdimitrov added 2 commits December 19, 2022 12:21

Move field counting to a private method

d98e04a

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Move PreallocatingSliceMetric to /mimrpb + rename

5fcf88f

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

colega reviewed Dec 19, 2022

View reviewed changes

pkg/storegateway/series_refs.go Show resolved Hide resolved

dimitarvdimitrov added 3 commits December 19, 2022 12:56

Add tests for PreallocatingMetric

c02fc03

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Fix godocs on PreallocatingMetric

f7cbebb

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Change hash key for seriesForPostingsCacheKey

161dcc0

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

dimitarvdimitrov force-pushed the dimitar/streaming-series-caching-more-efficient-protobuf branch from f6e9ebf to 161dcc0 Compare December 19, 2022 13:15

dimitarvdimitrov added 2 commits December 19, 2022 14:18

Change hash key for seriesForPostingsCacheKey

1167f9e

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Update file mode

99f8657

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

colega reviewed Dec 19, 2022

View reviewed changes

colega approved these changes Dec 19, 2022

View reviewed changes

pracucci approved these changes Dec 19, 2022

View reviewed changes

dimitarvdimitrov merged commit 73e834e into main Dec 19, 2022

dimitarvdimitrov deleted the dimitar/streaming-series-caching-more-efficient-protobuf branch December 19, 2022 16:20

dimitarvdimitrov mentioned this pull request Dec 19, 2022

Store-gateway: series streaming #3348

Closed

38 tasks

dimitarvdimitrov mentioned this pull request May 15, 2023

store-gateway: caching large lists of label values causes panics #5014

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

store-gateway: more efficient series caching #3751

store-gateway: more efficient series caching #3751

dimitarvdimitrov commented Dec 16, 2022

pracucci Dec 16, 2022

pracucci Dec 16, 2022

dimitarvdimitrov commented Dec 16, 2022

pracucci left a comment

dimitarvdimitrov commented Dec 19, 2022

colega Dec 19, 2022

colega Dec 19, 2022 •

edited

Loading

dimitarvdimitrov Dec 19, 2022

colega Dec 19, 2022

dimitarvdimitrov Dec 19, 2022

dimitarvdimitrov Dec 19, 2022

colega Dec 19, 2022

dimitarvdimitrov Dec 19, 2022

colega Dec 19, 2022

dimitarvdimitrov Dec 19, 2022

colega Dec 19, 2022

dimitarvdimitrov Dec 19, 2022

colega left a comment

pracucci left a comment

	// Calculate the cache key before we filter out anything from the postings,
	// so that the key doesn't depend on the series hash cache or any other filtering we do on the postings list.

		assert.Equal(t, metric, preallocMetric.Metric)
		assert.Equal(t, len(metric.Labels), cap(preallocMetric.Labels))

store-gateway: more efficient series caching #3751

store-gateway: more efficient series caching #3751

Conversation

dimitarvdimitrov commented Dec 16, 2022

Benchmarks

Notes to reviewers

Protobuf

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimitarvdimitrov commented Dec 16, 2022

pracucci left a comment

Choose a reason for hiding this comment

dimitarvdimitrov commented Dec 19, 2022

Choose a reason for hiding this comment

colega Dec 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

colega left a comment

Choose a reason for hiding this comment

pracucci left a comment

Choose a reason for hiding this comment

colega Dec 19, 2022 •

edited

Loading