Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store-gateway: more efficient series caching #3751

Merged

Conversation

dimitarvdimitrov
Copy link
Contributor

This PR changes the serialization format of messages stores in the series cache from gob to protobuf. The changes are only used in the streaming implementation.

Benchmarks

In benchmarks that use the series cache extensively it shows

  • between 14% and 65% reduced memory allocations
  • between 8% and 64% reduced CPU usage

The benchmarks with 0% change are either not using the streaming implementation or are only caching 1 series.

expand
name                                                                                                                                old time/op    new time/op    delta
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_default_options/10000000of10000000-10                                    69.8ms ± 1%    71.1ms ± 1%   +1.92%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_default_options/1000000of1000000-10                                        732ms ± 1%     725ms ± 1%     ~     (p=0.056 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_default_options/10000000of10000000-10                                     255µs ± 2%     257µs ± 2%     ~     (p=0.421 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10                    366µs ± 1%     334µs ± 0%   -8.62%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10                     393µs ± 1%     355µs ± 0%   -9.66%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10                    90.3ms ± 1%    80.2ms ± 2%  -11.28%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(10K_per_batch)/1000000of1000000-10                       885ms ± 1%     777ms ± 2%  -12.25%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10                   94.6ms ± 1%    82.7ms ± 1%  -12.59%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(1K_per_batch)/1000000of1000000-10                        899ms ± 2%     783ms ± 0%  -12.98%  (p=0.016 n=5+4)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10    65.2ms ± 1%    49.2ms ± 0%  -24.56%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_and_index_cache_(1K_per_batch)/1000000of1000000-10        566ms ± 2%     398ms ± 1%  -29.77%  (p=0.029 n=4+4)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                                                                             200µs ± 0%     106µs ± 1%  -46.91%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                                                                           202µs ± 1%     106µs ± 1%  -47.50%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                                                                        2.10ms ± 0%    1.07ms ± 0%  -48.86%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10                                                       2.08ms ± 0%    1.02ms ± 1%  -50.80%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                                                                        2.70ms ± 0%    1.15ms ± 1%  -57.37%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10                                                       2.77ms ± 0%    1.16ms ± 1%  -58.10%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10                                                          298µs ± 0%     115µs ± 4%  -61.41%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                                                                           322µs ± 1%     122µs ± 0%  -62.10%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                                                                             325µs ± 1%     121µs ± 0%  -62.78%  (p=0.016 n=4+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10                                                       310µs ± 0%     114µs ± 2%  -63.18%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10                                                    310µs ± 1%     113µs ± 0%  -63.44%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10     106µs ± 0%      37µs ± 0%  -64.73%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                                                                         3.87µs ± 0%    0.24µs ± 1%  -93.84%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                                                                      3.87µs ± 1%    0.20µs ± 0%  -94.86%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                                                                         14.6µs ± 0%     0.7µs ± 1%  -95.49%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/without_sharding-10                                                                                      14.6µs ± 1%     0.7µs ± 1%  -95.54%  (p=0.008 n=5+5)

name                                                                                                                                old alloc/op   new alloc/op   delta
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_default_options/1000000of1000000-10                                       1.95GB ± 0%    1.95GB ± 0%     ~     (p=0.548 n=5+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_default_options/10000000of10000000-10                                     179MB ± 0%     179MB ± 0%     ~     (p=0.690 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_default_options/10000000of10000000-10                                     710kB ± 0%     710kB ± 0%     ~     (p=0.516 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10                    854kB ± 1%     844kB ± 1%   -1.15%  (p=0.016 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10                     733kB ± 0%     722kB ± 0%   -1.52%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_and_index_cache_(1K_per_batch)/1000000of1000000-10        949MB ± 0%     815MB ± 0%  -14.09%  (p=0.029 n=4+4)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10    96.3MB ± 0%    82.1MB ± 0%  -14.79%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                                                                        3.37MB ± 0%    2.57MB ± 0%  -23.88%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(1K_per_batch)/1000000of1000000-10                       1.53GB ± 0%    1.16GB ± 0%  -24.49%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10                     152MB ± 0%     115MB ± 0%  -24.71%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10                                                       3.45MB ± 0%    2.59MB ± 0%  -24.93%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                                                                           374kB ± 0%     273kB ± 0%  -26.97%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                                                                             373kB ± 0%     272kB ± 0%  -27.01%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10                    147MB ± 0%     106MB ± 0%  -27.80%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(10K_per_batch)/1000000of1000000-10                      1.46GB ± 0%    1.05GB ± 0%  -27.82%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10                                                    325kB ± 0%     225kB ± 0%  -30.94%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10                                                       325kB ± 0%     225kB ± 0%  -30.94%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10                                                          325kB ± 0%     225kB ± 0%  -30.94%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                                                                           270kB ± 0%     132kB ± 0%  -51.21%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                                                                             269kB ± 0%     131kB ± 0%  -51.30%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10                                                       3.75MB ± 0%    1.31MB ± 0%  -65.05%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10    71.3kB ± 1%    24.8kB ± 2%  -65.24%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                                                                        3.72MB ± 0%    1.27MB ± 0%  -65.85%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                                                                         2.68kB ± 0%    0.20kB ± 0%  -92.65%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                                                                      2.68kB ± 0%    0.18kB ± 0%  -93.25%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                                                                         9.80kB ± 0%    0.26kB ± 0%  -97.39%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/without_sharding-10                                                                                      9.78kB ± 0%    0.24kB ± 0%  -97.55%  (p=0.008 n=5+5)

name                                                                                                                                old allocs/op  new allocs/op  delta
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_default_options/1000000of1000000-10                                        11.0M ± 0%     11.0M ± 0%     ~     (p=0.548 n=5+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10     1.50M ± 0%     0.90M ± 0%     ~     (p=0.079 n=4+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_default_options/10000000of10000000-10                                     1.10M ± 0%     1.10M ± 0%     ~     (p=0.579 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_default_options/10000000of10000000-10                                       802 ± 0%       802 ± 0%     ~     (all equal)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(10K_per_batch)/1000000of1000000-10                       10.0M ± 0%     10.0M ± 0%   -0.06%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10                    1.00M ± 0%     1.00M ± 0%   -0.07%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(1K_per_batch)/1000000of1000000-10                        10.1M ± 0%     10.1M ± 0%   -0.50%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10                     1.01M ± 0%     1.01M ± 0%   -0.50%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10                       816 ± 0%       680 ± 0%  -16.67%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10                      814 ± 0%       678 ± 0%  -16.71%  (p=0.008 n=5+5)
Bucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_and_index_cache_(1K_per_batch)/1000000of1000000-10        15.0M ± 0%      9.0M ± 0%  -40.08%  (p=0.029 n=4+4)
Bucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10     1.39k ± 0%     0.37k ± 0%  -73.49%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10                                                          6.28k ± 0%     1.04k ± 0%  -83.48%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10                                                       6.28k ± 0%     1.04k ± 0%  -83.48%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10                                                    6.28k ± 0%     1.04k ± 0%  -83.48%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                                                                             6.26k ± 0%     1.02k ± 0%  -83.71%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                                                                           6.26k ± 0%     1.02k ± 0%  -83.71%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                                                                           39.0 ± 0%       6.0 ± 0%  -84.62%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                                                                        39.0 ± 0%       6.0 ± 0%  -84.62%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                                                                              54.0 ± 0%       6.0 ± 0%  -88.89%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                                                                            54.0 ± 0%       6.0 ± 0%  -88.89%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                                                                         54.3k ± 0%      6.0k ± 0%  -88.90%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10                                                        60.2k ± 0%      6.0k ± 0%  -90.00%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                                                                          64.0 ± 0%       6.0 ± 0%  -90.62%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10                                                         64.0 ± 0%       6.0 ± 0%  -90.62%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/without_sharding-10                                                                                         259 ± 0%         9 ± 0%  -96.53%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                                                                            260 ± 0%         9 ± 0%  -96.54%  (p=0.008 n=5+5)

Notes to reviewers

Protobuf

Much of the gains were due to more efficient protobuf deserialization. For slices (slices of labels in our case) the default protobuf implementation starts with a nil slice and appends to it. This makes a lot of extra allocations and leaves unused memory.

In order to achieve this I had to copy some of the generated protobuf code for mimirpb.Metric in order to optimize the preallocation of labels slices. I copied the code into storepb.PreallocatingSliceMetric.

The way I chose to do this is to first iterate the bytes buffer and count the number of elements (labels). Then allocate a slice with that capacity and offload to the generated protobuf implementation for actually decoding into the structs.

A longer-term option is to write a protogen plugin that will generate the code.

Writing a solution with generics was non-trivial because the struct that we marshal and send mimirpb.LabelAdapter is not what implements proto.Marshler and proto.Unmarshler *mimirpb.LabelAdapter.

pkg/storegateway/storepb/cache.proto Outdated Show resolved Hide resolved
pkg/storegateway/storepb/cache.proto Outdated Show resolved Hide resolved
res.series = append(res.series, seriesChunkRefs{
lset: lset,
lset: mimirpb.FromLabelAdaptersToLabels(lset.Labels),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've some concerns about this. Once #3555 will be in, mimirpb.FromLabelAdaptersToLabels will make a copy, probably vanishing the benefits of this PR. I suggest to talk to @bboreham on how to make this PR #3555 friendly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is non blocking, since #3555 is not merged yet, but it's something I would like to think about.

pkg/storegateway/storepb/custom.go Outdated Show resolved Hide resolved
pkg/storegateway/storepb/custom.go Outdated Show resolved Hide resolved
pkg/storegateway/storepb/custom.go Outdated Show resolved Hide resolved
@dimitarvdimitrov
Copy link
Contributor Author

@pracucci raised a point about the size of the cache entries with gob vs with protobuf

I used the existing benchmarks to also print the size of the items in the cache. Since we snappy encode both the length is pretty much the same. There are some small benefits with protobuf.

I also ran without snappy, but the gains in replica memory will be dominated by the increased memory for memcached.

cache entry size - gob vs protobuf
1000 series with 1 matcher StoreSeriesForPostings len
gob: 6328     protobuf:  6108     without snappy: 44006
1000 series with 1 matcher, mismatching matchers StoreSeriesForPostings len
gob: 6328     protobuf:  6108     without snappy: 44006
1000 series with 1 matcher, mismatching postingsKey StoreSeriesForPostings len
gob: 6328     protobuf:  6108     without snappy: 44006
1000 series with 1 matcher, mismatching shard StoreSeriesForPostings len
gob: 6330     protobuf:  6112     without snappy: 44010
1000 series with 10 matchers StoreSeriesForPostings len
gob: 6362     protobuf:  6143     without snappy: 44042
6000 series with 6 labels each StoreSeriesForPostings len
gob: 47848    protobuf: 47686     without snappy: 516006
6000 series with 6 labels with more repetitions StoreSeriesForPostings len
gob: 48960    protobuf: 48850     without snappy: 532896
with sharding StoreSeriesForPostings len
gob: 248      protobuf: 28        without snappy: 26
without sharding StoreSeriesForPostings len
gob: 237      protobuf: 18        without snappy: 16
gob with snappy vs protobuf without snappy
name                                                                                 old time/op    new time/op    delta
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10        2.08ms ± 0%    0.87ms ± 1%  -58.06%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                         2.10ms ± 0%    0.86ms ± 1%  -58.80%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                         2.70ms ± 0%    1.10ms ± 1%  -59.15%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10        2.77ms ± 0%    1.10ms ± 1%  -60.43%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                              200µs ± 0%      79µs ± 5%  -60.59%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                            202µs ± 1%      78µs ± 0%  -61.62%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                            322µs ± 1%     114µs ± 2%  -64.67%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10           298µs ± 0%     104µs ± 1%  -65.10%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                              325µs ± 1%     113µs ± 1%  -65.41%  (p=0.016 n=4+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10        310µs ± 0%     104µs ± 1%  -66.46%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10     310µs ± 1%     104µs ± 1%  -66.61%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                          14.6µs ± 0%     0.6µs ± 0%  -95.57%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/without_sharding-10                                       14.6µs ± 1%     0.6µs ± 0%  -95.61%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                          3.87µs ± 0%    0.17µs ± 0%  -95.64%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                       3.87µs ± 1%    0.16µs ± 0%  -95.81%  (p=0.008 n=5+5)

name                                                                                 old alloc/op   new alloc/op   delta
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                         3.37MB ± 0%    2.05MB ± 0%  -39.19%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                            374kB ± 0%     224kB ± 0%  -40.13%  (p=0.000 n=5+4)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                              373kB ± 0%     223kB ± 0%  -40.19%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10        3.45MB ± 0%    2.05MB ± 0%  -40.60%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10        325kB ± 0%     176kB ± 0%  -46.05%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10     325kB ± 0%     176kB ± 0%  -46.05%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10           325kB ± 0%     176kB ± 0%  -46.05%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                            270kB ± 0%      74kB ± 0%  -72.46%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                              269kB ± 0%      74kB ± 0%  -72.60%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10        3.75MB ± 0%    0.69MB ± 0%  -81.65%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                         3.72MB ± 0%    0.66MB ± 0%  -82.15%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                          2.68kB ± 0%    0.13kB ± 0%  -95.04%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                       2.68kB ± 0%    0.12kB ± 0%  -95.63%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/without_sharding-10                                       9.78kB ± 0%    0.22kB ± 0%  -97.71%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                          9.80kB ± 0%    0.22kB ± 0%  -97.71%  (p=0.008 n=5+5)

name                                                                                 old allocs/op  new allocs/op  delta
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10           6.28k ± 0%     1.04k ± 0%  -83.50%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10        6.28k ± 0%     1.04k ± 0%  -83.50%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10     6.28k ± 0%     1.04k ± 0%  -83.50%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                              6.26k ± 0%     1.02k ± 0%  -83.73%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                            6.26k ± 0%     1.02k ± 0%  -83.73%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                            39.0 ± 0%       5.0 ± 0%  -87.18%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                         39.0 ± 0%       5.0 ± 0%  -87.18%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                          54.3k ± 0%      6.0k ± 0%  -88.90%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10         60.2k ± 0%      6.0k ± 0%  -90.00%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                               54.0 ± 0%       5.0 ± 0%  -90.74%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                             54.0 ± 0%       5.0 ± 0%  -90.74%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                           64.0 ± 0%       5.0 ± 0%  -92.19%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10          64.0 ± 0%       5.0 ± 0%  -92.19%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/without_sharding-10                                          259 ± 0%         8 ± 0%  -96.91%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                             260 ± 0%         8 ± 0%  -96.92%  (p=0.008 n=5+5)
protobuf with snappy vs protobuf without snappy
name                                                                                 old time/op    new time/op    delta
FetchCachedSeriesForPostings/without_sharding-10                                        652ns ± 1%     641ns ± 0%   -1.65%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                           658ns ± 1%     646ns ± 0%   -1.79%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                         1.15ms ± 1%    1.10ms ± 1%   -4.16%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10        1.16ms ± 1%    1.10ms ± 1%   -5.55%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                            122µs ± 0%     114µs ± 2%   -6.78%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                              121µs ± 0%     113µs ± 1%   -7.05%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10     113µs ± 0%     104µs ± 1%   -8.67%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10        114µs ± 2%     104µs ± 1%   -8.89%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10           115µs ± 4%     104µs ± 1%   -9.57%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10        1.02ms ± 1%    0.87ms ± 1%  -14.76%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                        199ns ± 0%     162ns ± 0%  -18.43%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                         1.07ms ± 0%    0.86ms ± 1%  -19.44%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                              106µs ± 1%      79µs ± 5%  -25.76%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                            106µs ± 1%      78µs ± 0%  -26.89%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                           239ns ± 1%     169ns ± 0%  -29.33%  (p=0.008 n=5+5)

name                                                                                 old alloc/op   new alloc/op   delta
FetchCachedSeriesForPostings/without_sharding-10                                         240B ± 0%      224B ± 0%   -6.67%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                            256B ± 0%      224B ± 0%  -12.50%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                            273kB ± 0%     224kB ± 0%  -18.02%  (p=0.000 n=5+4)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                              272kB ± 0%     223kB ± 0%  -18.06%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                         2.57MB ± 0%    2.05MB ± 0%  -20.12%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10        2.59MB ± 0%    2.05MB ± 0%  -20.88%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10        225kB ± 0%     176kB ± 0%  -21.88%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10           225kB ± 0%     176kB ± 0%  -21.88%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10     225kB ± 0%     176kB ± 0%  -21.88%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                            197B ± 0%      133B ± 0%  -32.49%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                         181B ± 0%      117B ± 0%  -35.36%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                            132kB ± 0%      74kB ± 0%  -43.56%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                              131kB ± 0%      74kB ± 0%  -43.72%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10        1.31MB ± 0%    0.69MB ± 0%  -47.50%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                         1.27MB ± 0%    0.66MB ± 0%  -47.74%  (p=0.008 n=5+5)

name                                                                                 old allocs/op  new allocs/op  delta
FetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10         6.02k ± 0%     6.02k ± 0%   -0.02%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                          6.02k ± 0%     6.02k ± 0%   -0.02%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10        1.04k ± 0%     1.04k ± 0%   -0.10%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10           1.04k ± 0%     1.04k ± 0%   -0.10%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10     1.04k ± 0%     1.04k ± 0%   -0.10%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_1_matcher-10                              1.02k ± 0%     1.02k ± 0%   -0.10%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/1000_series_with_10_matchers-10                            1.02k ± 0%     1.02k ± 0%   -0.10%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/with_sharding-10                                            9.00 ± 0%      8.00 ± 0%  -11.11%  (p=0.008 n=5+5)
FetchCachedSeriesForPostings/without_sharding-10                                         9.00 ± 0%      8.00 ± 0%  -11.11%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_each-10                           6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10          6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_1_matcher-10                               6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/1000_series_with_10_matchers-10                             6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/with_sharding-10                                            6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.008 n=5+5)
StoreCachedSeriesForPostings/without_sharding-10                                         6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.008 n=5+5)

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I left few minor comments, but can't see anything wrong. Like we did for #3739, I would love to see a quick comparison load testing the store-gateway with the scenario "many large requests but without chunks".

goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/storegateway
BenchmarkBucket_Series_WithSkipChunks
BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples
    bucket_test.go:2382: Creating 250000 1-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1000000SeriesWith1Samples2097057031/001/0
    bucket_test.go:2382: Creating 250000 1-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1000000SeriesWith1Samples2097057031/001/1
    bucket_test.go:2382: Creating 250000 1-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1000000SeriesWith1Samples2097057031/001/2
    bucket_test.go:2382: Creating 250000 1-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1000000SeriesWith1Samples2097057031/001/3
BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_default_options
BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_default_options/1000000of1000000
BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_default_options/1000000of1000000-10         	       2	 723917000 ns/op
BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(1K_per_batch)
BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(1K_per_batch)/1000000of1000000
BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(1K_per_batch)/1000000of1000000-10         	       2	 893729125 ns/op
BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(10K_per_batch)
BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(10K_per_batch)/1000000of1000000
BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(10K_per_batch)/1000000of1000000-10        	       2	 838661438 ns/op
BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_and_index_cache_(1K_per_batch)
BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_and_index_cache_(1K_per_batch)/1000000of1000000
BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_and_index_cache_(1K_per_batch)/1000000of1000000-10         	       1	1976057791 ns/op
BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples
    bucket_test.go:2382: Creating 25000 25-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks100000SeriesWith100Samples2208983086/001/0
    bucket_test.go:2382: Creating 25000 25-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks100000SeriesWith100Samples2208983086/001/1
    bucket_test.go:2382: Creating 25000 25-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks100000SeriesWith100Samples2208983086/001/2
    bucket_test.go:2382: Creating 25000 25-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks100000SeriesWith100Samples2208983086/001/3
BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_default_options
BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_default_options/10000000of10000000
BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_default_options/10000000of10000000-10                                      	      15	  71419614 ns/op
BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)
BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/10000000of10000000
BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10                      	      12	  95010087 ns/op
BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)
BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/10000000of10000000
BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10                     	      12	  96728188 ns/op
BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_and_index_cache_(1K_per_batch)
BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000
BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10      	      16	  64947492 ns/op
BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples
    bucket_test.go:2382: Creating 1 2500000-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1SeriesWith10000000Samples337119323/001/0
    bucket_test.go:2382: Creating 1 2500000-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1SeriesWith10000000Samples337119323/001/1
    bucket_test.go:2382: Creating 1 2500000-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1SeriesWith10000000Samples337119323/001/2
    bucket_test.go:2382: Creating 1 2500000-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1SeriesWith10000000Samples337119323/001/3
BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(10K_per_batch)
BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(10K_per_batch)/10000000of10000000
BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10                     	    3210	    362288 ns/op
BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_and_index_cache_(1K_per_batch)
BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000
BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10      	   10000	    100400 ns/op
BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_default_options
BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_default_options/10000000of10000000
BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_default_options/10000000of10000000-10                                      	    4920	    246681 ns/op
BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(1K_per_batch)
BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(1K_per_batch)/10000000of10000000
BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10                      	    3175	    374901 ns/op
PASS

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/storegateway
BenchmarkFetchCachedSeriesForPostings
BenchmarkFetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions
BenchmarkFetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10         	     423	   2777498 ns/op	 3450443 B/op	   60247 allocs/op
BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher
BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher-10                              	    3871	    307280 ns/op	  373021 B/op	    6257 allocs/op
BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers
BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10        	    3886	    300735 ns/op	  325350 B/op	    6278 allocs/op
BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey
BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10     	    3943	    303476 ns/op	  325352 B/op	    6278 allocs/op
BenchmarkFetchCachedSeriesForPostings/with_sharding
BenchmarkFetchCachedSeriesForPostings/with_sharding-10                                           	   79370	     14776 ns/op	    9799 B/op	     260 allocs/op
BenchmarkFetchCachedSeriesForPostings/without_sharding
BenchmarkFetchCachedSeriesForPostings/without_sharding-10                                        	   80967	     14740 ns/op	    9783 B/op	     259 allocs/op
BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard
BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10           	    3955	    299970 ns/op	  325348 B/op	    6278 allocs/op
BenchmarkFetchCachedSeriesForPostings/6000_series_with_6_labels_each
BenchmarkFetchCachedSeriesForPostings/6000_series_with_6_labels_each-10                          	     435	   2709429 ns/op	 3370466 B/op	   54257 allocs/op
BenchmarkFetchCachedSeriesForPostings/1000_series_with_10_matchers
BenchmarkFetchCachedSeriesForPostings/1000_series_with_10_matchers-10                            	    3841	    308489 ns/op	  373550 B/op	    6257 allocs/op
PASS

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/storegateway
BenchmarkStoreCachedSeriesForPostings
BenchmarkStoreCachedSeriesForPostings/with_sharding
BenchmarkStoreCachedSeriesForPostings/with_sharding-10         	  300554	      3755 ns/op	    2680 B/op	      39 allocs/op
BenchmarkStoreCachedSeriesForPostings/without_sharding
BenchmarkStoreCachedSeriesForPostings/without_sharding-10      	  318122	      3722 ns/op	    2680 B/op	      39 allocs/op
BenchmarkStoreCachedSeriesForPostings/6000_series_with_6_labels_each
BenchmarkStoreCachedSeriesForPostings/6000_series_with_6_labels_each-10         	     583	   2040430 ns/op	 3718159 B/op	      64 allocs/op
BenchmarkStoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions
BenchmarkStoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10         	     577	   2021253 ns/op	 3750927 B/op	      64 allocs/op
BenchmarkStoreCachedSeriesForPostings/1000_series_with_1_matcher
BenchmarkStoreCachedSeriesForPostings/1000_series_with_1_matcher-10                              	    5826	    196531 ns/op	  269312 B/op	      54 allocs/op
BenchmarkStoreCachedSeriesForPostings/1000_series_with_10_matchers
BenchmarkStoreCachedSeriesForPostings/1000_series_with_10_matchers-10                            	    5984	    196326 ns/op	  269801 B/op	      54 allocs/op
PASS

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov
Copy link
Contributor Author

I ran this PR in a dev cluster where requests were served almost exclusively from the series cache. In the graphs below zone-b is running with this PR, while zone-a is running on the commit titled "Add benchmark for storeCachedSeriesForPostings" to ensure as much common between the two as possible. Zone-c is running the default implementation.

  • there is reduction in CPU usage roughly by 25%
  • there's a 50% reduction in allocated bytes/sec (and in garbage collections/sec)
  • working set, heap in-use, and RSS memory are pretty much the same for zone-a and zone-b
  • average latency can be argued to be lower, but it's marginally and may not be significant because this is a R/W deployed cluster - so there is some noise from other components sharing a Pod

Increased RSS and working set are likely an artefact of the mmap-less store-gateway work (both zone-a and zone-b have it compared to zone-c)

Screenshot 2022-12-19 at 11 54 35

Screenshot 2022-12-19 at 11 55 27


I will now address the remaining comments on this PR

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Comment on lines 872 to +881
func storeCachedSeriesForPostings(ctx context.Context, indexCache indexcache.IndexCache, userID string, blockID ulid.ULID, matchers []*labels.Matcher, shard *sharding.ShardSelector, postingsKey indexcache.PostingsKey, set seriesChunkRefsSet, logger log.Logger) {
entry := seriesForPostingsCacheEntry{
LabelSets: make([]labels.Labels, set.len()),
MatchersKey: indexcache.CanonicalLabelMatchersKey(matchers),
Shard: maybeNilShard(shard),
nonNilShard := maybeNilShard(shard)
matchersKey := indexcache.CanonicalLabelMatchersKey(matchers)
data, err := encodeCachedSeriesForPostings(set, matchersKey, nonNilShard)
if err != nil {
logSeriesForPostingsCacheEvent(ctx, logger, userID, blockID, matchers, shard, postingsKey, "msg", "can't encode series for caching", "err", err)
return
}
indexCache.StoreSeriesForPostings(ctx, userID, blockID, matchersKey, shard, postingsKey, data)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly related to this PR, but I guess this method was created in the streaming store-gateway implementation PRs?

Why does it need all three matchersKey, shard and postingsKey? The first two reference the same data set as the third one, aren't they?

Copy link
Contributor

@colega colega Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to move this comment to the original PR: #3687 Nevermind, let's keep the conversation here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is outside of scope of this PR, but it's in scope for what i was going to do next.

Your comment made me realize that caching matchers may even introduce false-negative cache misses. You can reword a matcher to select the same series.

{a="1"}
and
{a!="2", a!="3"}

may still select the same postings, but we'll cache them separately.

I think the case is different for the shard because we include it in the cache key verbatim, not hash it - that reduces the collisions to only sets that have the same shard. If we also hash the shard key, then cache keys for all shards can collide.

What I'm unsure about is the strength of the hash if we don't hash the matchers - if we hash a matcher and a set of postings, is it the same strength as hashing a set of postings? Given that the matchers + shard have a 1:1 relationship with a set of postings, my answer would be no.

We'll be hashing a smaller thing - only set of postings, so the set of inputs that map onto 52 bits of hash will be smaller.. should give each input value a slightly smaller chance of collisions. So removing the matchers should even make the hash stronger?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method name says "StoreSeriesForPostings" so you should just store series for postingsKey. You don't care about matchers, shards or whatever else led you to look up those postings. Since the cache key will be the a hash of the postings, you should store the postings list itself in the cached value too, to verify that you brought the expected cached item when retrieving.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should store the postings list itself in the cached value too, to verify that you brought the expected cached item when retrieving

in this case we'll store postings twice - once for expanded postings and once for the series. I wanted to avoid that. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an alternative is to store the number of postings

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see your concern now. However, when encoded with delta encoding, most of the postings are less than couple of bytes, so storing them alongside the series shouldn't be a concern (because series are potentially hundreds of bytes).

(OTOH, number of postings is just another hashing function, one of the bad ones 😄 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know why we need the shard - the postings we have are not necessarily postings that belong only to this shard - see this comment

// Calculate the cache key before we filter out anything from the postings,
// so that the key doesn't depend on the series hash cache or any other filtering we do on the postings list.

this means that the postings for different shards are actually the same. But then the series that we cache are filtered by the shard. So we definitely need the shard in the cache key.

But I agree we can remove the matchers.


on verifying the postings for collisions.. Are you aware that we currently store two checksums of the postings - blake32 and sha1?

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov dimitarvdimitrov force-pushed the dimitar/streaming-series-caching-more-efficient-protobuf branch from f6e9ebf to 161dcc0 Compare December 19, 2022 13:15
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Comment on lines +196 to +197
assert.Equal(t, metric, preallocMetric.Metric)
assert.Equal(t, len(metric.Labels), cap(preallocMetric.Labels))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you can use testing.AllocsPerRun to verify the number of allocations (which should be 10 + 1 I guess)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried that and it didn't work, that's why i decided to not include the test.

the problem was that Metric.Unmarshal was doing the same number of allocations - 1. So it seemed like `PrealloctingMetric had no effect in terms of allocations. I didn't look into why further


var entry seriesForPostingsCacheEntry
if err := decodeSnappyGob(data, &entry); err != nil {
data, err := snappy.Decode(nil, data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth pooling the buffer provided to snappy.Decode here, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i can agree. It's accounting for 10% of allocations of the store-gateway.

I'm reluctant to do it since I'm not sure about the effectiveness of those and whether we won't end up with pooled slices much larger than what we actually need - I assume some things will be mere hundreds of bytes and other will be megabytes. Maybe we can use the pool.BucketBytes. Do you have suggestions for the sizes of buckets?

Copy link
Contributor

@colega colega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left some nitpicks you might want to consider.

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, LGTM!

@dimitarvdimitrov dimitarvdimitrov merged commit 73e834e into main Dec 19, 2022
@dimitarvdimitrov dimitarvdimitrov deleted the dimitar/streaming-series-caching-more-efficient-protobuf branch December 19, 2022 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants