-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store-gateway: more efficient series caching #3751
store-gateway: more efficient series caching #3751
Conversation
res.series = append(res.series, seriesChunkRefs{ | ||
lset: lset, | ||
lset: mimirpb.FromLabelAdaptersToLabels(lset.Labels), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is non blocking, since #3555 is not merged yet, but it's something I would like to think about.
@pracucci raised a point about the size of the cache entries with gob vs with protobuf I used the existing benchmarks to also print the size of the items in the cache. Since we snappy encode both the length is pretty much the same. There are some small benefits with protobuf. I also ran without snappy, but the gains in replica memory will be dominated by the increased memory for memcached. cache entry size - gob vs protobuf
gob with snappy vs protobuf without snappy
protobuf with snappy vs protobuf without snappy
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I left few minor comments, but can't see anything wrong. Like we did for #3739, I would love to see a quick comparison load testing the store-gateway with the scenario "many large requests but without chunks".
goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/storegateway BenchmarkBucket_Series_WithSkipChunks BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples bucket_test.go:2382: Creating 250000 1-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1000000SeriesWith1Samples2097057031/001/0 bucket_test.go:2382: Creating 250000 1-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1000000SeriesWith1Samples2097057031/001/1 bucket_test.go:2382: Creating 250000 1-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1000000SeriesWith1Samples2097057031/001/2 bucket_test.go:2382: Creating 250000 1-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1000000SeriesWith1Samples2097057031/001/3 BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_default_options BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_default_options/1000000of1000000 BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_default_options/1000000of1000000-10 2 723917000 ns/op BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(1K_per_batch) BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(1K_per_batch)/1000000of1000000 BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(1K_per_batch)/1000000of1000000-10 2 893729125 ns/op BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(10K_per_batch) BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(10K_per_batch)/1000000of1000000 BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_(10K_per_batch)/1000000of1000000-10 2 838661438 ns/op BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_and_index_cache_(1K_per_batch) BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_and_index_cache_(1K_per_batch)/1000000of1000000 BenchmarkBucket_Series_WithSkipChunks/1000000SeriesWith1Samples/with_series_streaming_and_index_cache_(1K_per_batch)/1000000of1000000-10 1 1976057791 ns/op BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples bucket_test.go:2382: Creating 25000 25-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks100000SeriesWith100Samples2208983086/001/0 bucket_test.go:2382: Creating 25000 25-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks100000SeriesWith100Samples2208983086/001/1 bucket_test.go:2382: Creating 25000 25-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks100000SeriesWith100Samples2208983086/001/2 bucket_test.go:2382: Creating 25000 25-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks100000SeriesWith100Samples2208983086/001/3 BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_default_options BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_default_options/10000000of10000000 BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_default_options/10000000of10000000-10 15 71419614 ns/op BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch) BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/10000000of10000000 BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10 12 95010087 ns/op BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch) BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/10000000of10000000 BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10 12 96728188 ns/op BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_and_index_cache_(1K_per_batch) BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000 BenchmarkBucket_Series_WithSkipChunks/100000SeriesWith100Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10 16 64947492 ns/op BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples bucket_test.go:2382: Creating 1 2500000-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1SeriesWith10000000Samples337119323/001/0 bucket_test.go:2382: Creating 1 2500000-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1SeriesWith10000000Samples337119323/001/1 bucket_test.go:2382: Creating 1 2500000-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1SeriesWith10000000Samples337119323/001/2 bucket_test.go:2382: Creating 1 2500000-sample series with 1ms interval in /var/folders/s2/gq3hbytx7szb_fmmfhnp4lrm0000gn/T/BenchmarkBucket_Series_WithSkipChunks1SeriesWith10000000Samples337119323/001/3 BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(10K_per_batch) BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(10K_per_batch)/10000000of10000000 BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(10K_per_batch)/10000000of10000000-10 3210 362288 ns/op BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_and_index_cache_(1K_per_batch) BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000 BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_and_index_cache_(1K_per_batch)/10000000of10000000-10 10000 100400 ns/op BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_default_options BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_default_options/10000000of10000000 BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_default_options/10000000of10000000-10 4920 246681 ns/op BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(1K_per_batch) BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(1K_per_batch)/10000000of10000000 BenchmarkBucket_Series_WithSkipChunks/1SeriesWith10000000Samples/with_series_streaming_(1K_per_batch)/10000000of10000000-10 3175 374901 ns/op PASS Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/storegateway BenchmarkFetchCachedSeriesForPostings BenchmarkFetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions BenchmarkFetchCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10 423 2777498 ns/op 3450443 B/op 60247 allocs/op BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher-10 3871 307280 ns/op 373021 B/op 6257 allocs/op BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_matchers-10 3886 300735 ns/op 325350 B/op 6278 allocs/op BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_postingsKey-10 3943 303476 ns/op 325352 B/op 6278 allocs/op BenchmarkFetchCachedSeriesForPostings/with_sharding BenchmarkFetchCachedSeriesForPostings/with_sharding-10 79370 14776 ns/op 9799 B/op 260 allocs/op BenchmarkFetchCachedSeriesForPostings/without_sharding BenchmarkFetchCachedSeriesForPostings/without_sharding-10 80967 14740 ns/op 9783 B/op 259 allocs/op BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard BenchmarkFetchCachedSeriesForPostings/1000_series_with_1_matcher,_mismatching_shard-10 3955 299970 ns/op 325348 B/op 6278 allocs/op BenchmarkFetchCachedSeriesForPostings/6000_series_with_6_labels_each BenchmarkFetchCachedSeriesForPostings/6000_series_with_6_labels_each-10 435 2709429 ns/op 3370466 B/op 54257 allocs/op BenchmarkFetchCachedSeriesForPostings/1000_series_with_10_matchers BenchmarkFetchCachedSeriesForPostings/1000_series_with_10_matchers-10 3841 308489 ns/op 373550 B/op 6257 allocs/op PASS Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/storegateway BenchmarkStoreCachedSeriesForPostings BenchmarkStoreCachedSeriesForPostings/with_sharding BenchmarkStoreCachedSeriesForPostings/with_sharding-10 300554 3755 ns/op 2680 B/op 39 allocs/op BenchmarkStoreCachedSeriesForPostings/without_sharding BenchmarkStoreCachedSeriesForPostings/without_sharding-10 318122 3722 ns/op 2680 B/op 39 allocs/op BenchmarkStoreCachedSeriesForPostings/6000_series_with_6_labels_each BenchmarkStoreCachedSeriesForPostings/6000_series_with_6_labels_each-10 583 2040430 ns/op 3718159 B/op 64 allocs/op BenchmarkStoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions BenchmarkStoreCachedSeriesForPostings/6000_series_with_6_labels_with_more_repetitions-10 577 2021253 ns/op 3750927 B/op 64 allocs/op BenchmarkStoreCachedSeriesForPostings/1000_series_with_1_matcher BenchmarkStoreCachedSeriesForPostings/1000_series_with_1_matcher-10 5826 196531 ns/op 269312 B/op 54 allocs/op BenchmarkStoreCachedSeriesForPostings/1000_series_with_10_matchers BenchmarkStoreCachedSeriesForPostings/1000_series_with_10_matchers-10 5984 196326 ns/op 269801 B/op 54 allocs/op PASS Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
I ran this PR in a dev cluster where requests were served almost exclusively from the series cache. In the graphs below zone-b is running with this PR, while zone-a is running on the commit titled "Add benchmark for storeCachedSeriesForPostings" to ensure as much common between the two as possible. Zone-c is running the default implementation.
Increased RSS and working set are likely an artefact of the mmap-less store-gateway work (both zone-a and zone-b have it compared to zone-c) I will now address the remaining comments on this PR |
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
func storeCachedSeriesForPostings(ctx context.Context, indexCache indexcache.IndexCache, userID string, blockID ulid.ULID, matchers []*labels.Matcher, shard *sharding.ShardSelector, postingsKey indexcache.PostingsKey, set seriesChunkRefsSet, logger log.Logger) { | ||
entry := seriesForPostingsCacheEntry{ | ||
LabelSets: make([]labels.Labels, set.len()), | ||
MatchersKey: indexcache.CanonicalLabelMatchersKey(matchers), | ||
Shard: maybeNilShard(shard), | ||
nonNilShard := maybeNilShard(shard) | ||
matchersKey := indexcache.CanonicalLabelMatchersKey(matchers) | ||
data, err := encodeCachedSeriesForPostings(set, matchersKey, nonNilShard) | ||
if err != nil { | ||
logSeriesForPostingsCacheEvent(ctx, logger, userID, blockID, matchers, shard, postingsKey, "msg", "can't encode series for caching", "err", err) | ||
return | ||
} | ||
indexCache.StoreSeriesForPostings(ctx, userID, blockID, matchersKey, shard, postingsKey, data) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly related to this PR, but I guess this method was created in the streaming store-gateway implementation PRs?
Why does it need all three matchersKey
, shard
and postingsKey
? The first two reference the same data set as the third one, aren't they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to move this comment to the original PR: #3687 Nevermind, let's keep the conversation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is outside of scope of this PR, but it's in scope for what i was going to do next.
Your comment made me realize that caching matchers may even introduce false-negative cache misses. You can reword a matcher to select the same series.
{a="1"}
and
{a!="2", a!="3"}
may still select the same postings, but we'll cache them separately.
I think the case is different for the shard because we include it in the cache key verbatim, not hash it - that reduces the collisions to only sets that have the same shard. If we also hash the shard key, then cache keys for all shards can collide.
What I'm unsure about is the strength of the hash if we don't hash the matchers - if we hash a matcher and a set of postings, is it the same strength as hashing a set of postings? Given that the matchers + shard have a 1:1 relationship with a set of postings, my answer would be no.
We'll be hashing a smaller thing - only set of postings, so the set of inputs that map onto 52 bits of hash will be smaller.. should give each input value a slightly smaller chance of collisions. So removing the matchers should even make the hash stronger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method name says "StoreSeriesForPostings" so you should just store series for postingsKey
. You don't care about matchers, shards or whatever else led you to look up those postings. Since the cache key will be the a hash of the postings, you should store the postings list itself in the cached value too, to verify that you brought the expected cached item when retrieving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should store the postings list itself in the cached value too, to verify that you brought the expected cached item when retrieving
in this case we'll store postings twice - once for expanded postings and once for the series. I wanted to avoid that. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an alternative is to store the number of postings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see your concern now. However, when encoded with delta encoding, most of the postings are less than couple of bytes, so storing them alongside the series shouldn't be a concern (because series are potentially hundreds of bytes).
(OTOH, number of postings is just another hashing function, one of the bad ones 😄 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know why we need the shard - the postings we have are not necessarily postings that belong only to this shard - see this comment
mimir/pkg/storegateway/series_refs.go
Lines 725 to 726 in 73e834e
// Calculate the cache key before we filter out anything from the postings, | |
// so that the key doesn't depend on the series hash cache or any other filtering we do on the postings list. |
this means that the postings for different shards are actually the same. But then the series that we cache are filtered by the shard. So we definitely need the shard in the cache key.
But I agree we can remove the matchers.
on verifying the postings for collisions.. Are you aware that we currently store two checksums of the postings - blake32 and sha1?
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
f6e9ebf
to
161dcc0
Compare
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
assert.Equal(t, metric, preallocMetric.Metric) | ||
assert.Equal(t, len(metric.Labels), cap(preallocMetric.Labels)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you can use testing.AllocsPerRun
to verify the number of allocations (which should be 10 + 1 I guess)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i tried that and it didn't work, that's why i decided to not include the test.
the problem was that Metric.Unmarshal
was doing the same number of allocations - 1. So it seemed like `PrealloctingMetric had no effect in terms of allocations. I didn't look into why further
|
||
var entry seriesForPostingsCacheEntry | ||
if err := decodeSnappyGob(data, &entry); err != nil { | ||
data, err := snappy.Decode(nil, data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth pooling the buffer provided to snappy.Decode
here, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i can agree. It's accounting for 10% of allocations of the store-gateway.
I'm reluctant to do it since I'm not sure about the effectiveness of those and whether we won't end up with pooled slices much larger than what we actually need - I assume some things will be mere hundreds of bytes and other will be megabytes. Maybe we can use the pool.BucketBytes. Do you have suggestions for the sizes of buckets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left some nitpicks you might want to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, LGTM!
This PR changes the serialization format of messages stores in the series cache from gob to protobuf. The changes are only used in the streaming implementation.
Benchmarks
In benchmarks that use the series cache extensively it shows
The benchmarks with 0% change are either not using the streaming implementation or are only caching 1 series.
expand
Notes to reviewers
Protobuf
Much of the gains were due to more efficient protobuf deserialization. For slices (slices of labels in our case) the default protobuf implementation starts with a
nil
slice and appends to it. This makes a lot of extra allocations and leaves unused memory.In order to achieve this I had to copy some of the generated protobuf code for
mimirpb.Metric
in order to optimize the preallocation of labels slices. I copied the code intostorepb.PreallocatingSliceMetric
.The way I chose to do this is to first iterate the bytes buffer and count the number of elements (labels). Then allocate a slice with that capacity and offload to the generated protobuf implementation for actually decoding into the structs.
A longer-term option is to write a protogen plugin that will generate the code.
Writing a solution with generics was non-trivial because the struct that we marshal and send
mimirpb.LabelAdapter
is not what implementsproto.Marshler
andproto.Unmarshler
*mimirpb.LabelAdapter
.