-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make query-ingesters-within
per-tenant
#4287
Conversation
query-ingesters-within
per-tenant
757bbda
to
d287ba7
Compare
I haven't finished writing additional tests yet, but it's ready for a quick review to check I'm on the right track. I'd recommend going through each commit separately because there's quite a few places where query-ingesters-within is used and had to be updated. |
d287ba7
to
ad92c17
Compare
pkg/querier/querier.go
Outdated
func (cfg *Config) Validate(limits validation.Limits) error { | ||
// Ensure the config wont create a situation where no queriers are returned. | ||
if cfg.QueryIngestersWithin != 0 && cfg.QueryStoreAfter != 0 { | ||
if cfg.QueryStoreAfter >= cfg.QueryIngestersWithin { | ||
if limits.QueryIngestersWithin != 0 && cfg.QueryStoreAfter != 0 { | ||
if cfg.QueryStoreAfter >= time.Duration(limits.QueryIngestersWithin) { | ||
return errBadLookbackConfigs | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure that this is how the QueryIngestersWithin
limit should be validated. This will only validate the default limit once at startup. If there's a tenant override, it won't be validated and we might have a gap where we don't query from anywhere.
I did it this way as it's consistent with how limits are validated for other components e.g. the ruler
Lines 120 to 129 in 5ac2d26
func (cfg *Config) Validate(limits validation.Limits, log log.Logger) error { | |
if limits.RulerTenantShardSize < 0 { | |
return errInvalidTenantShardSize | |
} | |
if err := cfg.ClientTLSConfig.Validate(log); err != nil { | |
return errors.Wrap(err, "invalid ruler gRPC client config") | |
} | |
return nil | |
} |
I could also add validation to the place where we read updated tenant overrides. This would require passing the global query-store-after
config to the read tenant overrides method. This is doable but would require some refactoring to do this cleanly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe doing the actual validation and erroring when reading the override as you say would be best, but here are some ideas/alternatives if that gets tricky:
- Enforce and validate that
limits.QueryIngestersWithin
<=limits[user].QueryIngestersWithin
(supposing that's easier to validate) - If it does happen that such gap exists and a query that falls there arrives, log an error and query all (or none) sources (don't really know the impact that querying all sources can have in performance)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pstibrany do you have any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem with checking limits[user].QueryIngestersWithin
against cfg.QueryStoreAfter
during query time is that user can't really do much about it... we need to let operators know.
Perhaps the best way to achieve that is to return 500, so that it counts towards error budget SLO and user doesn't accidentally get wrong results if QueryStoreAfter >= QueryIngestersWithin
(because it means there is a gap that we don't query).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative may be moving this validation to the Limits and then not updating the limits if this validation fail. We have an alert firing if the runtime config is invalid (you may push an invalid YAML for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone for Marco's suggestion, adding a validation check to run every time runtime config is loaded: 9121057.
The load function will now call a ValidateLimits()
function for the mimir Config
struct, so each tenant's limits are validated. The approach I went for is a bit more generic than necessary for just QueryIngestersWithin
. That's because there are other limits (e.g. limits.RulerTenantShardSize
) that are currently only validated on startup but actually should be validated every time configs are loaded. I think we can update those in a future PR.
@@ -38,9 +38,8 @@ import ( | |||
|
|||
// Config contains the configuration require to create a querier | |||
type Config struct { | |||
Iterators bool `yaml:"iterators" category:"advanced"` | |||
BatchIterators bool `yaml:"batch_iterators" category:"advanced"` | |||
QueryIngestersWithin time.Duration `yaml:"query_ingesters_within" category:"advanced"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we need to preserve this YAML field for another two Mimir releases, before removing it.
We can hide it by using doc:"hidden"
tag. If we find that it has non-default value, we can use it to set default limits (in initRuntimeConfig
in modules.go
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 3edc844, is this what you're expecting? I had to do something a little ugly in RegisterFlags()
to make sure the default value was set before the yaml is parsed.
pkg/distributor/query.go
Outdated
// is non-zero since otherwise we can't determine if an ingester should be part | ||
// of a tenant's shuffle sharding subring (we compare its registration time with | ||
// the lookback period). | ||
if shardSize > 0 && d.cfg.ShuffleShardingIngestersEnabled && queryIngestersWithin > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need ShuffleShardingIngestersEnabled
flag -- we know that shuffle sharding is enabled if shardSize > 0
.
Previous use of QueryIngestersWithin
to set ShuffleShardingLookbackPeriod
was based on in-order samples. I think the situation is not so clear with OOO samples -- if OOO window is say 1 week, and we also set QueryIngestersWithin
to one week, we don't necessarily want to set ShuffleShardingLookbackPeriod
to the same time -- that would effectively disable shuffle sharding on read path in our setup (with weekly rollouts).
I don't have a good idea how to solve this. Perhaps @pracucci has?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, we should not use lookback period that's longer than block retention time on ingesters. So that can be our upper limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue wasn't introduced by this change right? It's always been an issue with increasing query-ingesters-within to support longer OOO windows.
Would OOO samples be queryable if shuffle sharding is enabled and the lookback period to the block retention time, since the block retention time is probably lower than the ooo time window?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the issue is not new to this PR, but situation was simpler when using single global value.
For example, let's say that shuffle shard is 3, but one of those three ingesters has joined recently (within lookback period). It means that before this ingester joined, there was another ingester that was part of the shuffle shard.
Lookback period is used to find such situation -- in this case, we would look for one more extra ingester to make it part of shuffle shard on query path.
Since ingesters can delete blocks after retention period [*] has blocks older than retention period, I don't think we need to use lookback longer than that.
[*] Deadline for block deletion is computed since block was shipped, or if shipping is disabled, since block was created. Block cannot be shipped before creation, and the time of creation is ~3h after blocks' MinTime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Peter that we need to address it. Let me wrap my mind around it and I will follow up with my feedback.
that would effectively disable shuffle sharding on read path in our setup (with weekly rollouts).
Ingesters don't leave the ring during rollouts, so "with weekly rollouts" doesn't apply, but your point is valid and true anyway, and we need to address it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Peter. My take:
- We still need
ShuffleShardingIngestersEnabled
because shuffle sharding can be selectively enabled on read and write path independently. The value ofShuffleShardingIngestersEnabled
here should be the one from the querier's config. - Using the ingesters TSDB retention instead of "query ingesters within" is a good strategy as "lookback period" is a good strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the shuffle sharding lookback logic and documentation in https://github.com/grafana/mimir/pull/4287/files/25b204a8e7b27c2900b12fe59159b6bcf4e097fe..0690e96be924614d88cf70e8371cece9b05423fc
3656380
to
a63a36c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this is on the right track.
@pracucci, as a shuffle-sharding expert, would you please take a look at lookback usage in distributor/query.go and if my suggestion make sense?
pkg/mimir/modules.go
Outdated
if t.Cfg.Querier.QueryIngestersWithin != querier.DefaultQuerierCfgQueryIngestersWithin { | ||
t.Cfg.LimitsConfig.QueryIngestersWithin = model.Duration(t.Cfg.Querier.QueryIngestersWithin) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to do this before line 228, which actually sets default values for limits (used during YAML unmarshalling).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 8589ece
pkg/distributor/query.go
Outdated
// is non-zero since otherwise we can't determine if an ingester should be part | ||
// of a tenant's shuffle sharding subring (we compare its registration time with | ||
// the lookback period). | ||
if shardSize > 0 && d.cfg.ShuffleShardingIngestersEnabled && queryIngestersWithin > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the issue is not new to this PR, but situation was simpler when using single global value.
For example, let's say that shuffle shard is 3, but one of those three ingesters has joined recently (within lookback period). It means that before this ingester joined, there was another ingester that was part of the shuffle shard.
Lookback period is used to find such situation -- in this case, we would look for one more extra ingester to make it part of shuffle shard on query path.
Since ingesters can delete blocks after retention period [*] has blocks older than retention period, I don't think we need to use lookback longer than that.
[*] Deadline for block deletion is computed since block was shipped, or if shipping is disabled, since block was created. Block cannot be shipped before creation, and the time of creation is ~3h after blocks' MinTime.
The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! I haven't accurately looked at every change, and I've just focused on the shuffle sharding discussion (where I've commented). I left Peter doing the accurate review.
While reading the changes, I noticed few random things (it's a draft PR, so I assume you would have done these anyway, but I'm mentioning them to not forget):
- Need a CHANGELOG entry
- The
-querier.query-ingesters-within
flag is referenced multiple times in thedocs/
with regards to its impact on shuffle sharding. We should update the doc accordingly after this change. - All references to "remove in Mimir 2.9.0" should change to "remove in Mimir 2.10.0" because we haven't made this PR in time for 2.7, so the change will be announced in 2.8
pkg/distributor/query.go
Outdated
// is non-zero since otherwise we can't determine if an ingester should be part | ||
// of a tenant's shuffle sharding subring (we compare its registration time with | ||
// the lookback period). | ||
if shardSize > 0 && d.cfg.ShuffleShardingIngestersEnabled && queryIngestersWithin > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Peter. My take:
- We still need
ShuffleShardingIngestersEnabled
because shuffle sharding can be selectively enabled on read and write path independently. The value ofShuffleShardingIngestersEnabled
here should be the one from the querier's config. - Using the ingesters TSDB retention instead of "query ingesters within" is a good strategy as "lookback period" is a good strategy.
pkg/querier/querier.go
Outdated
func (cfg *Config) Validate(limits validation.Limits) error { | ||
// Ensure the config wont create a situation where no queriers are returned. | ||
if cfg.QueryIngestersWithin != 0 && cfg.QueryStoreAfter != 0 { | ||
if cfg.QueryStoreAfter >= cfg.QueryIngestersWithin { | ||
if limits.QueryIngestersWithin != 0 && cfg.QueryStoreAfter != 0 { | ||
if cfg.QueryStoreAfter >= time.Duration(limits.QueryIngestersWithin) { | ||
return errBadLookbackConfigs | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative may be moving this validation to the Limits and then not updating the limits if this validation fail. We have an alert firing if the runtime config is invalid (you may push an invalid YAML for example).
I've updated the comments to have placeholder |
8589ece
to
25b204a
Compare
pkg/querier/distributor_queryable.go
Outdated
// queryIngestersWithin might have changed since d.UseQueryable() was called, so we check it again when creating a querier | ||
if !d.useQueryable(now, mint, maxt, queryIngestersWithin) { | ||
return storage.NoopQuerier(), nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this check. There's no practical value doing it. We get a Queryable for every query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It solves possible race: queryIngestersWithin
may be different from when d.UseQueryable()
was called, and now querier is no longer needed. Rather than querying ingesters, we return empty querier.
We get a Queryable for every query.
There is only one long-lived queryable, which returns new Querier for each query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the storage.QueryableFunc()
which is called for each query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It solves possible race: queryIngestersWithin may be different from when d.UseQueryable() was called, and now querier is no longer needed. Rather than querying ingesters, we return empty querier.
How long is the time window of this race condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed on Slack with @pstibrany. Now I see the race condition. We discussed options to remove the double check and only have 1. He will follow up with a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed options to remove the double check and only have 1.
The idea is simple: we can make distributorQueryable.UseQueryable
to always return true
and only have a check in distributorQueryable.Querier
.
As discussed privately, we may want to simplify this code a bit as well... get rid of UseQueryable
and move the checks into Querier
methods, but that's a topic for another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 45f624c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up: removal of UseQueryable
and support for multiple store-queryables: #4478
I won't be available for the next two weeks, after that I'll keep working on the PR. The main tasks remaining are: making sure he shuffle sharding lookback and |
docs/sources/mimir/operators-guide/architecture/components/querier.md
Outdated
Show resolved
Hide resolved
- greater than the estimated minimum amount of time for the oldest samples stored in a block uploaded by ingester to be discovered and available for querying. | ||
When running Grafana Mimir with the default configuration, the estimated minimum amount of time for the oldest sample in an uploaded block to be available for querying is `3h`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this 3h come from?
Why do we talk about "oldest samples in the block"? All samples in the block become queryable at the same time, don't they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this over from another part of the documentation.
I'm not sure about the 3h value.
Samples are queryable at the same time, I think the point about the oldest sample in the block is you have to wait longer if the sample you've ingested is the oldest sample in the block i.e. it explains the maximum wait time for any sample ingested
pkg/distributor/distributor.go
Outdated
ShuffleShardingLookbackPeriod time.Duration `yaml:"-"` | ||
// The two fields below are dynamically injected from the querier and ingester configs | ||
ShuffleShardingIngestersEnabled bool `yaml:"-"` | ||
IngesterTSDBRetention time.Duration `yaml:"-"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest keeping the original ShuffleShardingLookbackPeriod
name, to express how the field is used, not where the value comes from.
pkg/distributor/distributor_test.go
Outdated
@@ -5255,6 +5256,8 @@ func TestSeriesAreShardedToCorrectIngesters(t *testing.T) { | |||
assert.Equal(t, series, totalMetadata) // each series has unique metric name, and each metric name gets metadata | |||
} | |||
|
|||
// TODO: test per-tenant query lookback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the plan for this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the TODO
since I wasn't changing anything in distributor.go
anymore.
pkg/mimir/modules.go
Outdated
// QueryIngestersWithin is moving from a global config that can in the querier yaml to a limit config | ||
// We need to preserve the option in the querier yaml for two releases | ||
// If the querier config is configured by the user, the default limit is overwritten | ||
// TODO: Remove in Mimir 2.XX.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is released in 2.8.0, we can remove it in 2.10.0.
pkg/distributor/distributor.go
Outdated
// This config is dynamically injected because it is defined in the querier config. | ||
ShuffleShardingLookbackPeriod time.Duration `yaml:"-"` | ||
// The two fields below are dynamically injected from the querier and ingester configs | ||
ShuffleShardingIngestersEnabled bool `yaml:"-"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is too generic and it may be confusing -- it looks like it may affect write path too. Let's give it more specific name. Eg. ShuffleShardingIngestersOnReadPathEnabled
. (Or something better)
Alternatively we can keep the original logic and simply set ShuffleShardingLookbackPeriod
to 0 if shuffle sharding of ingesters on read path is not enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated - thanks for the suggestion, much cleaner this way :)
@@ -233,6 +235,8 @@ func (l *Limits) RegisterFlags(f *flag.FlagSet) { | |||
f.IntVar(&l.QueryShardingTotalShards, "query-frontend.query-sharding-total-shards", 16, "The amount of shards to use when doing parallelisation via query sharding by tenant. 0 to disable query sharding for tenant. Query sharding implementation will adjust the number of query shards based on compactor shards. This allows querier to not search the blocks which cannot possibly have the series for given query shard.") | |||
f.IntVar(&l.QueryShardingMaxShardedQueries, "query-frontend.query-sharding-max-sharded-queries", 128, "The max number of sharded queries that can be run for a given received query. 0 to disable limit.") | |||
f.Var(&l.SplitInstantQueriesByInterval, "query-frontend.split-instant-queries-by-interval", "Split instant queries by an interval and execute in parallel. 0 to disable it.") | |||
_ = l.QueryIngestersWithin.Set("13h") | |||
f.Var(&l.QueryIngestersWithin, QueryIngestersWithinFlag, "Maximum lookback beyond which queries are not sent to ingester. 0 means all queries are sent to ingester.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"0 means all queries are sent to ingester." -- is this still true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We will still send all queries to ingesters. Changing the lookback period to use the tsdb retention (#4287 (comment)) just changes the subset of ingesters that are being queried.
07c8804
to
e85e34c
Compare
fab41e0
to
9121057
Compare
Apologies for the long delay, but this PR is ready to review again. I think I've covered all the comments and taken it out of draft mode. Since it has been a while, I have updated the issue description with some of the major discussions we've had as a recap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all your latest changes. PR looks good to me. Before merging, would you please rebase it on top of current main
, move changelog entries to unreleased / main
section, and update references to mentioning Mimir 2.8.0 as release introducing this feature to 2.9.0 (and similarly update 2.10.0 as release when we can remove some deprecated options to 2.11.0) please?
distributor.go is no longer modified by this PR so the TODO to add tests here is not needed anymore.
Previous field used was set by tsdbIngesterConfig() rather than in yaml/command line.
Previous approach ignored per-tenant limits and only validated against the default limits when the application was starting up. New approach validates query-ingesters-within every time a config is loaded at runtime.
01d680f
to
bcdc027
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, great work!
This reverts commit 5e36d6d.
The `-querier.query-ingesters-within` config has been moved from a global config to a per-tenant limit config. See #4287 for more details.
* Remove -querier.query-ingesters-within config The `-querier.query-ingesters-within` config has been moved from a global config to a per-tenant limit config. See #4287 for more details. * Remove -querier.iterators and -querier.batch-iterators The `-querier.iterators` and `-querier.batch-iterators` configuration parameters have been removed. See #5114 for more details. * Remove deprecated bucket store flags The following deprecated flags are removed: - `-blocks-storage.bucket-store.max-chunk-pool-bytes` - `-blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes` - `-blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes` See #4996 for more details. * Remove -blocks-storage.bucket-store.bucket-index.enabled config This configuration parameter has been removed. Mimir is running with bucket index enabled by default since 2.0 and it is now not possible to disable it. See #5051 for more details. * Update CHANGELOG.md
What this PR does
For OOO time windows longer than 13h,
query-ingesters-within
will need to be extended from its default value otherwise we wouldn't query older OOO samples until they're uploaded to storage. This is a global flag right now. This PR updatesquery-ingesters-within
so it can be configured to different values for each tenant, so if there are tenants with different OOO requirements, we can avoid querying ingesters unnecessarily for tenants with low OOO time windows.Additionally:
-blocks-storage.tsdb.retention-period
. The lookback period was previously set to-querier.query-ingesters-within
. With long OOO periods (weeks, months etc.),-querier.query-ingesters-within
need to be long enough that newly ingested OOO samples can be queried. However, distributors only contain samples that have been written in the last-blocks-storage.tsdb.retention-period
, so there's no point querying distributors for the entirety of the-querier.query-ingesters-within
period. See comment thread Makequery-ingesters-within
per-tenant #4287 (comment)query-ingesters-within
changes for a tenant but turns out to be invalid, we don't apply the new config. See comment Makequery-ingesters-within
per-tenant #4287 (comment)Which issue(s) this PR fixes or relates to
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
Remove in Mimir 2.XX.0
to the correct version (should be two after the version this PR is merged into)