New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(metadata): introduce a separate split interval for recent query window #11897
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good 👍 nice job @ashwanthgoli
Left few minor comments. Two other things
- I see you have some tests to lock the behaviour of returning "correct" split either inside or outside recent metadata query window. But not enough tests to check the results of the split I think?. Asking because I'm curious say for smaller recent window (say 30m), if a query comes asking for
5m
of labels, how big returned data time range would be1h
,1w
? - Like to see what others think of having "second" window to split metadata queries before merging it.
# `split_recent_metadata_queries_by_interval`. The value 0 disables using a | ||
# different split interval for recent metadata queries. | ||
# CLI flag: -experimental.querier.recent-metadata-query-window | ||
[recent_metadata_query_window: <duration> | default = 0s] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment does pretty good job of explaining "what" it does. IMHO, we should also explains "why" you need this flag in the first place?
Say if you have metadata queries coming into the system falls into some sort of more than 1 bucket. Say 1. >1h 2. <30m, Then it's preferable to have two ways of splitting the metadata queries. Normal splitting with 1h
for all queries > 1hr
and split by say 1m
for all queries <=30m
. Something similar.
} | ||
|
||
// if the query start is not before window start, it would be split using recentMetadataQuerySplitInterval | ||
if windowStart := ref.Add(-recentMetadataQueryWindow); !start.Before(windowStart) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: !start.Before(windowStart)
-> windowStart.After(start)
. Seems bit more intuitive IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had to use !before
because we also have to consider the case where start
aligns with the recentMetadataQueryWindow
start.
pkg/querier/queryrange/splitters.go
Outdated
|
||
// move the ingester splits to the end to maintain correct order | ||
reqs = append(reqs, ingesterSplits...) | ||
// move the ingester/"recent metadata" splits to the end to maintain correct order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
pkg/querier/queryrange/splitters.go
Outdated
if endTimeInclusive { | ||
end = end.Add(-util.SplitGap) | ||
} | ||
|
||
// query only overlaps ingester query window, nothing more to do | ||
// query only overlaps ingester/"recent metadata" query window, nothing more to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: /
always confuses me if it's "or" or "and". Can we be more specific?
) | ||
|
||
start, end, needsIngesterSplits := ingesterQueryBounds(execTime, s.iqo, req) | ||
switch req.(type) { | ||
case *LokiSeriesRequest, *LabelRequest: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why aren't we considering split_ingester_queries_by_interval
for series and labels queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i feel these features have orthogonal goals.
split_ingester_queries_by_interval
is introduced to reduce the number of sub-queries we send to ingesters. currently metadata queries send atmost 1 subquery to ingesters with the default split interval.
while split_recent_metadata_queries_by_interval
would increase the number of sub-queries, they should ideally deamplify over time as the results get cached.
I am not sure if we'd use these two together for metadata queries and what the config would look like if we do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. can you record this somewhere as comment for the future reference?
Haven't looked to closely at the implementation, but the idea makes good sense to me. First thought, does this need to be configurable? What circumstances would we ever change these configs? |
mostly
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Just one minor nit.
// metadataSplitIntervalForTimeRange returns split interval for series and label requests. | ||
// If `recent_metadata_query_window` is configured and the query start interval is within this window, | ||
// it returns `split_recent_metadata_queries_by_interval`. | ||
// For other cases, the default split interval of `split_metadata_queries_by_interval` will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think your code is pretty simple and self-documenting; all this comment can do is go stale.
@@ -334,6 +336,25 @@ func (s ResultsCache) partition(req Request, extents []Extent) ([]Request, []Res | |||
continue | |||
} | |||
|
|||
if s.onlyUseEntireExtent && (start > extent.GetStart() || end < extent.GetEnd()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome 👏
What this PR does / why we need it:
Metadata queries are split using a 24h interval by default. The same interval gets used in the cache key.
But it is not possible to extract a subset of labels/series from a cached extent, unlike samples they are not associated with a timestamp. To prevent short queries from returning the results of an entire
24h
extent, caching is disabled for the last 24h by default usingmax_metadata_cache_freshness
.But we have noticed in our cloud environments that most queries fall in the last 24h interval. This pr introduces the following changes to help cache the recent metadata query results:
recent_metadata_query_window
is split using a different interval ofsplit_recent_metadata_queries_by_interval
.To use a shorter split interval for recent metadata queries, the following needs to be configured:
recent_metadata_query_window
to configure the window inside which the shorter split interval gets applied. Disabled by default.split_recent_metadata_queries_by_interval
to configure the split interval for the portion of the query within the recent metadata query window. Defaults to1h
. Recommended to be configured to a value smaller thansplit_metadata_queries_by_interval
.max_metadata_cache_freshness
to control the cache freshnessWhich issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)CHANGELOG.md
updatedadd-to-release-notes
labeldocs/sources/setup/upgrade/_index.md
production/helm/loki/Chart.yaml
and updateproduction/helm/loki/CHANGELOG.md
andproduction/helm/loki/README.md
. Example PRdeprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR