Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metadata): introduce a separate split interval for recent query window #11897

Merged
merged 9 commits into from Feb 14, 2024

Conversation

ashwanthgoli
Copy link
Contributor

@ashwanthgoli ashwanthgoli commented Feb 8, 2024

What this PR does / why we need it:

Metadata queries are split using a 24h interval by default. The same interval gets used in the cache key.
But it is not possible to extract a subset of labels/series from a cached extent, unlike samples they are not associated with a timestamp. To prevent short queries from returning the results of an entire 24h extent, caching is disabled for the last 24h by default using max_metadata_cache_freshness.

But we have noticed in our cloud environments that most queries fall in the last 24h interval. This pr introduces the following changes to help cache the recent metadata query results:

  • Use a smaller split interval for caching recent metadata queries. Portion of the query within recent_metadata_query_window is split using a different interval of split_recent_metadata_queries_by_interval.
  • Only use extents that are entirely within the requested metadata query range. This is done to avoid using extents that contain results from the outside of the requested range.

To use a shorter split interval for recent metadata queries, the following needs to be configured:

  • recent_metadata_query_window to configure the window inside which the shorter split interval gets applied. Disabled by default.
  • split_recent_metadata_queries_by_interval to configure the split interval for the portion of the query within the recent metadata query window. Defaults to 1h. Recommended to be configured to a value smaller than split_metadata_queries_by_interval.
  • reduce max_metadata_cache_freshness to control the cache freshness

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Feb 8, 2024
@ashwanthgoli ashwanthgoli changed the title metadata: introduce recent metadata split window metadata: Introduce a separate split interval for recent query window Feb 8, 2024
@ashwanthgoli ashwanthgoli changed the title metadata: Introduce a separate split interval for recent query window feat(metadata): introduce a separate split interval for recent query window Feb 8, 2024
Copy link
Collaborator

@kavirajk kavirajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good 👍 nice job @ashwanthgoli

Left few minor comments. Two other things

  1. I see you have some tests to lock the behaviour of returning "correct" split either inside or outside recent metadata query window. But not enough tests to check the results of the split I think?. Asking because I'm curious say for smaller recent window (say 30m), if a query comes asking for 5m of labels, how big returned data time range would be 1h, 1w?
  2. Like to see what others think of having "second" window to split metadata queries before merging it.

# `split_recent_metadata_queries_by_interval`. The value 0 disables using a
# different split interval for recent metadata queries.
# CLI flag: -experimental.querier.recent-metadata-query-window
[recent_metadata_query_window: <duration> | default = 0s]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment does pretty good job of explaining "what" it does. IMHO, we should also explains "why" you need this flag in the first place?

Say if you have metadata queries coming into the system falls into some sort of more than 1 bucket. Say 1. >1h 2. <30m, Then it's preferable to have two ways of splitting the metadata queries. Normal splitting with 1h for all queries > 1hr and split by say 1m for all queries <=30m. Something similar.

}

// if the query start is not before window start, it would be split using recentMetadataQuerySplitInterval
if windowStart := ref.Add(-recentMetadataQueryWindow); !start.Before(windowStart) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: !start.Before(windowStart) -> windowStart.After(start). Seems bit more intuitive IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to use !before because we also have to consider the case where start aligns with the recentMetadataQueryWindow start.


// move the ingester splits to the end to maintain correct order
reqs = append(reqs, ingesterSplits...)
// move the ingester/"recent metadata" splits to the end to maintain correct order
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

if endTimeInclusive {
end = end.Add(-util.SplitGap)
}

// query only overlaps ingester query window, nothing more to do
// query only overlaps ingester/"recent metadata" query window, nothing more to do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: / always confuses me if it's "or" or "and". Can we be more specific?

)

start, end, needsIngesterSplits := ingesterQueryBounds(execTime, s.iqo, req)
switch req.(type) {
case *LokiSeriesRequest, *LabelRequest:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why aren't we considering split_ingester_queries_by_interval for series and labels queries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel these features have orthogonal goals.

split_ingester_queries_by_interval is introduced to reduce the number of sub-queries we send to ingesters. currently metadata queries send atmost 1 subquery to ingesters with the default split interval.

while split_recent_metadata_queries_by_interval would increase the number of sub-queries, they should ideally deamplify over time as the results get cached.

I am not sure if we'd use these two together for metadata queries and what the config would look like if we do so.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. can you record this somewhere as comment for the future reference?

@ashwanthgoli ashwanthgoli marked this pull request as ready for review February 9, 2024 12:45
@ashwanthgoli ashwanthgoli requested a review from a team as a code owner February 9, 2024 12:45
@slim-bean
Copy link
Collaborator

Haven't looked to closely at the implementation, but the idea makes good sense to me.

First thought, does this need to be configurable? What circumstances would we ever change these configs?

@ashwanthgoli
Copy link
Contributor Author

ashwanthgoli commented Feb 9, 2024

First thought, does this need to be configurable? What circumstances would we ever change these configs?

mostly recent_metadata_query_window will be left at 24h unless the orig split interval is changed. Having this configurable also allows disabling recent metadata splits if it's not working as expected.

split_recent_metadata_queries_by_interval would be useful to create even smaller cache buckets, smaller the bucket the higher the chance of a hit. To start with 1h seems like a good default to me.

Copy link
Collaborator

@kavirajk kavirajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Just one minor nit.

Comment on lines +22 to +25
// metadataSplitIntervalForTimeRange returns split interval for series and label requests.
// If `recent_metadata_query_window` is configured and the query start interval is within this window,
// it returns `split_recent_metadata_queries_by_interval`.
// For other cases, the default split interval of `split_metadata_queries_by_interval` will be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think your code is pretty simple and self-documenting; all this comment can do is go stale.

@@ -334,6 +336,25 @@ func (s ResultsCache) partition(req Request, extents []Extent) ([]Request, []Res
continue
}

if s.onlyUseEntireExtent && (start > extent.GetStart() || end < extent.GetEnd()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome 👏

@ashwanthgoli ashwanthgoli merged commit 9e7725b into main Feb 14, 2024
9 checks passed
@ashwanthgoli ashwanthgoli deleted the ashwanth/recent-metadata-splits branch February 14, 2024 11:16
grafanabot pushed a commit that referenced this pull request Feb 14, 2024
rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport k189 size/XXL type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants