feat(metadata cache): adds max_metadata_cache_freshness #11682

ashwanthgoli · 2024-01-16T07:47:21Z

What this PR does / why we need it:
Adds max_metadata_cache_freshness to limit the metadata requests that get cached. When configured, only metadata requests with end time before now - max_metadata_cache_freshness are cacheable.

reason for setting the default to 24h?
metric results cache can extract samples for the desired time range from an extent since the samples are associated with a timestamp. But the same is not true for metadata caching, it is not possible to extract a subset of labels/series from a cached extent. As a result, we could return inaccurate results, more that what was requested. for ex: returning results from an entire 1h extent for a 5m query

Setting max_metadata_cache_freshness to 24h should help us avoid caching recent data. For anything older, we would report cached metadata results at a granularity controlled by split_metadata_queries_by_interval

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
CHANGELOG.md updated
- If the change is worth mentioning in the release notes, add add-to-release-notes label
Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

vlad-diachenko

LGTM. could you please also update the changelog

dannykopping · 2024-01-16T08:37:18Z

pkg/querier/queryrange/labels_cache.go

+
+			cacheReq, err := shouldCacheMetadataReq(ctx, r, limits)
+			if err != nil {
+				level.Error(logger).Log("msg", "failed to determine if metadata request should be cached. Won't cache", "err", err)


Nit

Suggested change

level.Error(logger).Log("msg", "failed to determine if metadata request should be cached. Won't cache", "err", err)

level.Error(logger).Log("msg", "failed to determine if metadata request should be cached, won't cache", "err", err)

dannykopping · 2024-01-16T08:38:20Z

pkg/querier/queryrange/series_cache.go

@@ -92,9 +97,33 @@ func NewSeriesCacheMiddleware(
 		merger,
 		seriesExtractor{},
 		cacheGenNumberLoader,
-		shouldCache,
+		func(ctx context.Context, r queryrangebase.Request) bool {


Should we extract this out to a function for reuse?

dannykopping · 2024-01-16T08:42:28Z

pkg/validation/limits.go

@@ -277,6 +278,9 @@ func (l *Limits) RegisterFlags(f *flag.FlagSet) {
 	_ = l.MaxCacheFreshness.Set("10m")
 	f.Var(&l.MaxCacheFreshness, "frontend.max-cache-freshness", "Most recent allowed cacheable result per-tenant, to prevent caching very recent results that might still be in flux.")

+	_ = l.MaxMetadataCacheFreshness.Set("24h")


I think it's worth explaining the 24h here, since this aligns with the index compaction period & the index query splitting period. These are all related but since we have no relationship established it will be very tricky to know if we have modified all the correct places if we ever change these periods.

shantanualsi · 2024-01-16T08:46:45Z

do we still update the yaml files in docker/config, production/helm and production/ksonnet when we add new configs?
also config/build_test for completeness :)

ashwanthgoli · 2024-01-16T09:47:53Z

@shantanualsi in this case, I don't see a need since this is inside limits_config block.

but I agree this is something we haven't been playing close attention to when adding new configs.
I think we should atleast update or ensure that helm is configurable with the newly added configs, rest of them look flexible

dannykopping

LGTM

dannykopping · 2024-01-16T09:50:26Z

CHANGELOG.md

@@ -48,7 +48,8 @@
 * [11545](https://github.com/grafana/loki/pull/11545) **dannykopping** Force correct memcached timeout when fetching chunks.
 * [11589](https://github.com/grafana/loki/pull/11589) **ashwanthgoli** Results Cache: Adds `query_length_served` cache stat to measure the length of the query served from cache.
 * [11535](https://github.com/grafana/loki/pull/11535) **dannykopping** Query Frontend: Allow customisable splitting of queries which overlap the `query_ingester_within` window to reduce query pressure on ingesters.
-* [11654](https://github.com/grafana/loki/pull/11654) **dannykopping** Cache: atomically check background cache size limit correctly. 
+* [11654](https://github.com/grafana/loki/pull/11654) **dannykopping** Cache: atomically check background cache size limit correctly.
+* [11682](https://github.com/grafana/loki/pull/11682) **ashwanthgoli** Metadata cache: Adds `frontend.max-metadata-cache-freshness`.


Can you make this less technical and address the need for it rather than what was changed?

**What this PR does / why we need it**: Adds `max_metadata_cache_freshness` to limit the metadata requests that get cached. When configured, only metadata requests with end time before `now - max_metadata_cache_freshness` are cacheable. _reason for setting the default to 24h?_ metric results cache can [extract samples for the desired time range from an extent](https://github.com/grafana/loki/blob/b6e64e1ef1fb2a2155661c815d0198e147579c8e/pkg/querier/queryrange/queryrangebase/results_cache.go#L78) since the samples are associated with a timestamp. But the same is not true for metadata caching, it is not possible to extract a subset of labels/series from a cached extent. As a result, we could return inaccurate results, more that what was requested. for ex: returning results from an entire 1h extent for a 5m query Setting `max_metadata_cache_freshness` to 24h should help us avoid caching recent data. For anything older, we would report cached metadata results at a granularity controlled by `split_metadata_queries_by_interval` **Which issue(s) this PR fixes**: Fixes #<issue number> **Special notes for your reviewer**: **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [x] Documentation added - [x] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](d10549e) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. [Example PR](0d4416a) (cherry picked from commit 51899b5)

Backport 51899b5 from #11682 --- **What this PR does / why we need it**: Adds `max_metadata_cache_freshness` to limit the metadata requests that get cached. When configured, only metadata requests with end time before `now - max_metadata_cache_freshness` are cacheable. _reason for setting the default to 24h?_ metric results cache can [extract samples for the desired time range from an extent](https://github.com/grafana/loki/blob/b6e64e1ef1fb2a2155661c815d0198e147579c8e/pkg/querier/queryrange/queryrangebase/results_cache.go#L78) since the samples are associated with a timestamp. But the same is not true for metadata caching, it is not possible to extract a subset of labels/series from a cached extent. As a result, we could return inaccurate results, more that what was requested. for ex: returning results from an entire 1h extent for a 5m query Setting `max_metadata_cache_freshness` to 24h should help us avoid caching recent data. For anything older, we would report cached metadata results at a granularity controlled by `split_metadata_queries_by_interval` **Which issue(s) this PR fixes**: Fixes #<issue number> **Special notes for your reviewer**: **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [x] Documentation added - [x] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](d10549e) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. [Example PR](0d4416a) Co-authored-by: Ashwanth <iamashwanth@gmail.com>

**What this PR does / why we need it**: Adds `max_metadata_cache_freshness` to limit the metadata requests that get cached. When configured, only metadata requests with end time before `now - max_metadata_cache_freshness` are cacheable. _reason for setting the default to 24h?_ metric results cache can [extract samples for the desired time range from an extent](https://github.com/grafana/loki/blob/b6e64e1ef1fb2a2155661c815d0198e147579c8e/pkg/querier/queryrange/queryrangebase/results_cache.go#L78) since the samples are associated with a timestamp. But the same is not true for metadata caching, it is not possible to extract a subset of labels/series from a cached extent. As a result, we could return inaccurate results, more that what was requested. for ex: returning results from an entire 1h extent for a 5m query Setting `max_metadata_cache_freshness` to 24h should help us avoid caching recent data. For anything older, we would report cached metadata results at a granularity controlled by `split_metadata_queries_by_interval` **Which issue(s) this PR fixes**: Fixes #<issue number> **Special notes for your reviewer**: **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [x] Documentation added - [x] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](grafana@d10549e) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. [Example PR](grafana@0d4416a)

metadata cache: add max_metadata_cache_freshness

90fa0f3

pull-request-size bot added the size/L label Jan 16, 2024

github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Jan 16, 2024

make format

6e252a6

ashwanthgoli marked this pull request as ready for review January 16, 2024 07:59

ashwanthgoli requested a review from a team as a code owner January 16, 2024 07:59

vlad-diachenko approved these changes Jan 16, 2024

View reviewed changes

dannykopping reviewed Jan 16, 2024

View reviewed changes

review comments

a417d24

fixup! review comments

6b7884c

dannykopping approved these changes Jan 16, 2024

View reviewed changes

fixup! fixup! review comments

0c6490b

ashwanthgoli enabled auto-merge (squash) January 16, 2024 10:31

ashwanthgoli added the backport k185 label Jan 16, 2024

ashwanthgoli merged commit 51899b5 into main Jan 16, 2024
11 checks passed

ashwanthgoli deleted the ashwanth/metadata-cache-freshness branch January 16, 2024 10:37

grafanabot mentioned this pull request Jan 16, 2024

[k185] feat(metadata cache): adds max_metadata_cache_freshness #11683

Merged

8 tasks

loki-gh-app bot mentioned this pull request Mar 27, 2024

chore(add-major-release-workflow): release 3.0.0-rc.1 #12380

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(metadata cache): adds max_metadata_cache_freshness #11682

feat(metadata cache): adds max_metadata_cache_freshness #11682

ashwanthgoli commented Jan 16, 2024 •

edited

vlad-diachenko left a comment

dannykopping Jan 16, 2024

dannykopping Jan 16, 2024

dannykopping Jan 16, 2024

shantanualsi commented Jan 16, 2024 •

edited

ashwanthgoli commented Jan 16, 2024

dannykopping left a comment

dannykopping Jan 16, 2024

	level.Error(logger).Log("msg", "failed to determine if metadata request should be cached. Won't cache", "err", err)
	level.Error(logger).Log("msg", "failed to determine if metadata request should be cached, won't cache", "err", err)

feat(metadata cache): adds max_metadata_cache_freshness #11682

feat(metadata cache): adds max_metadata_cache_freshness #11682

Conversation

ashwanthgoli commented Jan 16, 2024 • edited

vlad-diachenko left a comment

Choose a reason for hiding this comment

dannykopping Jan 16, 2024

Choose a reason for hiding this comment

dannykopping Jan 16, 2024

Choose a reason for hiding this comment

dannykopping Jan 16, 2024

Choose a reason for hiding this comment

shantanualsi commented Jan 16, 2024 • edited

ashwanthgoli commented Jan 16, 2024

dannykopping left a comment

Choose a reason for hiding this comment

dannykopping Jan 16, 2024

Choose a reason for hiding this comment

ashwanthgoli commented Jan 16, 2024 •

edited

shantanualsi commented Jan 16, 2024 •

edited