New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(metadata cache): adds max_metadata_cache_freshness #11682
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. could you please also update the changelog
|
||
cacheReq, err := shouldCacheMetadataReq(ctx, r, limits) | ||
if err != nil { | ||
level.Error(logger).Log("msg", "failed to determine if metadata request should be cached. Won't cache", "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
level.Error(logger).Log("msg", "failed to determine if metadata request should be cached. Won't cache", "err", err) | |
level.Error(logger).Log("msg", "failed to determine if metadata request should be cached, won't cache", "err", err) |
@@ -92,9 +97,33 @@ func NewSeriesCacheMiddleware( | |||
merger, | |||
seriesExtractor{}, | |||
cacheGenNumberLoader, | |||
shouldCache, | |||
func(ctx context.Context, r queryrangebase.Request) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we extract this out to a function for reuse?
@@ -277,6 +278,9 @@ func (l *Limits) RegisterFlags(f *flag.FlagSet) { | |||
_ = l.MaxCacheFreshness.Set("10m") | |||
f.Var(&l.MaxCacheFreshness, "frontend.max-cache-freshness", "Most recent allowed cacheable result per-tenant, to prevent caching very recent results that might still be in flux.") | |||
|
|||
_ = l.MaxMetadataCacheFreshness.Set("24h") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth explaining the 24h here, since this aligns with the index compaction period & the index query splitting period. These are all related but since we have no relationship established it will be very tricky to know if we have modified all the correct places if we ever change these periods.
do we still update the yaml files in docker/config, production/helm and production/ksonnet when we add new configs? |
@shantanualsi in this case, I don't see a need since this is inside but I agree this is something we haven't been playing close attention to when adding new configs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
CHANGELOG.md
Outdated
@@ -48,7 +48,8 @@ | |||
* [11545](https://github.com/grafana/loki/pull/11545) **dannykopping** Force correct memcached timeout when fetching chunks. | |||
* [11589](https://github.com/grafana/loki/pull/11589) **ashwanthgoli** Results Cache: Adds `query_length_served` cache stat to measure the length of the query served from cache. | |||
* [11535](https://github.com/grafana/loki/pull/11535) **dannykopping** Query Frontend: Allow customisable splitting of queries which overlap the `query_ingester_within` window to reduce query pressure on ingesters. | |||
* [11654](https://github.com/grafana/loki/pull/11654) **dannykopping** Cache: atomically check background cache size limit correctly. | |||
* [11654](https://github.com/grafana/loki/pull/11654) **dannykopping** Cache: atomically check background cache size limit correctly. | |||
* [11682](https://github.com/grafana/loki/pull/11682) **ashwanthgoli** Metadata cache: Adds `frontend.max-metadata-cache-freshness`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this less technical and address the need for it rather than what was changed?
**What this PR does / why we need it**: Adds `max_metadata_cache_freshness` to limit the metadata requests that get cached. When configured, only metadata requests with end time before `now - max_metadata_cache_freshness` are cacheable. _reason for setting the default to 24h?_ metric results cache can [extract samples for the desired time range from an extent](https://github.com/grafana/loki/blob/b6e64e1ef1fb2a2155661c815d0198e147579c8e/pkg/querier/queryrange/queryrangebase/results_cache.go#L78) since the samples are associated with a timestamp. But the same is not true for metadata caching, it is not possible to extract a subset of labels/series from a cached extent. As a result, we could return inaccurate results, more that what was requested. for ex: returning results from an entire 1h extent for a 5m query Setting `max_metadata_cache_freshness` to 24h should help us avoid caching recent data. For anything older, we would report cached metadata results at a granularity controlled by `split_metadata_queries_by_interval` **Which issue(s) this PR fixes**: Fixes #<issue number> **Special notes for your reviewer**: **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [x] Documentation added - [x] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](d10549e) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. [Example PR](0d4416a) (cherry picked from commit 51899b5)
Backport 51899b5 from #11682 --- **What this PR does / why we need it**: Adds `max_metadata_cache_freshness` to limit the metadata requests that get cached. When configured, only metadata requests with end time before `now - max_metadata_cache_freshness` are cacheable. _reason for setting the default to 24h?_ metric results cache can [extract samples for the desired time range from an extent](https://github.com/grafana/loki/blob/b6e64e1ef1fb2a2155661c815d0198e147579c8e/pkg/querier/queryrange/queryrangebase/results_cache.go#L78) since the samples are associated with a timestamp. But the same is not true for metadata caching, it is not possible to extract a subset of labels/series from a cached extent. As a result, we could return inaccurate results, more that what was requested. for ex: returning results from an entire 1h extent for a 5m query Setting `max_metadata_cache_freshness` to 24h should help us avoid caching recent data. For anything older, we would report cached metadata results at a granularity controlled by `split_metadata_queries_by_interval` **Which issue(s) this PR fixes**: Fixes #<issue number> **Special notes for your reviewer**: **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [x] Documentation added - [x] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](d10549e) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. [Example PR](0d4416a) Co-authored-by: Ashwanth <iamashwanth@gmail.com>
**What this PR does / why we need it**: Adds `max_metadata_cache_freshness` to limit the metadata requests that get cached. When configured, only metadata requests with end time before `now - max_metadata_cache_freshness` are cacheable. _reason for setting the default to 24h?_ metric results cache can [extract samples for the desired time range from an extent](https://github.com/grafana/loki/blob/b6e64e1ef1fb2a2155661c815d0198e147579c8e/pkg/querier/queryrange/queryrangebase/results_cache.go#L78) since the samples are associated with a timestamp. But the same is not true for metadata caching, it is not possible to extract a subset of labels/series from a cached extent. As a result, we could return inaccurate results, more that what was requested. for ex: returning results from an entire 1h extent for a 5m query Setting `max_metadata_cache_freshness` to 24h should help us avoid caching recent data. For anything older, we would report cached metadata results at a granularity controlled by `split_metadata_queries_by_interval` **Which issue(s) this PR fixes**: Fixes #<issue number> **Special notes for your reviewer**: **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [x] Documentation added - [x] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](grafana@d10549e) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. [Example PR](grafana@0d4416a)
What this PR does / why we need it:
Adds
max_metadata_cache_freshness
to limit the metadata requests that get cached. When configured, only metadata requests with end time beforenow - max_metadata_cache_freshness
are cacheable.reason for setting the default to 24h?
metric results cache can extract samples for the desired time range from an extent since the samples are associated with a timestamp. But the same is not true for metadata caching, it is not possible to extract a subset of labels/series from a cached extent. As a result, we could return inaccurate results, more that what was requested. for ex: returning results from an entire 1h extent for a 5m query
Setting
max_metadata_cache_freshness
to 24h should help us avoid caching recent data. For anything older, we would report cached metadata results at a granularity controlled bysplit_metadata_queries_by_interval
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)CHANGELOG.md
updatedadd-to-release-notes
labeldocs/sources/setup/upgrade/_index.md
production/helm/loki/Chart.yaml
and updateproduction/helm/loki/CHANGELOG.md
andproduction/helm/loki/README.md
. Example PRdeprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR