Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metadata cache): adds max_metadata_cache_freshness #11682

Merged
merged 5 commits into from Jan 16, 2024

Conversation

ashwanthgoli
Copy link
Contributor

@ashwanthgoli ashwanthgoli commented Jan 16, 2024

What this PR does / why we need it:
Adds max_metadata_cache_freshness to limit the metadata requests that get cached. When configured, only metadata requests with end time before now - max_metadata_cache_freshness are cacheable.

reason for setting the default to 24h?
metric results cache can extract samples for the desired time range from an extent since the samples are associated with a timestamp. But the same is not true for metadata caching, it is not possible to extract a subset of labels/series from a cached extent. As a result, we could return inaccurate results, more that what was requested. for ex: returning results from an entire 1h extent for a 5m query

Setting max_metadata_cache_freshness to 24h should help us avoid caching recent data. For anything older, we would report cached metadata results at a granularity controlled by split_metadata_queries_by_interval

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Jan 16, 2024
@ashwanthgoli ashwanthgoli marked this pull request as ready for review January 16, 2024 07:59
@ashwanthgoli ashwanthgoli requested a review from a team as a code owner January 16, 2024 07:59
Copy link
Contributor

@vlad-diachenko vlad-diachenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. could you please also update the changelog


cacheReq, err := shouldCacheMetadataReq(ctx, r, limits)
if err != nil {
level.Error(logger).Log("msg", "failed to determine if metadata request should be cached. Won't cache", "err", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
level.Error(logger).Log("msg", "failed to determine if metadata request should be cached. Won't cache", "err", err)
level.Error(logger).Log("msg", "failed to determine if metadata request should be cached, won't cache", "err", err)

@@ -92,9 +97,33 @@ func NewSeriesCacheMiddleware(
merger,
seriesExtractor{},
cacheGenNumberLoader,
shouldCache,
func(ctx context.Context, r queryrangebase.Request) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we extract this out to a function for reuse?

@@ -277,6 +278,9 @@ func (l *Limits) RegisterFlags(f *flag.FlagSet) {
_ = l.MaxCacheFreshness.Set("10m")
f.Var(&l.MaxCacheFreshness, "frontend.max-cache-freshness", "Most recent allowed cacheable result per-tenant, to prevent caching very recent results that might still be in flux.")

_ = l.MaxMetadataCacheFreshness.Set("24h")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth explaining the 24h here, since this aligns with the index compaction period & the index query splitting period. These are all related but since we have no relationship established it will be very tricky to know if we have modified all the correct places if we ever change these periods.

@shantanualsi
Copy link
Contributor

shantanualsi commented Jan 16, 2024

do we still update the yaml files in docker/config, production/helm and production/ksonnet when we add new configs?
also config/build_test for completeness :)

@ashwanthgoli
Copy link
Contributor Author

@shantanualsi in this case, I don't see a need since this is inside limits_config block.

but I agree this is something we haven't been playing close attention to when adding new configs.
I think we should atleast update or ensure that helm is configurable with the newly added configs, rest of them look flexible

Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

CHANGELOG.md Outdated
@@ -48,7 +48,8 @@
* [11545](https://github.com/grafana/loki/pull/11545) **dannykopping** Force correct memcached timeout when fetching chunks.
* [11589](https://github.com/grafana/loki/pull/11589) **ashwanthgoli** Results Cache: Adds `query_length_served` cache stat to measure the length of the query served from cache.
* [11535](https://github.com/grafana/loki/pull/11535) **dannykopping** Query Frontend: Allow customisable splitting of queries which overlap the `query_ingester_within` window to reduce query pressure on ingesters.
* [11654](https://github.com/grafana/loki/pull/11654) **dannykopping** Cache: atomically check background cache size limit correctly.
* [11654](https://github.com/grafana/loki/pull/11654) **dannykopping** Cache: atomically check background cache size limit correctly.
* [11682](https://github.com/grafana/loki/pull/11682) **ashwanthgoli** Metadata cache: Adds `frontend.max-metadata-cache-freshness`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this less technical and address the need for it rather than what was changed?

@ashwanthgoli ashwanthgoli enabled auto-merge (squash) January 16, 2024 10:31
@ashwanthgoli ashwanthgoli merged commit 51899b5 into main Jan 16, 2024
11 checks passed
@ashwanthgoli ashwanthgoli deleted the ashwanth/metadata-cache-freshness branch January 16, 2024 10:37
grafanabot pushed a commit that referenced this pull request Jan 16, 2024
**What this PR does / why we need it**:
Adds `max_metadata_cache_freshness` to limit the metadata requests that
get cached. When configured, only metadata requests with end time before
`now - max_metadata_cache_freshness` are cacheable.

_reason for setting the default to 24h?_
metric results cache can [extract samples for the desired time range
from an
extent](https://github.com/grafana/loki/blob/b6e64e1ef1fb2a2155661c815d0198e147579c8e/pkg/querier/queryrange/queryrangebase/results_cache.go#L78)
since the samples are associated with a timestamp. But the same is not
true for metadata caching, it is not possible to extract a subset of
labels/series from a cached extent. As a result, we could return
inaccurate results, more that what was requested. for ex: returning
results from an entire 1h extent for a 5m query

Setting `max_metadata_cache_freshness` to 24h should help us avoid
caching recent data. For anything older, we would report cached metadata
results at a granularity controlled by
`split_metadata_queries_by_interval`

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [x] Documentation added
- [x] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a)

(cherry picked from commit 51899b5)
ashwanthgoli added a commit that referenced this pull request Jan 16, 2024
Backport 51899b5 from #11682

---

**What this PR does / why we need it**:
Adds `max_metadata_cache_freshness` to limit the metadata requests that
get cached. When configured, only metadata requests with end time before
`now - max_metadata_cache_freshness` are cacheable.

_reason for setting the default to 24h?_
metric results cache can [extract samples for the desired time range
from an
extent](https://github.com/grafana/loki/blob/b6e64e1ef1fb2a2155661c815d0198e147579c8e/pkg/querier/queryrange/queryrangebase/results_cache.go#L78)
since the samples are associated with a timestamp. But the same is not
true for metadata caching, it is not possible to extract a subset of
labels/series from a cached extent. As a result, we could return
inaccurate results, more that what was requested. for ex: returning
results from an entire 1h extent for a 5m query

Setting `max_metadata_cache_freshness` to 24h should help us avoid
caching recent data. For anything older, we would report cached metadata
results at a granularity controlled by
`split_metadata_queries_by_interval`

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [x] Documentation added
- [x] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a)

Co-authored-by: Ashwanth <iamashwanth@gmail.com>
rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024
**What this PR does / why we need it**:
Adds `max_metadata_cache_freshness` to limit the metadata requests that
get cached. When configured, only metadata requests with end time before
`now - max_metadata_cache_freshness` are cacheable.

_reason for setting the default to 24h?_
metric results cache can [extract samples for the desired time range
from an
extent](https://github.com/grafana/loki/blob/b6e64e1ef1fb2a2155661c815d0198e147579c8e/pkg/querier/queryrange/queryrangebase/results_cache.go#L78)
since the samples are associated with a timestamp. But the same is not
true for metadata caching, it is not possible to extract a subset of
labels/series from a cached extent. As a result, we could return
inaccurate results, more that what was requested. for ex: returning
results from an entire 1h extent for a 5m query

Setting `max_metadata_cache_freshness` to 24h should help us avoid
caching recent data. For anything older, we would report cached metadata
results at a granularity controlled by
`split_metadata_queries_by_interval`

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [x] Documentation added
- [x] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport k185 size/L type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants