results cache: add new stat `query_length_served` to measure cache effectiveness #11589

ashwanthgoli · 2024-01-05T10:42:14Z

What this PR does / why we need it:
cache hit rate that is currently being measured using metrics and stats does not account for the fact that a cache hit could return partial results. When we query the cache for a key, we get back a list of extents and these need no cover the entire (split) range of the cache key.

This pr adds a new cache stat called query_length_served to better measure the cache effectiveness.

query_length - sum(length of downstream queries)

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
CHANGELOG.md updated
- If the change is worth mentioning in the release notes, add add-to-release-notes label
Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

github-actions · 2024-01-05T10:45:31Z

Trivy scan found the following vulnerabilities:

HIGH, Target: docker.io/grafana/loki:main-6897056 (alpine 3.18.4), Type: alpine openssl: Incorrect cipher key and IV length processing in libcrypto3 v3.1.3-r0. Fixed in v3.1.4-r0
HIGH, Target: docker.io/grafana/loki:main-6897056 (alpine 3.18.4), Type: alpine openssl: Incorrect cipher key and IV length processing in libssl3 v3.1.3-r0. Fixed in v3.1.4-r0
\nTo see more details on these vulnerabilities, and how/where to fix them, please run docker build -t grafana/loki:main-6897056 -f cmd/loki/Dockerfile .
trivy i grafana/loki:main-6897056 on your branch. If these were not introduced by your PR, please considering fixing them in via a subsequent PR. Thanks!

dannykopping

Trying to understand: how would this will be used to measure cache effectiveness? Would we use this field and perform some arithmetic against the query length to determine the overall "hit rate"?

On its own it'll be difficult to use as a measure of cache effectiveness, I feel.

ashwanthgoli · 2024-01-05T11:22:36Z

Would we use this field and perform some arithmetic against the query length to determine the overall "hit rate"?

yes @dannykopping, this is what I was thinking. Just logging the actual number same as what we do with other stats like timings.

dannykopping · 2024-01-05T11:25:01Z

Gotcha, ok. Can we even do arithmetic on durations in this way?

ashwanthgoli · 2024-01-05T11:34:44Z

not pretty, but that should do it I think

divf (.cache_label_results_query_length_served | duration) (.length | duration))

dannykopping · 2024-01-05T11:38:57Z

not pretty, but that should do it I think
divf (.cache_label_results_query_length_served | duration) (.length | duration))

Seems plausible, can you confirm before we merge?

dannykopping

LGTM

…fectiveness (grafana#11589) **What this PR does / why we need it**: cache hit rate that is currently being measured using metrics and stats does not account for the fact that a cache hit could return partial results. When we query the cache for a key, we get back a list of extents and these need no cover the entire (split) range of the cache key. This pr adds a new cache stat called `query_length_served` to better measure the cache effectiveness. ``` query_length - sum(length of downstream queries) ``` **Which issue(s) this PR fixes**: Fixes #<issue number> **Special notes for your reviewer**: **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [x] Documentation added - [x] Tests updated - [x] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](grafana@d10549e) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. [Example PR](grafana@0d4416a)

results cache: add query_length_served

e9593c4

pull-request-size bot added the size/L label Jan 5, 2024

add changelog

6342932

ashwanthgoli changed the title ~~results cache: add query_length_served~~ results cache: add new stat query_length_served to measure cache effectiveness Jan 5, 2024

ashwanthgoli marked this pull request as ready for review January 5, 2024 10:45

ashwanthgoli requested a review from a team as a code owner January 5, 2024 10:45

dannykopping reviewed Jan 5, 2024

View reviewed changes

fix lint issues

b6e64e1

dannykopping approved these changes Jan 5, 2024

View reviewed changes

ashwanthgoli merged commit 852becf into main Jan 5, 2024
8 checks passed

ashwanthgoli deleted the ashwanth/cache-hit-rate branch January 5, 2024 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

results cache: add new stat `query_length_served` to measure cache effectiveness #11589

results cache: add new stat `query_length_served` to measure cache effectiveness #11589

ashwanthgoli commented Jan 5, 2024 •

edited

github-actions bot commented Jan 5, 2024

dannykopping left a comment •

edited

ashwanthgoli commented Jan 5, 2024

dannykopping commented Jan 5, 2024

ashwanthgoli commented Jan 5, 2024

dannykopping commented Jan 5, 2024

dannykopping left a comment

results cache: add new stat query_length_served to measure cache effectiveness #11589

results cache: add new stat query_length_served to measure cache effectiveness #11589

Conversation

ashwanthgoli commented Jan 5, 2024 • edited

github-actions bot commented Jan 5, 2024

dannykopping left a comment • edited

Choose a reason for hiding this comment

ashwanthgoli commented Jan 5, 2024

dannykopping commented Jan 5, 2024

ashwanthgoli commented Jan 5, 2024

dannykopping commented Jan 5, 2024

dannykopping left a comment

Choose a reason for hiding this comment

results cache: add new stat `query_length_served` to measure cache effectiveness #11589

results cache: add new stat `query_length_served` to measure cache effectiveness #11589

ashwanthgoli commented Jan 5, 2024 •

edited

dannykopping left a comment •

edited