Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(caching): Support caching /series and /labels query results #11539

Merged
merged 27 commits into from
Jan 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
54b42a2
feature(cache): Support caching for metadata query results
kavirajk Dec 18, 2023
6139807
idk
kavirajk Dec 19, 2023
fe12193
fix TestSeriesCache
ashwanthgoli Dec 19, 2023
ec99784
Complete series cache tests
kavirajk Dec 20, 2023
7158994
have separate loki config to run with memcached
kavirajk Dec 20, 2023
e4d1302
Add cache hit metrics and configs to test
kavirajk Dec 20, 2023
4cb7145
fix memcached config
kavirajk Dec 20, 2023
55a7c8a
stats.proto change to support series cache
kavirajk Dec 20, 2023
64175cf
tidy up the tests
ashwanthgoli Dec 20, 2023
d716461
update failing tests to include the new stats
ashwanthgoli Dec 21, 2023
01f4792
apply default configs to series cache
ashwanthgoli Dec 26, 2023
13ff271
add test for GenerateCacheKey
ashwanthgoli Dec 26, 2023
634b43b
make doc
ashwanthgoli Dec 26, 2023
c654c82
Merge branch 'main' into kavirajk/metadata-caching
ashwanthgoli Dec 26, 2023
ed3db1a
preserve cache prefix
ashwanthgoli Dec 26, 2023
836b9ee
fixup! preserve cache prefix
ashwanthgoli Dec 26, 2023
dfe174f
s/querier.cache-series-results/frontend.cache-series-results
ashwanthgoli Dec 27, 2023
ee9667f
retain headers when merging series response
ashwanthgoli Dec 28, 2023
2cc04cf
add label results cache
ashwanthgoli Dec 29, 2023
bb2280d
make format && make doc
ashwanthgoli Dec 29, 2023
40f69e3
add CHANGELOG
ashwanthgoli Dec 29, 2023
db966bd
Merge branch 'main' into kavirajk/metadata-caching
ashwanthgoli Jan 2, 2024
ccd4a55
introduce split_metadata_queries_by_interval
ashwanthgoli Jan 2, 2024
32f4d20
make doc
ashwanthgoli Jan 2, 2024
92a7e43
Make flags prefix consistent with rest of result cache flags
kavirajk Jan 2, 2024
8706162
nit
ashwanthgoli Jan 2, 2024
40d64ef
PR remarks
kavirajk Jan 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
* [10956](https://github.com/grafana/loki/pull/10956) **jeschkies** do not wrap requests but send pure Protobuf from frontend v2 via scheduler to querier when `-frontend.encoding=protobuf`.
* [10417](https://github.com/grafana/loki/pull/10417) **jeschkies** shard `quantile_over_time` range queries using probabilistic data structures.
* [11284](https://github.com/grafana/loki/pull/11284) **ashwanthgoli** Config: Adds `frontend.max-query-capacity` to tune per-tenant query capacity.
* [11539](https://github.com/grafana/loki/pull/11539) **kaviraj,ashwanthgoli** Support caching /series and /labels query results
* [11545](https://github.com/grafana/loki/pull/11545) **dannykopping** Force correct memcached timeout when fetching chunks.

##### Fixes
Expand Down
87 changes: 87 additions & 0 deletions cmd/loki/loki-local-with-memcached.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
auth_enabled: false

server:
http_listen_port: 3100
grpc_listen_port: 9096

common:
instance_addr: 127.0.0.1
path_prefix: /tmp/loki
storage:
filesystem:
chunks_directory: /tmp/loki/chunks
rules_directory: /tmp/loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory

query_range:
align_queries_with_step: true
cache_index_stats_results: true
cache_results: true
cache_volume_results: true
cache_series_results: true
series_results_cache:
cache:
default_validity: 12h
memcached_client:
consistent_hash: true
addresses: "dns+localhost:11211"
max_idle_conns: 16
timeout: 500ms
update_interval: 1m
index_stats_results_cache:
cache:
default_validity: 12h
memcached_client:
consistent_hash: true
addresses: "dns+localhost:11211"
max_idle_conns: 16
timeout: 500ms
update_interval: 1m
max_retries: 5
results_cache:
cache:
default_validity: 12h
memcached_client:
consistent_hash: true
addresses: "dns+localhost:11211"
max_idle_conns: 16
timeout: 500ms
update_interval: 1m
volume_results_cache:
cache:
default_validity: 12h
memcached_client:
consistent_hash: true
addresses: "dns+localhost:11211"
max_idle_conns: 16
timeout: 500ms
update_interval: 1m

schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v12
index:
prefix: index_
period: 24h

ruler:
alertmanager_url: http://localhost:9093

# By default, Loki will send anonymous, but uniquely-identifiable usage and configuration
# analytics to Grafana Labs. These statistics are sent to https://stats.grafana.org/
#
# Statistics help us better understand how Loki is used, and they show us performance
# levels for most users. This helps us prioritize features and documentation.
# For more information on what's sent, look at
# https://github.com/grafana/loki/blob/main/pkg/analytics/stats.go
# Refer to the buildReport method to see what goes into a report.
#
# If you would like to disable reporting, uncomment the following lines:
#analytics:
# reporting_enabled: false
42 changes: 42 additions & 0 deletions docs/sources/configure/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -880,6 +880,40 @@ volume_results_cache:
# compression. Supported values are: 'snappy' and ''.
# CLI flag: -frontend.volume-results-cache.compression
[compression: <string> | default = ""]

# Cache series query results.
# CLI flag: -querier.cache-series-results
[cache_series_results: <boolean> | default = false]

# If series_results_cache is not configured and cache_series_results is true,
# the config for the results cache is used.
series_results_cache:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much value in having separate cache configs for this, label results, and normal result cache. IMHO we should err on the side of simplicity until we find a very compelling case for needing separate caches for all 3.

Our config is already monstrously complex; we should do what we can to not make it more so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree to this. one downside is that we would lose granularity with stats and metric collection since they are tried to the cache instance. labels, series and results caching all would be clubbed together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How valuable are the metrics? Could we use a different label based on which "subcomponent" is observing metrics?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Thought about this when introducing it. Now we have so much repeated configs for normal results cache (metric range queries), index stats, volume, labels, series and for metric instant queries in future. All have exactly same configs.

One downside is it's hard to clean this up without breaking changes I think. Other option is to introduce shared config with "subcomponent" label but only for labels and series without breaking existing configs, then we loose the consistency with how we configure different results cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be fine with leaving this as is for now but combining them all in v3.0 with breaking changes, thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Adding it to the epic 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another thing that I am worried about is blowing up the size of stats proto. does encoding ignore fields that are not set? if that's the case we don't have to worry about this

only one of these would be set for a given request, I guess we could just collapse this into a single field with a new cache type field

Cache result = 3 [
(gogoproto.nullable) = false,
(gogoproto.jsontag) = "result"
];
Cache statsResult = 4 [
(gogoproto.nullable) = false,
(gogoproto.jsontag) = "statsResult"
];
Cache volumeResult = 5 [
(gogoproto.nullable) = false,
(gogoproto.jsontag) = "volumeResult"
];
Cache seriesResult = 6 [
(gogoproto.nullable) = false,
(gogoproto.jsontag) = "seriesResult"
];
Cache labelResult = 7 [
(gogoproto.nullable) = false,
(gogoproto.jsontag) = "labelResult"
];
}

# The cache block configures the cache backend.
# The CLI flags prefix for this block configuration is:
# frontend.series-results-cache
[cache: <cache_config>]

# Use compression in cache. The default is an empty value '', which disables
# compression. Supported values are: 'snappy' and ''.
# CLI flag: -frontend.series-results-cache.compression
[compression: <string> | default = ""]

# Cache label query results.
# CLI flag: -querier.cache-label-results
[cache_label_results: <boolean> | default = false]

# If label_results_cache is not configured and cache_label_results is true, the
# config for the results cache is used.
label_results_cache:
# The cache block configures the cache backend.
# The CLI flags prefix for this block configuration is:
# frontend.label-results-cache
[cache: <cache_config>]

# Use compression in cache. The default is an empty value '', which disables
# compression. Supported values are: 'snappy' and ''.
# CLI flag: -frontend.label-results-cache.compression
[compression: <string> | default = ""]
```

### ruler
Expand Down Expand Up @@ -2844,6 +2878,12 @@ The `limits_config` block configures global and per-tenant limits in Loki.
# CLI flag: -querier.split-queries-by-interval
[split_queries_by_interval: <duration> | default = 1h]

# Split metadata queries by a time interval and execute in parallel. The value 0
# disables splitting metadata queries by time. This also determines how cache
# keys are chosen when label/series result caching is enabled.
# CLI flag: -querier.split-metadata-queries-by-interval
[split_metadata_queries_by_interval: <duration> | default = 1d]

# Limit queries that can be sharded. Queries within the time range of now and
# now minus this sharding lookback are not sharded. The default value of 0s
# disables the lookback, causing sharding of all queries at all times.
Expand Down Expand Up @@ -4283,6 +4323,8 @@ The cache block configures the cache backend. The supported CLI flags `<prefix>`
- `bloom-gateway-client.cache`
- `frontend`
- `frontend.index-stats-results-cache`
- `frontend.label-results-cache`
- `frontend.series-results-cache`
- `frontend.volume-results-cache`
- `store.chunks-cache`
- `store.index-cache-read`
Expand Down
11 changes: 10 additions & 1 deletion pkg/logql/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,10 @@ func RecordLabelQueryMetrics(
"query", query,
"query_hash", util.HashedQuery(query),
"total_entries", stats.Summary.TotalEntriesReturned,
"cache_label_results_req", stats.Caches.LabelResult.EntriesRequested,
"cache_label_results_hit", stats.Caches.LabelResult.EntriesFound,
dannykopping marked this conversation as resolved.
Show resolved Hide resolved
"cache_label_results_stored", stats.Caches.LabelResult.EntriesStored,
"cache_label_results_download_time", stats.Caches.LabelResult.CacheDownloadTime(),
)

execLatency.WithLabelValues(status, queryType, "").Observe(stats.Summary.ExecTime)
Expand Down Expand Up @@ -272,7 +276,12 @@ func RecordSeriesQueryMetrics(ctx context.Context, log log.Logger, start, end ti
"status", status,
"match", PrintMatches(match),
"query_hash", util.HashedQuery(PrintMatches(match)),
"total_entries", stats.Summary.TotalEntriesReturned)
"total_entries", stats.Summary.TotalEntriesReturned,
"cache_series_results_req", stats.Caches.SeriesResult.EntriesRequested,
"cache_series_results_hit", stats.Caches.SeriesResult.EntriesFound,
"cache_series_results_stored", stats.Caches.SeriesResult.EntriesStored,
"cache_series_results_download_time", stats.Caches.SeriesResult.CacheDownloadTime(),
)

if shard != nil {
logValues = append(logValues,
Expand Down
20 changes: 18 additions & 2 deletions pkg/logql/metrics_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -106,10 +106,18 @@ func TestLogLabelsQuery(t *testing.T) {
TotalBytesProcessed: 100000,
TotalEntriesReturned: 12,
},
Caches: stats.Caches{
LabelResult: stats.Cache{
EntriesRequested: 2,
EntriesFound: 1,
EntriesStored: 1,
DownloadTime: 80,
},
},
})
require.Regexp(t,
fmt.Sprintf(
"level=info org_id=foo traceID=%s sampled=true latency=slow query_type=labels splits=0 start=.* end=.* start_delta=1h0m0.* end_delta=.* length=1h0m0s duration=25.25s status=200 label=foo query= query_hash=2166136261 total_entries=12\n",
"level=info org_id=foo traceID=%s sampled=true latency=slow query_type=labels splits=0 start=.* end=.* start_delta=1h0m0.* end_delta=.* length=1h0m0s duration=25.25s status=200 label=foo query= query_hash=2166136261 total_entries=12 cache_label_results_req=2 cache_label_results_hit=1 cache_label_results_stored=1 cache_label_results_download_time=80ns\n",
sp.Context().(jaeger.SpanContext).SpanID().String(),
),
buf.String())
Expand All @@ -132,10 +140,18 @@ func TestLogSeriesQuery(t *testing.T) {
TotalBytesProcessed: 100000,
TotalEntriesReturned: 10,
},
Caches: stats.Caches{
SeriesResult: stats.Cache{
EntriesRequested: 2,
EntriesFound: 1,
EntriesStored: 1,
DownloadTime: 80,
},
},
})
require.Regexp(t,
fmt.Sprintf(
"level=info org_id=foo traceID=%s sampled=true latency=slow query_type=series splits=0 start=.* end=.* start_delta=1h0m0.* end_delta=.* length=1h0m0s duration=25.25s status=200 match=\"{container_name=.*\"}:{app=.*}\" query_hash=23523089 total_entries=10\n",
"level=info org_id=foo traceID=%s sampled=true latency=slow query_type=series splits=0 start=.* end=.* start_delta=1h0m0.* end_delta=.* length=1h0m0s duration=25.25s status=200 match=\"{container_name=.*\"}:{app=.*}\" query_hash=23523089 total_entries=10 cache_series_results_req=2 cache_series_results_hit=1 cache_series_results_stored=1 cache_series_results_download_time=80ns\n",
sp.Context().(jaeger.SpanContext).SpanID().String(),
),
buf.String())
Expand Down
22 changes: 22 additions & 0 deletions pkg/logqlmodel/stats/context.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ const (
StatsResultCache = "stats-result"
VolumeResultCache = "volume-result"
WriteDedupeCache = "write-dedupe"
SeriesResultCache = "series-result"
LabelResultCache = "label-result"
BloomFilterCache = "bloom-filter"
BloomBlocksCache = "bloom-blocks"
)
Expand Down Expand Up @@ -100,6 +102,8 @@ func (c *Context) Caches() Caches {
Result: c.caches.Result,
StatsResult: c.caches.StatsResult,
VolumeResult: c.caches.VolumeResult,
SeriesResult: c.caches.SeriesResult,
LabelResult: c.caches.LabelResult,
}
}

Expand Down Expand Up @@ -215,6 +219,8 @@ func (c *Caches) Merge(m Caches) {
c.Result.Merge(m.Result)
c.StatsResult.Merge(m.StatsResult)
c.VolumeResult.Merge(m.VolumeResult)
c.SeriesResult.Merge(m.SeriesResult)
c.LabelResult.Merge(m.LabelResult)
}

func (c *Cache) Merge(m Cache) {
Expand Down Expand Up @@ -444,6 +450,10 @@ func (c *Context) getCacheStatsByType(t CacheType) *Cache {
stats = &c.caches.StatsResult
case VolumeResultCache:
stats = &c.caches.VolumeResult
case SeriesResultCache:
stats = &c.caches.SeriesResult
case LabelResultCache:
stats = &c.caches.LabelResult
default:
return nil
}
Expand Down Expand Up @@ -526,6 +536,18 @@ func (c Caches) Log(log log.Logger) {
"Cache.VolumeResult.EntriesStored", c.VolumeResult.EntriesStored,
"Cache.VolumeResult.BytesSent", humanize.Bytes(uint64(c.VolumeResult.BytesSent)),
"Cache.VolumeResult.BytesReceived", humanize.Bytes(uint64(c.VolumeResult.BytesReceived)),
"Cache.SeriesResult.Requests", c.SeriesResult.Requests,
"Cache.SeriesResult.EntriesRequested", c.SeriesResult.EntriesRequested,
"Cache.SeriesResult.EntriesFound", c.SeriesResult.EntriesFound,
"Cache.SeriesResult.EntriesStored", c.SeriesResult.EntriesStored,
"Cache.SeriesResult.BytesSent", humanize.Bytes(uint64(c.SeriesResult.BytesSent)),
"Cache.SeriesResult.BytesReceived", humanize.Bytes(uint64(c.SeriesResult.BytesReceived)),
"Cache.LabelResult.Requests", c.LabelResult.Requests,
"Cache.LabelResult.EntriesRequested", c.LabelResult.EntriesRequested,
"Cache.LabelResult.EntriesFound", c.LabelResult.EntriesFound,
"Cache.LabelResult.EntriesStored", c.LabelResult.EntriesStored,
"Cache.LabelResult.BytesSent", humanize.Bytes(uint64(c.LabelResult.BytesSent)),
"Cache.LabelResult.BytesReceived", humanize.Bytes(uint64(c.LabelResult.BytesReceived)),
"Cache.Result.DownloadTime", c.Result.CacheDownloadTime(),
"Cache.Result.Requests", c.Result.Requests,
"Cache.Result.EntriesRequested", c.Result.EntriesRequested,
Expand Down