New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query-frontend: customisable query splitting for queries overlapping query_ingester_within
window
#11535
Query-frontend: customisable query splitting for queries overlapping query_ingester_within
window
#11535
Conversation
Trivy scan found the following vulnerabilities:
|
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
11c9f8a
to
98c3234
Compare
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
d5b2072
to
58d6af1
Compare
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
58d6af1
to
3f7b74c
Compare
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
query_ingester_within
windowquery_ingester_within
window
query_ingester_within
windowquery_ingester_within
window
3b5f3ba
to
ac70af3
Compare
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
pkg/querier/queryrange/splitters.go
Outdated
// for queries outside the `query_ingesters_within` window. | ||
// | ||
// The given factory is responsible for building the splits and appending to reqs. | ||
newStart, newEnd := buildIngesterQuerySplitsAndRebound(execTime, s.limits, s.iqo, tenantIDs, lokiReq, factory, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for metric queries, this does not round the split boundaries to the step. is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I can't use this function here; thanks! I'll fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch on this @ashwanthgoli! I've addressed it in b7dfa09, PTAL.
something to be aware of: caching middlewares downstream expect the sub-queries to be split and aligned by for ex: a cache key spanning an interval of 15m might store a larger 1h extent. |
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Hhmm, what should we do about this? I guess we shouldn't be caching metric queries within |
pkg/querier/queryrange/splitters.go
Outdated
s.buildMetricSplits(lokiReq.GetStep(), ingesterQueryInterval, start, end, factory) | ||
|
||
// rebound after ingester queries have been split out | ||
end = start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also substract step from end to leave a step gap on the ingester split boundary similar to what we do with each split.
loki/pkg/querier/queryrange/splitters.go
Line 155 in b7dfa09
// Round up to the step before the next interval boundary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in 75ee015, good catch!
@dannykopping should we also make the cache layer aware of the ingester splits and pick a different interval for the cache key based on the query range being served?
problem with this is that the current defaults for max_cache_freshness are much smaller, sub 30m. this might have some performance impact |
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Great idea! I'll do this in a follow-up PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
||
func (s *metricQuerySplitter) alignStartEnd(step int64, start, end time.Time) (time.Time, time.Time) { | ||
// step align start and end time of the query. Start time is rounded down and end time is rounded up. | ||
stepNs := step * 1e6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fingers crossed that step is ms
;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks awesome 🚀
… keys (#11679) **What this PR does / why we need it**: Follow up from #11535 (specifically from [this thread](#11535 (comment))), this PR modifies the results cache implementation to use the same interval for generating cache keys.
…`query_ingester_within` window (#11535) **What this PR does / why we need it**: The config option `query_ingesters_within` defines the window during which logs _could_ be present on ingesters, and as such queriers will send queries to ingesters instead. `split_queries_by_interval` is defined to split queries into subqueries for increased parallelism. Aggressive query splitting within the `query_ingesters_within` window can result in overloading ingesters with unnecessarily large numbers of subqueries, which perversely can impact writes. `query_ingesters_within` is set to 3h by default. In Grafana Cloud Logs we set `split_queries_by_interval` as low as 15m (defaults to 1h), which would result in result in 3*60/15=12 requests. Every querier queries every ingester during this window, so that's 12 requests _per ingester per query_ which has the `query_ingesters_within` window in its time range _(i.e. a query from now to now-7d would include the `query_ingesters_within` window as well, now-3h to now-7d would not)_. However, we _do_ want to split queries so an ingester won't have to handle a query for a full `query_ingesters_within` window - this could involve a large amount of data. To account for this, this PR introduces a new option `split_ingester_queries_by_interval` on the query-frontend; this setting is disabled by default. ![image](https://github.com/grafana/loki/assets/373762/2e671bd8-9e8d-4bf3-addf-bebcfc25e8d7)
… keys (#11679) **What this PR does / why we need it**: Follow up from #11535 (specifically from [this thread](#11535 (comment))), this PR modifies the results cache implementation to use the same interval for generating cache keys.
…`query_ingester_within` window (grafana#11535) **What this PR does / why we need it**: The config option `query_ingesters_within` defines the window during which logs _could_ be present on ingesters, and as such queriers will send queries to ingesters instead. `split_queries_by_interval` is defined to split queries into subqueries for increased parallelism. Aggressive query splitting within the `query_ingesters_within` window can result in overloading ingesters with unnecessarily large numbers of subqueries, which perversely can impact writes. `query_ingesters_within` is set to 3h by default. In Grafana Cloud Logs we set `split_queries_by_interval` as low as 15m (defaults to 1h), which would result in result in 3*60/15=12 requests. Every querier queries every ingester during this window, so that's 12 requests _per ingester per query_ which has the `query_ingesters_within` window in its time range _(i.e. a query from now to now-7d would include the `query_ingesters_within` window as well, now-3h to now-7d would not)_. However, we _do_ want to split queries so an ingester won't have to handle a query for a full `query_ingesters_within` window - this could involve a large amount of data. To account for this, this PR introduces a new option `split_ingester_queries_by_interval` on the query-frontend; this setting is disabled by default. ![image](https://github.com/grafana/loki/assets/373762/2e671bd8-9e8d-4bf3-addf-bebcfc25e8d7)
… keys (grafana#11679) **What this PR does / why we need it**: Follow up from grafana#11535 (specifically from [this thread](grafana#11535 (comment))), this PR modifies the results cache implementation to use the same interval for generating cache keys.
What this PR does / why we need it:
The config option
query_ingesters_within
defines the window during which logs could be present on ingesters, and as such queriers will send queries to ingesters instead.split_queries_by_interval
is defined to split queries into subqueries for increased parallelism.Aggressive query splitting within the
query_ingesters_within
window can result in overloading ingesters with unnecessarily large numbers of subqueries, which perversely can impact writes.query_ingesters_within
is set to 3h by default. In Grafana Cloud Logs we setsplit_queries_by_interval
as low as 15m (defaults to 1h), which would result in result in 3*60/15=12 requests. Every querier queries every ingester during this window, so that's 12 requests per ingester per query which has thequery_ingesters_within
window in its time range (i.e. a query from now to now-7d would include thequery_ingesters_within
window as well, now-3h to now-7d would not).However, we do want to split queries so an ingester won't have to handle a query for a full
query_ingesters_within
window - this could involve a large amount of data. To account for this, this PR introduces a new optionsplit_ingester_queries_by_interval
on the query-frontend; this setting is disabled by default.Checklist
CONTRIBUTING.md
guide (required)CHANGELOG.md
updatedadd-to-release-notes
labeldocs/sources/setup/upgrade/_index.md
production/helm/loki/Chart.yaml
and updateproduction/helm/loki/CHANGELOG.md
andproduction/helm/loki/README.md
. Example PRdeprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR