Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider range and offset in queries while looking for schema config for query sharding #7880

Merged

Conversation

sandeepsukhani
Copy link
Contributor

@sandeepsukhani sandeepsukhani commented Dec 8, 2022

What this PR does / why we need it:
We disable query sharding when a query touches multiple schema configs since they might have different sharding factors and might handle sharding differently. This check was done purely on the start and end time of the query without considering range and offset, which would change the length of the actual data being queried.

Besides being incorrect, it also causes us to fail queries when migrating between tsdb and non-tsdb stores since both handle query sharding differently. For e.g., if the prev schema is boltdb-shipper and starting today, tsdb index is being used, so if we do a query sum(rate({foo="bar"}[24h]) even with start and end within tsdb range, we will fail the query with error incompatible index shard if dynamic sharding goes with shard factor other than 32(default shard factor in ingester for inverted index). The reason being the range here is 24h which causes us to process data from previous schema as well.

This PR takes care of the issue by factoring in range and offset in the queries while looking for schema config for query sharding.

Checklist

  • CHANGELOG.md updated

@sandeepsukhani sandeepsukhani requested a review from a team as a code owner December 8, 2022 09:02
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0.1%
+               iter	0%
+            storage	0%
+           chunkenc	0%
-              logql	-0.4%
+               loki	0%

Copy link
Contributor

@kavirajk kavirajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch :)

LGTM. Feel like we should lock this behavior with some tests?

return 0, 0, err
}

if _, ok := expr.(syntax.SampleExpr); !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this also changes the previous behavior of returning non-nil errors for LogSelectorExpr but now it would return nil error.

Do you think this would break anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is okay since previous usage was only in metric query path and range and offset are only used in metric queries.

if r, ok := e.(*syntax.LogRange); ok && r.Interval > max {
max = r.Interval
if r, ok := e.(*syntax.LogRange); ok {
if r.Interval > maxRVDuration {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: whats the point of comparing it with maxRvDuration and maxOffset. They have only zero values right? (from the above declarations)

why not just maxRVDuration, maxOffset = r.Internval, r.Offset? Given, there can be only one LogRange expression in a given LogQL.? (can LogQL contains multiple Range expression? I doubt it though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can have multiple of them with binary operations like rate({job="foo"} [1m] offset 5m) / rate({job="bar"} [1m] offset 15m)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! 👍

@sandeepsukhani
Copy link
Contributor Author

Nice catch :)

LGTM. Feel like we should lock this behavior with some tests?

yeah, I am working on the tests. Didn't get to work on them due to having my hand in multiple things and would like to get this PR in this week's build. Will let you know when I am done with the tests.

@pull-request-size pull-request-size bot added size/L and removed size/M labels Dec 8, 2022
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0.4%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0.4%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@sandeepsukhani sandeepsukhani merged commit e9c93cd into grafana:main Dec 9, 2022
sandeepsukhani added a commit that referenced this pull request Dec 9, 2022
…for query sharding (#7880)

**What this PR does / why we need it**:
We disable query sharding when a query touches multiple schema configs
since they might have different sharding factors and might handle
sharding differently. This check was done purely on the start and end
time of the query without considering `range` and `offset`, which would
change the length of the actual data being queried.

Besides being incorrect, it also causes us to fail queries when
migrating between tsdb and non-tsdb stores since both handle query
sharding differently. For e.g., if the prev schema is `boltdb-shipper`
and starting today, `tsdb` index is being used, so if we do a query
`sum(rate({foo="bar"}[24h])` even with start and end within `tsdb`
range, we will fail the query with error `incompatible index shard` if
dynamic sharding goes with shard factor other than `32`(default shard
factor in ingester for inverted index). The reason being the `range`
here is `24h` which causes us to process data from previous schema as
well.

This PR takes care of the issue by factoring in `range` and `offset` in
the queries while looking for schema config for query sharding.

**Checklist**
- [x] `CHANGELOG.md` updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants