Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve performance of first_over_time and last_over_time queries by sharding them #11605

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

jeschkies
Copy link
Contributor

@jeschkies jeschkies commented Jan 8, 2024

What this PR does / why we need it:
With the introduction of sending query plans with custom nodes to the querier we can now shard first_over_time queries.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

Copy link
Contributor

github-actions bot commented Jan 8, 2024

Trivy scan found the following vulnerabilities:

  • HIGH, Target: docker.io/grafana/loki:main-61a4205 (alpine 3.18.4), Type: alpine openssl: Incorrect cipher key and IV length processing in libcrypto3 v3.1.3-r0. Fixed in v3.1.4-r0
  • HIGH, Target: docker.io/grafana/loki:main-61a4205 (alpine 3.18.4), Type: alpine openssl: Incorrect cipher key and IV length processing in libssl3 v3.1.3-r0. Fixed in v3.1.4-r0
    \nTo see more details on these vulnerabilities, and how/where to fix them, please run docker build -t grafana/loki:main-61a4205 -f cmd/loki/Dockerfile .
    trivy i grafana/loki:main-61a4205 on your branch. If these were not introduced by your PR, please considering fixing them in via a subsequent PR. Thanks!

@jeschkies jeschkies assigned cstyan and unassigned cstyan Jan 9, 2024
@jeschkies jeschkies requested a review from cstyan January 9, 2024 17:11
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
in each vector, not the first

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm extended this to include implementing last_over_time here.

My only comments would be:

  • that renaming first_over_time.go to something more generic, or first_last_over_time.go like I did in my branch, probably makes sense
  • some comments would be useful (such as where I commented on the PR) for first time readers of the code

pkg/logql/first_over_time.go Outdated Show resolved Hide resolved
pkg/logql/first_over_time.go Outdated Show resolved Hide resolved
T: ts,
//T: ts,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can do it like this. What do you think, @cstyan?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code on main overrides he timestamps in each vector.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the timestamp from the actual vector samples more accurate than the one got from stepEvaluator.Next()?

Can you help me understand why it won't make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make sense for us. I'm just not sure some other component is relying on this overridden timestamp.

pkg/logql/first_over_time.go Outdated Show resolved Hide resolved
@jeschkies jeschkies marked this pull request as ready for review January 23, 2024 11:15
@jeschkies jeschkies requested a review from a team as a code owner January 23, 2024 11:15
@kavirajk kavirajk changed the title Shard first_over_time queries. Shard first_over_time and last_over_time queries. Jan 29, 2024
// order (see next, moves the current timestamp forward by step). So this
// means that for each downstreamed shard of a first_over_time, selecting the
// first sample here in each iterator is getting us the earliest timestamped
// value. Later on when we merge we select the earliest from all the shards.
Copy link
Collaborator

@kavirajk kavirajk Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this comment should be broken down into pieces and moved to three different iterators.
firstWithTimestampBatch, lastWithTimestampBatch and mergeOverTimeEvaluator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@kavirajk kavirajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great. Nice work folks! learned few things about sharding in downstream engine.

Added few comments. One question for me is, how do we make sure the correctness of first_over_time and last_over_time running with old engine vs downstream engine.?

I see you added tests in TestMappingEquivalence. But do you think should be also added on other tests like TestRangeMappingEquivalence on downstream engine tests?

Copy link
Collaborator

@kavirajk kavirajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks for addressing the feedbacks.

Still this one applies I think.

I see you added tests in TestMappingEquivalence. But do you think should be also added on other tests like TestRangeMappingEquivalence on downstream engine tests? Rationale being the later tests for more cases of different range aggregation.

approving to unblock.

@cstyan
Copy link
Contributor

cstyan commented Feb 20, 2024

hey @kavirajk, actually something fun I learned this past week when finally finding the cause of the issues with our quantile_over_time, TestMappingEquivalence is actually testing the results equivalency for each query as a range query. What we don't have (AFAICT) is tests for the results equivalency for instant queries. The test names here are actually a bit misleading; TestMappingEquivalence is testing for equivalency with sharded queries while TestRangeMappingEquivalence is testing for equivalency with time range split queries.

@cstyan cstyan changed the title Shard first_over_time and last_over_time queries. feat: improve performance of first_over_time and last_over_time queries by sharding them Apr 23, 2024
@cstyan
Copy link
Contributor

cstyan commented Apr 23, 2024

@jeschkies any opinions on merging this as is vs as behind a feature flag? given that it's not being executed via a probabilistic data structure I don't think we need a feature flag

cstyan and others added 4 commits May 3, 2024 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants