Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid splitting large range vector aggregation. #5172

Merged
merged 5 commits into from
Jan 19, 2022

Conversation

cyriltovena
Copy link
Contributor

What this PR does / why we need it:

This PR ensure we don't split large range vector aggregation in smaller interval causing waste of resource by processing more than required.

The logic is very simple, it clamp the split by to maximum range vector found in the query.

I tested this in our dev cluster and it's way better:

before


 query="sum(rate({container=\"ingester\"}[7d]))" query_type=metric range_type=range length=1h0m0s step=1s duration=46.313273078s status=200 limit=2512 returned_lines=0 throughput=12GB total_bytes=539GB queue_time=18.756237ms

after

  query="sum(rate({container=\"ingester\"}[7d]))" query_type=metric range_type=range length=1h0m0.001s step=1s duration=25.962113827s status=200 limit=2512 returned_lines=0 throughput=4.2GB total_bytes=110GB queue_time=5.923129ms

5x less data processed, for a ~2x speed improvement.

Which issue(s) this PR fixes:
Fixes #5161

Special notes for your reviewer:

Checklist

  • Documentation added
  • Tests updated
  • Add an entry in the CHANGELOG.md about the changes.

cc @cstyan @owen-d @slim-bean based on your previous experience.

@cyriltovena cyriltovena requested a review from a team as a code owner January 18, 2022 16:24
Copy link
Contributor

@chaudum chaudum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Added suggestions for improving the function comments.

pkg/querier/queryrange/split_by_interval.go Outdated Show resolved Hide resolved
pkg/querier/queryrange/split_by_interval.go Outdated Show resolved Hide resolved
Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! LGTM!

cyriltovena and others added 3 commits January 19, 2022 10:51
Co-authored-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
@cyriltovena cyriltovena merged commit 1a7614f into grafana:main Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Splitting large range vector can lead to high load on a Loki cluster
3 participants