Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a physical plan for block querying #2586

Merged
merged 15 commits into from
Nov 9, 2023

Conversation

simonswine
Copy link
Contributor

@simonswine simonswine commented Oct 25, 2023

This PR will reduce the work necessary to query blocks, by planning out
which blocks are fetched from which ingester/store-gateway.

That way queries will potentially not need profile-by-profile
deduplication for blocks that had been deduplicated by the compactor.

TODO:

  • respect limit the BlockMetacall to start/end in
    ingester/store-gateway
  • respect hints in ingester
  • Reuse the compactor block mark deletion algorithm in part to disqualify blocks (It is a bit more nuanced, so keeping it separate for now)

@simonswine simonswine changed the title WIP: Create a physical plan for blcok querying WIP: Create a physical plan for block querying Oct 26, 2023
@simonswine simonswine force-pushed the 20230928_deduplicate-blocks branch 3 times, most recently from 4300368 to 9338f89 Compare October 31, 2023 18:22
@simonswine simonswine force-pushed the 20230928_deduplicate-blocks branch 3 times, most recently from 8df68f6 to c7c1e11 Compare November 7, 2023 17:03
pkg/phlaredb/phlaredb.go Outdated Show resolved Hide resolved
pkg/querier/ingester_querier.go Show resolved Hide resolved
Comment on lines +364 to +369
// adapt the plan to make sure all replicas will deduplicate
if deduplicate {
for _, hints := range plan {
hints.Deduplication = deduplicate
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that if we got a query like "last 2 days" we will force deduplication as the plan also includes blocks that have not been compacted yet? I guess, this should be mitigated by query time range split – I'm wondering if we should set to the block duration by default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is true. The first step is for the query time range split to alight with roughly the processing time for the compacted block. So it should be slightly higher than block duration, just because we introduce a fair bit of latency before a block is compacted/deduplicated and available to store-gateways.

Ideally in a future where the query-frontend will also be aware of the plan, we can split accordingly to what the block layout is.

api/ingester/v1/ingester.proto Outdated Show resolved Hide resolved
This PR will reduce the work necessary to query blocks, by planning out
which blocks are fetched from which ingester/store-gateway.

That way queries will potentially not need profile-by-profile
deduplication for blocks that had been deduplicated by the compactor.
@simonswine simonswine marked this pull request as ready for review November 8, 2023 19:29
@simonswine simonswine requested review from a team as code owners November 8, 2023 19:29
Copy link
Collaborator

@kolesnikovae kolesnikovae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀 Looking forward to seeing this in action!

}

// TODO(simonswine): Split profiles per row group and run the MergeByStacktraces in parallel.
merge, err := querier.MergeByStacktraces(ctx, iter.NewSliceIterator(querier.Sort(profiles)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid sorting and pulling actual labels. This could be big savings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you mean all in one step: a SelectAndMergeProfiles runs one iterator through data?

@simonswine simonswine changed the title WIP: Create a physical plan for block querying Create a physical plan for block querying Nov 9, 2023
@simonswine simonswine merged commit d4e3b03 into grafana:main Nov 9, 2023
17 checks passed
simonswine added a commit to simonswine/pyroscope that referenced this pull request Nov 9, 2023
* Create a physical plan for block querying

This PR will reduce the work necessary to query blocks, by planning out
which blocks are fetched from which ingester/store-gateway.

That way queries will potentially not need profile-by-profile
deduplication for blocks that had been deduplicated by the compactor.

* Remove unused field

* Missing `make generate` change

* Signal that there are no profiles streamed

* Generate block metadata correctly

* Improve tracing for block plan

* Request BlockMetadata correctly

Query head first (heads and flushing) and then queriers, this will make
sure that we do not miss a block moving into flushing.

* Forward hints correctly to store-gateway

* Sort iterator properly

* Ensure block metadata repsones are ts bound for ingesters

* Ignore storeQueryAfter for plan building

* Move the signalling to close connction to the right place

* Ensure we only read replica for the correct instance type

* Handle no result profiles

* Log full plan in debug level
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants