Skip to content
This repository has been archived by the owner on Jul 19, 2023. It is now read-only.

Split query by interval #713

Merged
merged 9 commits into from
May 31, 2023
Merged

Split query by interval #713

merged 9 commits into from
May 31, 2023

Conversation

kolesnikovae
Copy link
Contributor

@kolesnikovae kolesnikovae commented May 24, 2023

Resolves #696

The PR contains the basic implementation of query parallelization by time sub-ranges. I decided to not over-copmlicate the solution for now, because the primary goal of the work is to find bottle necks in the read path. Over time, however, we should have a composable query plan (logical and physical).

The implementation only covers the following APIs:

  • SelectMergeStacktraces
  • SelectSeries

The following configuration options were added:

  • querier.split-queries-by-interval. Defaults to 0 – the new mechanism is disabled.
  • querier.max-query-parallelism. Defaults to 0 – the limit is disabled. Specifies how many sub-queries can be executed simultaneously per a single query in the frontend. In practice, I don't think we need to limit this, or have a very big value (thousands), because it is up to the scheduler (query-scheduler.max-outstanding-requests-per-tenant defaults to 100) and the querier to decide when to execute a sub-query. The parameter is present for consistence with the existing frontend implementations.
  • querier.max-concurrent. Defaults to 4 – The default value was previously set statically. Indicates how many requests (queries or sub-queries) a single querier can handle concurrently.

(I propose to set the values separately, after a sane default is found).


There's an issue that's worths mentioning: a flame graph aggregated from many SelectMergeStacktraces calls differs from the one fetched without parallelization because of how the truncation works. The difference is small but might be noticeable is some cases:

Comparison

The discrepancy comes from the fact that the set of truncated nodes for each "intermediate" flame graph (read: tree) is different and their weights vary from one to another. I'm not sure if this is something that requires an immediate fix, but we should keep an eye on this.

Before is on the left. Notice that the share of other node has decreased. However, it may have opposite results in some cases, where the nodes are truncated early (when we don't know about other trees) before a critical mass has built up, contributing to other more than we want. This is a problem because our assumption that we preserve the top N most significant nodes is not true anymore. In case if it becomes a real issue, I propose to simply increase the number of nodes for trees that are inputs of the aggregation (e.g. by 20-100%), first.

image

@cyriltovena
Copy link
Collaborator

Definitively like this approach.

@kolesnikovae kolesnikovae changed the title Separate handlers in querier fronted Split query by interval May 25, 2023
@kolesnikovae kolesnikovae force-pushed the feat/query_split_by_interval branch 2 times, most recently from 2c5afd8 to f6efd32 Compare May 29, 2023 09:53
@kolesnikovae kolesnikovae marked this pull request as ready for review May 29, 2023 15:39
pkg/util/math/math.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cyriltovena
Copy link
Collaborator

I'm waiting for this one to be merged to start working on query limits in the frontend.

@kolesnikovae kolesnikovae merged commit 9f19269 into main May 31, 2023
17 checks passed
@kolesnikovae kolesnikovae deleted the feat/query_split_by_interval branch May 31, 2023 15:27
simonswine pushed a commit to simonswine/pyroscope that referenced this pull request Jun 30, 2023
* Separate handlers in querier fronted

* Clarify http responce decompression implementation details

* Draft query time split

* Split SelectSeries by time

* Align interval to step duration

* Remove unused code

* Add querier.max-concurrent option

* Fix connect headers
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query time splitting
2 participants