Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search Query Runtime Cost Calculation #5174

Open
PritLadani opened this issue Nov 9, 2022 · 5 comments
Open

Search Query Runtime Cost Calculation #5174

PritLadani opened this issue Nov 9, 2022 · 5 comments
Labels
enhancement Enhancement or improvement to existing feature or request Indexing & Search

Comments

@PritLadani
Copy link
Contributor

Is your feature request related to a problem? Please describe.
#1179 aims to build resource tracking framework for search queries. As a part of #3982, we have enabled resource tracking for shard level tasks. However, to support search back-pressure and to build a model for query cost estimation as discussed in #1042, we need coordinator level/query level resource consumption stats.

Describe the solution you'd like
We will piggyback the shard tasks' resource consumption along with the ClusterSearchShardsResponse from children nodes(data nodes) to the parent node(coordinator node). We need to change the response structure to accommodate the resource stats.

Describe alternatives you've considered
Another alternative we have considered is, rather than piggybacking the resource stats at the task completion, we can periodically share the resource stats from data nodes to the coordinator node. However, for query cost calculation, we do not need periodic stats from the data nodes. Moreover, sharing the resource consumption stats periodically will introduce overhead of new service running in the background to collect and share the data to the parent node.

Additional context
Just by looking at the resource consumption or aggregating the resource stats of child tasks, we cannot get the estimate of resource consumption of the coordinator task. Hence we cannot estimate whether a search task will cause the node go in duress or not and hence we do not need periodic resource stats from the data nodes.

@PritLadani PritLadani added enhancement Enhancement or improvement to existing feature or request untriaged labels Nov 9, 2022
@reta
Copy link
Collaborator

reta commented Nov 9, 2022

I am not really sure what is being estimated as the "search query cost" here. Based on the description it is deducted as the resource consumption stats, which is post execution of the query. The exactly same query could have drastically different consumption stats over time (fe because new data is being ingested all the time).

What would be useful though is to estimate search query cost before the execution, based on:

  • query complexity (criteria)
  • indices involved (size, # shards, # documents, ...)
  • cluster topology (data nodes, etc)
  • other factors

Does it make sense or am I missing something here?

@dblock
Copy link
Member

dblock commented Nov 9, 2022

I think the proposal is a little unclear on cost estimation vs. query planning, and so @reta is rightfully confused.

I think the purpose of the proposal is to predict the best way possible the "runtime cost" (consumption of time and space) of an incoming query and use it in backpressure. Runtime cost is impacted by the query being made, but it's a lot more impacted by things like the size of data.

So, I propose to explain the goals by calling the ask here as "search query runtime cost" (vs. just search query cost), and calling the non-runtime aspects of a query "query (plan) cost (or complexity)". Does that help?

@PritLadani
Copy link
Contributor Author

@reta We are not really estimating the query cost here, rather we are just calculating the actual query cost(or call it runtime cost). We are building co-ordinator level view of resource consumption for each search request. As discussed in #1179 and #1181, we want to build the aggregated view of resource consumption stats for any given query. For the same, we want to piggyback the consumption stats to the parent node. However, as a part of this issue, we will not make cancellation decisions yet.

Also, exactly the same query can have different resource consumption stats for different scenarios but as @dblock mentioned, we are trying calculate "runtime cost" for a search query.

@dblock Appreciate your suggestion to change it to "Search Query Runtime Cost". Will update the title.

@PritLadani PritLadani changed the title Search Query Cost Calculation Search Query Runtime Cost Calculation Nov 9, 2022
@PritLadani PritLadani reopened this Jul 14, 2023
@anasalkouz
Copy link
Member

Hi @PritLadani, are you actively working on this?

@PritLadani
Copy link
Contributor Author

PritLadani commented Jul 19, 2023

Hey @anasalkouz , I might not be able to take this up as of now.
@kaushalmahi12, are you taking care of this as a part of next milestone of Search Backpressure?
@ajaymovva, I remember you were also building some kind of cost calculator for running tasks. Can this task be considered for that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing & Search
Projects
None yet
Development

No branches or pull requests

5 participants