[RFC] Query Planning and Rewriting #12390

jainankitk · 2024-02-20T17:27:07Z

Problem statement

In traditional database systems, query planner and optimizer chooses the right indices and join order to ensure efficient execution of query. Opensearch does not have any such component causing degraded performance for queries not written well. The profiler can provide insights into the latency breakdown of each phase, but does not automatically optimize the queries for customers. The impact of manually rewriting the query has been confirmed both by github issues and during managed service customer engagements:

Github Issues:
- Date histogram performance can significantly benefit by rewriting the aggregation into range query filter wherever possible (Github issue - Understand/Improve the performance of date histograms #9310)
- Dynamic pruning for cardinality aggregation - [Feature Request] Make use of dynamic pruning for faster cardinality aggregations #11959
Other rewrites:
- Stored fields/_source to docvalue_fields - This optimization has resulted in significant performance improvement for some of the managed service customers
- Map/Global Ordinals execution hint - One of the managed service customer was able to achieve more consistent performance by providing map hint instead of default global ordinals

High Level Approach

There are primarily two approaches for solving this problem. One approach relies on cost estimation for planning the query similar to traditional database systems and other is more dynamic in nature by collecting feedback from the performance of rewritten queries and accordingly, enabling/disabling and tuning the parameters for query rewriting.

Query Cost Estimation Based

The key aspect of this approach is the query cost estimation component. Given any query, it is able to guess the query cost well allowing the query rewriter to compare multiple query plans. Accordingly, it can choose the most efficient form for query execution.

Pros:

Low cost, insignificant overhead. Need not worry about workload changes
Handles >1 rewrite for single query easily, by generating multiple plans with those rewrites

Cons:

Estimation process is complex due to sheer number of cases, might rely on clause execution time feedback
Hard to come up with the accurate estimates

Rewritten Query Execution Based

This approach starts with the query rewriting in shadow mode. For every query, it checks if the query is rewrite eligible and samples such queries, executing them asynchronously (shadow mode) and comparing the execution time for the original vs rewritten query. Along with the execution time, every rewrite logic emits the value of tunable rewrite parameters for the query rewrites. Taking date histogram and doc values rewrite as example, we can expect following data after few executions:

Date Histogram Parameters
Rewrite efficiency:

directly proportional to document hit count
inversely proportional to bucket count

Original Time	Rewrite Time	Bucket Count	Hit Count
500	50	20	1M
300	30	50	500k
40	60	500	500
20	40	1000	100

Doc Values Rewrite Parameters
Rewrite efficiency:

directly proportional to total fields indexed
inversely proportional to field requested

Original Time	Rewrite Time	Fields Requested	Total Fields Indexed
500	50	3	1000
300	30	1	500
350	500	70	100
250	400	150	200

Using the above data, query insights plugin will be able to help detect the right parameter values for each query rewrite type. And once it has sufficient confidence, it can operate in reverse shadow mode where the original query is run occasionally to detect any changes in workload for that particular type of rewrite.

Pros:

Makes query rewriting pluggable as and when new rewrites are identified. Each rewrite needs to plugin itself by providing the list of parameters on which it can be tuned
Actually running the query

Cons:

Cannot be done for every query to limit the overhead, relies on sampling
Need to handle caching for getting right execution times
Difficult to handle >1 rewrite for single query easily, as every rewrite has cost
Need to run even original query to detect any workload changes
Additional resource consumption for async execution of rewritten / original query

Mitigations:

Reduce the subset by limiting to expensive queries > 100/200ms
Try the query rewrite on few shards for limiting performance impact
Threshold the query execution time based on actual/rewritten query, maybe 2x (Query Sandboxing / Hard Cancellation)

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

anirudha · 2024-02-20T17:38:59Z

review step 4 :
https://github.com/opensearch-project/sql/blob/main/docs/dev/intro-architecture.md

peternied · 2024-02-21T16:29:40Z

[Triage - attendees 1 2 3 4 5]
@jainankitk Thanks for filing this rfc

kkmr · 2024-02-21T18:57:42Z

Most existing query engines don't run optimized queries in shadow mode - They optimize the plan and execute it. I would consider requesting customers to opt-in when we launch the feature initially. Later, once we have tuned the system, we can make query planning the default and let customers turn it off.

kkmr · 2024-02-21T18:58:18Z

How Good are Query Optimizers, Really? (https://www.vldb.org/pvldb/vol9/p204-leis.pdf) might be relevant reading.

jainankitk · 2024-02-28T01:56:51Z

How Good are Query Optimizers, Really? (https://www.vldb.org/pvldb/vol9/p204-leis.pdf) might be relevant reading.

Thanks @kkmr for sharing this. I went through this paper and this is mostly around cardinality estimation and join order both of which are not directly relevant for Opensearch. Few observations from this publication:

Real vs benchmark data can have very different characteristics as the uniformity and correlations can be much more different. While the estimation is pretty accurate for TPC-H generated data, not the case for JOB (IMDB) data
Cost estimation is hard problem, but improvement can be significant as called out in Section 5.4 when the estimates were replaced by actual cardinalities for planning the join

Most existing query engines don't run optimized queries in shadow mode - They optimize the plan and execute it. I would consider requesting customers to opt-in when we launch the feature initially.

I am also leaning towards introducing query rewriting as feature which can be tuned or turned off using cluster setting. This will allow the front-loading of core value proposition, and the framework for tuning those settings/parameters can be worked upon in later milestones.

Later, once we have tuned the system, we can make query planning the default and let customers turn it off.

While I hope, this would be eventually possible, I feel that tuning for query planning will be continuous exercise for different types of customer workloads

jainankitk · 2024-08-09T05:23:12Z

Recently, Opensearch customers have shown strong interest for more performant and resource efficient Join operation. One approach for such "Join" is making it first class citizen within Opensearch and building a query planner more aware of OpenSearch shard and Lucene index architecture. Both logical and physical plan to execute a query could be optimized over existing SQL plugin architecture for “Join“, which is based on executing OpenSearch queries, gathering all data and performing Join operation over them. It doesn’t deal or make use of intricacies of OpenSearch architecture but works more like an application sitting on top of OpenSearch. Very simple join operation across 2 indices in OpenSearch could look something like below:

{
    "join_query": {
        "queries": {
            // all single index query or nested join_query
            "left": {
                "index": "index1",
                "query": {
                    // regular single query
                },
                "fields": ["f1, f2"]
            },
            "right": {
                "index": "index2",
                "query": {
                    // regular single query
                },
                "fields": ["f1, f2"]
            },
            "derived_field": [
                {
                    "name": "f3"
                    "value": "left.f1 + right.f1"
                }
            ]
        },
        "join" : {
            "type": "inner/left/right/semi", 
            "query": {
                "boolean": {
                    "must": [
                        {
                            "match" : {
                                "left.f1": "emit(right.f1)"
                            }
                        }, 
                        {
                            "range" : {
                                "left.f2" : {
                                    "gte" : "emit(right.f2)"
                                }
                            }
                        }
                    ] 
                }
            }
        },
        "agg": { // aggregation on joined index
            // regular agg on single index
        },
        "fields": []
    }
}

Note: Extensive Query DSL grammar and query engine related changes will be covered in much more depth as part of separate RFC by @rishabhmaurya. This RFC is specifically focused on the Query Planning aspects around this Join operation.

jainankitk · 2024-08-09T05:26:23Z

High level design for Query Planner within OpenSearch could look something like below:

Query Planner can be a library or service available on all the coordinator nodes. It will take QueryDSL and IndexMetadata as primary input and generation Query Execution plan understood by OpenSearch Query Engine as output. There might be intermediate step similar to canMatch phase today for collecting additional metadata to allow better cardinality estimation and cost modelling. There can also be an option to collect this metadata asynchronously and keep it cached.

Logical Planner - This will be responsible for generating logical plan from given query DSL. We can potentially have an intermediate step for translating query DSL into common language allowing Query Planner to be generic and engine agnostic. Any Query Engine can provide adapter for doing this translation for reusing this Query Planner. Although, that translation is a cost, and might make sense to skip for OpenSearch specific implementation.
Rule based Optimizations - This is the rule engine for optimizing the logical plan. The rule engine should expose APIs for dynamically enabling/disabling rules. Some of the rules might be Predicate push down, Projection push down, Aggregation push down for decomposable functions like min, max for simplification before physical planning.
Physical Plan Generation - Within OpenSearch, Query Planner might need additional metadata like Index data distribution, histograms, shard routings which can be collected using additional phase similar to canMatch from all the data nodes by coordinator and relayed to the Query Planner. Shard routing can drive important decisions around the join algorithm to use (Hash, Nested or MergeSort) and the associated network costs for each plan. As called out in the overview above, this information can be cached using an asynchronous job scheduler
Cardinality Estimation - Having near correct estimates on number of documents processed across indices can have significant cost implications for the algorithm being used to perform the join. We can leverage existing metadata for lucene indices, for improving these estimations. Although, the estimates accuracy will still depend a lot on the data type, data distribution (more uniform, better the estimates are), any arithmetic or string operations before joining the column, querying equals vs not equals (the matching set is often bigger and diverse for not equals).
Cost Optimization - Very low cardinality for one of the indices, allows the join operation to be run as query across other index significantly reducing the cost. Unlike SQL databases, in OpenSearch, most of the document fields are indexed as well making the query operation across fields most efficient. Also, we have multiple data structures like doc values and stored fields for fetching projection fields. In many cases, it might be better to fetch projection fields lazily unlike SQL.

dblock · 2024-08-12T16:50:13Z

At a high level this looks good. I would like to battle-test this proposal against vector search because it has some native parts, maybe @navneet1v could take a look?

penghuo · 2024-08-13T16:17:36Z

love this idea. some thoughts

Execution Perspective: Should we consider developing the New-QueryEngine to primarily serve SQL first, and then gradually migrate DSL? This approach offers three key benefits:
1.1. It aligns with our goal of moving SQL to the core.
1.2. It has less impact on existing DSL users, while encouraging them to migrate to SQL.
1.3. It avoids the need to introduce new JOIN grammar in DSL.
Building New-QueryEngine: I suggest we construct the New-QueryEngine using battle-tested components. There's a relevant paper titled Assembling a Query Engine from Spare Parts that could provide valuable insights.

rishabhmaurya · 2024-08-13T17:44:42Z

@penghuo thanks for providing feedback. I'm little hesitant in moving SQL support to core as it could be a significant overhead in maintenance in the core and I like the plugin model better for supporting a new query language frontend.
However, I'm totally aligned on implementing functionality which SQL supports and OpenSearch query DSL doesn't, like JOIN, in the core if its in demand among our users.
I would like to understand more on why do we have "It aligns with our goal of moving SQL to the core." as a goal?
Tagging other members and maintainers to take their input as well @andrross @msfroh @reta @dblock @smacrakis

kkmr · 2024-08-13T18:39:41Z

Do look the cratedb implementation of joins (https://cratedb.com/docs/crate/reference/en/latest/concepts/joins.html). CrateDB is derived from elastic- so it uses Lucene as the storage. It also ports over postgres query engine code to Java (Postgres QP is reputed to be pretty good). So some of the ideas in the cratedb code might translate well to this effort.

navneet1v · 2024-08-15T18:05:23Z

At a high level this looks good. I would like to battle-test this proposal against vector search because it has some native parts, maybe @navneet1v could take a look?

I took a high level look. In terms of vector search, as long as query is coming via Lucene Searcher interface vector search won't have any issue. @dblock is there any specific thing you want me to look at.

Because after reading the doc, what I can see on high level is we are trying to rewrite the queries to optimized queries and not sure how vector queries can make use of it because vector search queries doesn't have complexities like Histogram aggs etc.

Please correct me if my understanding is wrong.

dblock · 2024-08-15T19:10:02Z

Because after reading the doc, what I can see on high level is we are trying to rewrite the queries to optimized queries and not sure how vector queries can make use of it because vector search queries doesn't have complexities like Histogram aggs etc.

Thanks for chiming in! I wanted to make sure that the cost estimation will be able to (eventually) account for native parts of search (faiss, etc.).

reta · 2024-08-15T19:48:11Z

Thanks for putting it together, @jainankitk ! I would really like to see Query Cost Estimation Based implementation (although I 💯 agree this is hard problem). On the bright side, we could engage with Apache Lucene folks to sketch out a "fair" query cost estimation model per index (== OS shard).

navneet1v · 2024-08-15T20:48:28Z

Because after reading the doc, what I can see on high level is we are trying to rewrite the queries to optimized queries and not sure how vector queries can make use of it because vector search queries doesn't have complexities like Histogram aggs etc.

Thanks for chiming in! I wanted to make sure that the cost estimation will be able to (eventually) account for native parts of search (faiss, etc.).

@dblock
Now I understand your ask. Thanks for clarification.

To know what will be the cost, I would like to know more details on how cost estimation is working. I see we have sprinkled this information little bit in the GH issue, but those might not work for every type of query. For vector search we cannot estimate just like this. There are other(algorithm related and query related) parameters that defines the scope of search. I think it will require more discussion.

dblock · 2024-08-16T15:14:15Z

I think it will require more discussion.

More importantly is that we build mechanisms (interfaces) for the knn plugin to participate in the cost estimation.

jainankitk · 2024-08-19T20:54:06Z

To know what will be the cost, I would like to know more details on how cost estimation is working. I see we have sprinkled this information little bit in the GH issue, but those might not work for every type of query. For vector search we cannot estimate just like this. There are other(algorithm related and query related) parameters that defines the scope of search. I think it will require more discussion.

@navneet1v - Thank you for providing your feedback from the knn perspective. While we are working on ironing out more details around cardinality/cost estimation and overall long term approach, I am wondering if there are specific knn queries that can benefit from Query Planning. For example - Join operation can significantly benefit from the order in which it is executed, are there similar rewriting optimizations that can be done for knn queries resulting in noticeable performance improvement?

jainankitk added enhancement Enhancement or improvement to existing feature or request untriaged labels Feb 20, 2024

github-actions bot added the Search:Performance label Feb 20, 2024

ansjcy mentioned this issue Feb 20, 2024

[Query Insights] Capture query-level resource usage metrics #12399

Closed

jainankitk mentioned this issue Feb 21, 2024

[RFC] High Level Vision for Core Search in OpenSearch #8879

Open

peternied added RFC Issues requesting major changes and removed untriaged labels Feb 21, 2024

sohami added the Roadmap:Cost/Performance/Scale Project-wide roadmap label label May 14, 2024

getsaurabh02 assigned jainankitk Aug 5, 2024

rishabhmaurya mentioned this issue Aug 9, 2024

[RFC] Join support in OpenSearch #15185

Open

Bukhtawar mentioned this issue Aug 19, 2024

Adding @jainankitk as a Maintainer #15304

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Query Planning and Rewriting #12390

[RFC] Query Planning and Rewriting #12390

jainankitk commented Feb 20, 2024 •

edited

Loading

anirudha commented Feb 20, 2024

peternied commented Feb 21, 2024

kkmr commented Feb 21, 2024

kkmr commented Feb 21, 2024

jainankitk commented Feb 28, 2024

jainankitk commented Aug 9, 2024

jainankitk commented Aug 9, 2024 •

edited

Loading

dblock commented Aug 12, 2024

penghuo commented Aug 13, 2024

rishabhmaurya commented Aug 13, 2024 •

edited

Loading

kkmr commented Aug 13, 2024

navneet1v commented Aug 15, 2024

dblock commented Aug 15, 2024

reta commented Aug 15, 2024

navneet1v commented Aug 15, 2024 •

edited

Loading

dblock commented Aug 16, 2024

jainankitk commented Aug 19, 2024

[RFC] Query Planning and Rewriting #12390

[RFC] Query Planning and Rewriting #12390

Comments

jainankitk commented Feb 20, 2024 • edited Loading

Problem statement

High Level Approach

Query Cost Estimation Based

Rewritten Query Execution Based

Related component

Describe alternatives you've considered

Additional context

anirudha commented Feb 20, 2024

peternied commented Feb 21, 2024

kkmr commented Feb 21, 2024

kkmr commented Feb 21, 2024

jainankitk commented Feb 28, 2024

jainankitk commented Aug 9, 2024

jainankitk commented Aug 9, 2024 • edited Loading

dblock commented Aug 12, 2024

penghuo commented Aug 13, 2024

rishabhmaurya commented Aug 13, 2024 • edited Loading

kkmr commented Aug 13, 2024

navneet1v commented Aug 15, 2024

dblock commented Aug 15, 2024

reta commented Aug 15, 2024

navneet1v commented Aug 15, 2024 • edited Loading

dblock commented Aug 16, 2024

jainankitk commented Aug 19, 2024

jainankitk commented Feb 20, 2024 •

edited

Loading

jainankitk commented Aug 9, 2024 •

edited

Loading

rishabhmaurya commented Aug 13, 2024 •

edited

Loading

navneet1v commented Aug 15, 2024 •

edited

Loading