Skip to content

ES|QL: Add min competitive optimization for lucene operators #136267

@ioanatia

Description

@ioanatia

We currently only push ES|QL conditions that have an exact equivalent to a Lucene query.

Take as example, where all conditions are pushed down to lucene and we use a SORT that's also pushed down:

FROM wikipedia METADATA _score
| WHERE title:"europe"
| sort _score desc
| limit 10

This query will use the LuceneTopNSourceOperator and should be close in performance to a running a match query in the DSL. The LuceneTopNSourceOperator will emit at most 10 rows that need to be processed on the compute service side.

The moment we have a filter condition that is not pushed down, we no longer use the LuceneTopNSourceOperator, but LuceneSourceOperator. We will still push down the match query, but LuceneSourceOperator will output all docs that match the query string. These docs will then be processed on the compute service side to apply the non-pushable filter and then sorted to get the the top 10.

FROM wikipedia METADATA _score
| WHERE title:"europe" and length(title) > 10
| sort _score desc
| limit 10

We should look into whether we can push down more WHERE conditions as filters.
We will likely need a custom Lucene query for this that can evaluate an ES|QL expression.
We can start with something simple such as pushing down only conditions that depend on the Literals and indexed fields (not other runtime columns resulted from EVALs).
If we want to first validate if this would improve things, we can start with a simple prototype that pushes down a set conditions like length(title) > 10 as a painless script query and then run a simple benchmark on a larger dataset (where it is more likely that we will see an improvement).

EDIT: We should look into adding a min competitive optimization.
For example we can have a callback between the LuceneSourceOperator and TopNOperator such that we can set a min competitive score back in LuceneSourceOperator once we fill the priority queue in TopNOperator

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions