Skip to content

Conversation

@dimitris-athanasiou
Copy link
Contributor

This POC implements automatic prefiltering for semantic_text queries.

We achieve this by adding an AutoPrefilteringScope object to the SearchExecutionContext. When we convert a query to a lucene query, queries may push prefilters to the AutoPrefilteringScope. At that stage, queries have already been rewritten. Semantic queries using text_embedding inference endpoints are rewritten to knn vector queries that are auto-prefiltering enabled. Then, when an auto-prefiltering enabled knn vector query is converted to its lucene equivalent, we fetch the prefilters from the SearchExecutionContext and we apply them to the knn vector query - which supports prefiltering already.

@dimitris-athanasiou dimitris-athanasiou added >non-issue :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v9.3.0 labels Nov 7, 2025
@dimitris-athanasiou
Copy link
Contributor Author

dimitris-athanasiou commented Nov 7, 2025

I had a very constructive discussion with @jimczi and we decided to try this out this way. This approach is way less intrusive than #137467. I'm still missing tests, but I wanted to share this ASAP to collect feedback.

@dimitris-athanasiou dimitris-athanasiou changed the title POC-2 - Auto prefiltering for knn vector query POC-2 - Auto prefiltering for semantic text queries Nov 7, 2025
This POC implements automatic prefiltering for `semantic_text` queries.

We achieve this by adding an `AutoPrefilteringScope` object to the
`SearchExecutionContext`. When we convert a query to a lucene query,
queries may push prefilters to the `AutoPrefilteringScope`. At that
stage, queries have already been rewritten. Semantic queries using
`text_embedding` inference endpoints are rewritten to knn vector queries
that are auto-prefiltering enabled. Then, when an auto-prefiltering
enabled knn vector query is converted to its lucene equivalent, we
fetch the prefilters from the `SearchExecutionContext` and we apply
them to the knn vector query - which supports prefiltering already.
@dimitris-athanasiou dimitris-athanasiou force-pushed the auto-prefiltering-for-knn-query branch from a338f0f to 6d68f0b Compare November 7, 2025 15:34
Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice approach, simple and effective 🙌

I'm not entirely sure how this will integrate with ES|QL, but I see it playing out in one of two ways:

  • If ES|QL builds a boolean query out of all its clauses, this would integrate automatically
  • If ES|QL builds a query with a different structure, we may need to add a filters() method to SemanticQueryBuilder and MatchQueryBuilder. This would be a method for internal use only to programmatically set pre-filters when necessary. We could then pass the pre-filters directly to the generated kNN query in SemanticTextFieldMapper#semanticQuery.

@carlosdelest
Copy link
Member

carlosdelest commented Nov 11, 2025

Interesting approach - instead of updating the QueryBuilders for propagation, we store the information in the search context. That's a very nice alternative.

ES|QL will be able to use this directly I think - bool queries are created for combining together ANDs. I think I'll be able to remove the prefiltering rules that are applied to KNN function there, as I believe this approach should take care of them. That's pretty neat and less complexity on ES|QL 💯

On the nit side - I wonder if we need a queue for storing the information - IIUC, we don't pop() anything from there, just add more as we get deeper inside the queries. Maybe a regular List would work?

Very excited about this - it's simple and powerful!

@dimitris-athanasiou
Copy link
Contributor Author

@carlosdelest

On the nit side - I wonder if we need a queue for storing the information - IIUC, we don't pop() anything from there, just add more as we get deeper inside the queries. Maybe a regular List would work?

We do pop. I made AutoPrefilteringScope AutoCloseable so we can use it with try-with-resources and get the pop() for free.

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction!

Some comments

Comment on lines +668 to +674
private List<QueryBuilder> getAllApplicableFilters(SearchExecutionContext context) {
List<QueryBuilder> applicableFilters = new ArrayList<>(filterQueries);
if (isAutoPrefiltering) {
applicableFilters.addAll(context.autoPrefilteringScope().getPrefilters());
}
return applicableFilters;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not mix and match user provided filters and auto-prefilters. It might lead to surprising behaviors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, as a user, I would expect the filters I provide particularly are the only ones applied.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which, I think we can safely assert that filters are empty here, right? Since the only thing that applies isAutoPrefiltering is semantic text :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. As things are today, it is impossible to have auto and explicit prefilters at the same time. I'll safe-guard against mixing them.

Copy link
Contributor Author

@dimitris-athanasiou dimitris-athanasiou Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but unfortunately we are programmatically adding prefilters in

Not sure how to enforce this. But we would have to mess up to apply auto prefiltering to a user knn query, as there is no way to enable auto prefiltering for a user knn query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but unfortunately we are programmatically adding prefilters in

🤔

This is messy, we need to ensure that users cannot do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, users cannot end up getting auto-prefilters on their bespoke knn queries. They cannot enable auto prefiltering, it is only programmatically enabled by semantic text.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They cannot enable auto prefiltering, it is only programmatically enabled by semantic text.

My worry is that somebody in the future will try to enable it.

Can we add a test or something to signal this shouldn't be allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I can add a test where we have explicit knn query under a bool and check it doesn't get any extra prefilters. With some commentary too. If a PR popped that changes that test people would at least have to read the warning, etc. Sounds good?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! Thats great! Just making sure we have an adequate fence

}

private List<QueryBuilder> collectPrefilters() {
return Stream.of(mustClauses, mustNotClauses.stream().map(c -> QueryBuilders.boolQuery().mustNot(c)).toList(), filterClauses)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mustNot thing is tricky, since we already need to iterate all of it again when pushing down stream, why not have this collectPrefilters have a QueryBuilder argument, then we bypass the self concern, and then this can simply return QueryBuilder which is a single BooleanQuery we pass down (as it will be a boolean query anyways ultimately once its finally used right?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no self concern for must_not as we do not apply prefilters to those. I'll add assertions in addBooleanClauses to communicate that contract and keep things simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants