Skip to content

Conversation

pmpailis
Copy link
Contributor

@pmpailis pmpailis commented Sep 29, 2025

This PR aims to optimize the way we handle constant vectors in ES|QL vector similarity functions. The main idea is to reuse a single object and avoid duplicating and creating huge FloatBlock instances and identical float arrays.

Benchmark results show a nice improvement for the script-score functions using the so_vector rally track:

| Metric                                      | Main (ms) | Candidate (ms) | Δ% (Candidate vs Main) |
|---------------------------------------------|-----------|----------------|-------------------------|
| esql-script-score-query-match-all            | 6528.64   | 2567.18        | -60.7%                 |
| esql-script-score-query-acceptedAnswerId     | 6388.91   | 2171.75        | -66.0%                 |
| esql-script-score-query-java                 | 714.05    | 261.21         | -63.4%                 |
| esql-script-score-query-javascript           | 594.97    | 212.74         | -64.3%                 |
| esql-script-score-query-css                  | 226.48    | 86.48          | -61.8%                 |
| esql-script-score-query-concurrency          | 23.44     | 13.06          | -44.3%                 |

Closes #134210

Copy link
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM - a question about creating multiple subclasses

public final EvalOperator.ExpressionEvaluator.Factory toEvaluator(EvaluatorMapper.ToEvaluator toEvaluator) {
VectorValueProvider.Builder leftVectorProviderBuilder = new VectorValueProvider.Builder();
VectorValueProvider.Builder rightVectorProviderBuilder = new VectorValueProvider.Builder();
if (left() instanceof Literal && left().dataType() == DENSE_VECTOR) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can count on the arguments being type dense_vector, it is already checked as part of checkDenseVectorParam() and would have been caught at the Analyzer level

private FloatBlock block;
private float[] scratch;

VectorValueProvider(ArrayList<Float> constantVector, EvalOperator.ExpressionEvaluator expressionEvaluator) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to use a class to wrap the different ways of retrieving vectors!

Would it make sense to create two separate subclasses, so we don't need to keep checking for what has been provided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice suggestion and will make things easier to follow, thanks! Will refactor to have 2 subclasses.

@pmpailis pmpailis marked this pull request as ready for review October 1, 2025 08:45
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 1, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

assumeTrue("Similarity function is not enabled", capability().isEnabled());
}

public abstract String getBaseEvaluatorName();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlosdelest trying to find a better way to access this as we already pass it in the similarityParameters , but it was getting a bit too complicated for a small-scoped change, as the PR only affects the similarity functions. Happy to discuss alternatives :)

@pmpailis
Copy link
Contributor Author

pmpailis commented Oct 1, 2025

run elasticsearch-ci/part-3


/**
* Fields with this type are dense vectors, represented as an array of double values.
* Fields with this type are dense vectors, represented as an array of float values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😁 thanks for fixing!

}

@Override
public String toString() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about creating a different factory for each type of VectorValueProvider, both of them implementing the same interface?

That would help with not having to check which of the two fields need to be used. Each of the classes would have its build() and toString() methods.

I think it makes sense as the factories create two different objects - makes sense to make them different.

Maybe it¡s a bit convoluted for what we're doing but probably cleaner. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ makes sense. Didn't want to have a ton of new factories/interfaces, but it will be cleaner. Will refactor as suggested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in ee019f2

…github.com:pmpailis/elasticsearch into pb_134210_optimize_vector_similarity_when_constant
Copy link
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Your first PR achieved such a speedup 🚀

public final EvalOperator.ExpressionEvaluator.Factory toEvaluator(EvaluatorMapper.ToEvaluator toEvaluator) {
VectorValueProviderFactory leftVectorProviderFactory;
VectorValueProviderFactory rightVectorProviderFactory;
if (left() instanceof Literal) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - we could extract this to a private method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ updated in 9f47ab2

@pmpailis pmpailis enabled auto-merge (squash) October 2, 2025 11:07
@pmpailis pmpailis merged commit abf225c into elastic:main Oct 2, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search Relevance/ES|QL Search functionality in ES|QL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ES|QL: Optimize vector similarity functions when one arg is a constant
3 participants