Skip to content

Conversation

@carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Nov 20, 2025

KNN function allowed the use of min_candidates to set the minimum number of candidates to retrieve per shard. What it did was replace k for the underlying knn query.

This caused the following problems:

  • Users were unable to modify num_candidates to explore more candidates per shard before getting the top k
  • min_candidates was subject to oversampling, which was not clear in the docs

This PR adds more options to KNN function:

  • Allows setting k to override the LIMIT applied to the function, and retrieve more results from each shard
  • Changes min_candidates to be equivalent to setting num_candidates
  • Adds visit_percentage option for disk_bbq index type

Usage including all possible options for KNN would become:

FROM test
| WHERE KNN(dense_vector, [0.1, 0.2, 0.3],
    {"k": 10, "min_candidates": 20, "rescore_oversample": 1.5, "similarity": 0.5, "boost": 2.0, "visit_percentage": 0.25})

@carlosdelest carlosdelest added >enhancement :Analytics/ES|QL AKA ESQL ES|QL-ui Impacts ES|QL UI Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch Team:Search - Relevance The Search organization Search Relevance team :Search Relevance/ES|QL Search functionality in ES|QL labels Nov 20, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @carlosdelest, I've created a changelog YAML for you.

@github-actions
Copy link
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@carlosdelest carlosdelest marked this pull request as ready for review November 20, 2025 16:33
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed Team:Search - Relevance The Search organization Search Relevance team labels Nov 20, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one nit regarding how the default is set for visit_percentage, LGTM otherwise.

+ "Must be between 0 and 100. 0 will default to using num_candidates for calculating the percent visited. "
+ "Increasing visit_percentage tends to improve the accuracy of the final results. "
+ "If visit_percentage is set for bbq_disk, num_candidates is ignored. "
+ "Defaults to ~1% per shard for every 1 million vectors"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this wording and where this default is coming from, @benwtrent can you verify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't default to anything. We shouldn't default it to anything. We dynamically set it according to num_candidates, which is dynamically determined via k (if not provided), which is required for knn search to work.

…ns-change

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
@carlosdelest carlosdelest enabled auto-merge (squash) November 21, 2025 10:30
@carlosdelest carlosdelest merged commit 9c924bc into elastic:main Nov 21, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement ES|QL-ui Impacts ES|QL UI :Search Relevance/ES|QL Search functionality in ES|QL Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants