-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ES|QL - KNN function option changes #138372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES|QL - KNN function option changes #138372
Conversation
|
Hi @carlosdelest, I've created a changelog YAML for you. |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
…ange' into bugfix/esql-knn-options-change
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Pinging @elastic/kibana-esql (ES|QL-ui) |
jimczi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one nit regarding how the default is set for visit_percentage, LGTM otherwise.
| + "Must be between 0 and 100. 0 will default to using num_candidates for calculating the percent visited. " | ||
| + "Increasing visit_percentage tends to improve the accuracy of the final results. " | ||
| + "If visit_percentage is set for bbq_disk, num_candidates is ignored. " | ||
| + "Defaults to ~1% per shard for every 1 million vectors" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this wording and where this default is coming from, @benwtrent can you verify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't default to anything. We shouldn't default it to anything. We dynamically set it according to num_candidates, which is dynamically determined via k (if not provided), which is required for knn search to work.
…ns-change # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
KNN function allowed the use of
min_candidatesto set the minimum number of candidates to retrieve per shard. What it did was replacekfor the underlying knn query.This caused the following problems:
num_candidatesto explore more candidates per shard before getting the top kmin_candidateswas subject to oversampling, which was not clear in the docsThis PR adds more options to KNN function:
kto override the LIMIT applied to the function, and retrieve more results from each shardmin_candidatesto be equivalent to settingnum_candidatesvisit_percentageoption fordisk_bbqindex typeUsage including all possible options for KNN would become: