New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Knn query (not top-level knn section) producing zero results while top-level knn works fine #107574
Comments
Pinging @elastic/es-search (Team:Search) |
I have been trying to replicate and I cannot. Could you provide data & steps to replicate? One thing to try is to use the https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html
|
@benwtrent unfortunately it is not possible to share actual data, but I executed the suggested explain request with the same query vector against the first result of the top-level knn query (very handy btw, I was not aware of this functionality) and the response is as follows:
Are you sure that you tried with a nested type (as in the provided mapping example)? I also tried lowering the query similarity parameter as low as 0.1, but the result is the same. |
I could also debug the actual code step-by-step with breakpoints if you point me to the right classes, it seems that somewhere along the path all search results in the second case get lost. |
@nemphys let me try to debug again with nested. I somehow skipped that in my initial reading of this issue. |
@nemphys I just noticed your nested knn query syntax. Could you try the same query but within a When using the top level But for the
|
Right! I missed that because the knn query parameters are almost identical to the top-level knn (except for k) and there was no error thrown. It now works as expected, producing the same results as the top-level knn query. PS. Is there a reason why the top-level knn search is the "preferred" way to perform ANN search (according to the documentation)? Does the "normal" knn query have any disadvantages compared to the top-level one? |
@nemphys the difference has to do with the number of documents collected and scored. We consider it preferred as it provides the most consistent experience, just maybe not the most flexible or powerful one. The top-level kNN utilizes the DFS phase to make sure it only counts the global top-k no matter the number of shards. This way, your total hit count won't change based on the number of shards. For the kNN query, the hit count may vary by number of shards. The number of nearest neighbors returned will be the same. But, your total hit count is now Here is an example:
|
@benwtrent does this apply even if the size parameter is explicitly set? Ie. could a kNN query return 30 results if the size parameter is set to 10 (or will it just do the fetch/calculations for 30 and return the top 10)? |
@nemphys its not about the actual hits (nearest neighbors) returned, but the You will still only get the |
@benwtrent OK, clear. Thank you for the detailed explanation! |
Elasticsearch Version
8.13.2
Installed Plugins
analysis-icu
Java Version
bundled
OS Version
macOS Sonoma 14.4.1
Problem Description
Testing the Knn query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-knn-query.html) produces zero results, whereas the top-level knn search (https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html) with the exact same settings produces normal/expected results.
Steps to Reproduce
Top-level kNN:
(produces 10 results as expected)
kNN query:
(produces zero results)
Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: