Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knn query (not top-level knn section) producing zero results while top-level knn works fine #107574

Closed
nemphys opened this issue Apr 17, 2024 · 11 comments
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories :Search/Vectors Vector search Team:Search Meta label for search team v8.13.2

Comments

@nemphys
Copy link

nemphys commented Apr 17, 2024

Elasticsearch Version

8.13.2

Installed Plugins

analysis-icu

Java Version

bundled

OS Version

macOS Sonoma 14.4.1

Problem Description

Testing the Knn query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-knn-query.html) produces zero results, whereas the top-level knn search (https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html) with the exact same settings produces normal/expected results.

Steps to Reproduce

{
  "mappings": {
    "properties": {
      "_text_embeddings": {
        "type": "nested",
        "properties": {
          "key": {
            "type": "keyword"
          },
          "vector": {
            "type": "dense_vector",
            "dims": 1024,
            "index": true,
            "similarity": "dot_product"
          }
        }
      }
    }
  }
}

Top-level kNN:

GET /search-index-test/_search
{
  "knn": {
    "field": "_text_embeddings.vector",
    "query_vector": [ .... ],
    "k": 10,
    "num_candidates": 10000,
    "similarity": "0.55"
  }
}

(produces 10 results as expected)

kNN query:

GET /search-index-test/_search
{
  "query": {
    "knn": {
      "field": "_text_embeddings.vector",
      "query_vector": [ .... ],
      "num_candidates": 10000,
      "similarity": "0.55"
    }
  },
  "size": 10
}

(produces zero results)

Logs (if relevant)

No response

@nemphys nemphys added >bug needs:triage Requires assignment of a team area label labels Apr 17, 2024
@Mikep86 Mikep86 added :Search/Search Search-related issues that do not fall into other categories :Search/Vectors Vector search v8.13.2 and removed needs:triage Requires assignment of a team area label labels Apr 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Apr 17, 2024
@benwtrent
Copy link
Member

I have been trying to replicate and I cannot.

Could you provide data & steps to replicate?

One thing to try is to use the _explain API, it will indicate if the value is in the top-k and if the similarity is within the configured similarity.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html

GET search-index-test/_explain/<doc_id_that_matches_top_level_knn>
{
  "query": {
    "knn": {
      "field": "_text_embeddings.vector",
      "query_vector": [ .... ],
      "num_candidates": 10000,
      "similarity": "0.55"
    }
  }
}

@nemphys
Copy link
Author

nemphys commented Apr 25, 2024

@benwtrent unfortunately it is not possible to share actual data, but I executed the suggested explain request with the same query vector against the first result of the top-level knn query (very handy btw, I was not aware of this functionality) and the response is as follows:

"explanation": {
    "value": 0,
    "description": "Failure to meet condition(s) of required/prohibited clause(s)",
    "details": [
      {
        "value": 0,
        "description": "no match on required clause (VectorSimilarityQuery[similarity=0.5, docScore=0.75, innerKnnQuery=DocAndScore[10000]])",
        "details": [
          {
            "value": 0,
            "description": "not in top 10000",
            "details": []
          }
        ]
      },
      {
        "value": 0,
        "description": "match on required clause, product of:",
        "details": [
          {
            "value": 0,
            "description": "# clause",
            "details": []
          },
          {
            "value": 1,
            "description": "FieldExistsQuery [field=_primary_term]",
            "details": []
          }
        ]
      }
    ]
  }

Are you sure that you tried with a nested type (as in the provided mapping example)?

I also tried lowering the query similarity parameter as low as 0.1, but the result is the same.

@nemphys
Copy link
Author

nemphys commented Apr 25, 2024

I could also debug the actual code step-by-step with breakpoints if you point me to the right classes, it seems that somewhere along the path all search results in the second case get lost.

@benwtrent
Copy link
Member

@nemphys let me try to debug again with nested. I somehow skipped that in my initial reading of this issue.

@benwtrent
Copy link
Member

@nemphys I just noticed your nested knn query syntax. Could you try the same query but within a nested query? When you use knn as a query, you must go back to the typical nested query syntax stuff.

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-knn-query.html#knn-query-with-nested-query

When using the top level knn object, we can infer easily that you want this to be within a nested context or not given the field.

But for the knn query, it could be combined with other nested or non-nested things, so its not as easily determinable the context you want the query to run in.

GET /search-index-test/_search
{
  "query": {
    "nested": {
      "path": "_text_embeddings",
      "query": {
        "knn": {
          "field": "_text_embeddings.vector",
          "query_vector": [ .... ],
          "num_candidates": 10000,
          "similarity": "0.55"
        }
      }
    }
  },
  "size": 10
}

@benwtrent benwtrent self-assigned this Apr 25, 2024
@nemphys
Copy link
Author

nemphys commented Apr 25, 2024

Right! I missed that because the knn query parameters are almost identical to the top-level knn (except for k) and there was no error thrown.

It now works as expected, producing the same results as the top-level knn query.

PS. Is there a reason why the top-level knn search is the "preferred" way to perform ANN search (according to the documentation)? Does the "normal" knn query have any disadvantages compared to the top-level one?

@benwtrent
Copy link
Member

@nemphys the difference has to do with the number of documents collected and scored. We consider it preferred as it provides the most consistent experience, just maybe not the most flexible or powerful one.

The top-level kNN utilizes the DFS phase to make sure it only counts the global top-k no matter the number of shards. This way, your total hit count won't change based on the number of shards.

For the kNN query, the hit count may vary by number of shards. The number of nearest neighbors returned will be the same. But, your total hit count is now num_candidates*shard instead of just num_candidates.

Here is an example:

  • Gathering 10 num_candidates from an index with 3 shards.
  • Top-level knn will indicate 10 total hits
  • query level knn will indicate 30 total hits.

@nemphys
Copy link
Author

nemphys commented Apr 26, 2024

@benwtrent does this apply even if the size parameter is explicitly set? Ie. could a kNN query return 30 results if the size parameter is set to 10 (or will it just do the fetch/calculations for 30 and return the top 10)?

@benwtrent
Copy link
Member

@nemphys its not about the actual hits (nearest neighbors) returned, but the hits.total.value, the total hit count.

You will still only get the size results you want, and we will still only collect at most num_candidates per shard. The only difference is the total hit count provided.

@nemphys
Copy link
Author

nemphys commented Apr 26, 2024

@benwtrent OK, clear. Thank you for the detailed explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories :Search/Vectors Vector search Team:Search Meta label for search team v8.13.2
Projects
None yet
Development

No branches or pull requests

4 participants