You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There seems to be a correctness issue that manifests if the following conditions are met:
Content storage ENABLED for the Embeddings
Documents being indexed are python dicts
One or more of the documents being indexed has an empty '"text" field in the dict (i.e. {"text":""})
If all of these conditions are met, then it seems as though search results for all documents AFTER the first empty "text" field will be mis-aligned by the number of documents with empty text fields.
As you can see, the embedding search incorrectly returns document 3 ("California") as the top match when the documents are python dicts and content storage is enabled.
The text was updated successfully, but these errors were encountered:
I would ensure that None or empty string aren't passed as the text to index as it's not going to produce useful results. With that being said, the behavior should be the same regardless of whether content is enabled. I just checked in a fix for this, thank you again for finding this bug.
There seems to be a correctness issue that manifests if the following conditions are met:
Embeddings
'"text"
field in the dict (i.e.{"text":""}
)If all of these conditions are met, then it seems as though search results for all documents AFTER the first empty
"text"
field will be mis-aligned by the number of documents with empty text fields.See below code to reproduce:
The code will print:
As you can see, the embedding search incorrectly returns document 3 (
"California"
) as the top match when the documents are python dicts and content storage is enabled.The text was updated successfully, but these errors were encountered: