Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject zero-length vectors when using cosine similarity #82241

Merged
merged 1 commit into from
Jan 11, 2022

Conversation

jtibshirani
Copy link
Contributor

Cosine similarity is not defined when one of the vectors has zero magnitude.
Before, the kNN search endpoint threw a confusing exception related to top docs
collection. Now we reject vectors early with a clear error message, failing
indexing if the vector has zero magnitude.

Closes #81167.

Cosine similarity is not defined when one of the vectors has zero magnitude.
Before, the kNN search endpoint threw a confusing exception related to top docs
collection. Now we reject vectors early with a clear error message, failing
indexing if the vector has zero magnitude.
@jtibshirani jtibshirani added :Search/Search Search-related issues that do not fall into other categories >bug v8.0.0 v8.1.0 labels Jan 5, 2022
@jtibshirani
Copy link
Contributor Author

This PR only addressed indexed vectors used in kNN search. For non-indexed vectors, we can't check the magnitude when they're ingested, since we don't know what similarity function will end up being used. During search, the cosineSimilarity script function has a pretty clear message when one vector has zero magnitude: "script_score script returned an invalid score [NaN] for doc [0]. Must be a non-negative score!"

@jtibshirani jtibshirani marked this pull request as ready for review January 5, 2022 20:55
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jan 5, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this edge case, LGTM

@jtibshirani jtibshirani merged commit 6c44292 into elastic:master Jan 11, 2022
@jtibshirani jtibshirani deleted the zero-vectors branch January 11, 2022 17:34
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Jan 11, 2022
Cosine similarity is not defined when one of the vectors has zero magnitude.
Before, the kNN search endpoint threw a confusing exception related to top docs
collection. Now we reject vectors early with a clear error message, failing
indexing if the vector has zero magnitude.
@jtibshirani jtibshirani added :Search/Vectors Vector search and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Edge cases with 0-vectors in knn_search
4 participants