-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize dataset vectors in the CAGRA InnerProduct tests #2287
Normalize dataset vectors in the CAGRA InnerProduct tests #2287
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @enp1s0 for the PR! I agree with Tarang, that we should either use existing raft utilities, or document in an issue why this is not possible.
… into cagra-test-normalize-vectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @enp1s0 for the updates, LGTM!
/merge |
) This PR updates the CAGRA test to normalize the dataset and query vectors in the CAGRA test when the metric is InnerProduct. If we don't normalize them, large L2 norm dataset vectors tend to be included in the search result across all queries. This means that only a part of the graph nodes may be traversed in the search process, leading to test incompleteness. Authors: - tsuki (https://github.com/enp1s0) Approvers: - Tarang Jain (https://github.com/tarang-jain) - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#2287
This PR updates the CAGRA test to normalize the dataset and query vectors in the CAGRA test when the metric is InnerProduct. If we don't normalize them, large L2 norm dataset vectors tend to be included in the search result across all queries. This means that only a part of the graph nodes may be traversed in the search process, leading to test incompleteness.