Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize dataset vectors in the CAGRA InnerProduct tests #2287

Merged
merged 14 commits into from
May 7, 2024

Conversation

enp1s0
Copy link
Member

@enp1s0 enp1s0 commented May 2, 2024

This PR updates the CAGRA test to normalize the dataset and query vectors in the CAGRA test when the metric is InnerProduct. If we don't normalize them, large L2 norm dataset vectors tend to be included in the search result across all queries. This means that only a part of the graph nodes may be traversed in the search process, leading to test incompleteness.

@enp1s0 enp1s0 requested a review from a team as a code owner May 2, 2024 10:22
@enp1s0 enp1s0 self-assigned this May 2, 2024
@github-actions github-actions bot added the cpp label May 2, 2024
@enp1s0 enp1s0 added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed cpp labels May 2, 2024
@github-actions github-actions bot added the cpp label May 2, 2024
@enp1s0 enp1s0 mentioned this pull request May 2, 2024
2 tasks
cpp/test/neighbors/ann_cagra.cuh Outdated Show resolved Hide resolved
cpp/test/neighbors/ann_cagra.cuh Outdated Show resolved Hide resolved
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enp1s0 for the PR! I agree with Tarang, that we should either use existing raft utilities, or document in an issue why this is not possible.

cpp/test/neighbors/ann_cagra.cuh Outdated Show resolved Hide resolved
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enp1s0 for the updates, LGTM!

@tfeher
Copy link
Contributor

tfeher commented May 7, 2024

/merge

@rapids-bot rapids-bot bot merged commit 97e38eb into rapidsai:branch-24.06 May 7, 2024
69 checks passed
abc99lr pushed a commit to abc99lr/raft that referenced this pull request May 10, 2024
)

This PR updates the CAGRA test to normalize the dataset and query vectors in the CAGRA test when the metric is InnerProduct. If we don't normalize them, large L2 norm dataset vectors tend to be included in the search result across all queries. This means that only a part of the graph nodes may be traversed in the search process, leading to test incompleteness.

Authors:
  - tsuki (https://github.com/enp1s0)

Approvers:
  - Tarang Jain (https://github.com/tarang-jain)
  - Tamas Bela Feher (https://github.com/tfeher)

URL: rapidsai#2287
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

3 participants