Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Hamming distance test to reflect normalized values #520

Merged
merged 1 commit into from
May 16, 2024

Conversation

cococo2000
Copy link
Contributor

Description:

This PR updates the test_hamming function in our pytest suite to correctly reflect the normalized Hamming distance. The previous test was expecting a raw Hamming distance of 2, but since our metric function calculates the normalized Hamming distance, the expected value should be 0.5.

    # ann-benchmarks/ann_benchmarks/distance.py: line 29
    "hamming": Metric(
        distance=lambda a, b: np.mean(a.astype(np.bool_) ^ b.astype(np.bool_)),
        distance_valid=lambda a: True
    ),

Changes:

Updated the expected values in test_hamming from 2 to 0.5 to align with the normalized Hamming distance calculation.
Reasoning:
The metrics["hamming"].distance function calculates the normalized Hamming distance by taking the mean of the boolean XOR results. Thus, for arrays p and q given in the tests:

p = [1, 1, 0, 0]
q = [1, 0, 0, 1]
The raw Hamming distance is 2 (two differing positions), and the normalized Hamming distance is 2/4 = 0.5.

Update Hamming distance test to reflect normalized values
@cococo2000 cococo2000 changed the title Update Hamming distance test to reflect normalized values Fix Hamming distance test to reflect normalized values May 16, 2024
@maumueller maumueller merged commit 1a171c5 into erikbern:main May 16, 2024
36 of 43 checks passed
@maumueller
Copy link
Collaborator

Thanks!

@GreateFang
Copy link

Hello coco @cococo2000 ! I am a database developer. Recently, I need to use ann_benchmark to test the performance of mainstream vector databases. I noticed that you recently updated the Milvus part of ann_benchmark. Have you verified that this part can produce results? Due to development needs, I need to conduct offline testing in a CentOS environment. Can the Milvus after this submission achieve this? I have been trying for a long time to run the Milvus testing part before your commits, but it didn't work.

@cococo2000 cococo2000 deleted the patch-1 branch May 17, 2024 06:15
@cococo2000
Copy link
Contributor Author

Hello coco @cococo2000 ! I am a database developer. Recently, I need to use ann_benchmark to test the performance of mainstream vector databases. I noticed that you recently updated the Milvus part of ann_benchmark. Have you verified that this part can produce results? Due to development needs, I need to conduct offline testing in a CentOS environment. Can the Milvus after this submission achieve this? I have been trying for a long time to run the Milvus testing part before your commits, but it didn't work.

I have tested the Milvus part of ann_benchmark on Ubuntu, and it has successfully produced results. Additionally, it has passed the GitHub Actions tests.
image
However, I have not verified it on CentOS specifically. You might need to make minor adjustments based on your specific environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants