Fix Hamming distance test to reflect normalized values #520

cococo2000 · 2024-05-16T07:15:59Z

Description:

This PR updates the test_hamming function in our pytest suite to correctly reflect the normalized Hamming distance. The previous test was expecting a raw Hamming distance of 2, but since our metric function calculates the normalized Hamming distance, the expected value should be 0.5.

    # ann-benchmarks/ann_benchmarks/distance.py: line 29
    "hamming": Metric(
        distance=lambda a, b: np.mean(a.astype(np.bool_) ^ b.astype(np.bool_)),
        distance_valid=lambda a: True
    ),

Changes:

Updated the expected values in test_hamming from 2 to 0.5 to align with the normalized Hamming distance calculation.
Reasoning:
The metrics["hamming"].distance function calculates the normalized Hamming distance by taking the mean of the boolean XOR results. Thus, for arrays p and q given in the tests:

p = [1, 1, 0, 0]
q = [1, 0, 0, 1]
The raw Hamming distance is 2 (two differing positions), and the normalized Hamming distance is 2/4 = 0.5.

Update Hamming distance test to reflect normalized values

maumueller · 2024-05-16T10:19:20Z

Thanks!

GreateFang · 2024-05-17T03:02:21Z

Hello coco @cococo2000 ! I am a database developer. Recently, I need to use ann_benchmark to test the performance of mainstream vector databases. I noticed that you recently updated the Milvus part of ann_benchmark. Have you verified that this part can produce results? Due to development needs, I need to conduct offline testing in a CentOS environment. Can the Milvus after this submission achieve this? I have been trying for a long time to run the Milvus testing part before your commits, but it didn't work.

cococo2000 · 2024-05-17T06:43:13Z

Hello coco @cococo2000 ! I am a database developer. Recently, I need to use ann_benchmark to test the performance of mainstream vector databases. I noticed that you recently updated the Milvus part of ann_benchmark. Have you verified that this part can produce results? Due to development needs, I need to conduct offline testing in a CentOS environment. Can the Milvus after this submission achieve this? I have been trying for a long time to run the Milvus testing part before your commits, but it didn't work.

I have tested the Milvus part of ann_benchmark on Ubuntu, and it has successfully produced results. Additionally, it has passed the GitHub Actions tests.

However, I have not verified it on CentOS specifically. You might need to make minor adjustments based on your specific environment.

Update distance_test.py: Update Hamming distance test

9c48cac

Update Hamming distance test to reflect normalized values

cococo2000 changed the title ~~Update Hamming distance test to reflect normalized values~~ Fix Hamming distance test to reflect normalized values May 16, 2024

maumueller merged commit 1a171c5 into erikbern:main May 16, 2024
36 of 43 checks passed

cococo2000 deleted the patch-1 branch May 17, 2024 06:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Hamming distance test to reflect normalized values #520

Fix Hamming distance test to reflect normalized values #520

cococo2000 commented May 16, 2024

maumueller commented May 16, 2024

GreateFang commented May 17, 2024

cococo2000 commented May 17, 2024

Fix Hamming distance test to reflect normalized values #520

Fix Hamming distance test to reflect normalized values #520

Conversation

cococo2000 commented May 16, 2024

Description:

Changes:

maumueller commented May 16, 2024

GreateFang commented May 17, 2024

cococo2000 commented May 17, 2024