Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Filtering results do not match expectations when using the radius param. #30327

Closed
1 task done
syjzlee opened this issue Jan 27, 2024 · 18 comments
Closed
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@syjzlee
Copy link

syjzlee commented Jan 27, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.3.3
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

  1. I can get the second record whose id equal "71a5b70e-bcc1-11ee-ae22-708bcdba5a7a" when I set nothing to radius param.
    1

  2. But I can't see it anymore when i set 0.7 to radius. Although I have no delete the that record.

2

My index is IP. Maybe I'm wrong about the radius parameter? If my understanding is wrong, can you tell me how to filter the result with score greater than 0.7?
3

Expected Behavior

Input some param to filter the result with score greater than 0.7.

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

@syjzlee syjzlee added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 27, 2024
@smellthemoon
Copy link
Contributor

/assign

@smellthemoon
Copy link
Contributor

If it's convenient, could you provide more detailed logs for further investigation?
@syjzlee

@syjzlee
Copy link
Author

syjzlee commented Jan 29, 2024

There are no server logs. Only parameters used by the visual client are available through the Attu.

@yanliang567 yanliang567 removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 29, 2024
@yanliang567 yanliang567 removed their assignment Jan 29, 2024
@yanliang567 yanliang567 added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jan 29, 2024
@Asperas13
Copy link

Asperas13 commented Feb 6, 2024

I can confirm i have a similar behaviour with “Hamming” metric
When i manually filter results like this:
result = [] for hit in hits:   If hit.distance < radius:      result.append(hit)
I’m able to see all similar entities, but this works slower
Hovewer when i use “radius” search param then some closest neighbours disappear

Copy link

stale bot commented Mar 7, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Mar 7, 2024
@stale stale bot closed this as completed Mar 14, 2024
@mkhaliki24
Copy link

I seem to have a problem related to this.

  • Milvus version: v2.3.4
  • Deployment mode(standalone or cluster):
  • MQ type(rocksmq, pulsar or kafka):
  • SDK version(e.g. pymilvus v2.0.0rc2): milvus-sdk-java 2.3.4
  • OS(Ubuntu or CentOS):
  • CPU/Memory:
  • GPU:
  • Others:

index_params = {
"index_type": "IVF_SQ8",
"params": {"nlist": 1024},
"metric_type": "COSINE"
}

Current Behavior

  1. Search with COSINE metric_type and params:

Screenshot 2024-03-14 at 2 03 17 PM

Results:
Screenshot 2024-03-14 at 2 02 53 PM

  1. Same search but with higher radius (0.7)

Screenshot 2024-03-14 at 2 08 31 PM

Results:
Screenshot 2024-03-14 at 2 08 41 PM

Expected Behavior

In the first search result, the first record we get back has a distance of 0.80049676. Increasing the radius from 0.1 to 0.7 should still return the first record.

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

@maggonravi
Copy link

Is there a resolution for this? Facing exact same issue when using radius with IP.

@yanliang567
Copy link
Contributor

could you please file a new issue and offer the milvus logs for investigation?

@xiaofan-luan
Copy link
Contributor

@smellthemoon @cydrain
is there a knob we can increase nprobe and improve the searc quality?

@xiaofan-luan
Copy link
Contributor

I don't think nprobe will work on this case

@cydrain
Copy link
Contributor

cydrain commented Jun 19, 2024

Hi @mkhaliki24 @maggonravi,

For IVF-serial index types, you can try this param "max_empty_result_buckets" to fine tuning their range search results.
Screenshot from 2024-06-19 11-11-11

@maggonravi
Copy link

Hi @cydrain ,

I tried setting max_empty_result_buckets to low value (10) and to max value (65535) but still got empty results.

Here are the params. If I remove range_filter and radius, it returns 5 documents from db. If I pass those, I get empty. While metric has value of ~ .7 - .8. I tried varying radius between -1 to 1 but no help.

{
    'nprobe': 128, 
    'range_filter': 1.0, 
    'radius': -1, 
    'max_empty_result_buckets': 65535
}

@yanliang567
Copy link
Contributor

@maggonravi could you please file a new issue and offer the milvus logs? if possible sharing some data sample with us to reproduce the issue would be great helpful.

@xiaofan-luan
Copy link
Contributor

Hi @cydrain ,

I tried setting max_empty_result_buckets to low value (10) and to max value (65535) but still got empty results.

Here are the params. If I remove range_filter and radius, it returns 5 documents from db. If I pass those, I get empty. While metric has value of ~ .7 - .8. I tried varying radius between -1 to 1 but no help.

{
    'nprobe': 128, 
    'range_filter': 1.0, 
    'radius': -1, 
    'max_empty_result_buckets': 65535
}

what metrics you are using?
For cosine metrics, 1.0 means most similar and you just fitered out all the vectors

@mprudra
Copy link

mprudra commented Jun 25, 2024

For cosine metrics, 1.0 means most similar and you just fitered out all the vectors

As per the official docs, the above filter seems correct to me.
radius is the actual threshold of the metric and range_filter is the secondary filter to even exclude closest ones, that's what I could gather from the docs.

image
Ref: https://milvus.io/docs/single-vector-search.md#Range-search

So radius (-1) < distance <= range_filter (1.0)
This should have resulted in some results.

@cydrain
Copy link
Contributor

cydrain commented Jun 26, 2024

/assign

@cydrain
Copy link
Contributor

cydrain commented Jun 26, 2024

Hi @mprudra ,

I can reproduce this issue in my local machine.
I have filed another issue to track #34199

@cydrain
Copy link
Contributor

cydrain commented Jun 27, 2024

Hi @mprudra ,

This issue has been fixed in Milvus v2.4.x release, you can see my comments in #34199

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

9 participants