-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Zero Recall Rate, Very Strange for multi-vector search!!!!!!! #33294
Comments
/assign @czs007 |
I'm in hurry, so please help me, thanks very much. cc @czs007. Can you reproduce? |
working on it. @JackTan25 |
Thanks, can you reproduce it? cc @czs007 |
milvus/internal/proxy/search_reduce_util.go Line 478 in 5452376
Please change the function here from "big" to "small". After applying the previous activation function, when performing ranking, it was sorted in descending order. Once you modify it to return the original L2 distance, it should be sorted in ascending order. |
Ok, let me try it. Thanks. |
hi, I test 1000 rows dataset, and it works expectedly, but when the dataset is very large to 100w, I can get only 2%. I want to upper the limit 16384, where should I modify for the source code? @czs007 |
@JackTan25 what do you mean by 2%? recall? |
yes, I upper the top to 99w and I can get 64% now. |
@JackTan25 Why does the value of TopK need to be so large? |
I feel strange yet. I think you can test the small_data I give you, when the topk2 is low(50), the recall rate is still zero, when I lift it up to 1000(dataset is 1000), it can only get 94%. Multi Vector's recall is very low, but Single Vector recall is very high and quick. cc @czs007 |
The question is that, the query itself can make a big difference to the result. Different query can get different recall. |
I think it still means the ranking function you modified has some bug. maybe you should debug into it |
well, I think the recall is low,but I modify the top-k2, it can really grow the recall. The ranking function is right, maybe it's the algorithm's bug. cc @xiaofan-luan |
I just modify two places here. cc @xiaofan-luan @czs007 Is this right? |
@czs007 @xiaofan-luan Is there any other logic that we need to check? I'm not familiar with the code module. |
well, I find a thing is that, seems for the weight rank, the score is not sum, but a single column. @czs007 Where is the logic of this part? |
grow to 1000, can get 100%. cc @xiaofan-luan |
The code here is very strange here. What does the meaning of realTopK? The limit user gives or the number of vector column? cc @czs007 @xiaofan-luan |
any updates? |
The code here is very strange here. What does the meaning of realTopK? The limit user gives or the number of vector column? cc @czs007 @xiaofan-luan I guess real topk means topk. but sometimes a search can not return limit result(For example, you ask for topk1000 but there are only 500 entities in milvus, then real topk is 500), |
Is there an existing issue for this?
Environment
Current Behavior
Get Zero Recall Rate.
can't find any correct result.
Expected Behavior
expect high recall rate.
Steps To Reproduce
Milvus Log
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: