Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Missing data in hybrid search result when using partition key #30607

Closed
1 task done
IsaacWhittakerTR opened this issue Feb 14, 2024 · 21 comments
Closed
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@IsaacWhittakerTR
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.3.7 (upgraded recently from 2.3.2)
- Deployment mode(standalone or cluster): EKS Cluster
- MQ type(rocksmq, pulsar or kafka): Kafka
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus 2.3.6
- OS(Ubuntu or CentOS): n/a - EKS
- CPU/Memory: 
- GPU: n/a
- Others: s3 for data persistance

Current Behavior

When performing a hybrid search for vectors which exist in collection and filtering on the partition key, either an empty result is returned or a result with a few distant vectors are returned IF a boolean expression with the partition_key in [123, 456] TermExpr syntax is used.

The correct results are returned if the boolean expression uses the partition_key == 123 or partition_key == 456 CmpOp is used instead. Correct results are also returned when filtering on any of the non-partition key scalar fields.****

Expected Behavior

When performing a hybrid search on a vector that exists in the collection, and filtering by the partition key which is associated with that vector, the search vector should be returned as one of the results with a zero distance.

Steps To Reproduce

1. Query a vector to confirm it exists in the collection, with `partition_key` 123.
2. Perform a hybrid search on that vector with `partition_key in [123]` filter.
3. Query returns an empty result.

Milvus Log

I was not able to find any related error or warning logs, but I can provide the logs on request.

I could not find any issues with the collection using birdwatcher either.

Anything else?

Schema information:

field_1 -> type int32 with scalar index
field_2 -> partition key, type int64 with scalar index
field_3 -> primary key, type int64
field_4 -> type int32 with scalar index
field_5 -> type FloatVector with dim 768 and IVF_SQ8 index (metric type L2, nlist 8192)

A Zilliz user mentioned a very similar problem in this discord thread: https://discord.com/channels/1160323594396635310/1201965988451454996

@IsaacWhittakerTR IsaacWhittakerTR added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 14, 2024
@IsaacWhittakerTR
Copy link
Author

Update - First downgrading to Milvus v2.3.2 and pymilvus v2.3.2, then re-upserting specific entities seems to make those vectors suddenly searchable again with any of the hybrid search filtering syntaxes mentioned above.

The issue appears to be specific to Milvus or pymilvus greater than version 2.3.2 - unfortunately this means we will need to downgrade our cluster and re-upsert 100 Million+ vectors for search to work again which is obviously less than ideal. Any idea what could have changed in the new versions to break hybrid search?

@xiaofan-luan
Copy link
Collaborator

Seems to be a potential bug introduced recently.

@yanliang567
please help on it.

@xiaofan-luan
Copy link
Collaborator

if this is a bug, pls assign to @zhagnlu

@yanliang567
Copy link
Contributor

trying to reproduce it in house

@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 17, 2024
@yanliang567
Copy link
Contributor

yanliang567 commented Feb 18, 2024

@IsaacWhittakerTR I failed to reproduce the issue in house. could you please reproduce the issue and collect the milvus logs for investigation?

I roughly tried below on milvus v2.3.8

  1. create a collection
    1.1 with partition key, type int64 with scalar index(named category)
    1.2 with type FloatVector(named embedding) with dim 512 and IVF_SQ8 index (metric type L2, nlist 1024)
  2. insert 2 million entities
  3. run a query to confirm the vec exits in the collection, category==1
  4. search
    res1=c.search(data=[vec], anns_field="embedding", param={}, limit=10, expr="category in [1]")
    res2=c.search(data=[vec], anns_field="embedding", param={}, limit=10, expr="category==1")
    res1 equals res2

@yanliang567
Copy link
Contributor

/assign @IsaacWhittakerTR
please reproduce a search with empty result, then refer this doc to export the whole Milvus logs. Thx

@zhagnlu
Copy link
Contributor

zhagnlu commented Feb 20, 2024

I also failed to reproduce this error using 2.3.7. please provide more detailed infos.

@IsaacWhittakerTR
Copy link
Author

Thank you both for trying to reproduce the issue. Did either of you try creating a collection first using an older version of Milvus (i.e. v2.3.1 or v2.3.2), and then upgrading the cluster to a newer version (2.3.6) and inserting vectors after upgrading?

This is the exact situation which caused issues for us. Downgrading the cluster to v2.3.2 and re-upserting vectors fixed the issue, and then upgrading back to v2.3.7 and re-upserting the vectors would consistantly give empty results for hybrid searches on our partition key.

I will try to reproduce the issue again and export the full Milvus logs. Is there by chance an e-mail address I can send the logs to instead of uploading them directly here?

@xiaofan-luan
Copy link
Collaborator

sounds like this could be a compatibility issue?

@yanliang567
Copy link
Contributor

@IsaacWhittakerTR thank you for your patience, and I will try to reproduce it with an upgrade. If you have logs, please mail to me: yanliang.qiao@zilliz.com

@yanliang567
Copy link
Contributor

@IsaacWhittakerTR I tried today with an upgrade from v2.3.1 to v2.3.7, but no luck. So please share the logs when you got them. thx.

@IsaacWhittakerTR
Copy link
Author

@yanliang567 Thanks again for your continued assistance on this issue. It is strange to me that it is not reproducible for you as it is very consistent on my end (but I can't think what else would differ between our setup that would change the behavior of the partitioning logic).

I sent over our logs via e-mail, here is a detailed list of the steps I took to reproduce the incorrect search results:

  1. Upgrade our cluster from Milvus v2.3.2 to v2.3.7 via helm upgrade.
  2. Query the cluster to get vector and metadata for an existing entity (with primary key value 1286322).
  3. Upsert that entity to the existing collection using pymilvus v2.3.6.
  4. Query again using <primary_key> in [1286322] to confirm the vector still exists in the collection.
  5. Perform several searches using this vector:
  • Hybrid search with TermExpr on partition key does not return the entity in results
  • Hybrid search with CmpOp on partition key does not return the entity in results
  • Hybrid search using another non-partition key correctly returns the entity with primary key 1286322 with distance 0.0
  • Regular search with no filter correctly returns the entity with primary key 1286322 in results with distance 0.0

@Richard-lrg
Copy link

Richard-lrg commented Feb 22, 2024

I also reproduced the same problem
The issue occurred when the milvus cluster was upgraded from 2.3.3 to 2.3.9.
The steps to reproduce the problem are as follows:
Query expr: id in ["l6UDMYe17zSECY"] return the expected results
image

When adding conditions, partitionKey in ["xxx"] and id in ["l6UDMYe17zSECY"] have no results (it can be confirmed that the relevant field values are correct)
image

@yanliang567
Copy link
Contributor

@Cactus-L @IsaacWhittakerTR thank you both for updates. We will look into the logs and keep you posted.

@xiaofan-luan xiaofan-luan added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Feb 22, 2024
@yanliang567
Copy link
Contributor

@Cactus-L could you please share your collection schema and index params for me to trying a reproduction?

sre-ci-robot pushed a commit that referenced this issue Feb 23, 2024
#30607

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
@Richard-lrg
Copy link

@Cactus-L could you please share your collection schema and index params for me to trying a reproduction?

is this info still needed?
I see a fix code has been submitted.

@yanliang567
Copy link
Contributor

yep...we are testing on the fix pr, now will release a new version soon after verification done. Thank you both for popping this issue up. @Cactus-L @IsaacWhittakerTR

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed triage/needs-information Indicates an issue needs more information in order to work on it. labels Feb 23, 2024
@yanliang567 yanliang567 added this to the 2.3.10 milestone Feb 23, 2024
@czs007
Copy link
Collaborator

czs007 commented Feb 23, 2024

fix pr :#30773
please wait for the new version 2.3.10

@emaestre
Copy link

yep...we are testing on the fix pr, now will release a new version soon after verification done. Thank you both for popping this issue up. @Cactus-L @IsaacWhittakerTR

Glad to hear that the issue is solved! Great job guys 🙌🏻

Will be attentive for the release of the new version to update it on our end as well.

@xiaofan-luan
Copy link
Collaborator

we have already released 2.3.10.
The problem is solved on the new release.
We are also working on a fix about the existing problem data

sre-ci-robot pushed a commit to milvus-io/birdwatcher that referenced this issue Feb 27, 2024
…r data (#247)

See also milvus-io/milvus#30607

This command could scan binlogs for milvus instance and check partition
key data is located in wrong partition due to previously mentioned issue

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi.Xia <congqi.xia@zilliz.com>
@yanliang567 yanliang567 modified the milestones: 2.3.10, 2.3.11 Feb 28, 2024
yanliang567 added a commit to yanliang567/milvus that referenced this issue Feb 29, 2024
Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Feb 29, 2024
related issue: #30607 
and update some test for groupby

Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>
tedxu pushed a commit to tedxu/milvus that referenced this issue Feb 29, 2024
related issue: milvus-io#30607 
and update some test for groupby

Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>
@yanliang567
Copy link
Contributor

I'd close this issue as I have verified the fix on milvus 2.3.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

7 participants