-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Querynode terminated with log: failed to Deserialize index, cardinal inner error #30857
Comments
Perhaps it is because the |
/assign @liliu-z |
/assign @yanliang567 |
/assign @chyezh |
Index Node `` start to release segment while new load request is incoming.
Load repeat segment is checked by SegmentManager.
Release segment is remove the segment from SegmentManager then release the memory.
Concurrent load and release happens. |
Short-term fix: Implement mutual exclusivity between Release and Load on QN; |
/unassign |
@chyezh |
Load is triggered when segment is releasing, but not release is triggered when segment is loading. The release segment operation is divided into two steps on query node.
|
@chyezh got it, thanks! |
After some offline discussion, the final solution shall be separating the disk resource for different segment life-cycle. One more thing, it's looks weird that a segment is released than loaded back. Maybe the segment was bouncing between querynode? |
|
Release then load collection.
|
Short-term fix: Implement mutual exclusivity between Release and Load on QN; |
issue: #30857 --------- Signed-off-by: chyezh <chyezh@outlook.com>
should be fixed at 2.4.5, please verify it. |
issue: milvus-io#30857 --------- Signed-off-by: chyezh <chyezh@outlook.com>
Is there an existing issue for this?
Environment
Current Behavior
laion_stable_4
has 58m-768d+ data, and the schema is:reload collection (64 segments) -> concurrent requests: insert + delete + search + query
One querynode of total 4 terminated 134 error with error logs:
(Since the cardinal is private, please get in touch with me for more detailed querynode terminated logs)
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
argo: laion1b-test-new-1
pods:
/tmp/cores/core-laion1b-test-2-milvus-querynode-0-7977c8fdbf-zfhhz-milvus-8-1708964037
of 4am-node39Anything else?
No response
The text was updated successfully, but these errors were encountered: