Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Search return empty results while querynode recovering #20841

Closed
1 task done
XuanYang-cn opened this issue Nov 25, 2022 · 11 comments
Closed
1 task done

[Bug]: Search return empty results while querynode recovering #20841

XuanYang-cn opened this issue Nov 25, 2022 · 11 comments
Assignees
Labels
kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Milestone

Comments

@XuanYang-cn
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:
- Deployment mode(standalone or cluster):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

image

Expected Behavior

  1. milvus master cluster 1 QN, 1 DN (milvus run querynode &> querynode.log &)
  2. prepare a collection with 20 sealed segments, dim=128, numRows=5,000,000; create index, load.
  3. Loop unlimited: Search with nq=1, print latency
  4. start the second QN milvus run querynode --alias=1 &> querynode1.log &
  5. stop the first QN: milvus stop querynode

Observe:

  1. tail -f querynode1.log | grep "segments distribution"
  2. tail -f proxy | grep "WARN"
  3. The search latency of step3

Current Behaviour:

  1. Before querynode1 load all segments done(collection available), proxy will get empty results.

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

@XuanYang-cn XuanYang-cn added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 25, 2022
@XuanYang-cn XuanYang-cn self-assigned this Nov 25, 2022
@XuanYang-cn
Copy link
Contributor Author

See also #20502

@xiaofan-luan xiaofan-luan added this to the 2.3 milestone Nov 29, 2022
@xiaofan-luan
Copy link
Contributor

so this is only a issue on master right?

@XuanYang-cn
Copy link
Contributor Author

so this is only a issue on master right?

I just tested on 2.2.0 and it reproduced.

@XuanYang-cn
Copy link
Contributor Author

/assign @yah01
Please help investigating.

@XuanYang-cn
Copy link
Contributor Author

/unassign

@yah01
Copy link
Member

yah01 commented Dec 2, 2022

The QueryCoord detects the offline event of QueryNode in 1min (etcd lease timeout), during the lag, QueryCoord can't pull the distribution of the QueryNode, and considers it's online. The solutions are:

  1. Check TargetID in QueryNode Query/Search
  2. QueryCoord shouldn't returns QueryNodes whose distribution have been not updated for a while (much less than 1min)

@XuanYang-cn
Copy link
Contributor Author

/assign
/unassign @yah01

@sre-ci-robot sre-ci-robot assigned XuanYang-cn and unassigned yah01 Dec 5, 2022
@XuanYang-cn
Copy link
Contributor Author

XuanYang-cn commented Dec 5, 2022

Still reproducing in master 243d8cf
image

/assign @yah01
/unassign

@yah01
Copy link
Member

yah01 commented Dec 6, 2022

not reproduced with 11e44 for me

@yah01
Copy link
Member

yah01 commented Dec 6, 2022

/assign @XuanYang-cn

@XuanYang-cn
Copy link
Contributor Author

verified at e131915

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

3 participants