Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: speed up target recover after qc restart #28491

Closed
1 task done
weiliu1031 opened this issue Nov 16, 2023 · 4 comments
Closed
1 task done

[Enhancement]: speed up target recover after qc restart #28491

weiliu1031 opened this issue Nov 16, 2023 · 4 comments
Assignees
Labels
kind/enhancement Issues or changes related to enhancement stale indicates no udpates for 30 days

Comments

@weiliu1031
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

What would you like to be added?

in milvus cluster which has real-time write operaions, the target maybe change in every seconds. and if query coord restart, it need to pull a new target, and wait for all segment loaded, then recover the search service. it may take a couple minutes if the datase it huge enough.

to speed up the target recovery, we can save the target's snapshot to etcd, then recover it from etcd during qc recover

Why is this needed?

No response

Anything else?

No response

@weiliu1031 weiliu1031 added the kind/enhancement Issues or changes related to enhancement label Nov 16, 2023
@weiliu1031
Copy link
Contributor Author

/assign

Copy link

stale bot commented Dec 16, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Dec 16, 2023
@stale stale bot closed this as completed Dec 23, 2023
@weiliu1031
Copy link
Contributor Author

/reopen

@sre-ci-robot sre-ci-robot reopened this Mar 13, 2024
@sre-ci-robot
Copy link
Contributor

@weiliu1031: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sre-ci-robot pushed a commit that referenced this issue Mar 15, 2024
issue: #28491

after querycoord restart, it will pull a new target, which include
channel and segment list. when segments loaded on querynode has reached
the target, the collection could provide search/query. but if segment
list changes by time, ater querycoord pull a new target, it will takes a
few minutes to catch up the target's segment distribution. and before
that, query/search will fail due to lack of segments.

This PR save the current loaded target to meta storein querycoord's stop
progress, and recover it when query coord starts, to speed up the target
recovery time.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
weiliu1031 added a commit to weiliu1031/milvus that referenced this issue Mar 15, 2024
…o#31240)

issue: milvus-io#28491

after querycoord restart, it will pull a new target, which include
channel and segment list. when segments loaded on querynode has reached
the target, the collection could provide search/query. but if segment
list changes by time, ater querycoord pull a new target, it will takes a
few minutes to catch up the target's segment distribution. and before
that, query/search will fail due to lack of segments.

This PR save the current loaded target to meta storein querycoord's stop
progress, and recover it when query coord starts, to speed up the target
recovery time.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Mar 18, 2024
issue: #28491

should save target to meta store after target observer stop, incase of
target changed

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Mar 18, 2024
issue: #28491
pr: #31315

should save target to meta store after target observer stop, incase of
target changed

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
weiliu1031 added a commit to weiliu1031/milvus that referenced this issue Mar 20, 2024
…o#31240)

issue: milvus-io#28491

after querycoord restart, it will pull a new target, which include
channel and segment list. when segments loaded on querynode has reached
the target, the collection could provide search/query. but if segment
list changes by time, ater querycoord pull a new target, it will takes a
few minutes to catch up the target's segment distribution. and before
that, query/search will fail due to lack of segments.

This PR save the current loaded target to meta storein querycoord's stop
progress, and recover it when query coord starts, to speed up the target
recovery time.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
weiliu1031 added a commit to weiliu1031/milvus that referenced this issue Mar 21, 2024
…o#31240)

issue: milvus-io#28491

after querycoord restart, it will pull a new target, which include
channel and segment list. when segments loaded on querynode has reached
the target, the collection could provide search/query. but if segment
list changes by time, ater querycoord pull a new target, it will takes a
few minutes to catch up the target's segment distribution. and before
that, query/search will fail due to lack of segments.

This PR save the current loaded target to meta storein querycoord's stop
progress, and recover it when query coord starts, to speed up the target
recovery time.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Mar 22, 2024
…31449)

issue: #28491
pr: #31240

after querycoord restart, it will pull a new target, which include
channel and segment list. when segments loaded on querynode has reached
the target, the collection could provide search/query. but if segment
list changes by time, ater querycoord pull a new target, it will takes a
few minutes to catch up the target's segment distribution. and before
that, query/search will fail due to lack of segments.

This PR save the current loaded target to meta storein querycoord's stop
progress, and recover it when query coord starts, to speed up the target
recovery time.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
@stale stale bot closed this as completed Mar 22, 2024
congqixia added a commit to congqixia/milvus that referenced this issue Mar 26, 2024
See also milvus-io#28491 milvus-io#31240

When colleciton number is large, querycoord saves collection target one
by one, which is slow and may block querycoord exits.

In local run, 500 collections scenario may lead to about 40 seconds
saving collection targets.

This PR changes the `SaveCollectionTarget` interface into batch one and
organizes the collection in 16 per bundle batches to accelerate this
procedure.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Mar 26, 2024
See also #28491 #31240

When colleciton number is large, querycoord saves collection target one
by one, which is slow and may block querycoord exits.

In local run, 500 collections scenario may lead to about 40 seconds
saving collection targets.

This PR changes the `SaveCollectionTarget` interface into batch one and
organizes the collection in 16 per bundle batches to accelerate this
procedure.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
congqixia added a commit to congqixia/milvus that referenced this issue Mar 27, 2024
See also milvus-io#28491 milvus-io#31240

When colleciton number is large, querycoord saves collection target one
by one, which is slow and may block querycoord exits.

In local run, 500 collections scenario may lead to about 40 seconds
saving collection targets.

This PR changes the `SaveCollectionTarget` interface into batch one and
organizes the collection in 16 per bundle batches to accelerate this
procedure.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
congqixia added a commit to congqixia/milvus that referenced this issue Mar 27, 2024
See also milvus-io#28491 milvus-io#31240

When colleciton number is large, querycoord saves collection target one
by one, which is slow and may block querycoord exits.

In local run, 500 collections scenario may lead to about 40 seconds
saving collection targets.

This PR changes the `SaveCollectionTarget` interface into batch one and
organizes the collection in 16 per bundle batches to accelerate this
procedure.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Mar 27, 2024
Cherry-pick from master
pr: #31616
See also #28491 #31240

When colleciton number is large, querycoord saves collection target one
by one, which is slow and may block querycoord exits.

In local run, 500 collections scenario may lead to about 40 seconds
saving collection targets.

This PR changes the `SaveCollectionTarget` interface into batch one and
organizes the collection in 16 per bundle batches to accelerate this
procedure.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
congqixia added a commit to congqixia/milvus that referenced this issue Mar 27, 2024
See also milvus-io#28491 milvus-io#31240

When colleciton number is large, querycoord saves collection target one
by one, which is slow and may block querycoord exits.

In local run, 500 collections scenario may lead to about 40 seconds
saving collection targets.

This PR changes the `SaveCollectionTarget` interface into batch one and
organizes the collection in 16 per bundle batches to accelerate this
procedure.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Mar 28, 2024
…31655)

Cherry-pick from master
pr: #31616 
See also #28491 #31240

When colleciton number is large, querycoord saves collection target one
by one, which is slow and may block querycoord exits.

In local run, 500 collections scenario may lead to about 40 seconds
saving collection targets.

This PR changes the `SaveCollectionTarget` interface into batch one and
organizes the collection in 16 per bundle batches to accelerate this
procedure.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Issues or changes related to enhancement stale indicates no udpates for 30 days
Projects
None yet
Development

No branches or pull requests

2 participants