-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix wrong watchedDmchannel and SealedSegment after querynode down #12624
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: xige-16 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
fix #12323 |
783508c
to
0564904
Compare
/re-run unit |
Signed-off-by: xige-16 <xi.ge@zilliz.com>
Codecov Report
@@ Coverage Diff @@
## master #12624 +/- ##
==========================================
+ Coverage 79.08% 79.15% +0.07%
==========================================
Files 457 458 +1
Lines 60712 60960 +248
==========================================
+ Hits 48016 48255 +239
- Misses 10231 10238 +7
- Partials 2465 2467 +2
|
case queryPb.WatchType_WatchPartition: | ||
lt = loadTypePartition | ||
default: | ||
return errors.New("unKnow watch type, collectionID = " + fmt.Sprintln(collectionID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print the watchType so it's easier for debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should has unified log format, for instance
Failed to execute watchDmChannelsTask becasase of unknown watch type, collectionId xxx, watchType xxx
@@ -59,6 +59,7 @@ type Cluster interface { | |||
watchDeltaChannels(ctx context.Context, nodeID int64, in *querypb.WatchDeltaChannelsRequest) error | |||
//TODO:: removeDmChannel | |||
getNumDmChannels(nodeID int64) (int, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not return err here? since this is only a in memory action
defer c.RUnlock() | ||
|
||
if node, ok := c.nodes[nodeID]; ok { | ||
watchInfos := node.getDmChannelWatchInfo(collectionID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy before return?
removeDmChannelByPartitionID(info, partitionID) | ||
} | ||
// delete delta channels | ||
delete(qn.watchedDeltaChannels, collectionID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about delta channels? thought we also need to handle the partition case
} | ||
err = qn.addDmChannel(in.CollectionID, channelInfo) | ||
if err != nil { | ||
log.Debug("watchDmChannels: add dm channel to node meta failed", zap.String("error", err.Error())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Warn log in error handling path
@@ -643,8 +668,12 @@ func (c *queryNodeCluster) removeNodeInfo(nodeID int64) error { | |||
return err | |||
} | |||
|
|||
if _, ok := c.nodes[nodeID]; ok { | |||
err = c.nodes[nodeID].clearNodeInfo() | |||
if node, ok := c.nodes[nodeID]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so far the logic is ok, but it seems that the logic is not achieved for idemptence.
For example , if removeDmChannelWatchInfosByNodeID success but clearNodeInfo failed, will the task retry to delete nodeId in c.nodes? can the retry success?
There is usually two way to guarantee idempt:
- do the io in a batch or trasanction
- do the critical io in the last step, and make sure all the previous step can be retried with no error.
// different queryNode may watch same dmChannel and create same growing segment | ||
// deduplicate result when reduce, the correctness of search has no effect | ||
// and growing segment will be removed after handoff | ||
if loadReq.WatchType == querypb.WatchType_WatchTypeNone { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weird, do we has some meta about this collection whether it is loaded per collection or per segment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also I guess delta channel need to be handled at the same time
/hold |
fix issue: #12340 #12190
/kind bug
Signed-off-by: xige-16 xi.ge@zilliz.com