New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pause MCP before draining/rebooting node #93
Conversation
| if newMcp.GetName() != dn.mcpName { | ||
| return | ||
| } | ||
| if mcfgv1.IsMachineConfigPoolConditionFalse(newMcp.Status.Conditions, mcfgv1.MachineConfigPoolDegraded) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By checking poolDegraded, it makes sure pool is not paused by user, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuqi-zhang @sinnykumari, do you think the checks here can 100% guarantee there is no race condition between MCO and SNO trying to start an update simultaneously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By checking poolDegraded, it makes sure pool is not paused by user, right?
No, this is to check MCP is not in 'Degraded' state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pausing and degraded are separate concepts. I think the SRIOV operator shouldn't do anything if a pool is degraded (which means the pool is in an error state), as you have here. I think the checks (not degraded, not updating, updated) should be good.
| if newMcp.GetName() != dn.mcpName { | ||
| return | ||
| } | ||
| if mcfgv1.IsMachineConfigPoolConditionFalse(newMcp.Status.Conditions, mcfgv1.MachineConfigPoolDegraded) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When node gets rebooted by sriov config daemon, will the drainNode be called again (in which case MCP is degraded due to pause:true)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pausing an MCP will not make it 'Degraded'.
pkg/daemon/daemon.go
Outdated
| if mcfgv1.IsMachineConfigPoolConditionFalse(newMcp.Status.Conditions, mcfgv1.MachineConfigPoolDegraded) && | ||
| mcfgv1.IsMachineConfigPoolConditionTrue(newMcp.Status.Conditions, mcfgv1.MachineConfigPoolUpdated) && | ||
| mcfgv1.IsMachineConfigPoolConditionFalse(newMcp.Status.Conditions, mcfgv1.MachineConfigPoolUpdating) { | ||
| glog.Infof("drainNode(): MCP %s is ready", newMcp.GetName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check readyMachineCount == machineCount as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the 'Updated' condition of MCP is sufficient.
cc8430e
to
6f92a10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this approach I think is more safe with the MCO and race conditions. Some more specific comments below
| @@ -580,8 +590,17 @@ func (dn *Daemon) completeDrain() error { | |||
| return err | |||
| } | |||
|
|
|||
| if utils.ClusterType == utils.ClusterTypeOpenshift { | |||
| glog.Infof("completeDrain(): resume MCP %s", dn.mcpName) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making sure I understand this properly, the completeDrain() function gets called after the reboot correct? To uncordon the node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It will be executed after reboot or after draining without a reboot.
| if newMcp.GetName() != dn.mcpName { | ||
| return | ||
| } | ||
| if mcfgv1.IsMachineConfigPoolConditionFalse(newMcp.Status.Conditions, mcfgv1.MachineConfigPoolDegraded) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pausing and degraded are separate concepts. I think the SRIOV operator shouldn't do anything if a pool is degraded (which means the pool is in an error state), as you have here. I think the checks (not degraded, not updating, updated) should be good.
pkg/daemon/daemon.go
Outdated
| <-ctx.Done() | ||
|
|
||
| glog.Infof("drainNode(): pause MCP %s", dn.mcpName) | ||
| pausePatch := []byte("{\"spec\":{\"paused\":true}}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want to ensure maximum safety and elimate races, after the pause goes through here, we should add another check like above to see if the pool is progressing, since there is a tiny chance that the MCP started some kind of update in between.
If we check here and the pool is still good, then we can progress with drain. If the pool somehow is updating now, we should consider unpausing and then waiting in a loop for the MCP to go back to not updating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also want to check that pause is not already set by user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the patch, PTAL.
741949c
to
0d42c93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing it in this way! overall this approach looks good.
| } | ||
| if paused { | ||
| glog.Infof("drainNode(): MCP is processing, resume MCP %s", dn.mcpName) | ||
| pausePatch := []byte("{\"spec\":{\"paused\":false}}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, can you please explain the condition in which we are unpausing the pool here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MCP informer will not be stoped right after we pause the MCP. So the mcpEventHandler will be invoked once more as we updated the MCD. In this cycle, we check if the MCP is processing, if yes and the MCP was paused during the last cycle (indicated by the variable 'paused' here). We resume the MCP and wait until it finishes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to address #93 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good I just have small comments thanks!
|
/lgtm |
|
@zshi-redhat @SchSeba I just updated the patch. A new annotation value |
|
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just two small nits and I think the PR is ready
pkg/daemon/daemon.go
Outdated
| pausePatch := []byte("{\"spec\":{\"paused\":true}}") | ||
| _, err = dn.mcClient.MachineconfigurationV1().MachineConfigPools().Patch(context.Background(), dn.mcpName, types.MergePatchType, pausePatch, metav1.PatchOptions{}) | ||
| if err != nil { | ||
| glog.V(2).Infof("drainNode(): fail to pause MCP %s: %v", dn.mcpName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: fail -> Failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
| if err != nil { | ||
| glog.Errorf("drainNode(): Failed to annotate node: %v", err) | ||
| return err | ||
| mcpInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we add a check if this is the MCP that the node is part of even before we send it to the mcpEventHandler function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't know how we can do that. Normally, the informer watches all the objects. I think it's sufficient that we check it in the mcpEventHandler.
The config daemon instance on each node need to compete for the 'Draining' lock before it can draining/rebooting a node. After a config daemon instance gets the lock, it checks its MachineConfigPool conditions. If the MCP is in ready state, it will pause the MCP then process, otherwise, it wait for the MCP getting ready. The MCP will be resumed after the config daemon and release the lock after the config dameon finishes its work.
The config daemon instance on each node need to compete for the 'Draining' lock before
it can draining/rebooting a node.
After a config daemon instance gets the lock, it checks its MachineConfigPool conditions.
If the MCP is in ready state, it will pause the MCP then process, otherwise, it wait for
the MCP getting ready. The MCP will be resumed after the config daemon and release the
lock after the config dameon finishes its work.