Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pause MCP before draining/rebooting node #93

Merged
merged 1 commit into from Apr 7, 2021

Conversation

pliurh
Copy link
Collaborator

@pliurh pliurh commented Mar 23, 2021

The config daemon instance on each node need to compete for the 'Draining' lock before
it can draining/rebooting a node.

After a config daemon instance gets the lock, it checks its MachineConfigPool conditions.
If the MCP is in ready state, it will pause the MCP then process, otherwise, it wait for
the MCP getting ready. The MCP will be resumed after the config daemon and release the
lock after the config dameon finishes its work.

@pliurh pliurh changed the title Pause MCP before draining/rebooting node WIP: Pause MCP before draining/rebooting node Mar 23, 2021
pkg/daemon/daemon.go Outdated Show resolved Hide resolved
if newMcp.GetName() != dn.mcpName {
return
}
if mcfgv1.IsMachineConfigPoolConditionFalse(newMcp.Status.Conditions, mcfgv1.MachineConfigPoolDegraded) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By checking poolDegraded, it makes sure pool is not paused by user, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuqi-zhang @sinnykumari, do you think the checks here can 100% guarantee there is no race condition between MCO and SNO trying to start an update simultaneously?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By checking poolDegraded, it makes sure pool is not paused by user, right?

No, this is to check MCP is not in 'Degraded' state.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pausing and degraded are separate concepts. I think the SRIOV operator shouldn't do anything if a pool is degraded (which means the pool is in an error state), as you have here. I think the checks (not degraded, not updating, updated) should be good.

if newMcp.GetName() != dn.mcpName {
return
}
if mcfgv1.IsMachineConfigPoolConditionFalse(newMcp.Status.Conditions, mcfgv1.MachineConfigPoolDegraded) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When node gets rebooted by sriov config daemon, will the drainNode be called again (in which case MCP is degraded due to pause:true)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pausing an MCP will not make it 'Degraded'.

if mcfgv1.IsMachineConfigPoolConditionFalse(newMcp.Status.Conditions, mcfgv1.MachineConfigPoolDegraded) &&
mcfgv1.IsMachineConfigPoolConditionTrue(newMcp.Status.Conditions, mcfgv1.MachineConfigPoolUpdated) &&
mcfgv1.IsMachineConfigPoolConditionFalse(newMcp.Status.Conditions, mcfgv1.MachineConfigPoolUpdating) {
glog.Infof("drainNode(): MCP %s is ready", newMcp.GetName())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check readyMachineCount == machineCount as well ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the 'Updated' condition of MCP is sufficient.

@pliurh pliurh force-pushed the pause_mcp branch 2 times, most recently from cc8430e to 6f92a10 Compare March 23, 2021 16:11
Copy link

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this approach I think is more safe with the MCO and race conditions. Some more specific comments below

@@ -580,8 +590,17 @@ func (dn *Daemon) completeDrain() error {
return err
}

if utils.ClusterType == utils.ClusterTypeOpenshift {
glog.Infof("completeDrain(): resume MCP %s", dn.mcpName)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making sure I understand this properly, the completeDrain() function gets called after the reboot correct? To uncordon the node?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It will be executed after reboot or after draining without a reboot.

if newMcp.GetName() != dn.mcpName {
return
}
if mcfgv1.IsMachineConfigPoolConditionFalse(newMcp.Status.Conditions, mcfgv1.MachineConfigPoolDegraded) &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pausing and degraded are separate concepts. I think the SRIOV operator shouldn't do anything if a pool is degraded (which means the pool is in an error state), as you have here. I think the checks (not degraded, not updating, updated) should be good.

<-ctx.Done()

glog.Infof("drainNode(): pause MCP %s", dn.mcpName)
pausePatch := []byte("{\"spec\":{\"paused\":true}}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to ensure maximum safety and elimate races, after the pause goes through here, we should add another check like above to see if the pool is progressing, since there is a tiny chance that the MCP started some kind of update in between.

If we check here and the pool is still good, then we can progress with drain. If the pool somehow is updating now, we should consider unpausing and then waiting in a loop for the MCP to go back to not updating.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to check that pause is not already set by user?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the patch, PTAL.

@pliurh pliurh force-pushed the pause_mcp branch 2 times, most recently from 741949c to 0d42c93 Compare March 24, 2021 05:27
Copy link

@sinnykumari sinnykumari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing it in this way! overall this approach looks good.

pkg/daemon/daemon.go Outdated Show resolved Hide resolved
}
if paused {
glog.Infof("drainNode(): MCP is processing, resume MCP %s", dn.mcpName)
pausePatch := []byte("{\"spec\":{\"paused\":false}}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, can you please explain the condition in which we are unpausing the pool here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MCP informer will not be stoped right after we pause the MCP. So the mcpEventHandler will be invoked once more as we updated the MCD. In this cycle, we check if the MCP is processing, if yes and the MCP was paused during the last cycle (indicated by the variable 'paused' here). We resume the MCP and wait until it finishes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to address #93 (comment)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, makes sense.

Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good I just have small comments thanks!

deploy/clusterrole.yaml Show resolved Hide resolved
pkg/daemon/daemon.go Show resolved Hide resolved
pkg/daemon/daemon.go Show resolved Hide resolved
@SchSeba
Copy link
Collaborator

SchSeba commented Mar 30, 2021

/lgtm

@github-actions github-actions bot added the lgtm label Mar 30, 2021
pkg/daemon/daemon.go Show resolved Hide resolved
@pliurh
Copy link
Collaborator Author

pliurh commented Apr 1, 2021

@zshi-redhat @SchSeba I just updated the patch. A new annotation value Draining_MCP_Paused was introduced to indicate whether the MCP is paused by the config daemon. Also, I changed the needDrainNode, so that changing MTU will not need a drain if ifaceStatus.NumVfs is 0.

@zshi-redhat
Copy link
Collaborator

/lgtm

Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just two small nits and I think the PR is ready

pausePatch := []byte("{\"spec\":{\"paused\":true}}")
_, err = dn.mcClient.MachineconfigurationV1().MachineConfigPools().Patch(context.Background(), dn.mcpName, types.MergePatchType, pausePatch, metav1.PatchOptions{})
if err != nil {
glog.V(2).Infof("drainNode(): fail to pause MCP %s: %v", dn.mcpName, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: fail -> Failed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

if err != nil {
glog.Errorf("drainNode(): Failed to annotate node: %v", err)
return err
mcpInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we add a check if this is the MCP that the node is part of even before we send it to the mcpEventHandler function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't know how we can do that. Normally, the informer watches all the objects. I think it's sufficient that we check it in the mcpEventHandler.

The config daemon instance on each node need to compete for the 'Draining' lock before
it can draining/rebooting a node.

After a config daemon instance gets the lock, it checks its MachineConfigPool conditions.
If the MCP is in ready state, it will pause the MCP then process, otherwise, it wait for
the MCP getting ready. The MCP will be resumed after the config daemon and release the
lock after the config dameon finishes its work.
@pliurh pliurh changed the title WIP: Pause MCP before draining/rebooting node Pause MCP before draining/rebooting node Apr 7, 2021
@pliurh pliurh merged commit f17625b into k8snetworkplumbingwg:master Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants