Fix "kata-oc" node role handling #271

pmores · 2023-03-02T11:49:37Z

This PR modifies handling of the "kata-oc" node label so that changes of the pool of kata-enabled workers can be handled correctly.

The OSC controller now processes user-supplied label requirements internally and doesn't "publish" them in its "kata-oc" MCP nodeSelector. Instead, if the new internal label processing results in a match for a Node the node is labelled with the "kata-oc" node role, and similarly on mismatch the controller ensures the pertinent Node does not have the node role. The "kata-oc" node role is the single unambiguous externally-visible sign whether a Node labels did or did not match KataConfig.spec.kataConfigPoolSelector.

This, with an introduction of Node resource watching, allows us to handle correctly any changes in Node labeling that affect whether a Node question should or should not have kata installed on it.

Fixes: KATA-1928

littlejawa · 2023-03-02T13:55:19Z

controllers/openshift_controller.go

+	stateOld := nodeOld.GetAnnotations()["machineconfiguration.openshift.io/state"]
+	stateNew := nodeNew.GetAnnotations()["machineconfiguration.openshift.io/state"]
+	if stateOld != stateNew {
+		foundRelevantChange = true


At this point, you can just queue the reconcile request and return right?
There is no need to compare the labels?

For bare functionality, yes you're right. But we want the complete logs, trust me. ;-D I found it extremely helpful to know for a fact what the states of individual nodes are when trying to understand or debug something, esp. from an incomplete log file where you cannot check the earlier messages for information that you're missing.

littlejawa · 2023-03-02T13:57:45Z

controllers/openshift_controller.go

+		r.Log.Info("couldn't get kata node selector", "err", err)
+		// If we cannot find out whether a Node matches assuming that it
+		// doesn't seems to be the safer assumption.  This also seems
+		// consistent with error-handling sematics of earlier similar code.


nit:

Suggested change

// consistent with error-handling sematics of earlier similar code.

// consistent with error-handling semantics of earlier similar code.

good catch, thanks!

bpradipt · 2023-03-09T09:52:41Z

controllers/openshift_controller.go

+	if !reflect.DeepEqual(labelsOld, labelsNew) {
+
+		log.Info("labels changed", "old", labelsOld, "new", labelsNew)
+		added, modified, removed := getStringMapDiff(labelsOld, labelsNew)


Looks like getStringMapDiff() is only used for logging? Can we just log the old and new labels and avoid the diff ?

That's right, it's only used for logging. I've added it because I was initially only logging the old and new labels and as it turns out, that's really hard to read. In practice, this is how an actual old and new labels log message can look like:

2023-03-07T09:25:04Z INFO controllers.KataConfig.NodeUpdate labels changed {"node name": "kata-412-stable-worker-1.stable412.com", "old": {"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"kata-412-stable-worker-1.stable412.com","kubernetes.io/os":"linux","node-role.kubernetes.io/worker":"","node.openshift.io/os_id":"rhcos"}, "new": {"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","custom-kata1":"test","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"kata-412-stable-worker-1.stable412.com","kubernetes.io/os":"linux","node-role.kubernetes.io/worker":"","node.openshift.io/os_id":"rhcos"}}

Now spot the difference. ;-) With the diff you get:

2023-03-07T09:25:04Z INFO controllers.KataConfig.NodeUpdate labels diff {"node name": "kata-412-stable-worker-1.stable412.com", "added": {"custom-kata1":"test"}, "modified": {}, "removed": {}}

I understand that including such a low-level generic function here can feel dissonant. I blame it to the general weakness of golang std lib though.

Worth the effort :-) . Thanks for sharing the example..

bpradipt · 2023-03-09T09:58:13Z

controllers/openshift_controller.go

+	err = r.unlabelNodes(&metav1.LabelSelector{MatchLabels: r.getNodeSelectorAsMapNoKataOc()})
+	if err != nil {
+		if k8serrors.IsConflict(err) {
+			return ctrl.Result{Requeue: true, RequeueAfter: 10 * time.Second}, nil


nit: How about having the requeueAfter timeout as a constant for consistent usage across the code ?

This will be dealt with explicitly and systematically across the controller code as part of KATA-1968. At the moment though, I'd probably be in favour of using timeouts based on a particular use-case.

E.g. when requeueing to wait for the MCO 10-15 seconds might be reasonable (the MCO usually takes about 5 s to react).

In this particular case, I'd be actually inclined to use a much shorter wait, like a second or two since this is caused by the often-occurring "the object has been modified" error where long wait times don't seem to have much benefit (in fact, retrying right away, without deferring to the requeueing mechanism, would probably succeed AFAICT).

We should properly discuss this before work on KATA-1968 starts. I do agree that 10 s is kind of arbitrary but it's going to change.

gkurz · 2023-03-09T10:07:04Z

controllers/openshift_controller.go

+	log := eh.reconciler.Log.WithName("NodeCreate").WithValues("node name", node.GetName())
+	log.Info("node created")
+
+	if !isWorkerNode(node) {


I'm guessing that the intent is to skip masters in regular setups but not in converged setups where we always have !isWorkerNode(node) == false (IIUC the outcome of your experiments) , right ?

Actually, no. It's a bit tricky. This is how a converged cluster looks like:

oc get nodes NAME STATUS ROLES AGE VERSION converged-412-ctlplane-0.converged412.com Ready control-plane,master,worker 45h v1.25.4+18eadca converged-412-ctlplane-1.converged412.com Ready control-plane,master,worker 45h v1.25.4+18eadca converged-412-ctlplane-2.converged412.com Ready control-plane,master,worker 45h v1.25.4+18eadca

Note how all masters are also workers. So this code still works on converged clusters although much of it is not really needed in that environment. Since kataConfigPoolSelector cannot be supported on converged clusters, the only operations possible are basically just install kata on all nodes and uninstall kata from all nodes. The only useful part of node watching on converged clusters is triggering reconciliation on machineconfiguration.openshift.io/state changes, and that works.

Actually you're right, I misread the condition in your comment, I apologize. We do want to avoid masters in normal clusters but not in converged ones, and we use the fact that all masters in converged clusters also have the "worker" node role label to implement that.

gkurz · 2023-03-09T10:41:02Z

controllers/openshift_controller.go

+			if err != nil {
+				r.Log.Error(err, "Error when removing labels from node", "node", node)
+				return err
+			}


I don't quite understand the reason for not calling unlabelNode() actually...

I considered that but it doesn't really seem all that beneficial. unlabelNode() has two lines of substantial code, the rest is error handling. But since it can still return an error, the error handling here, at the call site, would need to remain. So using unlabelNode() would save a single line here but would require us to keep the whole unlabelNode() function.

Additionally, given that unlabelNode() will have no other callers, I really think that doing the unlabeling here and removing unlabelNode() in the final commit of this PR is a better solution.

I agree that unlabelNode() doesn't bring much and could have been open-coded from the start. I guess I would just say something along those lines in the commit message : useless helper, better done open-coded, will be removed later.

I think it is mentioned in the commit message, isn't it? In the final paragraph I mean.

Well not exactly but nevermind. Let's go forward ! 😉

gkurz · 2023-03-09T13:12:41Z

controllers/openshift_controller.go

@@ -620,6 +620,17 @@ func (r *KataConfigOpenShiftReconciler) getNodeSelectorAsMap() map[string]string
 	return nodeSelector
 }

+func (r *KataConfigOpenShiftReconciler) getNodeSelectorAsMapNoKataOc() map[string]string {


Not sure to understand the semantics of this function and thus the last sentence of the commit message...

It's basically getNodeSelectorAsMap() without the "kata-oc" requirement. With this PR we're moving away from "combined" node selectors that contain both the "canonical" label requirements coming from KataConfig (mostly kataConfigPoolSelector) and the "kata-oc" node role derived from them. This is a temporary function that returns just the "canonical" part to bridge the gap and keep the controller compilable until a proper implementation is in place.

Don't worry, this will be removed soon (I have a subsequent PR pretty much ready that will do that, pending on merge of this one).

gkurz

LGTM. Thanks @pmores !

bpradipt

/lgtm
Thanks @pmores for the detail commit messages and also patiently addressing the queries.

As it turns out we cannot reliably handle Node labelling changes without watching Nodes explicitly. Notably, labelling a new Node to add it to a kata-enabled nodes pool will not work (KATA-1928[*]) since this controller will never learn about it. The immediate reason is that the controller only watches MCPs but no MCP is changed. In particular the "kata-oc" MCP doesn't change since just adding labels so that a Node matches KataConfig.spec.kataConfigPoolSelector doesn't yet make it a member of the "kata-oc" MCP because the MCP membership also requires the "kata-oc" node role label on the Node. The node role is supposed is to added (and managed) by this controller yet it cannot do that since there's nothing to trigger the appropriate reconciliation. To fix this, this change introduces Node resource watching. This enables us to trigger reconciliation on relevant Node resource changes. More specifically we only trigger reconciliation when the changed Node is a worker. On creation we reconcile if the new Node matches the kataConfigPoolSelector on the assumption that non-matching Nodes are of no interest to us. We ignore Node deletion since it's not clear what good a reconciliation could do with a Node resource that's already unavailable. On update we check for relevant changes in the Node resource, namely the machineconfiguration.openshift.io/state annotation (which is used by reconciliation code) and labelling changes that change the Node's membership in kataConfigPoolSelector. The addition of getNodeSelectorAsMapNoKataOc() should be considered temporary, pending a future refactoring of kata node selector handling. [*] https://issues.redhat.com/browse/KATA-1928 Signed-off-by: Pavel Mores <pmores@redhat.com>

This commit just prepares the ground, introducing new functionality but actually changing nothing yet. updateNodeLabels() is, in a way, a generalisation of the existing helper labelNodes(). In addition to what labelNodes() already does, i.e. put the "kata-oc" node role on every Node that matches kataConfigPoolSelector, the new updateNodeLabels() function also ensures that Nodes which do not match the kata selector (anymore) do not have the "kata-oc" node role either. It thus adds a critical piece of functionality that was missing from the controller code and whose lack made it impossible to manage the "kata-oc" node role label correctly. unlabelNodes() does what unlabelNode() has been doing but for all pertinent Nodes. Due to the way Golang error handling works, it unfortunately seems impractical for unlabelNodes() to employ unlabelNode() as its own helper so long as unlabelNode() can return an error. Signed-off-by: Pavel Mores <pmores@redhat.com>

This commit changes the "kata-oc" node role label handling as follows. The "kata-oc" MCP's nodeSelector now only ever contains just a single requirement - that Node have the "kata-oc" node role label. Installation procedure now updates node role labels to match the current kataConfigPoolSelector as the first step. Label update was pulled out of newMCPforCR() into processKataConfigInstallRequest() since it's a first-class concept of the installation algorithm. It also handles removing the "kata-oc" role from Nodes that don't match the kata node selector anymore (see also previous commit). This is an improvement on the previous state which only ever added labels during installation. That was wrong since kata node selector updates are handled as installations. This is true even if they end up narrowing the kata-enabled Node pool and thus actually uninstalling kata from (a set of) Nodes. Since the "kata-oc" MCP's nodeSelector is now immutable, code to update it was removed accordingly from installation handling. Uninstallation procedure now removes the "kata-oc" node role label from all Nodes as its first step. This doesn't really change the semantics of the algorithm, strictly speaking, but makes for a much more streamlined and tractable sequence of uninstallation steps. Unlabeling first drains the "kata-oc" pool immediately and the rest of uninstallation is just waiting for individual Nodes to become Updated/Ready in "worker". The steps used so far started with removal of the kata extension MC which sent Nodes updating to a new machine configuration that now matched that of "worker" but still left them members of the "kata-oc" pool. Only after individual Nodes finished updating they were unlabeled one by one and became members of "worker". While the end result is the same AFAICT this was a rather roundabout way and quite a bit harder to follow/make sense of. This commit also makes the uninstallation symmetrical to installation from the MCO standpoint, just with the roles of the source and target pool swapped between "kata-oc" and "worker". Signed-off-by: Pavel Mores <pmores@redhat.com> Fixes: https://issues.redhat.com/browse/KATA-1928

newMCPforCR() now cannot fail so the error return value was removed. labelNodes() and unlabelNode() are not called anymore and thus obsolete so they were removed, too. Signed-off-by: Pavel Mores <pmores@redhat.com>

openshift-ci · 2023-03-09T16:22:14Z

New changes are detected. LGTM label has been removed.

pmores · 2023-03-09T16:24:40Z

Force-push just to fix the typo pointed out by @littlejawa .

Thanks a lot to reviewers!

openshift-ci bot requested review from cpmeadors and littlejawa March 2, 2023 11:50

littlejawa reviewed Mar 2, 2023

View reviewed changes

bpradipt reviewed Mar 9, 2023

View reviewed changes

gkurz reviewed Mar 9, 2023

View reviewed changes

gkurz approved these changes Mar 9, 2023

View reviewed changes

bpradipt approved these changes Mar 9, 2023

View reviewed changes

openshift-ci bot assigned bpradipt Mar 9, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 9, 2023

Pavel Mores and others added 4 commits March 9, 2023 17:18

clean-up of node role handling changes

a36f8a8

newMCPforCR() now cannot fail so the error return value was removed. labelNodes() and unlabelNode() are not called anymore and thus obsolete so they were removed, too. Signed-off-by: Pavel Mores <pmores@redhat.com>

pmores force-pushed the fix-node-label-handling branch from f02b80f to a36f8a8 Compare March 9, 2023 16:22

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 9, 2023

pmores merged commit 3695336 into openshift:main Mar 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix "kata-oc" node role handling #271

Fix "kata-oc" node role handling #271

pmores commented Mar 2, 2023 •

edited by openshift-ci bot

littlejawa Mar 2, 2023

pmores Mar 2, 2023

littlejawa Mar 2, 2023

pmores Mar 2, 2023

pmores Mar 9, 2023

bpradipt Mar 9, 2023

pmores Mar 9, 2023

bpradipt Mar 9, 2023

bpradipt Mar 9, 2023

pmores Mar 9, 2023

gkurz Mar 9, 2023

pmores Mar 9, 2023 •

edited

pmores Mar 9, 2023

gkurz Mar 9, 2023

pmores Mar 9, 2023

gkurz Mar 9, 2023

pmores Mar 9, 2023

gkurz Mar 9, 2023

gkurz Mar 9, 2023

pmores Mar 9, 2023

gkurz left a comment

bpradipt left a comment

openshift-ci bot commented Mar 9, 2023

pmores commented Mar 9, 2023

	// consistent with error-handling sematics of earlier similar code.
	// consistent with error-handling semantics of earlier similar code.

Fix "kata-oc" node role handling #271

Fix "kata-oc" node role handling #271

Conversation

pmores commented Mar 2, 2023 • edited by openshift-ci bot

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmores Mar 9, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gkurz left a comment

Choose a reason for hiding this comment

bpradipt left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Mar 9, 2023

pmores commented Mar 9, 2023

pmores commented Mar 2, 2023 •

edited by openshift-ci bot

pmores Mar 9, 2023 •

edited