New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix "kata-oc" node role handling #271
Conversation
stateOld := nodeOld.GetAnnotations()["machineconfiguration.openshift.io/state"] | ||
stateNew := nodeNew.GetAnnotations()["machineconfiguration.openshift.io/state"] | ||
if stateOld != stateNew { | ||
foundRelevantChange = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, you can just queue the reconcile request and return right?
There is no need to compare the labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For bare functionality, yes you're right. But we want the complete logs, trust me. ;-D I found it extremely helpful to know for a fact what the states of individual nodes are when trying to understand or debug something, esp. from an incomplete log file where you cannot check the earlier messages for information that you're missing.
controllers/openshift_controller.go
Outdated
r.Log.Info("couldn't get kata node selector", "err", err) | ||
// If we cannot find out whether a Node matches assuming that it | ||
// doesn't seems to be the safer assumption. This also seems | ||
// consistent with error-handling sematics of earlier similar code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// consistent with error-handling sematics of earlier similar code. | |
// consistent with error-handling semantics of earlier similar code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
if !reflect.DeepEqual(labelsOld, labelsNew) { | ||
|
||
log.Info("labels changed", "old", labelsOld, "new", labelsNew) | ||
added, modified, removed := getStringMapDiff(labelsOld, labelsNew) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like getStringMapDiff() is only used for logging? Can we just log the old and new labels and avoid the diff ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, it's only used for logging. I've added it because I was initially only logging the old and new labels and as it turns out, that's really hard to read. In practice, this is how an actual old and new labels log message can look like:
2023-03-07T09:25:04Z INFO controllers.KataConfig.NodeUpdate labels changed {"node name": "kata-412-stable-worker-1.stable412.com", "old": {"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"kata-412-stable-worker-1.stable412.com","kubernetes.io/os":"linux","node-role.kubernetes.io/worker":"","node.openshift.io/os_id":"rhcos"}, "new": {"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","custom-kata1":"test","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"kata-412-stable-worker-1.stable412.com","kubernetes.io/os":"linux","node-role.kubernetes.io/worker":"","node.openshift.io/os_id":"rhcos"}}
Now spot the difference. ;-) With the diff you get:
2023-03-07T09:25:04Z INFO controllers.KataConfig.NodeUpdate labels diff {"node name": "kata-412-stable-worker-1.stable412.com", "added": {"custom-kata1":"test"}, "modified": {}, "removed": {}}
I understand that including such a low-level generic function here can feel dissonant. I blame it to the general weakness of golang std lib though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth the effort :-) . Thanks for sharing the example..
err = r.unlabelNodes(&metav1.LabelSelector{MatchLabels: r.getNodeSelectorAsMapNoKataOc()}) | ||
if err != nil { | ||
if k8serrors.IsConflict(err) { | ||
return ctrl.Result{Requeue: true, RequeueAfter: 10 * time.Second}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: How about having the requeueAfter timeout as a constant for consistent usage across the code ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be dealt with explicitly and systematically across the controller code as part of KATA-1968. At the moment though, I'd probably be in favour of using timeouts based on a particular use-case.
E.g. when requeueing to wait for the MCO 10-15 seconds might be reasonable (the MCO usually takes about 5 s to react).
In this particular case, I'd be actually inclined to use a much shorter wait, like a second or two since this is caused by the often-occurring "the object has been modified" error where long wait times don't seem to have much benefit (in fact, retrying right away, without deferring to the requeueing mechanism, would probably succeed AFAICT).
We should properly discuss this before work on KATA-1968 starts. I do agree that 10 s is kind of arbitrary but it's going to change.
log := eh.reconciler.Log.WithName("NodeCreate").WithValues("node name", node.GetName()) | ||
log.Info("node created") | ||
|
||
if !isWorkerNode(node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing that the intent is to skip masters in regular setups but not in converged setups where we always have !isWorkerNode(node) == false
(IIUC the outcome of your experiments) , right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, no. It's a bit tricky. This is how a converged cluster looks like:
oc get nodes
NAME STATUS ROLES AGE VERSION
converged-412-ctlplane-0.converged412.com Ready control-plane,master,worker 45h v1.25.4+18eadca
converged-412-ctlplane-1.converged412.com Ready control-plane,master,worker 45h v1.25.4+18eadca
converged-412-ctlplane-2.converged412.com Ready control-plane,master,worker 45h v1.25.4+18eadca
Note how all masters are also workers. So this code still works on converged clusters although much of it is not really needed in that environment. Since kataConfigPoolSelector
cannot be supported on converged clusters, the only operations possible are basically just install kata on all nodes and uninstall kata from all nodes. The only useful part of node watching on converged clusters is triggering reconciliation on machineconfiguration.openshift.io/state
changes, and that works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you're right, I misread the condition in your comment, I apologize. We do want to avoid masters in normal clusters but not in converged ones, and we use the fact that all masters in converged clusters also have the "worker" node role label to implement that.
if err != nil { | ||
r.Log.Error(err, "Error when removing labels from node", "node", node) | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand the reason for not calling unlabelNode()
actually...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered that but it doesn't really seem all that beneficial. unlabelNode()
has two lines of substantial code, the rest is error handling. But since it can still return an error, the error handling here, at the call site, would need to remain. So using unlabelNode()
would save a single line here but would require us to keep the whole unlabelNode()
function.
Additionally, given that unlabelNode()
will have no other callers, I really think that doing the unlabeling here and removing unlabelNode()
in the final commit of this PR is a better solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that unlabelNode()
doesn't bring much and could have been open-coded from the start. I guess I would just say something along those lines in the commit message : useless helper, better done open-coded, will be removed later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is mentioned in the commit message, isn't it? In the final paragraph I mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well not exactly but nevermind. Let's go forward ! 😉
@@ -620,6 +620,17 @@ func (r *KataConfigOpenShiftReconciler) getNodeSelectorAsMap() map[string]string | |||
return nodeSelector | |||
} | |||
|
|||
func (r *KataConfigOpenShiftReconciler) getNodeSelectorAsMapNoKataOc() map[string]string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure to understand the semantics of this function and thus the last sentence of the commit message...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's basically getNodeSelectorAsMap()
without the "kata-oc" requirement. With this PR we're moving away from "combined" node selectors that contain both the "canonical" label requirements coming from KataConfig (mostly kataConfigPoolSelector
) and the "kata-oc" node role derived from them. This is a temporary function that returns just the "canonical" part to bridge the gap and keep the controller compilable until a proper implementation is in place.
Don't worry, this will be removed soon (I have a subsequent PR pretty much ready that will do that, pending on merge of this one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @pmores !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks @pmores for the detail commit messages and also patiently addressing the queries.
As it turns out we cannot reliably handle Node labelling changes without watching Nodes explicitly. Notably, labelling a new Node to add it to a kata-enabled nodes pool will not work (KATA-1928[*]) since this controller will never learn about it. The immediate reason is that the controller only watches MCPs but no MCP is changed. In particular the "kata-oc" MCP doesn't change since just adding labels so that a Node matches KataConfig.spec.kataConfigPoolSelector doesn't yet make it a member of the "kata-oc" MCP because the MCP membership also requires the "kata-oc" node role label on the Node. The node role is supposed is to added (and managed) by this controller yet it cannot do that since there's nothing to trigger the appropriate reconciliation. To fix this, this change introduces Node resource watching. This enables us to trigger reconciliation on relevant Node resource changes. More specifically we only trigger reconciliation when the changed Node is a worker. On creation we reconcile if the new Node matches the kataConfigPoolSelector on the assumption that non-matching Nodes are of no interest to us. We ignore Node deletion since it's not clear what good a reconciliation could do with a Node resource that's already unavailable. On update we check for relevant changes in the Node resource, namely the machineconfiguration.openshift.io/state annotation (which is used by reconciliation code) and labelling changes that change the Node's membership in kataConfigPoolSelector. The addition of getNodeSelectorAsMapNoKataOc() should be considered temporary, pending a future refactoring of kata node selector handling. [*] https://issues.redhat.com/browse/KATA-1928 Signed-off-by: Pavel Mores <pmores@redhat.com>
This commit just prepares the ground, introducing new functionality but actually changing nothing yet. updateNodeLabels() is, in a way, a generalisation of the existing helper labelNodes(). In addition to what labelNodes() already does, i.e. put the "kata-oc" node role on every Node that matches kataConfigPoolSelector, the new updateNodeLabels() function also ensures that Nodes which do not match the kata selector (anymore) do not have the "kata-oc" node role either. It thus adds a critical piece of functionality that was missing from the controller code and whose lack made it impossible to manage the "kata-oc" node role label correctly. unlabelNodes() does what unlabelNode() has been doing but for all pertinent Nodes. Due to the way Golang error handling works, it unfortunately seems impractical for unlabelNodes() to employ unlabelNode() as its own helper so long as unlabelNode() can return an error. Signed-off-by: Pavel Mores <pmores@redhat.com>
This commit changes the "kata-oc" node role label handling as follows. The "kata-oc" MCP's nodeSelector now only ever contains just a single requirement - that Node have the "kata-oc" node role label. Installation procedure now updates node role labels to match the current kataConfigPoolSelector as the first step. Label update was pulled out of newMCPforCR() into processKataConfigInstallRequest() since it's a first-class concept of the installation algorithm. It also handles removing the "kata-oc" role from Nodes that don't match the kata node selector anymore (see also previous commit). This is an improvement on the previous state which only ever added labels during installation. That was wrong since kata node selector updates are handled as installations. This is true even if they end up narrowing the kata-enabled Node pool and thus actually uninstalling kata from (a set of) Nodes. Since the "kata-oc" MCP's nodeSelector is now immutable, code to update it was removed accordingly from installation handling. Uninstallation procedure now removes the "kata-oc" node role label from all Nodes as its first step. This doesn't really change the semantics of the algorithm, strictly speaking, but makes for a much more streamlined and tractable sequence of uninstallation steps. Unlabeling first drains the "kata-oc" pool immediately and the rest of uninstallation is just waiting for individual Nodes to become Updated/Ready in "worker". The steps used so far started with removal of the kata extension MC which sent Nodes updating to a new machine configuration that now matched that of "worker" but still left them members of the "kata-oc" pool. Only after individual Nodes finished updating they were unlabeled one by one and became members of "worker". While the end result is the same AFAICT this was a rather roundabout way and quite a bit harder to follow/make sense of. This commit also makes the uninstallation symmetrical to installation from the MCO standpoint, just with the roles of the source and target pool swapped between "kata-oc" and "worker". Signed-off-by: Pavel Mores <pmores@redhat.com> Fixes: https://issues.redhat.com/browse/KATA-1928
newMCPforCR() now cannot fail so the error return value was removed. labelNodes() and unlabelNode() are not called anymore and thus obsolete so they were removed, too. Signed-off-by: Pavel Mores <pmores@redhat.com>
f02b80f
to
a36f8a8
Compare
New changes are detected. LGTM label has been removed. |
Force-push just to fix the typo pointed out by @littlejawa . Thanks a lot to reviewers! |
This PR modifies handling of the "kata-oc" node label so that changes of the pool of kata-enabled workers can be handled correctly.
The OSC controller now processes user-supplied label requirements internally and doesn't "publish" them in its "kata-oc" MCP
nodeSelector
. Instead, if the new internal label processing results in a match for a Node the node is labelled with the "kata-oc" node role, and similarly on mismatch the controller ensures the pertinent Node does not have the node role. The "kata-oc" node role is the single unambiguous externally-visible sign whether a Node labels did or did not matchKataConfig.spec.kataConfigPoolSelector
.This, with an introduction of Node resource watching, allows us to handle correctly any changes in Node labeling that affect whether a Node question should or should not have kata installed on it.
Fixes: KATA-1928