Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "kata-oc" node role handling #271

Merged
merged 4 commits into from Mar 9, 2023

Conversation

pmores
Copy link
Contributor

@pmores pmores commented Mar 2, 2023

This PR modifies handling of the "kata-oc" node label so that changes of the pool of kata-enabled workers can be handled correctly.

The OSC controller now processes user-supplied label requirements internally and doesn't "publish" them in its "kata-oc" MCP nodeSelector. Instead, if the new internal label processing results in a match for a Node the node is labelled with the "kata-oc" node role, and similarly on mismatch the controller ensures the pertinent Node does not have the node role. The "kata-oc" node role is the single unambiguous externally-visible sign whether a Node labels did or did not match KataConfig.spec.kataConfigPoolSelector.

This, with an introduction of Node resource watching, allows us to handle correctly any changes in Node labeling that affect whether a Node question should or should not have kata installed on it.

Fixes: KATA-1928

stateOld := nodeOld.GetAnnotations()["machineconfiguration.openshift.io/state"]
stateNew := nodeNew.GetAnnotations()["machineconfiguration.openshift.io/state"]
if stateOld != stateNew {
foundRelevantChange = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, you can just queue the reconcile request and return right?
There is no need to compare the labels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For bare functionality, yes you're right. But we want the complete logs, trust me. ;-D I found it extremely helpful to know for a fact what the states of individual nodes are when trying to understand or debug something, esp. from an incomplete log file where you cannot check the earlier messages for information that you're missing.

r.Log.Info("couldn't get kata node selector", "err", err)
// If we cannot find out whether a Node matches assuming that it
// doesn't seems to be the safer assumption. This also seems
// consistent with error-handling sematics of earlier similar code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
// consistent with error-handling sematics of earlier similar code.
// consistent with error-handling semantics of earlier similar code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if !reflect.DeepEqual(labelsOld, labelsNew) {

log.Info("labels changed", "old", labelsOld, "new", labelsNew)
added, modified, removed := getStringMapDiff(labelsOld, labelsNew)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like getStringMapDiff() is only used for logging? Can we just log the old and new labels and avoid the diff ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, it's only used for logging. I've added it because I was initially only logging the old and new labels and as it turns out, that's really hard to read. In practice, this is how an actual old and new labels log message can look like:

2023-03-07T09:25:04Z INFO controllers.KataConfig.NodeUpdate labels changed {"node name": "kata-412-stable-worker-1.stable412.com", "old": {"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"kata-412-stable-worker-1.stable412.com","kubernetes.io/os":"linux","node-role.kubernetes.io/worker":"","node.openshift.io/os_id":"rhcos"}, "new": {"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","custom-kata1":"test","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"kata-412-stable-worker-1.stable412.com","kubernetes.io/os":"linux","node-role.kubernetes.io/worker":"","node.openshift.io/os_id":"rhcos"}}

Now spot the difference. ;-) With the diff you get:

2023-03-07T09:25:04Z INFO controllers.KataConfig.NodeUpdate labels diff {"node name": "kata-412-stable-worker-1.stable412.com", "added": {"custom-kata1":"test"}, "modified": {}, "removed": {}}

I understand that including such a low-level generic function here can feel dissonant. I blame it to the general weakness of golang std lib though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth the effort :-) . Thanks for sharing the example..

err = r.unlabelNodes(&metav1.LabelSelector{MatchLabels: r.getNodeSelectorAsMapNoKataOc()})
if err != nil {
if k8serrors.IsConflict(err) {
return ctrl.Result{Requeue: true, RequeueAfter: 10 * time.Second}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: How about having the requeueAfter timeout as a constant for consistent usage across the code ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be dealt with explicitly and systematically across the controller code as part of KATA-1968. At the moment though, I'd probably be in favour of using timeouts based on a particular use-case.

E.g. when requeueing to wait for the MCO 10-15 seconds might be reasonable (the MCO usually takes about 5 s to react).

In this particular case, I'd be actually inclined to use a much shorter wait, like a second or two since this is caused by the often-occurring "the object has been modified" error where long wait times don't seem to have much benefit (in fact, retrying right away, without deferring to the requeueing mechanism, would probably succeed AFAICT).

We should properly discuss this before work on KATA-1968 starts. I do agree that 10 s is kind of arbitrary but it's going to change.

log := eh.reconciler.Log.WithName("NodeCreate").WithValues("node name", node.GetName())
log.Info("node created")

if !isWorkerNode(node) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that the intent is to skip masters in regular setups but not in converged setups where we always have !isWorkerNode(node) == false (IIUC the outcome of your experiments) , right ?

Copy link
Contributor Author

@pmores pmores Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no. It's a bit tricky. This is how a converged cluster looks like:

oc get nodes
NAME                                        STATUS   ROLES                         AGE   VERSION
converged-412-ctlplane-0.converged412.com   Ready    control-plane,master,worker   45h   v1.25.4+18eadca
converged-412-ctlplane-1.converged412.com   Ready    control-plane,master,worker   45h   v1.25.4+18eadca
converged-412-ctlplane-2.converged412.com   Ready    control-plane,master,worker   45h   v1.25.4+18eadca

Note how all masters are also workers. So this code still works on converged clusters although much of it is not really needed in that environment. Since kataConfigPoolSelector cannot be supported on converged clusters, the only operations possible are basically just install kata on all nodes and uninstall kata from all nodes. The only useful part of node watching on converged clusters is triggering reconciliation on machineconfiguration.openshift.io/state changes, and that works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you're right, I misread the condition in your comment, I apologize. We do want to avoid masters in normal clusters but not in converged ones, and we use the fact that all masters in converged clusters also have the "worker" node role label to implement that.

if err != nil {
r.Log.Error(err, "Error when removing labels from node", "node", node)
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the reason for not calling unlabelNode() actually...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that but it doesn't really seem all that beneficial. unlabelNode() has two lines of substantial code, the rest is error handling. But since it can still return an error, the error handling here, at the call site, would need to remain. So using unlabelNode() would save a single line here but would require us to keep the whole unlabelNode() function.

Additionally, given that unlabelNode() will have no other callers, I really think that doing the unlabeling here and removing unlabelNode() in the final commit of this PR is a better solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that unlabelNode() doesn't bring much and could have been open-coded from the start. I guess I would just say something along those lines in the commit message : useless helper, better done open-coded, will be removed later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is mentioned in the commit message, isn't it? In the final paragraph I mean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well not exactly but nevermind. Let's go forward ! 😉

@@ -620,6 +620,17 @@ func (r *KataConfigOpenShiftReconciler) getNodeSelectorAsMap() map[string]string
return nodeSelector
}

func (r *KataConfigOpenShiftReconciler) getNodeSelectorAsMapNoKataOc() map[string]string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure to understand the semantics of this function and thus the last sentence of the commit message...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically getNodeSelectorAsMap() without the "kata-oc" requirement. With this PR we're moving away from "combined" node selectors that contain both the "canonical" label requirements coming from KataConfig (mostly kataConfigPoolSelector) and the "kata-oc" node role derived from them. This is a temporary function that returns just the "canonical" part to bridge the gap and keep the controller compilable until a proper implementation is in place.

Don't worry, this will be removed soon (I have a subsequent PR pretty much ready that will do that, pending on merge of this one).

Copy link
Member

@gkurz gkurz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @pmores !

Copy link
Contributor

@bpradipt bpradipt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Thanks @pmores for the detail commit messages and also patiently addressing the queries.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 9, 2023
Pavel Mores and others added 4 commits March 9, 2023 17:18
As it turns out we cannot reliably handle Node labelling changes without
watching Nodes explicitly.  Notably, labelling a new Node to add it to a
kata-enabled nodes pool will not work (KATA-1928[*]) since this controller
will never learn about it.

The immediate reason is that the controller only watches MCPs but no MCP
is changed.  In particular the "kata-oc" MCP doesn't change since just
adding labels so that a Node matches KataConfig.spec.kataConfigPoolSelector
doesn't yet make it a member of the "kata-oc" MCP because the MCP
membership also requires the "kata-oc" node role label on the Node.  The
node role is supposed is to added (and managed) by this controller yet it
cannot do that since there's nothing to trigger the appropriate
reconciliation.

To fix this, this change introduces Node resource watching.  This enables
us to trigger reconciliation on relevant Node resource changes.  More
specifically we only trigger reconciliation when the changed Node is a
worker.

On creation we reconcile if the new Node matches the kataConfigPoolSelector
on the assumption that non-matching Nodes are of no interest to us.

We ignore Node deletion since it's not clear what good a reconciliation
could do with a Node resource that's already unavailable.

On update we check for relevant changes in the Node resource, namely the
machineconfiguration.openshift.io/state annotation (which is used by
reconciliation code) and labelling changes that change the Node's
membership in kataConfigPoolSelector.

The addition of getNodeSelectorAsMapNoKataOc() should be considered
temporary, pending a future refactoring of kata node selector handling.

[*] https://issues.redhat.com/browse/KATA-1928

Signed-off-by: Pavel Mores <pmores@redhat.com>
This commit just prepares the ground, introducing new functionality but
actually changing nothing yet.

updateNodeLabels() is, in a way, a generalisation of the existing helper
labelNodes().  In addition to what labelNodes() already does, i.e. put
the "kata-oc" node role on every Node that matches
kataConfigPoolSelector, the new updateNodeLabels() function also ensures
that Nodes which do not match the kata selector (anymore) do not have
the "kata-oc" node role either.  It thus adds a critical piece of
functionality that was missing from the controller code and whose lack
made it impossible to manage the "kata-oc" node role label correctly.

unlabelNodes() does what unlabelNode() has been doing but for all
pertinent Nodes.  Due to the way Golang error handling works, it
unfortunately seems impractical for unlabelNodes() to employ
unlabelNode() as its own helper so long as unlabelNode() can return an
error.

Signed-off-by: Pavel Mores <pmores@redhat.com>
This commit changes the "kata-oc" node role label handling as follows.

The "kata-oc" MCP's nodeSelector now only ever contains just a single
requirement - that Node have the "kata-oc" node role label.

Installation procedure now updates node role labels to match the current
kataConfigPoolSelector as the first step.  Label update was pulled out of
newMCPforCR() into processKataConfigInstallRequest() since it's a
first-class concept of the installation algorithm.  It also handles
removing the "kata-oc" role from Nodes that don't match the kata node
selector anymore (see also previous commit).  This is an improvement on the
previous state which only ever added labels during installation.  That was
wrong since kata node selector updates are handled as installations.  This
is true even if they end up narrowing the kata-enabled Node pool and thus
actually uninstalling kata from (a set of) Nodes.

Since the "kata-oc" MCP's nodeSelector is now immutable, code to update it
was removed accordingly from installation handling.

Uninstallation procedure now removes the "kata-oc" node role label from all
Nodes as its first step.  This doesn't really change the semantics of the
algorithm, strictly speaking, but makes for a much more streamlined and
tractable sequence of uninstallation steps.  Unlabeling first drains the
"kata-oc" pool immediately and the rest of uninstallation is just waiting
for individual Nodes to become Updated/Ready in "worker".  The steps used
so far started with removal of the kata extension MC which sent Nodes
updating to a new machine configuration that now matched that of "worker"
but still left them members of the "kata-oc" pool.  Only after individual
Nodes finished updating they were unlabeled one by one and became members
of "worker".  While the end result is the same AFAICT this was a rather
roundabout way and quite a bit harder to follow/make sense of.  This commit
also makes the uninstallation symmetrical to installation from the MCO
standpoint, just with the roles of the source and target pool swapped
between "kata-oc" and "worker".

Signed-off-by: Pavel Mores <pmores@redhat.com>
Fixes: https://issues.redhat.com/browse/KATA-1928
newMCPforCR() now cannot fail so the error return value was removed.
labelNodes() and unlabelNode() are not called anymore and thus obsolete
so they were removed, too.

Signed-off-by: Pavel Mores <pmores@redhat.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 9, 2023
@openshift-ci
Copy link

openshift-ci bot commented Mar 9, 2023

New changes are detected. LGTM label has been removed.

@pmores
Copy link
Contributor Author

pmores commented Mar 9, 2023

Force-push just to fix the typo pointed out by @littlejawa .

Thanks a lot to reviewers!

@pmores pmores merged commit 3695336 into openshift:main Mar 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants