Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix kataconfig status handling to support installation updates #327

Conversation

pmores
Copy link
Contributor

@pmores pmores commented Jun 23, 2023

This PR aims to fix KataConfig.status updating to handle installation modifications (adding and removing kata-enabled Nodes) to the extent I believe possible without changing the KataConfig.status structure itself.

Some not so intuitive aspects of the result, apart from names dictated by the current form of KataConfig.status, include:

  • all Done Nodes will appear on one of the Completed lists, not just the ones affected by the ongoing or most recent operation. This is because I'm not aware of any way of telling apart Nodes that had arrived in their state earlier as opposed to as a result of the most recent operation.
  • Nodes scheduled to change state (=install/uninstall kata) as part of an ongoing operation but still waiting to do so are not listed anywhere since there doesn't seem to be a good place for them in the current KataConfig.status. They will properly appear though on an InProgress list once they actually start updating and on a Completed list once they finish updating.

This is a simpler and more transparent alternative to getMcp(). It will be
useful in a subsequent commit.

Signed-off-by: Pavel Mores <pmores@redhat.com>
If the MCO considers a Node Done, we check if its current and target
MachineConfigs are the same.

If they are then the Node is actually done and we append it either
to the installation completed list or uninstallation completed list based
on whether installation or uninstallation is in progress and whether
the Node is a member of "worker" or "kata-oc".

If they aren't it means that the Node is waiting for its MC to be changed
as part of this ongoing installation or uninstallation.  We don't append
such Nodes to any list since there doesn't seem to be a good place for
them in the current KataConfig.status structure.

Signed-off-by: Pavel Mores <pmores@redhat.com>
@openshift-ci openshift-ci bot requested review from cpmeadors and gkurz June 23, 2023 14:16
Copy link
Member

@gkurz gkurz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My overall impression is that this PR really improves things.
Most of my remarks, except maybe the one on the ignored error, can still be addressed in a followup, so I'm approving 😃

Thanks @pmores !

@pmores pmores force-pushed the fix-kataconfig-status-handling-to-support-installation-updates branch from 75060c9 to 3a7ac8d Compare June 28, 2023 16:16
@pmores
Copy link
Contributor Author

pmores commented Jun 28, 2023

Force-pushed a fix to the inappropriate string literals usage.

Working and Degraded Nodes are handled structurally similarly so they are
handled by the same function.

Working Nodes are put on an InProgress list, Degraded Nodes are put on
a Failed list.

If uninstallation is in progress Nodes go to uninstallation lists.
If installation is in progress they go to installation lists on converged
clusters or if they are labeled for kata, otherwise they go to
uninstallation lists (this handles the case when kata is removed from
a Node as part of kata installation modification on a cluster).

Signed-off-by: Pavel Mores <pmores@redhat.com>
This puts together functionality added by the previous commits.

The approach to KataConfig.status updating introduced here fundamentally
differs from the existing updateStatus() function.

The existing function only handled kata installation to and uninstallation
from a cluster with no support for modifications (adding/removing
kata-enabled Nodes) in the meantime.  Thus it made sense to split status
updating at the top level into two largely independent branches, one for
installation and the other for uninstallation, each of which then iterated
over Nodes and updated status based on the Nodes' MCO 'state' and other
information.

This model however breaks when kata installation modifications are
introduced since they mean that kata may actually be uninstalled from Nodes
as part of an operation that is nominally an installation.

The approach introduced here kind of turns the existing one inside out.
It iterates over *Nodes* at the top level and only then it figures out
for each Node individually whether kata is being installed on it,
uninstalled from it, or if the Node is left alone by the ongoing operation.

Signed-off-by: Pavel Mores <pmores@redhat.com>
Any errors from the new status updating are just logged, status updating
thus cannot mess up installation/uninstallation flow anymore.  This is
appropriate as status updating now does just that, update status, so any
failures should not divert the main flow.

This also removes the last opportunity for races like the recent
uninstallation wedging due to updateStatus() failing to retrieve the
"kata-oc" MCP which has already been deleted the uninstallation flow,
and thus forcing the uninstallation flow to return early with an error and
never to be able to finish.

Signed-off-by: Pavel Mores <pmores@redhat.com>
With removal of the legacy updateStatus(), updateStatusNew() can be renamed
to the original name.

With removal of the legacy updateStatus() infrastructure getMcp() is left
with no callers and can be removed.

With removal of getMcp() the last remaining caller of
getMcpNameIfMcpExists() is removed so this function can finally be removed
too.  This concludes a refactor started in commit 3adfbd5.

Signed-off-by: Pavel Mores <pmores@redhat.com>
@pmores pmores force-pushed the fix-kataconfig-status-handling-to-support-installation-updates branch from 3a7ac8d to b8a7235 Compare June 28, 2023 16:51
@gkurz
Copy link
Member

gkurz commented Jun 29, 2023

Force-pushed a fix to the inappropriate string literals usage.

Works much better 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants