New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix kataconfig status handling to support installation updates #327
Fix kataconfig status handling to support installation updates #327
Conversation
This is a simpler and more transparent alternative to getMcp(). It will be useful in a subsequent commit. Signed-off-by: Pavel Mores <pmores@redhat.com>
If the MCO considers a Node Done, we check if its current and target MachineConfigs are the same. If they are then the Node is actually done and we append it either to the installation completed list or uninstallation completed list based on whether installation or uninstallation is in progress and whether the Node is a member of "worker" or "kata-oc". If they aren't it means that the Node is waiting for its MC to be changed as part of this ongoing installation or uninstallation. We don't append such Nodes to any list since there doesn't seem to be a good place for them in the current KataConfig.status structure. Signed-off-by: Pavel Mores <pmores@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My overall impression is that this PR really improves things.
Most of my remarks, except maybe the one on the ignored error, can still be addressed in a followup, so I'm approving 😃
Thanks @pmores !
75060c9
to
3a7ac8d
Compare
Force-pushed a fix to the inappropriate string literals usage. |
Working and Degraded Nodes are handled structurally similarly so they are handled by the same function. Working Nodes are put on an InProgress list, Degraded Nodes are put on a Failed list. If uninstallation is in progress Nodes go to uninstallation lists. If installation is in progress they go to installation lists on converged clusters or if they are labeled for kata, otherwise they go to uninstallation lists (this handles the case when kata is removed from a Node as part of kata installation modification on a cluster). Signed-off-by: Pavel Mores <pmores@redhat.com>
This puts together functionality added by the previous commits. The approach to KataConfig.status updating introduced here fundamentally differs from the existing updateStatus() function. The existing function only handled kata installation to and uninstallation from a cluster with no support for modifications (adding/removing kata-enabled Nodes) in the meantime. Thus it made sense to split status updating at the top level into two largely independent branches, one for installation and the other for uninstallation, each of which then iterated over Nodes and updated status based on the Nodes' MCO 'state' and other information. This model however breaks when kata installation modifications are introduced since they mean that kata may actually be uninstalled from Nodes as part of an operation that is nominally an installation. The approach introduced here kind of turns the existing one inside out. It iterates over *Nodes* at the top level and only then it figures out for each Node individually whether kata is being installed on it, uninstalled from it, or if the Node is left alone by the ongoing operation. Signed-off-by: Pavel Mores <pmores@redhat.com>
Any errors from the new status updating are just logged, status updating thus cannot mess up installation/uninstallation flow anymore. This is appropriate as status updating now does just that, update status, so any failures should not divert the main flow. This also removes the last opportunity for races like the recent uninstallation wedging due to updateStatus() failing to retrieve the "kata-oc" MCP which has already been deleted the uninstallation flow, and thus forcing the uninstallation flow to return early with an error and never to be able to finish. Signed-off-by: Pavel Mores <pmores@redhat.com>
With removal of the legacy updateStatus(), updateStatusNew() can be renamed to the original name. With removal of the legacy updateStatus() infrastructure getMcp() is left with no callers and can be removed. With removal of getMcp() the last remaining caller of getMcpNameIfMcpExists() is removed so this function can finally be removed too. This concludes a refactor started in commit 3adfbd5. Signed-off-by: Pavel Mores <pmores@redhat.com>
3a7ac8d
to
b8a7235
Compare
Works much better 😃 |
This PR aims to fix KataConfig.status updating to handle installation modifications (adding and removing kata-enabled Nodes) to the extent I believe possible without changing the KataConfig.status structure itself.
Some not so intuitive aspects of the result, apart from names dictated by the current form of KataConfig.status, include: