New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove legacy kata config status parts #333
Remove legacy kata config status parts #333
Conversation
As it turns out, without this it would be rather hard (likely impossible) to keep backwards compatibility for the "Completed" AdditionalPrinterColumn in the KataConfig CRD. As simple as the semantics of readyNodeCount are (literally just the length of status.kataNodes.installed), the syntax of crd.spec.versions.additionalPrinterColumns only allows jsonPath to define the value of a column, and k8s' JSON Path implementation doesn't seem to support getting a length of an array. (The go-template output format of kubectl/oc would make this trivial but go-template doesn't seem to be supported by CRD at all.) Signed-off-by: Pavel Mores <pmores@redhat.com>
This implicitly improves the behaviour of the InProgress and Completed columns which ignored uninstallation so far. Signed-off-by: Pavel Mores <pmores@redhat.com>
Handling of uninstallation being blocked by running kata pods turned out to fit the InProgress Condition neatly by introducing a new "BlockedByExistingKataPods" reason for _not_ being in-progress. The rest of the migration is pretty straightforward - setting the legacy IsInProgress boolean to True corresponds to setInProgressConditionToUninstalling(), setting it to False corresponds to resetInProgressCondition(). One place that set the boolean to True was spurious (it was set to True already) so it's removed without replacement. Signed-off-by: Pavel Mores <pmores@redhat.com>
While the idea is simple and the same as with the uninstallation flow in the previous commit, installation actually did use InstallationStatus.IsInProgress in its implementation so the replacement is a bit more tricky. It's solved by observing that the legacy field was only set if the MCO was updating "kata-oc" and/or "worker". We can do the same for a newly introduced isInstallationInProgress variable whose semantics are basically the same as InstallationStatus.IsInProgress', just without persisting it in the KataConfig resource. Signed-off-by: Pavel Mores <pmores@redhat.com>
Looks good with kata/peer-pods workloads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks!
Signed-off-by: Pavel Mores <pmores@redhat.com>
The "is in progress" check at the top of updateStatus() is now simplified, kata-enabled node count can now be assigned directly to Status.KataNodes.NodeCount. Updating the legacy structures is just turned off and proper error handling is added to the putNodeOnStatusList() call as this function now bears the brunt of updateStatus(). Signed-off-by: Pavel Mores <pmores@redhat.com>
They are not used by anything anymore. Signed-off-by: Pavel Mores <pmores@redhat.com>
8e75a72
to
6a0a6de
Compare
Force-push to add a commit switching envtest to the new KataConfig.status (thanks @tbuskey for pointing this out) |
} | ||
e := r.putNodeOnStatusList(&node) | ||
if e != nil { | ||
err = e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of multiple errors, we're keeping only the last one right?
Can we at least log the error here before we potentially lose it?
Or as an alternative: if we meet an error, is it useful to continue the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a comment at the top of the functions makes that point.
As for error handling, yes, my idea was that it's still useful to continue. The way I see it, if there's an error our status will always end up imprecise for the moment but it would be even more imprecise if we broke off on the first error. That said, I'm not sure at the moment how rigorous we should be here, I'm guessing some practical experience will help. IIRC the legacy status update behaved the same way and I don't remember any practical problems with that.
Errors that can come up here will likely tend to be transient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed the comment at the top.
Thanks for the answer. Makes sense
@@ -85,9 +70,9 @@ type KataConfigStatus struct { | |||
// +kubebuilder:object:root=true | |||
// +kubebuilder:subresource:status | |||
// +kubebuilder:resource:path=kataconfigs,scope=Cluster | |||
// +kubebuilder:printcolumn:name="InProgress",type=string,JSONPath=".status.installationStatus.IsInProgress",description="Status of Kata runtime installation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not mark these fields as deprecated and leave them in? We don't know what tools people have written that rely on these fields, i.e. the .status section to me is part of the API.
We could even add logging for the case that any of the fields are non-empty (unlikely that another software has the rights to update it, but possible). After a while we can bump the version and remove the fields. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's generally the correct way to handle this.
Admittedly I'd still tend to remove the fields now. I suspect there's little automation in use that relies on the old status. The combined status with new and old parts is fairly confusing. Migration from old to new status is fairly easy and mechanical (migrations so far were apparently easy and without problems), the migration guide is included in the #330 cover letter and it's been a couple of weeks since the need to migrate has been announced.
If folks feel more comfortable taking the slow path I wouldn't block it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can change our automation to check both.
-o=jsonpath='{.status.conditions[?(@.type=="InProgress")].status}'
is the new one
-o=jsonpath={.status.installationStatus.IsInProgress}{.status.unInstallationStatus.inProgress.status}
is the current one
As long as ToLower()
!= false, I know things are not in progress
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbuskey I think that checking the new one should be enough by now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the delay. I think we don't know who's using it and in what way (we were surprised a few times already. We may break users that we are not even aware of. To be honest I think we should take the slow path. What do others think? @bpradipt @snir911 @cpmeadors ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm considering these aspects when providing my view on this PR.
We are adding/improving new features - peer-pods, CoCo which will increase the adoption and thereby make it relativey more harder to make this sort of a change.
Further, this change will only be available in newer versions, and is only affecting the install/uninstall of the operator with no impact to the workloads. Aso it'll be documented so that users relying on earlier status field can make suitable changes to use the newer fields.
Given the above aspects I'm in favour of getting this merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmores I don't have any comments on the code.
lgtm on the code front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a more general note, although it seems in this case that we can opt to have both the old and new stuff running alongside, this doesn't seem to be true in general. Consider the upcoming change of KataConfig.status.runtimeClass
to an array to support runtimeclasses other than plain kata. How would we go about it, add the array as runtimeClassNew
first and then rename it back to runtimeClass
? What about users who'd ignore the runtimeClassNew
addition and continued to use the single-value runtimeClass
?
I'm afraid there's no avoiding breaking users who don't pay attention, unfortunately. For users who do, they've had a month now to adapt to the change.
Also, I think we should be pragmatic. As @bpradipt says, changes will be harder when we have a lot of users. If we know already we need to change something, we should consider doing it as fast as possible since it will only get harder in the future I'm afraid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QE automation has already been changed to look for .status.conditions and will use it instead if it exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks @pmores !
I was able to unsubscribe & resubscribe while running a kata workload
The kataNodes statuses should be 3:
Uninstalling kataconfig did not update the node counts, but the |
When running with
Deleting kataconfig puts the |
This PR removes parts of
KataConfig.status
obsoleted by #330 (and #329 before it),totalNodesCount
,installationStatus
,unInstallationStatus
(and the stub ofupgradeStatus
).