New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge to main for 1.5 #357
Conversation
This fixes an omission in PR openshift#300 where the special case of KataConfig deletion while there are no kata nodes on a cluster wasn't handled properly and uninstallation got stuck. This happened because the uninstallation flow assumed incorrectly that it will always be necessary for the MCO to reconciliate which is not the case. Installation flow (PR openshift#291) got this right and while the idea of PR openshift#300 was basically to make uninstallation flow analogous to installation, this aspect was omitted by mistake. Signed-off-by: Pavel Mores <pmores@redhat.com>
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
Merge main into devel after v1.4.0 release
image-job: fix payload url
The DaemonSet launched by the controller doesn't put tolerations on the monitor pods. This prevents the monitor pods to run on tainted nodes and break metrics for sandboxed containers. The monitor pods must run on every node where kata is deployed, no matter any taint the user might have set. Enforce a toleration that matches all possible taints. Fixes https://issues.redhat.com/browse/KATA-2121 Signed-off-by: Greg Kurz <groug@kaod.org>
This is a simpler and more transparent alternative to getMcp(). It will be useful in a subsequent commit. Signed-off-by: Pavel Mores <pmores@redhat.com>
If the MCO considers a Node Done, we check if its current and target MachineConfigs are the same. If they are then the Node is actually done and we append it either to the installation completed list or uninstallation completed list based on whether installation or uninstallation is in progress and whether the Node is a member of "worker" or "kata-oc". If they aren't it means that the Node is waiting for its MC to be changed as part of this ongoing installation or uninstallation. We don't append such Nodes to any list since there doesn't seem to be a good place for them in the current KataConfig.status structure. Signed-off-by: Pavel Mores <pmores@redhat.com>
Working and Degraded Nodes are handled structurally similarly so they are handled by the same function. Working Nodes are put on an InProgress list, Degraded Nodes are put on a Failed list. If uninstallation is in progress Nodes go to uninstallation lists. If installation is in progress they go to installation lists on converged clusters or if they are labeled for kata, otherwise they go to uninstallation lists (this handles the case when kata is removed from a Node as part of kata installation modification on a cluster). Signed-off-by: Pavel Mores <pmores@redhat.com>
This puts together functionality added by the previous commits. The approach to KataConfig.status updating introduced here fundamentally differs from the existing updateStatus() function. The existing function only handled kata installation to and uninstallation from a cluster with no support for modifications (adding/removing kata-enabled Nodes) in the meantime. Thus it made sense to split status updating at the top level into two largely independent branches, one for installation and the other for uninstallation, each of which then iterated over Nodes and updated status based on the Nodes' MCO 'state' and other information. This model however breaks when kata installation modifications are introduced since they mean that kata may actually be uninstalled from Nodes as part of an operation that is nominally an installation. The approach introduced here kind of turns the existing one inside out. It iterates over *Nodes* at the top level and only then it figures out for each Node individually whether kata is being installed on it, uninstalled from it, or if the Node is left alone by the ongoing operation. Signed-off-by: Pavel Mores <pmores@redhat.com>
Any errors from the new status updating are just logged, status updating thus cannot mess up installation/uninstallation flow anymore. This is appropriate as status updating now does just that, update status, so any failures should not divert the main flow. This also removes the last opportunity for races like the recent uninstallation wedging due to updateStatus() failing to retrieve the "kata-oc" MCP which has already been deleted the uninstallation flow, and thus forcing the uninstallation flow to return early with an error and never to be able to finish. Signed-off-by: Pavel Mores <pmores@redhat.com>
With removal of the legacy updateStatus(), updateStatusNew() can be renamed to the original name. With removal of the legacy updateStatus() infrastructure getMcp() is left with no callers and can be removed. With removal of getMcp() the last remaining caller of getMcpNameIfMcpExists() is removed so this function can finally be removed too. This concludes a refactor started in commit 3adfbd5. Signed-off-by: Pavel Mores <pmores@redhat.com>
All predicates take the same arguments describing a node and figure out if kata on the node is Installed, Installing, WaitingToInstall, FailedToInstall, NotInstalled, Uninstalling, WaitingToUninstall or FailedToUninstall. Signed-off-by: Pavel Mores <pmores@redhat.com>
This is a preparation for the next commit where this functionality will be needed. Signed-off-by: Pavel Mores <pmores@redhat.com>
The first part computes inputs for the kata status predicates, the rest runs the predicates. For now, the result of classification is just logged. Signed-off-by: Pavel Mores <pmores@redhat.com>
A list of nodes is added to KataConfig.status for each of Installed, Installing, WaitingToInstall, FailedToInstall, Uninstalling, WaitingToInstall and FailedToInstall kata installation statuses. There's no list for NotInstalled as that's left implicit (the NotInstalled status is still logged though). putNodeOnStatusList() is also changed to actually fill the lists. Signed-off-by: Pavel Mores <pmores@redhat.com>
This is basically a counterpart of putNodeOnStatusList(), to be called before the lists can start being populated by putNodeOnStatusList(). Signed-off-by: Pavel Mores <pmores@redhat.com>
The new functionality runs alongside of the existing status reporting for the time being. Signed-off-by: Pavel Mores <pmores@redhat.com>
Despite 1.4.0 was released about a month ago, there are still some mentions of 1.3.x in the tree : $ git grep -e '1\.3\.[0123]' -- ':!go.*' config/manager/kustomization.yaml: newTag: 1.3.1 config/samples/deploy.yaml: image: quay.io/openshift_sandboxed_containers/openshift-sandboxed-containers-operator-catalog:v1.3.0 config/samples/deploy.yaml: startingCSV: sandboxed-containers-operator.v1.3.0 This is essentially because we don't do automated upstream builds. Let's bump the versions to latest public release : 1.4.0. Signed-off-by: Greg Kurz <groug@kaod.org>
The data structures and semantics largely correspond to recommendations in k8s api-conventions document. The handling added is just what will be needed, with no attempt to implement a complete set of conventional list operations (i.e. no remove). Signed-off-by: Pavel Mores <pmores@redhat.com>
A getter for InProgress Condition status is added along with a bunch of setters to ease setting the Condition to various states. Signed-off-by: Pavel Mores <pmores@redhat.com>
In uninstallation the InProgress Condition handling corresponds to the legacy UnInstallationStatus.InProgress.IsInProgress boolean handling pretty much one-to-one. In installation the handling diverges slightly since InProgress offers a bit more granularity, making a difference between an initial installation and a later update, so the handling has to be a tiny bit different. Resetting however again corresponds to the legacy InstallationStatus.IsInProgress boolean handling. The Failed state is detected and reported from KataConfig.status updating code as before. Signed-off-by: Pavel Mores <pmores@redhat.com>
This effectively moves the legacy KataConfig.status.totalNodesCount field to a more suitable place created by addition of KataNodesStatus. Signed-off-by: Pavel Mores <pmores@redhat.com>
Bump outdated versions of OSC components
This reverts commit 489e6a0.
The config file name is configuration-remote.toml. Hence renaming kata-remote.conf file to configuration.toml. Also the butane config used to create machineconfig yaml to apply the Kata configuration is added. Example butane run to create the machineconfig ``` butane -d ./local mc-remote-butane.conf > mc-40-kata-remote-config.yaml ``` Fixes: #KATA-2268 Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
This updates the machineconfig to apply the new Kata configuration which enables the following annotations - default_vcpus - default_memory - machine_type Fixes: #KATA-2268 Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Update machineconfig to enable required annotations for flexible instance types
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
and update peer-pods-cm ConfigMap with the created image value. also sets BOOT_FIPS=true if controller is running on FIPS enabled system Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
caa repo as it's too fragile Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
podvm: run image jobs from the controller automatically
…ogress When KataConfig deletion processing is allowed to start while a previous installation or update is still running both processes will interfere with each other, resulting in a variety of possible cluster states, most of them undesirable (most often, uninstallation blocks indefinitely). This commit alleviates the situation by not allowing uninstallation to start while installation or an update is still underway. Once the current operation finishes uninstallation should start immediately. Signed-off-by: Pavel Mores <pmores@redhat.com>
We usually fetching cloud-provider name from either the peer-pods CM CLOUD_PROVIDER value or from the infra object, as cloud provider is not expected to be changed at runtime this commit moves its fetching into the image-generator initialization. Additionally fetching the cloud-provider solely from the infra Object to avoid relying on user input (from peer-pods CM) Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
in peer-pods ConfigMap or Secret it's assumed essential values will not be modified once set and kataConfig was created Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
peer-pods: validate CM and Secret are set
As a preparatory step for the upcoming release, this bumps the version to 1.5.0 in Makefile, the CSV and images. Note that the updated images aren't available at this stage. They will be added later. Until then the image links aren't valid. Signed-off-by: Greg Kurz <groug@kaod.org>
Bump OSC to 1.5.0
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @gkurz !
/test |
@gkurz: The
The following commands are available to trigger optional jobs:
Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/override ci/prow/check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @gkurz !
@gkurz: Overrode contexts on behalf of gkurz: ci/prow/check In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@gkurz: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Executed the git diff and tests, lgtm.
- Description of the problem which is fixed/What is the use case
This is merging the devel branch into main in order to prepare the v1.5.0 release.
- What I did
merge-to-main-for-1.5
merge-to-main-for-1.5
(see below for justification)merge-to-main-for-1.5
- How to verify it
git diff ${clone_of_this_PR} devel
should be empty.- Description for the changelog
Goal of this PR is to merge
devel
intomain
. This required some care as plaingit merge devel
was failing to start with. This was the consequence ofgit merge
not being able to correlate cherry-picks in #339 with the original commits ondevel
. This got workaround by reverting #339 (backport of #327) and proceeding with the merge, which brought back the original changes from #327.