Merge to main for 1.5 #357

gkurz · 2023-11-14T12:29:20Z

- Description of the problem which is fixed/What is the use case

This is merging the devel branch into main in order to prepare the v1.5.0 release.

- What I did

Cloned the main branch locally as merge-to-main-for-1.5
Reverted Fix kataconfig status handling to support installation updates (port to main) #339 from merge-to-main-for-1.5 (see below for justification)
Merge the devel branch into merge-to-main-for-1.5

- How to verify it

git diff ${clone_of_this_PR} devel should be empty.
Run QE tests

- Description for the changelog

Goal of this PR is to merge devel into main. This required some care as plain git merge devel was failing to start with. This was the consequence of git merge not being able to correlate cherry-picks in #339 with the original commits on devel. This got workaround by reverting #339 (backport of #327) and proceeding with the merge, which brought back the original changes from #327.

This fixes an omission in PR openshift#300 where the special case of KataConfig deletion while there are no kata nodes on a cluster wasn't handled properly and uninstallation got stuck. This happened because the uninstallation flow assumed incorrectly that it will always be necessary for the MCO to reconciliate which is not the case. Installation flow (PR openshift#291) got this right and while the idea of PR openshift#300 was basically to make uninstallation flow analogous to installation, this aspect was omitted by mistake. Signed-off-by: Pavel Mores <pmores@redhat.com>

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Merge main into devel after v1.4.0 release

image-job: fix payload url

The DaemonSet launched by the controller doesn't put tolerations on the monitor pods. This prevents the monitor pods to run on tainted nodes and break metrics for sandboxed containers. The monitor pods must run on every node where kata is deployed, no matter any taint the user might have set. Enforce a toleration that matches all possible taints. Fixes https://issues.redhat.com/browse/KATA-2121 Signed-off-by: Greg Kurz <groug@kaod.org>

This is a simpler and more transparent alternative to getMcp(). It will be useful in a subsequent commit. Signed-off-by: Pavel Mores <pmores@redhat.com>

If the MCO considers a Node Done, we check if its current and target MachineConfigs are the same. If they are then the Node is actually done and we append it either to the installation completed list or uninstallation completed list based on whether installation or uninstallation is in progress and whether the Node is a member of "worker" or "kata-oc". If they aren't it means that the Node is waiting for its MC to be changed as part of this ongoing installation or uninstallation. We don't append such Nodes to any list since there doesn't seem to be a good place for them in the current KataConfig.status structure. Signed-off-by: Pavel Mores <pmores@redhat.com>

Working and Degraded Nodes are handled structurally similarly so they are handled by the same function. Working Nodes are put on an InProgress list, Degraded Nodes are put on a Failed list. If uninstallation is in progress Nodes go to uninstallation lists. If installation is in progress they go to installation lists on converged clusters or if they are labeled for kata, otherwise they go to uninstallation lists (this handles the case when kata is removed from a Node as part of kata installation modification on a cluster). Signed-off-by: Pavel Mores <pmores@redhat.com>

This puts together functionality added by the previous commits. The approach to KataConfig.status updating introduced here fundamentally differs from the existing updateStatus() function. The existing function only handled kata installation to and uninstallation from a cluster with no support for modifications (adding/removing kata-enabled Nodes) in the meantime. Thus it made sense to split status updating at the top level into two largely independent branches, one for installation and the other for uninstallation, each of which then iterated over Nodes and updated status based on the Nodes' MCO 'state' and other information. This model however breaks when kata installation modifications are introduced since they mean that kata may actually be uninstalled from Nodes as part of an operation that is nominally an installation. The approach introduced here kind of turns the existing one inside out. It iterates over *Nodes* at the top level and only then it figures out for each Node individually whether kata is being installed on it, uninstalled from it, or if the Node is left alone by the ongoing operation. Signed-off-by: Pavel Mores <pmores@redhat.com>

Any errors from the new status updating are just logged, status updating thus cannot mess up installation/uninstallation flow anymore. This is appropriate as status updating now does just that, update status, so any failures should not divert the main flow. This also removes the last opportunity for races like the recent uninstallation wedging due to updateStatus() failing to retrieve the "kata-oc" MCP which has already been deleted the uninstallation flow, and thus forcing the uninstallation flow to return early with an error and never to be able to finish. Signed-off-by: Pavel Mores <pmores@redhat.com>

With removal of the legacy updateStatus(), updateStatusNew() can be renamed to the original name. With removal of the legacy updateStatus() infrastructure getMcp() is left with no callers and can be removed. With removal of getMcp() the last remaining caller of getMcpNameIfMcpExists() is removed so this function can finally be removed too. This concludes a refactor started in commit 3adfbd5. Signed-off-by: Pavel Mores <pmores@redhat.com>

All predicates take the same arguments describing a node and figure out if kata on the node is Installed, Installing, WaitingToInstall, FailedToInstall, NotInstalled, Uninstalling, WaitingToUninstall or FailedToUninstall. Signed-off-by: Pavel Mores <pmores@redhat.com>

This is a preparation for the next commit where this functionality will be needed. Signed-off-by: Pavel Mores <pmores@redhat.com>

The first part computes inputs for the kata status predicates, the rest runs the predicates. For now, the result of classification is just logged. Signed-off-by: Pavel Mores <pmores@redhat.com>

A list of nodes is added to KataConfig.status for each of Installed, Installing, WaitingToInstall, FailedToInstall, Uninstalling, WaitingToInstall and FailedToInstall kata installation statuses. There's no list for NotInstalled as that's left implicit (the NotInstalled status is still logged though). putNodeOnStatusList() is also changed to actually fill the lists. Signed-off-by: Pavel Mores <pmores@redhat.com>

This is basically a counterpart of putNodeOnStatusList(), to be called before the lists can start being populated by putNodeOnStatusList(). Signed-off-by: Pavel Mores <pmores@redhat.com>

The new functionality runs alongside of the existing status reporting for the time being. Signed-off-by: Pavel Mores <pmores@redhat.com>

Despite 1.4.0 was released about a month ago, there are still some mentions of 1.3.x in the tree : $ git grep -e '1\.3\.[0123]' -- ':!go.*' config/manager/kustomization.yaml: newTag: 1.3.1 config/samples/deploy.yaml: image: quay.io/openshift_sandboxed_containers/openshift-sandboxed-containers-operator-catalog:v1.3.0 config/samples/deploy.yaml: startingCSV: sandboxed-containers-operator.v1.3.0 This is essentially because we don't do automated upstream builds. Let's bump the versions to latest public release : 1.4.0. Signed-off-by: Greg Kurz <groug@kaod.org>

The data structures and semantics largely correspond to recommendations in k8s api-conventions document. The handling added is just what will be needed, with no attempt to implement a complete set of conventional list operations (i.e. no remove). Signed-off-by: Pavel Mores <pmores@redhat.com>

A getter for InProgress Condition status is added along with a bunch of setters to ease setting the Condition to various states. Signed-off-by: Pavel Mores <pmores@redhat.com>

In uninstallation the InProgress Condition handling corresponds to the legacy UnInstallationStatus.InProgress.IsInProgress boolean handling pretty much one-to-one. In installation the handling diverges slightly since InProgress offers a bit more granularity, making a difference between an initial installation and a later update, so the handling has to be a tiny bit different. Resetting however again corresponds to the legacy InstallationStatus.IsInProgress boolean handling. The Failed state is detected and reported from KataConfig.status updating code as before. Signed-off-by: Pavel Mores <pmores@redhat.com>

This effectively moves the legacy KataConfig.status.totalNodesCount field to a more suitable place created by addition of KataNodesStatus. Signed-off-by: Pavel Mores <pmores@redhat.com>

Bump outdated versions of OSC components

This reverts commit 489e6a0.

The config file name is configuration-remote.toml. Hence renaming kata-remote.conf file to configuration.toml. Also the butane config used to create machineconfig yaml to apply the Kata configuration is added. Example butane run to create the machineconfig ``` butane -d ./local mc-remote-butane.conf > mc-40-kata-remote-config.yaml ``` Fixes: #KATA-2268 Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>

This updates the machineconfig to apply the new Kata configuration which enables the following annotations - default_vcpus - default_memory - machine_type Fixes: #KATA-2268 Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>

Update machineconfig to enable required annotations for flexible instance types

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

and update peer-pods-cm ConfigMap with the created image value. also sets BOOT_FIPS=true if controller is running on FIPS enabled system Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

caa repo as it's too fragile Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

podvm: run image jobs from the controller automatically

…ogress When KataConfig deletion processing is allowed to start while a previous installation or update is still running both processes will interfere with each other, resulting in a variety of possible cluster states, most of them undesirable (most often, uninstallation blocks indefinitely). This commit alleviates the situation by not allowing uninstallation to start while installation or an update is still underway. Once the current operation finishes uninstallation should start immediately. Signed-off-by: Pavel Mores <pmores@redhat.com>

We usually fetching cloud-provider name from either the peer-pods CM CLOUD_PROVIDER value or from the infra object, as cloud provider is not expected to be changed at runtime this commit moves its fetching into the image-generator initialization. Additionally fetching the cloud-provider solely from the infra Object to avoid relying on user input (from peer-pods CM) Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

in peer-pods ConfigMap or Secret it's assumed essential values will not be modified once set and kataConfig was created Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

peer-pods: validate CM and Secret are set

As a preparatory step for the upcoming release, this bumps the version to 1.5.0 in Makefile, the CSV and images. Note that the updated images aren't available at this stage. They will be added later. Until then the image links aren't valid. Signed-off-by: Greg Kurz <groug@kaod.org>

Bump OSC to 1.5.0

This reverts commit 5617c1b, reversing changes made to 1a660d8. Signed-off-by: Greg Kurz <groug@kaod.org>

openshift-ci · 2023-11-14T12:29:43Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

pmores

lgtm thanks @gkurz !

gkurz · 2023-11-14T12:55:02Z

/test

openshift-ci · 2023-11-14T12:56:00Z

@gkurz: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

/test check
/test ci-index-openshift-sandboxed-containers-operator-bundle
/test images

The following commands are available to trigger optional jobs:

/test sandboxed-containers-operator-e2e

Use /test all to run all jobs.

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

gkurz · 2023-11-14T14:05:45Z

make test is broken as explained in #310 (comment).

/override ci/prow/check

littlejawa

lgtm
Thanks @gkurz !

openshift-ci · 2023-11-14T14:09:24Z

@gkurz: Overrode contexts on behalf of gkurz: ci/prow/check

In response to this:

make test is broken as explained in #310 (comment).

/override ci/prow/check

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2023-11-14T14:42:05Z

@gkurz: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/sandboxed-containers-operator-e2e	`22c683d`	link	false	`/test sandboxed-containers-operator-e2e`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

beraldoleal

Executed the git diff and tests, lgtm.

pmores and others added 30 commits May 17, 2023 13:00

image-job: fix payload url

3b8a810

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Merge pull request openshift#325 from openshift/main

576c225

Merge main into devel after v1.4.0 release

Merge pull request openshift#324 from snir911/fix-reg-devel

bd0db5a

image-job: fix payload url

add function to just get an MCP, without hidden assumptions

ed94f2f

This is a simpler and more transparent alternative to getMcp(). It will be useful in a subsequent commit. Signed-off-by: Pavel Mores <pmores@redhat.com>

factor getting kata coreos extension name out to a function

c0dde6b

This is a preparation for the next commit where this functionality will be needed. Signed-off-by: Pavel Mores <pmores@redhat.com>

add function to take Node and determine its kata status

65ba7f2

The first part computes inputs for the kata status predicates, the rest runs the predicates. For now, the result of classification is just logged. Signed-off-by: Pavel Mores <pmores@redhat.com>

add function to clear node kata installation status lists

2908911

This is basically a counterpart of putNodeOnStatusList(), to be called before the lists can start being populated by putNodeOnStatusList(). Signed-off-by: Pavel Mores <pmores@redhat.com>

plug the new functionality in to updateStatus()

93c5d1a

The new functionality runs alongside of the existing status reporting for the time being. Signed-off-by: Pavel Mores <pmores@redhat.com>

implement InProgress Condition interface for general controller code

7eebde2

A getter for InProgress Condition status is added along with a bunch of setters to ease setting the Condition to various states. Signed-off-by: Pavel Mores <pmores@redhat.com>

add NodeCount member into KataNodesStatus

9d3cc99

This effectively moves the legacy KataConfig.status.totalNodesCount field to a more suitable place created by addition of KataNodesStatus. Signed-off-by: Pavel Mores <pmores@redhat.com>

Merge pull request openshift#332 from gkurz/fix-old-versions

fc6dc21

Bump outdated versions of OSC components

Add snir911 to reviewers and approvers

489e6a0

Revert "Add snir911 to reviewers and approvers"

3517426

This reverts commit 489e6a0.

Add snir911 to reviewers and approvers

ecc2af2

Remove unncessary spaces from the example config

ca6ab21

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>

Merge pull request openshift#335 from bpradipt/KATA-2268

8a491a9

Update machineconfig to enable required annotations for flexible instance types

snir911 and others added 14 commits October 17, 2023 14:43

peerpods: add AWS image creation & deletion jobs

bc3f194

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

peerpods: add Azure image creation & deletion jobs

ea94505

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

peerpods: add logic to execute image creation jobs

ee96b0e

and update peer-pods-cm ConfigMap with the created image value. also sets BOOT_FIPS=true if controller is running on FIPS enabled system Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

peerpods: avoid caa latest container from upstream

68ea4be

caa repo as it's too fragile Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Merge pull request openshift#343 from snir911/embed_jobs3

f504623

podvm: run image jobs from the controller automatically

peer-pods: validate essential peer-pods values are set

e45dd98

in peer-pods ConfigMap or Secret it's assumed essential values will not be modified once set and kataConfig was created Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

image-generation: fix azure login

7de0bf7

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Merge pull request openshift#353 from snir911/2557-devel-fix

7c3e212

peer-pods: validate CM and Secret are set

Merge pull request openshift#355 from gkurz/bump-to-1.5.0

1505215

Bump OSC to 1.5.0

Revert "Merge pull request openshift#339 from gkurz/fix-2240"

85a2e64

This reverts commit 5617c1b, reversing changes made to 1a660d8. Signed-off-by: Greg Kurz <groug@kaod.org>

Merge branch 'devel' into merge-to-main-for-1.5

22c683d

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 14, 2023

pmores approved these changes Nov 14, 2023

View reviewed changes

gkurz marked this pull request as ready for review November 14, 2023 12:54

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 14, 2023

openshift-ci bot requested review from littlejawa and pmores November 14, 2023 12:59

littlejawa approved these changes Nov 14, 2023

View reviewed changes

beraldoleal approved these changes Nov 14, 2023

View reviewed changes

cpmeadors approved these changes Nov 14, 2023

View reviewed changes

gkurz merged commit 67adbe1 into openshift:main Nov 14, 2023
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge to main for 1.5 #357

Merge to main for 1.5 #357

gkurz commented Nov 14, 2023

openshift-ci bot commented Nov 14, 2023

pmores left a comment

gkurz commented Nov 14, 2023

openshift-ci bot commented Nov 14, 2023

gkurz commented Nov 14, 2023

littlejawa left a comment

openshift-ci bot commented Nov 14, 2023

openshift-ci bot commented Nov 14, 2023

beraldoleal left a comment

Merge to main for 1.5 #357

Merge to main for 1.5 #357

Conversation

gkurz commented Nov 14, 2023

openshift-ci bot commented Nov 14, 2023

pmores left a comment

Choose a reason for hiding this comment

gkurz commented Nov 14, 2023

openshift-ci bot commented Nov 14, 2023

gkurz commented Nov 14, 2023

littlejawa left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Nov 14, 2023

openshift-ci bot commented Nov 14, 2023

beraldoleal left a comment

Choose a reason for hiding this comment