OCPNODE-4494: Testcase to test runc Upgrade case#31266
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@asahay19: This pull request references OCPNODE-4494 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
WalkthroughAdds a disruptive Ginkgo e2e that verifies MCO blocks RHCOS 9→10 OSImageStream moves when CRI-O default runtime is Changesrunc RHCOS 10 upgrade guard test
Sequence Diagram(s)sequenceDiagram
participant TestRunner
participant API_Server
participant MachineConfigOperator
participant Node
participant ClusterVersion
TestRunner->>API_Server: create MachineConfigPool (rhel-9) & MachineConfig (runc drop-in)
API_Server->>MachineConfigOperator: notify new MCP/MC
MachineConfigOperator->>Node: render and apply ignition + drop-in (runc)
Node-->>MachineConfigOperator: node reports installed config and OS version (rhel-9)
TestRunner->>API_Server: patch MCP OSImageStream -> rhel-10
API_Server->>MachineConfigOperator: new desired OSImageStream (rhel-10)
MachineConfigOperator->>MachineConfigOperator: detect runc + rhel-10 -> set Degraded & RenderDegraded with message
MachineConfigOperator->>ClusterVersion: no change (CV remains Available=true)
TestRunner->>API_Server: patch MCP OSImageStream -> rhel-9 (recovery)
MachineConfigOperator->>Node: reconcile to rhel-9 stream, clear degraded
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: asahay19 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/node/runc_upgrade_cases.go`:
- Around line 59-63: The test fails on SingleReplica (SNO) clusters because it
requires a “pure worker”; update the preflight topology check that calls
exutil.GetControlPlaneTopology (variable controlPlaneTopology) to also detect
configv1.SingleReplicaTopologyMode and call g.Skip("Skipping on single-replica
(SNO) cluster") before attempting to select a pure worker. Make the same change
in the other preflight/topology-check sites that use
exutil.GetControlPlaneTopology or perform pure-worker selection (the other
occurrences referenced in the review) so the test is skipped early on
SingleReplica clusters.
- Around line 99-107: The AfterEach currently ignores all error returns from
cleanup calls (removeNodeLabel, waitForMCPMachineCount, deleteMachineConfig,
deleteMachineConfigPool) which can leave test state dirty; update AfterEach to
capture each error into a variable and assert failure instead of swallowing it
(e.g., err := removeNodeLabel(...); Expect(err).NotTo(HaveOccurred())) for each
call that uses nodeName, oc, mcClient, runcRHCOS10GuardPool, and runcGuardMCName
(and similarly for
waitForMCPMachineCount/deleteMachineConfig/deleteMachineConfigPool) so any
cleanup failure fails the test and surfaces the underlying error.
In `@test/extended/node/runc_upgrade_cases.md`:
- Around line 51-53: Add a language tag to the fenced code block containing the
test declaration g.It("blocks upgrade of RHCOS 9 to 10 when default runtime is
runc") — replace the opening triple backticks with a language-tagged fence
(e.g., ```go) so the block reads as a Go snippet and satisfies MD040.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 02321dc4-fc08-4828-8271-c00e13720ac4
📒 Files selected for processing (2)
test/extended/node/runc_upgrade_cases.gotest/extended/node/runc_upgrade_cases.md
|
@bitoku: This PR was included in a payload test run from openshift/machine-config-operator#5891 |
|
@bitoku: This PR was included in a payload test run from openshift/machine-config-operator#5891
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/de265370-633b-11f1-9c47-870e6f6dcba6-0 |
|
@bitoku: This PR was included in a payload test run from openshift/machine-config-operator#5891
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/67ece2a0-634f-11f1-8aab-94564898c66c-0 |
|
@bitoku: This PR was included in a payload test run from openshift/machine-config-operator#5891
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b9a6bab0-63dd-11f1-828a-621d5ba8e722-0 |
|
@bitoku: This PR was included in a payload test run from openshift/machine-config-operator#5891
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c1104ff0-63dd-11f1-9f15-5905ea882eb2-0 |
07d64fb to
2db0de0
Compare
| o.Expect(err).NotTo(o.HaveOccurred(), "need a worker node for the custom pool") | ||
|
|
||
| g.By("Creating custom MachineConfigPool pinned to rhel-9 and runc MachineConfig") | ||
| o.Expect(createRuncGuardMachineConfig(ctx, mcClient)).To(o.Succeed()) |
There was a problem hiding this comment.
I want test cases to align with the real usecases.
Drop-in MC is not supported in openshift.
What we will see is either 1. CRC or 2. MC that was created during the past update.
I want the test case name to explicitly describe which.
If you test 1, it should have CRC with runc config.
If you test 2, I want it to use the same MC name and explicitly say that it's that MC.
There was a problem hiding this comment.
As of now ,I will take test 1 should have CRC with runc config.
| g.By("Verifying ClusterVersion remains stable while pool guard is active") | ||
| o.Expect(verifyClusterVersionStable(ctx, oc)).To(o.Succeed()) |
There was a problem hiding this comment.
This should fail. See openshift/machine-config-operator#5891 (comment)
| g.By("Verifying node remains on RHCOS 9 with runc after guard blocks rollout") | ||
| rhelMajor, err = nodeRHELMajorVersion(oc, nodeName) | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
| o.Expect(rhelMajor).To(o.Equal("9")) | ||
| o.Expect(nodeUsesRuncRuntime(oc, nodeName)).To(o.BeTrue()) |
There was a problem hiding this comment.
We also want to check the node readiness, and not being rolled out by checking mco labels.
| func requireOpenShift5OrNewer(ctx context.Context, oc *exutil.CLI) { | ||
| cv, err := oc.AdminConfigClient().ConfigV1().ClusterVersions().Get(ctx, "version", metav1.GetOptions{}) | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
|
||
| version := cv.Status.Desired.Version | ||
| major, err := strconv.Atoi(strings.SplitN(version, ".", 2)[0]) | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
| if major < 5 { | ||
| g.Skip(fmt.Sprintf("cluster version %q is below OCP 5.0; runc RHCOS10 guard applies to 5.0+", version)) | ||
| } | ||
| framework.Logf("Cluster version %q satisfies OCP 5.0+ requirement", version) | ||
| } |
There was a problem hiding this comment.
Is this required? I think checking OSImageStream FG is enough.
| _, err := mcClient.MachineconfigurationV1().OSImageStreams().Get(ctx, "cluster", metav1.GetOptions{}) | ||
| if apierrors.IsNotFound(err) { | ||
| g.Skip("OSImageStream API is not available; enable TechPreviewNoUpgrade / OSStreams on the cluster") | ||
| } | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
|
||
| osi, err := mcClient.MachineconfigurationV1().OSImageStreams().Get(ctx, "cluster", metav1.GetOptions{}) | ||
| o.Expect(err).NotTo(o.HaveOccurred()) |
There was a problem hiding this comment.
This checks the same cluster OSImageStream twice.
|
@bitoku: This PR was included in a payload test run from openshift/machine-config-operator#5891
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8921abf0-6407-11f1-9686-c8cdaefe3b6e-0 |
|
|
||
| // verifyClusterVersionUnaffectedByIsolatedPoolGuard checks that a render failure on an isolated | ||
| // custom MCP does not degrade the cluster-wide machine-config operator or ClusterVersion. | ||
| // The guard is pool-scoped; worker/master pools remain healthy. |
There was a problem hiding this comment.
Did you confirm it? If so we may want to do some additional propagation.
This PR adds an end-to-end test that validates MCO's guard blocking a RHCOS 9→10 OS stream transition when a MachineConfigPool has runc configured as the default container runtime. Also this PR contains runc_upgrade_cases.md file which contains the Test Plan with all the required meta data.
What the test does
MCO change PR: openshift/machine-config-operator#5891
Locally tested with my custom mco image against the above mco PR and it got passed:
./openshift-tests run-test "[Suite:openshift/disruptive-longrunning][sig-node][Serial][Disruptive] runc RHCOS 10 upgrade guard blocks upgrade of RHCOS 9 to 10 when default runtime is runc"Summary by CodeRabbit
Summary by CodeRabbit
Release Notes
Tests
Documentation