Fix test: e2e-aws-ovn-upgrade-paused by hongkailiu · Pull Request #77842 · openshift/release

hongkailiu · 2026-04-15T13:51:25Z

The binary of the target release 4.y is used in each upgrade in the
e2e-aws-ovn-upgrade-paused e2e test. This limitation of the test
configuration may cause unexpected failure of the job, e.g.,

The fix of a bug is shipped in 4.y but not backported to 4.y-1.
The exception for the bug is removed on 4.y. We could wait longer to
do so but it is unintuitive in my opinion.

Then, the 4.y test with upgrade from 4.y-2 might fail because the fix is not
available in 4.y-1.

Ideally, we should use the 4.y-1 binary for the first one. Before that is
implemented, we use this as a workaround to fix the test nightly-4.22-upgrade-from-stable-4.20.

Many monitors have been disabled in the first upgrade to avoid failures.
It should not impact the product quality because there are tests in release
4.y-1 that do only one upgrade from 4.y-2 to 4.y-1.

Summary by CodeRabbit

Chores
- Added a configurable TEST_ARGS variable to supply additional disable flags for selected monitors/analyzers during paused upgrade test runs.
- Updated paused-upgrade test invocation to propagate this configured argument to the initial control-plane upgrade step.
- No other test sequencing, observer enablement, workflow control, or upgrade steps were modified.

hongkailiu · 2026-04-15T13:52:02Z

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-15T13:52:05Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

coderabbitai · 2026-04-15T13:52:07Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds a TEST_ARGS env variable to the e2e upgrade-paused CI job and forwards it as an extra positional argument to the first openshift-tests run-upgrade invocation inside the upgrade_paused() shell function.

Changes

Cohort / File(s)	Summary
CI Test Config `ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml`	Add `env.TEST_ARGS` containing a `--disable-monitor=...` comma-separated list for the `e2e-aws-ovn-upgrade-paused` step.
Upgrade test script `ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh`	In `upgrade_paused()`, pass `${TEST_ARGS:-}` as an extra positional argument to the first `openshift-tests run-upgrade` call.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 9 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (9 passed)

Check name	Status	Explanation
Stable And Deterministic Test Names	✅ Passed	The PR modifies CI/CD configuration files with no Ginkgo test name definitions, so the check is not applicable.
Test Structure And Quality	✅ Passed	PR modifies only CI/CD configuration files and shell scripts, not Ginkgo test code subject to the custom check requirements.
Microshift Test Compatibility	✅ Passed	PR modifies CI configuration and shell scripts only, not Ginkgo test source code. No new e2e tests are being added.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	PR modifies CI configuration and shell script functions only, not introducing new Ginkgo e2e tests.
Topology-Aware Scheduling Compatibility	✅ Passed	PR modifies CI test infrastructure configuration and shell scripts to disable monitoring during upgrade testing. Changes do not introduce any pod scheduling constraints, affinity rules, or topology spread constraints.
Ote Binary Stdout Contract	✅ Passed	The pull request consists solely of CI/CD configuration and shell script orchestration changes. No OTE binary source code is modified, so the OTE Binary Stdout Contract is not violated.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	PR modifies CI configuration and shell script invocation, not adding new Ginkgo e2e tests, so the IPv6/disconnected network compatibility check does not apply.
Title check	✅ Passed	The title 'Fix test: e2e-aws-ovn-upgrade-paused' accurately describes the main change, which adds TEST_ARGS to disable monitors in the e2e-aws-ovn-upgrade-paused test step.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh (1)
291-297: Add a comment explaining why only the first upgrade receives TEST_ARGS.

The asymmetric handling of TEST_ARGS (passed to the first upgrade but not the second or worker upgrade) is intentional per the PR description, but future maintainers won't understand this without context. Consider adding a brief comment explaining this is a workaround for binary version mismatch.

Additionally, the static analysis tool (SC2086) flags the unquoted ${TEST_ARGS:-}. While this follows the existing pattern on line 335, quoting would be safer if any arguments contain spaces or special characters.
Suggested documentation and optional quoting fix
     echo "Starting control-plane upgrade to ${OPENSHIFT_UPGRADE0_RELEASE_IMAGE_OVERRIDE}"
+    # Pass TEST_ARGS only to first upgrade to work around binary version mismatch
+    # between 4.y test binary and 4.y-1 release (see PR `#77842`)
-    openshift-tests run-upgrade "${TEST_UPGRADE_SUITE}" ${TEST_ARGS:-} \
+    openshift-tests run-upgrade "${TEST_UPGRADE_SUITE}" "${TEST_ARGS:-}" \
         --to-image "${OPENSHIFT_UPGRADE0_RELEASE_IMAGE_OVERRIDE}" \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh`
around lines 291 - 297, Add an inline comment above the first openshift-tests
run-upgrade invocation explaining that TEST_ARGS are intentionally passed only
to the initial upgrade (not the subsequent or worker upgrade) as a workaround
for binary/version mismatch between the test binary and the target cluster; then
also fix the shellwarning by quoting TEST_ARGS (change ${TEST_ARGS:-} to
"${TEST_ARGS:-}") in the run-upgrade call so arguments with spaces/special chars
are handled safely—refer to the openshift-tests run-upgrade invocation lines and
ensure the comment mentions the version-mismatch/workaround rationale for future
maintainers.
ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml (1)
32-34: Consider tracking the workaround with a TODO or linked issue.

The PR description indicates this is a temporary workaround until the 4.y-1 binary can be used for the first upgrade. Disabling 22+ monitors substantially reduces test coverage for the 4.20→4.21 upgrade phase. Adding a TODO comment or linking to a tracking issue would help ensure this workaround is revisited.
Suggested documentation
     env:
       TEST_UPGRADE_OPTIONS: ""
+      # TODO(JIRA-XXXX): Remove TEST_ARGS once 4.y-1 binary is used for first upgrade
       TEST_ARGS: --disable-monitor=etcd-log-analyzer,node-lifecycle,...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml`
around lines 32 - 34, Add a short TODO comment or add a reference to a tracking
issue next to the temporary workaround where monitors are disabled (in the
TEST_ARGS environment value that lists the disabled monitors) so the change is
clearly flagged for revisit; update the same YAML block that sets
TEST_UPGRADE_OPTIONS/TEST_ARGS to include a one-line TODO or an issue URL
describing this is a temporary workaround until the 4.y-1 binary is available
and indicating who should follow up.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml`:
- Line 34: The TEST_ARGS contains a misspelled monitor name
"pod-network-avalibility"; update all occurrences of that string to the correct
"pod-network-availability" (e.g., in the TEST_ARGS value and other places where
"pod-network-avalibility" appears across the repo, roughly 8 locations) so
monitor names are consistent; search for the exact token
"pod-network-avalibility" and replace it with "pod-network-availability", then
run any config linter/tests that validate monitor names (the relevant token to
change is TEST_ARGS and the monitor name string).

---

Nitpick comments:
In
`@ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml`:
- Around line 32-34: Add a short TODO comment or add a reference to a tracking
issue next to the temporary workaround where monitors are disabled (in the
TEST_ARGS environment value that lists the disabled monitors) so the change is
clearly flagged for revisit; update the same YAML block that sets
TEST_UPGRADE_OPTIONS/TEST_ARGS to include a one-line TODO or an issue URL
describing this is a temporary workaround until the 4.y-1 binary is available
and indicating who should follow up.

In `@ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh`:
- Around line 291-297: Add an inline comment above the first openshift-tests
run-upgrade invocation explaining that TEST_ARGS are intentionally passed only
to the initial upgrade (not the subsequent or worker upgrade) as a workaround
for binary/version mismatch between the test binary and the target cluster; then
also fix the shellwarning by quoting TEST_ARGS (change ${TEST_ARGS:-} to
"${TEST_ARGS:-}") in the run-upgrade call so arguments with spaces/special chars
are handled safely—refer to the openshift-tests run-upgrade invocation lines and
ensure the comment mentions the version-mismatch/workaround rationale for future
maintainers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ef0248c8-12b0-4bf8-88aa-8e8c6a8b532d

📥 Commits

Reviewing files that changed from the base of the PR and between 67862bf and aedc166.

📒 Files selected for processing (2)

ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml
ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh

coderabbitai · 2026-04-15T14:00:22Z

    cluster_profile: openshift-org-aws
    env:
      TEST_UPGRADE_OPTIONS: ""
+      TEST_ARGS: --disable-monitor=etcd-log-analyzer,node-lifecycle,on-prem-haproxy,on-prem-keepalived,initial-and-final-operator-log-scraper,apiserver-incluster-availability,kubelet-log-collector,audit-log-analyzer,metrics-endpoints-down,alert-summary-serializer,cpu-metric-collector,pod-network-avalibility,service-type-load-balancer-availability,ingress-availability,pathological-event-analyzer,legacy-test-framework-invariants,operator-state-analyzer,legacy-cvo-invariants,apiserver-external-availability,azure-metrics-collector,etcd-disk-metrics-intervals,termination-message-policy


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Check for the correct spelling of the pod-network availability monitor # Search for monitor names in the codebase rg -n "pod-network-av" --type go --type yaml | head -30

Repository: openshift/release

Length of output: 3790

🏁 Script executed:

# Search for the correctly spelled "pod-network-availability" rg "pod-network-availability" --type go --type yaml | head -20

Repository: openshift/release

Length of output: 43

🏁 Script executed:

# Search for monitor definitions to find authoritative names rg "pod-network" --type go -A 2 -B 2 | head -50

Repository: openshift/release

Length of output: 43

🏁 Script executed:

# Search for monitor definitions or registry across all file types rg "monitor.*pod-network" -i --max-count 10

Repository: openshift/release

Length of output: 4003

🏁 Script executed:

# Check if there's a monitoring or test framework file that lists valid monitors fd -e go -e yaml -e json | xargs grep -l "monitor" | grep -i "monitor\|test\|framework" | head -10

Repository: openshift/release

Length of output: 915

🏁 Script executed:

# Look for where TEST_ARGS or disable-monitor is documented rg "disable-monitor" -A 5 -B 5 --max-count 5 | head -40

Repository: openshift/release

Length of output: 5352

Possible typo: pod-network-avalibility should likely be pod-network-availability.

This misspelling appears consistently across at least 8 locations in the codebase, suggesting it was copied from an original source. The monitor name should use the correct English spelling "availability" rather than "avalibility".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml` at line 34, The TEST_ARGS contains a misspelled monitor name "pod-network-avalibility"; update all occurrences of that string to the correct "pod-network-availability" (e.g., in the TEST_ARGS value and other places where "pod-network-avalibility" appears across the repo, roughly 8 locations) so monitor names are consistent; search for the exact token "pod-network-avalibility" and replace it with "pod-network-availability", then run any config linter/tests that validate monitor names (the relevant token to change is TEST_ARGS and the monitor name string).

The typo comes from its definition.

https://github.com/openshift/origin/blob/2ad8693fecdc6e91a83328c310d367e9a88475be/pkg/defaultmonitortests/types.go#L130

That source should be fixed first.

hongkailiu · 2026-04-15T14:14:44Z

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-15T14:14:48Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

hongkailiu · 2026-04-15T14:17:55Z

  steps:
    cluster_profile: openshift-org-aws
    env:
+      TEST_ARGS: --disable-monitor=etcd-log-analyzer,node-lifecycle,on-prem-haproxy,on-prem-keepalived,initial-and-final-operator-log-scraper,apiserver-incluster-availability,kubelet-log-collector,audit-log-analyzer,metrics-endpoints-down,alert-summary-serializer,cpu-metric-collector,pod-network-avalibility,service-type-load-balancer-availability,ingress-availability,pathological-event-analyzer,legacy-test-framework-invariants,operator-state-analyzer,legacy-cvo-invariants,apiserver-external-availability,azure-metrics-collector,etcd-disk-metrics-intervals,termination-message-policy


If there is a knob of disable-all, I would use it here.
I would disable all other non-monitor tests as well, as long as it fails if the upgrade is failed.

hongkailiu · 2026-04-15T18:06:35Z

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/77842/rehearse-77842-periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused/2044433136883863552

Still failed on the first upgrade but TEST_ARGS is working.

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/77842/rehearse-77842-periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused/2044433136883863552/artifacts/e2e-aws-ovn-upgrade-paused/openshift-e2e-test/artifacts/junit/e2e-events_20260415-155730.json | jq '[.items[]|select(.locator.type=="Kind" and .locator.keys["clusterversion"]=="cluster" and .message.reason=="UpgradeStarted"  and .source=="KubeEvent")]'
[
  {
    "level": "Info",
    "source": "KubeEvent",
    "locator": {
      "type": "Kind",
      "keys": {
        "clusterversion": "cluster",
        "hmsg": "659712c7dc",
        "namespace": "openshift-cluster-version"
      }
    },
    "message": {
      "reason": "UpgradeStarted",
      "cause": "",
      "humanMessage": "version/ image/registry.build09.ci.openshift.org/ci-op-bhr33tp1/release@sha256:fe8c01d54f42fd4818aa0801d7087966f91304d447bc3fe9ceb1b97b9b9e40cb",
      "annotations": {
        "firstTimestamp": "0001-01-01T00:00:00Z",
        "lastTimestamp": "0001-01-01T00:00:00Z",
        "reason": "UpgradeStarted"
      }
    },
    "from": "2026-04-15T16:03:24Z",
    "to": "2026-04-15T16:03:24Z"
  }
]

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/77842/rehearse-77842-periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused/2044433136883863552/artifacts/e2e-aws-ovn-upgrade-paused/openshift-e2e-test/build-log.txt | grep 'Starting control-plane upgrade to' -A
3
+ echo 'Starting control-plane upgrade to registry.build09.ci.openshift.org/ci-op-bhr33tp1/release@sha256:fe8c01d54f42fd4818aa0801d7087966f91304d447bc3fe9ceb1b97b9b9e40cb'
Starting control-plane upgrade to registry.build09.ci.openshift.org/ci-op-bhr33tp1/release@sha256:fe8c01d54f42fd4818aa0801d7087966f91304d447bc3fe9ceb1b97b9b9e40cb
+ wait 366
+ openshift-tests run-upgrade all --disable-monitor=etcd-log-analyzer,node-lifecycle,on-prem-haproxy,on-prem-keepalived,initial-and-final-operator-log-scraper,apiserver-incluster-availability,kubelet-log-collector,audit-log-analyzer,metrics-endpoints-down,alert-summary-serializer,cpu-metric-collector,pod-network-avalibility,service-type-load-balancer-availability,ingress-availability,pathological-event-analyzer,legacy-test-framework-invariants,operator-state-analyzer,legacy-cvo-invariants,apiserver-external-availability,azure-metrics-collector,etcd-disk-metrics-intervals,termination-message-policy --retry-strategy=aggressive --to-image registry.build09.ci.openshift.org/ci-op-bhr33tp1/release@sha256:fe8c01d54f42fd4818aa0801d7087966f91304d447bc3fe9ceb1b97b9b9e40cb --options '' --provider '{"type":"aws","region":"us-east-1","zone":"us-east-1d","multizone":true,"multimaster":true}' -o /logs/artifacts/e2e.log --junit-dir /logs/artifacts/junit
I0415 15:56:51.780793     366 factory.go:195] Registered Plugin "containerd"

hongkailiu · 2026-04-15T18:07:48Z

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-15T18:07:51Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

coderabbitai

🧹 Nitpick comments (1)

ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh (1)

291-291: Harden TEST_ARGS expansion for run-upgrade invocation

At Line 291, unquoted ${TEST_ARGS:-} can trigger globbing/accidental word splitting. Please pass parsed args via an array to preserve intended tokens safely.

Proposed change

 function upgrade_paused() {
     set -x
+    local -a first_upgrade_args=()
+    if [[ -n "${TEST_ARGS:-}" ]]; then
+        read -r -a first_upgrade_args <<< "${TEST_ARGS}"
+    fi
     unset TEST_SUITE
     TARGET_RELEASES="${OPENSHIFT_UPGRADE_RELEASE_IMAGE_OVERRIDE:-}"
@@
-    openshift-tests run-upgrade "${TEST_UPGRADE_SUITE}" ${TEST_ARGS:-} \
+    openshift-tests run-upgrade "${TEST_UPGRADE_SUITE}" "${first_upgrade_args[@]}" \
         --to-image "${OPENSHIFT_UPGRADE0_RELEASE_IMAGE_OVERRIDE}" \
         --options "${TEST_UPGRADE_OPTIONS-}" \
         --provider "${TEST_PROVIDER}" \

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh`
at line 291, The unquoted ${TEST_ARGS:-} in the openshift-tests run-upgrade
invocation can cause word-splitting and globbing; fix by converting TEST_ARGS
into a safe array and expanding that array when calling run-upgrade.
Specifically, parse TEST_ARGS into an array (e.g., read -r -a TEST_ARGS_ARRAY
<<< "$TEST_ARGS" or otherwise build TEST_ARGS_ARRAY), then call openshift-tests
run-upgrade "${TEST_UPGRADE_SUITE}" "${TEST_ARGS_ARRAY[@]:-}" so each intended
token is preserved and no globbing occurs.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh`:
- Line 291: The unquoted ${TEST_ARGS:-} in the openshift-tests run-upgrade
invocation can cause word-splitting and globbing; fix by converting TEST_ARGS
into a safe array and expanding that array when calling run-upgrade.
Specifically, parse TEST_ARGS into an array (e.g., read -r -a TEST_ARGS_ARRAY
<<< "$TEST_ARGS" or otherwise build TEST_ARGS_ARRAY), then call openshift-tests
run-upgrade "${TEST_UPGRADE_SUITE}" "${TEST_ARGS_ARRAY[@]:-}" so each intended
token is preserved and no globbing occurs.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: feb25979-748a-473b-97ae-51ddfac7ed7c

📥 Commits

Reviewing files that changed from the base of the PR and between 4978391 and 63b6df3.

📒 Files selected for processing (2)

ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml
ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh

hongkailiu · 2026-04-15T21:27:43Z

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-15T21:27:46Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh`:
- Line 291: The unquoted expansion of TEST_ARGS in the openshift-tests
run-upgrade invocation can cause word-splitting; update the invocation that
currently uses openshift-tests run-upgrade "${TEST_UPGRADE_SUITE}"
${TEST_ARGS:-} \ to safely expand TEST_ARGS—either quote the scalar as
"${TEST_ARGS:-}" or, preferably, convert TEST_ARGS to an array and expand it as
"${TEST_ARGS[@]:-}" so multi-flag values are preserved; adjust any places that
set or read TEST_ARGS accordingly (referencing TEST_ARGS and the openshift-tests
run-upgrade call).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7ff68b06-4fc6-41d8-8588-1e8f82ca9cef

📥 Commits

Reviewing files that changed from the base of the PR and between 63b6df3 and e033ce3.

📒 Files selected for processing (2)

ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml
ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh

✅ Files skipped from review due to trivial changes (1)

ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml

hongkailiu · 2026-04-16T03:11:21Z

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-16T03:11:24Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

coderabbitai

♻️ Duplicate comments (1)

ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh (1)

291-291: ⚠️ Potential issue | 🟡 Minor

Line 291: keep TEST_ARGS expansion safe (still unresolved).

${TEST_ARGS:-} is still unquoted in command context, so shell word-splitting/globbing can alter arguments unexpectedly. This is the same issue raised in prior review comments and is still present.

Proposed fix

 function upgrade_paused() {
     set -x
+    local -a first_upgrade_args=()
+    if [[ -n "${TEST_ARGS:-}" ]]; then
+        read -r -a first_upgrade_args <<< "${TEST_ARGS}"
+    fi
     unset TEST_SUITE
@@
-    openshift-tests run-upgrade "${TEST_UPGRADE_SUITE}" ${TEST_ARGS:-} \
+    openshift-tests run-upgrade "${TEST_UPGRADE_SUITE}" "${first_upgrade_args[@]}" \
         --to-image "${OPENSHIFT_UPGRADE0_RELEASE_IMAGE_OVERRIDE}" \

#!/bin/bash
set -euo pipefail

file="ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh"

echo "Inspecting changed block:"
nl -ba "$file" | sed -n '286,296p'

echo
echo "Finding unquoted TEST_ARGS expansions in command invocations:"
rg -n '\$\{TEST_ARGS:-\}' "$file"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh`
at line 291, The command invocation openshift-tests run-upgrade
"${TEST_UPGRADE_SUITE}" ${TEST_ARGS:-} should quote the TEST_ARGS expansion to
prevent word-splitting/globbing: replace the unquoted ${TEST_ARGS:-} with a
quoted expansion (e.g. "${TEST_ARGS:-}") in the openshift-tests run-upgrade
invocation so TEST_ARGS is passed safely; if TEST_ARGS is meant to carry
multiple distinct arguments, refactor to use an array variable instead and
expand it as "${TEST_ARGS[@]}".

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh`:
- Line 291: The command invocation openshift-tests run-upgrade
"${TEST_UPGRADE_SUITE}" ${TEST_ARGS:-} should quote the TEST_ARGS expansion to
prevent word-splitting/globbing: replace the unquoted ${TEST_ARGS:-} with a
quoted expansion (e.g. "${TEST_ARGS:-}") in the openshift-tests run-upgrade
invocation so TEST_ARGS is passed safely; if TEST_ARGS is meant to carry
multiple distinct arguments, refactor to use an array variable instead and
expand it as "${TEST_ARGS[@]}".

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 094f63b9-40ea-4209-a2de-e83f9fb7fdcb

📥 Commits

Reviewing files that changed from the base of the PR and between e033ce3 and 2404cfc.

📒 Files selected for processing (2)

ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml
ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh

✅ Files skipped from review due to trivial changes (1)

ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml

hongkailiu · 2026-04-16T11:24:15Z

Progress!
The job failed on the 2nd upgrade:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/77842/rehearse-77842-periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused/2044616149156499456/artifacts/e2e-aws-ovn-upgrade-paused/openshift-e2e-test/build-log.txt | grep 'Starting control-plane upgrade to' -A3
3
+ echo 'Starting control-plane upgrade to registry.build09.ci.openshift.org/ci-op-nj99dkpv/release@sha256:d8c62aa4d5a312c1911468aa580dd2c404d967a784945ecee8c7ada72c0968e8'
Starting control-plane upgrade to registry.build09.ci.openshift.org/ci-op-nj99dkpv/release@sha256:d8c62aa4d5a312c1911468aa580dd2c404d967a784945ecee8c7ada72c0968e8
+ wait 368
+ openshift-tests run-upgrade all --disable-monitor=etcd-log-analyzer,node-lifecycle,on-prem-haproxy,on-prem-keepalived,initial-and-final-operator-log-scraper,apiserver-incluster-availability,kubelet-log-collector,audit-log-analyzer,metrics-endpoints-down,alert-summary-serializer,cpu-metric-collector,pod-network-avalibility,service-type-load-balancer-availability,ingress-availability,pathological-event-analyzer,legacy-test-framework-invariants,operator-state-analyzer,legacy-cvo-invariants,apiserver-external-availability,azure-metrics-collector,etcd-disk-metrics-intervals,termination-message-policy,legacy-test-framework-invariants-alerts,legacy-networking-invariants,oc-adm-upgrade-status --retry-strategy=aggressive --to-image registry.build09.ci.openshift.org/ci-op-nj99dkpv/release@sha256:d8c62aa4d5a312c1911468aa580dd2c404d967a784945ecee8c7ada72c0968e8 --options '' --provider '{"type":"aws","region":"us-west-1","zone":"us-west-1b","multizone":true,"multimaster":true}' -o /logs/artifacts/e2e.log --junit-dir /logs/artifacts/junit
I0416 04:15:39.388667     368 factory.go:195] Registered Plugin "containerd"
--
+ echo 'Starting control-plane upgrade to registry.build09.ci.openshift.org/ci-op-nj99dkpv/release@sha256:4f02290ff31104ecf367b8ec938f4ad0067be485db18ed1a6627817c3fd77d7e'
Starting control-plane upgrade to registry.build09.ci.openshift.org/ci-op-nj99dkpv/release@sha256:4f02290ff31104ecf367b8ec938f4ad0067be485db18ed1a6627817c3fd77d7e
+ wait 2791
+ openshift-tests run-upgrade all --to-image registry.build09.ci.openshift.org/ci-op-nj99dkpv/release@sha256:4f02290ff31104ecf367b8ec938f4ad0067be485db18ed1a6627817c3fd77d7e --options '' --provider '{"type":"aws","region":"us-west-1","zone":"us-west-1b","multizone":true,"multimaster":true}' -o /logs/artifacts/e2e.log --junit-dir /logs/artifacts/junit
I0416 05:36:11.072059    2791 factory.go:195] Registered Plugin "containerd"

The failure feels like a bug for the case of paused pool.

: [Monitor:oc-adm-upgrade-status][sig-cli][OCPFeatureGate:UpgradeStatus] oc adm upgrade status snapshots reflect the cluster upgrade lifecycle
...
=> Unexpected observation 'control plane updating' in state 'control plane updated'

https://redhat.atlassian.net/browse/OTA-1977

hongkailiu · 2026-04-16T11:32:57Z

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-16T11:33:01Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

hongkailiu · 2026-04-16T16:38:26Z

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/77842/rehearse-77842-periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused/2044743214509854720

The failure is on the worker upgrade. The two master upgrades succeeded. Progress.

I am not sure if the failure is legit or not.

hongkailiu · 2026-04-16T18:19:31Z

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-16T18:19:34Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

openshift-ci · 2026-04-21T03:37:45Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hongkailiu
Once this PR has been reviewed and has the lgtm label, please assign sosiouxme for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

hongkailiu · 2026-04-21T03:38:22Z

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.20-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade-paused periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-21T03:38:24Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

hongkailiu · 2026-04-21T10:50:32Z

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.20-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade-paused periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-21T10:50:35Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

hongkailiu · 2026-04-21T14:40:59Z

The failure on the 4.20 job looks irrelevant to this pull.
Retry ...

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.20-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-21T14:41:02Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

hongkailiu · 2026-04-21T15:07:18Z

The 4.22 job failed. But our fix seems working.

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.20-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade-paused periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-21T15:07:21Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

hongkailiu · 2026-04-21T15:10:28Z

The 4.22 job failed. But our fix seems working.

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/77842/rehearse-77842-periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused/2046543385593909248/artifacts/e2e-aws-ovn-upgrade-paused/gather-extra/artifacts/oc_cmds/nodes

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/77842/rehearse-77842-periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused/2046543385593909248/artifacts/e2e-aws-ovn-upgrade-paused/gather-extra/artifacts/oc_cmds/machineconfigpools

Workers are upraded.

hongkailiu · 2026-04-21T18:26:30Z

The 4.20 job has two failures run1 and run2. Both failed on disruption_tests: [bz-Cluster Version Operator] Verify presence of admin ack gate blocks upgrade until acknowledged.

However, we used to have successful runs such as run3.

I cannot explain what made the difference. The test became too flaky suddenly?

hongkailiu · 2026-04-21T19:16:50Z

The test suite are too flaky.

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.20-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade-paused periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-21T19:16:53Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

hongkailiu · 2026-04-22T00:56:26Z

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-22T00:56:29Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

hongkailiu · 2026-04-22T10:35:47Z

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-22T10:35:50Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Some CO may go Progressing before the upgrade to 4.22 (i.e., still in 4.21) and its exception on 4.22 has been removed in the test binary.

hongkailiu · 2026-04-22T15:20:31Z

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-22T15:20:34Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

hongkailiu · 2026-04-22T17:45:08Z

/retest

hongkailiu · 2026-04-22T17:45:22Z

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-22T17:45:25Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

openshift-ci · 2026-04-22T17:54:08Z

@hongkailiu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/rehearse/periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused	`c50517e`	link	unknown	`/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused`
ci/rehearse/periodic-ci-openshift-release-main-nightly-4.20-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade-paused	`c50517e`	link	unknown	`/pj-rehearse periodic-ci-openshift-release-main-nightly-4.20-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade-paused`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

hongkailiu · 2026-04-22T22:43:33Z

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

openshift-merge-bot · 2026-04-22T22:43:36Z

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

openshift-ci Bot requested review from petr-muller and stbenjam April 15, 2026 13:52

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

hongkailiu force-pushed the fix-e2e-aws-ovn-upgrade-paused branch from aedc166 to 4978391 Compare April 15, 2026 14:14

hongkailiu commented Apr 15, 2026

View reviewed changes

hongkailiu force-pushed the fix-e2e-aws-ovn-upgrade-paused branch from 4978391 to 63b6df3 Compare April 15, 2026 18:07

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

hongkailiu force-pushed the fix-e2e-aws-ovn-upgrade-paused branch from 63b6df3 to e033ce3 Compare April 15, 2026 21:27

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh Outdated

hongkailiu force-pushed the fix-e2e-aws-ovn-upgrade-paused branch from e033ce3 to 2404cfc Compare April 16, 2026 03:10

coderabbitai Bot reviewed Apr 16, 2026

View reviewed changes

hongkailiu force-pushed the fix-e2e-aws-ovn-upgrade-paused branch from 2404cfc to c17d3df Compare April 16, 2026 11:31

hongkailiu force-pushed the fix-e2e-aws-ovn-upgrade-paused branch from a4c6432 to 24fa9f8 Compare April 21, 2026 03:37

hongkailiu force-pushed the fix-e2e-aws-ovn-upgrade-paused branch from 24fa9f8 to 3821c1b Compare April 21, 2026 10:49

Override ClusterImagePolicy openshift on 4.20

c50517e

hongkailiu force-pushed the fix-e2e-aws-ovn-upgrade-paused branch from 3821c1b to c50517e Compare April 21, 2026 15:06

Disable oc-adm-upgrade-status monitor

f8f01ec

Some CO may go Progressing before the upgrade to 4.22 (i.e., still in 4.21) and its exception on 4.22 has been removed in the test binary.

Conversation

hongkailiu commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

hongkailiu commented Apr 15, 2026

Uh oh!

openshift-merge-bot Bot commented Apr 15, 2026

Uh oh!

coderabbitai Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hongkailiu Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

hongkailiu commented Apr 15, 2026

Uh oh!

openshift-merge-bot Bot commented Apr 15, 2026

Uh oh!

hongkailiu Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

hongkailiu commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hongkailiu commented Apr 15, 2026

Uh oh!

openshift-merge-bot Bot commented Apr 15, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

hongkailiu commented Apr 15, 2026

Uh oh!

openshift-merge-bot Bot commented Apr 15, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hongkailiu commented Apr 16, 2026

Uh oh!

openshift-merge-bot Bot commented Apr 16, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

hongkailiu commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hongkailiu commented Apr 16, 2026

Uh oh!

openshift-merge-bot Bot commented Apr 16, 2026

Uh oh!

hongkailiu commented Apr 16, 2026

Uh oh!

hongkailiu commented Apr 16, 2026

Uh oh!

openshift-merge-bot Bot commented Apr 16, 2026

Uh oh!

openshift-ci Bot commented Apr 21, 2026

Uh oh!

hongkailiu commented Apr 21, 2026

Uh oh!

openshift-merge-bot Bot commented Apr 21, 2026

Uh oh!

hongkailiu commented Apr 21, 2026

Uh oh!

hongkailiu commented Apr 15, 2026 •

edited

Loading

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading

coderabbitai Bot Apr 15, 2026 •

edited

Loading

hongkailiu commented Apr 15, 2026 •

edited

Loading

hongkailiu commented Apr 16, 2026 •

edited

Loading

hongkailiu commented Apr 21, 2026 •

edited

Loading