Skip to content

Fix test: e2e-aws-ovn-upgrade-paused#77842

Open
hongkailiu wants to merge 5 commits intoopenshift:mainfrom
hongkailiu:fix-e2e-aws-ovn-upgrade-paused
Open

Fix test: e2e-aws-ovn-upgrade-paused#77842
hongkailiu wants to merge 5 commits intoopenshift:mainfrom
hongkailiu:fix-e2e-aws-ovn-upgrade-paused

Conversation

@hongkailiu
Copy link
Copy Markdown
Member

@hongkailiu hongkailiu commented Apr 15, 2026

The binary of the target release 4.y is used in each upgrade in the
e2e-aws-ovn-upgrade-paused e2e test. This limitation of the test
configuration may cause unexpected failure of the job, e.g.,

  • The fix of a bug is shipped in 4.y but not backported to 4.y-1.
  • The exception for the bug is removed on 4.y. We could wait longer to
    do so but it is unintuitive in my opinion.

Then, the 4.y test with upgrade from 4.y-2 might fail because the fix is not
available in 4.y-1.

Ideally, we should use the 4.y-1 binary for the first one. Before that is
implemented, we use this as a workaround to fix the test nightly-4.22-upgrade-from-stable-4.20.

Many monitors have been disabled in the first upgrade to avoid failures.
It should not impact the product quality because there are tests in release
4.y-1 that do only one upgrade from 4.y-2 to 4.y-1.

Summary by CodeRabbit

  • Chores
    • Added a configurable TEST_ARGS variable to supply additional disable flags for selected monitors/analyzers during paused upgrade test runs.
    • Updated paused-upgrade test invocation to propagate this configured argument to the initial control-plane upgrade step.
    • No other test sequencing, observer enablement, workflow control, or upgrade steps were modified.

@hongkailiu
Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 15, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a TEST_ARGS env variable to the e2e upgrade-paused CI job and forwards it as an extra positional argument to the first openshift-tests run-upgrade invocation inside the upgrade_paused() shell function.

Changes

Cohort / File(s) Summary
CI Test Config
ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml
Add env.TEST_ARGS containing a --disable-monitor=... comma-separated list for the e2e-aws-ovn-upgrade-paused step.
Upgrade test script
ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh
In upgrade_paused(), pass ${TEST_ARGS:-} as an extra positional argument to the first openshift-tests run-upgrade call.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 9 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (9 passed)
Check name Status Explanation
Stable And Deterministic Test Names ✅ Passed The PR modifies CI/CD configuration files with no Ginkgo test name definitions, so the check is not applicable.
Test Structure And Quality ✅ Passed PR modifies only CI/CD configuration files and shell scripts, not Ginkgo test code subject to the custom check requirements.
Microshift Test Compatibility ✅ Passed PR modifies CI configuration and shell scripts only, not Ginkgo test source code. No new e2e tests are being added.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR modifies CI configuration and shell script functions only, not introducing new Ginkgo e2e tests.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies CI test infrastructure configuration and shell scripts to disable monitoring during upgrade testing. Changes do not introduce any pod scheduling constraints, affinity rules, or topology spread constraints.
Ote Binary Stdout Contract ✅ Passed The pull request consists solely of CI/CD configuration and shell script orchestration changes. No OTE binary source code is modified, so the OTE Binary Stdout Contract is not violated.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR modifies CI configuration and shell script invocation, not adding new Ginkgo e2e tests, so the IPv6/disconnected network compatibility check does not apply.
Title check ✅ Passed The title 'Fix test: e2e-aws-ovn-upgrade-paused' accurately describes the main change, which adds TEST_ARGS to disable monitors in the e2e-aws-ovn-upgrade-paused test step.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from petr-muller and stbenjam April 15, 2026 13:52
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh (1)

291-297: Add a comment explaining why only the first upgrade receives TEST_ARGS.

The asymmetric handling of TEST_ARGS (passed to the first upgrade but not the second or worker upgrade) is intentional per the PR description, but future maintainers won't understand this without context. Consider adding a brief comment explaining this is a workaround for binary version mismatch.

Additionally, the static analysis tool (SC2086) flags the unquoted ${TEST_ARGS:-}. While this follows the existing pattern on line 335, quoting would be safer if any arguments contain spaces or special characters.

Suggested documentation and optional quoting fix
     echo "Starting control-plane upgrade to ${OPENSHIFT_UPGRADE0_RELEASE_IMAGE_OVERRIDE}"
+    # Pass TEST_ARGS only to first upgrade to work around binary version mismatch
+    # between 4.y test binary and 4.y-1 release (see PR `#77842`)
-    openshift-tests run-upgrade "${TEST_UPGRADE_SUITE}" ${TEST_ARGS:-} \
+    openshift-tests run-upgrade "${TEST_UPGRADE_SUITE}" "${TEST_ARGS:-}" \
         --to-image "${OPENSHIFT_UPGRADE0_RELEASE_IMAGE_OVERRIDE}" \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh`
around lines 291 - 297, Add an inline comment above the first openshift-tests
run-upgrade invocation explaining that TEST_ARGS are intentionally passed only
to the initial upgrade (not the subsequent or worker upgrade) as a workaround
for binary/version mismatch between the test binary and the target cluster; then
also fix the shellwarning by quoting TEST_ARGS (change ${TEST_ARGS:-} to
"${TEST_ARGS:-}") in the run-upgrade call so arguments with spaces/special chars
are handled safely—refer to the openshift-tests run-upgrade invocation lines and
ensure the comment mentions the version-mismatch/workaround rationale for future
maintainers.
ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml (1)

32-34: Consider tracking the workaround with a TODO or linked issue.

The PR description indicates this is a temporary workaround until the 4.y-1 binary can be used for the first upgrade. Disabling 22+ monitors substantially reduces test coverage for the 4.20→4.21 upgrade phase. Adding a TODO comment or linking to a tracking issue would help ensure this workaround is revisited.

Suggested documentation
     env:
       TEST_UPGRADE_OPTIONS: ""
+      # TODO(JIRA-XXXX): Remove TEST_ARGS once 4.y-1 binary is used for first upgrade
       TEST_ARGS: --disable-monitor=etcd-log-analyzer,node-lifecycle,...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml`
around lines 32 - 34, Add a short TODO comment or add a reference to a tracking
issue next to the temporary workaround where monitors are disabled (in the
TEST_ARGS environment value that lists the disabled monitors) so the change is
clearly flagged for revisit; update the same YAML block that sets
TEST_UPGRADE_OPTIONS/TEST_ARGS to include a one-line TODO or an issue URL
describing this is a temporary workaround until the 4.y-1 binary is available
and indicating who should follow up.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml`:
- Line 34: The TEST_ARGS contains a misspelled monitor name
"pod-network-avalibility"; update all occurrences of that string to the correct
"pod-network-availability" (e.g., in the TEST_ARGS value and other places where
"pod-network-avalibility" appears across the repo, roughly 8 locations) so
monitor names are consistent; search for the exact token
"pod-network-avalibility" and replace it with "pod-network-availability", then
run any config linter/tests that validate monitor names (the relevant token to
change is TEST_ARGS and the monitor name string).

---

Nitpick comments:
In
`@ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml`:
- Around line 32-34: Add a short TODO comment or add a reference to a tracking
issue next to the temporary workaround where monitors are disabled (in the
TEST_ARGS environment value that lists the disabled monitors) so the change is
clearly flagged for revisit; update the same YAML block that sets
TEST_UPGRADE_OPTIONS/TEST_ARGS to include a one-line TODO or an issue URL
describing this is a temporary workaround until the 4.y-1 binary is available
and indicating who should follow up.

In `@ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh`:
- Around line 291-297: Add an inline comment above the first openshift-tests
run-upgrade invocation explaining that TEST_ARGS are intentionally passed only
to the initial upgrade (not the subsequent or worker upgrade) as a workaround
for binary/version mismatch between the test binary and the target cluster; then
also fix the shellwarning by quoting TEST_ARGS (change ${TEST_ARGS:-} to
"${TEST_ARGS:-}") in the run-upgrade call so arguments with spaces/special chars
are handled safely—refer to the openshift-tests run-upgrade invocation lines and
ensure the comment mentions the version-mismatch/workaround rationale for future
maintainers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ef0248c8-12b0-4bf8-88aa-8e8c6a8b532d

📥 Commits

Reviewing files that changed from the base of the PR and between 67862bf and aedc166.

📒 Files selected for processing (2)
  • ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml
  • ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh

cluster_profile: openshift-org-aws
env:
TEST_UPGRADE_OPTIONS: ""
TEST_ARGS: --disable-monitor=etcd-log-analyzer,node-lifecycle,on-prem-haproxy,on-prem-keepalived,initial-and-final-operator-log-scraper,apiserver-incluster-availability,kubelet-log-collector,audit-log-analyzer,metrics-endpoints-down,alert-summary-serializer,cpu-metric-collector,pod-network-avalibility,service-type-load-balancer-availability,ingress-availability,pathological-event-analyzer,legacy-test-framework-invariants,operator-state-analyzer,legacy-cvo-invariants,apiserver-external-availability,azure-metrics-collector,etcd-disk-metrics-intervals,termination-message-policy
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check for the correct spelling of the pod-network availability monitor

# Search for monitor names in the codebase
rg -n "pod-network-av" --type go --type yaml | head -30

Repository: openshift/release

Length of output: 3790


🏁 Script executed:

# Search for the correctly spelled "pod-network-availability"
rg "pod-network-availability" --type go --type yaml | head -20

Repository: openshift/release

Length of output: 43


🏁 Script executed:

# Search for monitor definitions to find authoritative names
rg "pod-network" --type go -A 2 -B 2 | head -50

Repository: openshift/release

Length of output: 43


🏁 Script executed:

# Search for monitor definitions or registry across all file types
rg "monitor.*pod-network" -i --max-count 10

Repository: openshift/release

Length of output: 4003


🏁 Script executed:

# Check if there's a monitoring or test framework file that lists valid monitors
fd -e go -e yaml -e json | xargs grep -l "monitor" | grep -i "monitor\|test\|framework" | head -10

Repository: openshift/release

Length of output: 915


🏁 Script executed:

# Look for where TEST_ARGS or disable-monitor is documented
rg "disable-monitor" -A 5 -B 5 --max-count 5 | head -40

Repository: openshift/release

Length of output: 5352


Possible typo: pod-network-avalibility should likely be pod-network-availability.

This misspelling appears consistently across at least 8 locations in the codebase, suggesting it was copied from an original source. The monitor name should use the correct English spelling "availability" rather than "avalibility".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml`
at line 34, The TEST_ARGS contains a misspelled monitor name
"pod-network-avalibility"; update all occurrences of that string to the correct
"pod-network-availability" (e.g., in the TEST_ARGS value and other places where
"pod-network-avalibility" appears across the repo, roughly 8 locations) so
monitor names are consistent; search for the exact token
"pod-network-avalibility" and replace it with "pod-network-availability", then
run any config linter/tests that validate monitor names (the relevant token to
change is TEST_ARGS and the monitor name string).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hongkailiu hongkailiu force-pushed the fix-e2e-aws-ovn-upgrade-paused branch from aedc166 to 4978391 Compare April 15, 2026 14:14
@hongkailiu
Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

steps:
cluster_profile: openshift-org-aws
env:
TEST_ARGS: --disable-monitor=etcd-log-analyzer,node-lifecycle,on-prem-haproxy,on-prem-keepalived,initial-and-final-operator-log-scraper,apiserver-incluster-availability,kubelet-log-collector,audit-log-analyzer,metrics-endpoints-down,alert-summary-serializer,cpu-metric-collector,pod-network-avalibility,service-type-load-balancer-availability,ingress-availability,pathological-event-analyzer,legacy-test-framework-invariants,operator-state-analyzer,legacy-cvo-invariants,apiserver-external-availability,azure-metrics-collector,etcd-disk-metrics-intervals,termination-message-policy
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a knob of disable-all, I would use it here.
I would disable all other non-monitor tests as well, as long as it fails if the upgrade is failed.

@hongkailiu
Copy link
Copy Markdown
Member Author

hongkailiu commented Apr 15, 2026

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/77842/rehearse-77842-periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused/2044433136883863552

Still failed on the first upgrade but TEST_ARGS is working.

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/77842/rehearse-77842-periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused/2044433136883863552/artifacts/e2e-aws-ovn-upgrade-paused/openshift-e2e-test/artifacts/junit/e2e-events_20260415-155730.json | jq '[.items[]|select(.locator.type=="Kind" and .locator.keys["clusterversion"]=="cluster" and .message.reason=="UpgradeStarted"  and .source=="KubeEvent")]'
[
  {
    "level": "Info",
    "source": "KubeEvent",
    "locator": {
      "type": "Kind",
      "keys": {
        "clusterversion": "cluster",
        "hmsg": "659712c7dc",
        "namespace": "openshift-cluster-version"
      }
    },
    "message": {
      "reason": "UpgradeStarted",
      "cause": "",
      "humanMessage": "version/ image/registry.build09.ci.openshift.org/ci-op-bhr33tp1/release@sha256:fe8c01d54f42fd4818aa0801d7087966f91304d447bc3fe9ceb1b97b9b9e40cb",
      "annotations": {
        "firstTimestamp": "0001-01-01T00:00:00Z",
        "lastTimestamp": "0001-01-01T00:00:00Z",
        "reason": "UpgradeStarted"
      }
    },
    "from": "2026-04-15T16:03:24Z",
    "to": "2026-04-15T16:03:24Z"
  }
]

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/77842/rehearse-77842-periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused/2044433136883863552/artifacts/e2e-aws-ovn-upgrade-paused/openshift-e2e-test/build-log.txt | grep 'Starting control-plane upgrade to' -A
3
+ echo 'Starting control-plane upgrade to registry.build09.ci.openshift.org/ci-op-bhr33tp1/release@sha256:fe8c01d54f42fd4818aa0801d7087966f91304d447bc3fe9ceb1b97b9b9e40cb'
Starting control-plane upgrade to registry.build09.ci.openshift.org/ci-op-bhr33tp1/release@sha256:fe8c01d54f42fd4818aa0801d7087966f91304d447bc3fe9ceb1b97b9b9e40cb
+ wait 366
+ openshift-tests run-upgrade all --disable-monitor=etcd-log-analyzer,node-lifecycle,on-prem-haproxy,on-prem-keepalived,initial-and-final-operator-log-scraper,apiserver-incluster-availability,kubelet-log-collector,audit-log-analyzer,metrics-endpoints-down,alert-summary-serializer,cpu-metric-collector,pod-network-avalibility,service-type-load-balancer-availability,ingress-availability,pathological-event-analyzer,legacy-test-framework-invariants,operator-state-analyzer,legacy-cvo-invariants,apiserver-external-availability,azure-metrics-collector,etcd-disk-metrics-intervals,termination-message-policy --retry-strategy=aggressive --to-image registry.build09.ci.openshift.org/ci-op-bhr33tp1/release@sha256:fe8c01d54f42fd4818aa0801d7087966f91304d447bc3fe9ceb1b97b9b9e40cb --options '' --provider '{"type":"aws","region":"us-east-1","zone":"us-east-1d","multizone":true,"multimaster":true}' -o /logs/artifacts/e2e.log --junit-dir /logs/artifacts/junit
I0415 15:56:51.780793     366 factory.go:195] Registered Plugin "containerd"

@hongkailiu hongkailiu force-pushed the fix-e2e-aws-ovn-upgrade-paused branch from 4978391 to 63b6df3 Compare April 15, 2026 18:07
@hongkailiu
Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh (1)

291-291: Harden TEST_ARGS expansion for run-upgrade invocation

At Line 291, unquoted ${TEST_ARGS:-} can trigger globbing/accidental word splitting. Please pass parsed args via an array to preserve intended tokens safely.

Proposed change
 function upgrade_paused() {
     set -x
+    local -a first_upgrade_args=()
+    if [[ -n "${TEST_ARGS:-}" ]]; then
+        read -r -a first_upgrade_args <<< "${TEST_ARGS}"
+    fi
     unset TEST_SUITE
     TARGET_RELEASES="${OPENSHIFT_UPGRADE_RELEASE_IMAGE_OVERRIDE:-}"
@@
-    openshift-tests run-upgrade "${TEST_UPGRADE_SUITE}" ${TEST_ARGS:-} \
+    openshift-tests run-upgrade "${TEST_UPGRADE_SUITE}" "${first_upgrade_args[@]}" \
         --to-image "${OPENSHIFT_UPGRADE0_RELEASE_IMAGE_OVERRIDE}" \
         --options "${TEST_UPGRADE_OPTIONS-}" \
         --provider "${TEST_PROVIDER}" \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh`
at line 291, The unquoted ${TEST_ARGS:-} in the openshift-tests run-upgrade
invocation can cause word-splitting and globbing; fix by converting TEST_ARGS
into a safe array and expanding that array when calling run-upgrade.
Specifically, parse TEST_ARGS into an array (e.g., read -r -a TEST_ARGS_ARRAY
<<< "$TEST_ARGS" or otherwise build TEST_ARGS_ARRAY), then call openshift-tests
run-upgrade "${TEST_UPGRADE_SUITE}" "${TEST_ARGS_ARRAY[@]:-}" so each intended
token is preserved and no globbing occurs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh`:
- Line 291: The unquoted ${TEST_ARGS:-} in the openshift-tests run-upgrade
invocation can cause word-splitting and globbing; fix by converting TEST_ARGS
into a safe array and expanding that array when calling run-upgrade.
Specifically, parse TEST_ARGS into an array (e.g., read -r -a TEST_ARGS_ARRAY
<<< "$TEST_ARGS" or otherwise build TEST_ARGS_ARRAY), then call openshift-tests
run-upgrade "${TEST_UPGRADE_SUITE}" "${TEST_ARGS_ARRAY[@]:-}" so each intended
token is preserved and no globbing occurs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: feb25979-748a-473b-97ae-51ddfac7ed7c

📥 Commits

Reviewing files that changed from the base of the PR and between 4978391 and 63b6df3.

📒 Files selected for processing (2)
  • ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml
  • ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh

@hongkailiu hongkailiu force-pushed the fix-e2e-aws-ovn-upgrade-paused branch from 63b6df3 to e033ce3 Compare April 15, 2026 21:27
@hongkailiu
Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh`:
- Line 291: The unquoted expansion of TEST_ARGS in the openshift-tests
run-upgrade invocation can cause word-splitting; update the invocation that
currently uses openshift-tests run-upgrade "${TEST_UPGRADE_SUITE}"
${TEST_ARGS:-} \ to safely expand TEST_ARGS—either quote the scalar as
"${TEST_ARGS:-}" or, preferably, convert TEST_ARGS to an array and expand it as
"${TEST_ARGS[@]:-}" so multi-flag values are preserved; adjust any places that
set or read TEST_ARGS accordingly (referencing TEST_ARGS and the openshift-tests
run-upgrade call).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7ff68b06-4fc6-41d8-8588-1e8f82ca9cef

📥 Commits

Reviewing files that changed from the base of the PR and between 63b6df3 and e033ce3.

📒 Files selected for processing (2)
  • ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml
  • ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh
✅ Files skipped from review due to trivial changes (1)
  • ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml

Comment thread ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh Outdated
@hongkailiu hongkailiu force-pushed the fix-e2e-aws-ovn-upgrade-paused branch from e033ce3 to 2404cfc Compare April 16, 2026 03:10
@hongkailiu
Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh (1)

291-291: ⚠️ Potential issue | 🟡 Minor

Line 291: keep TEST_ARGS expansion safe (still unresolved).

${TEST_ARGS:-} is still unquoted in command context, so shell word-splitting/globbing can alter arguments unexpectedly. This is the same issue raised in prior review comments and is still present.

Proposed fix
 function upgrade_paused() {
     set -x
+    local -a first_upgrade_args=()
+    if [[ -n "${TEST_ARGS:-}" ]]; then
+        read -r -a first_upgrade_args <<< "${TEST_ARGS}"
+    fi
     unset TEST_SUITE
@@
-    openshift-tests run-upgrade "${TEST_UPGRADE_SUITE}" ${TEST_ARGS:-} \
+    openshift-tests run-upgrade "${TEST_UPGRADE_SUITE}" "${first_upgrade_args[@]}" \
         --to-image "${OPENSHIFT_UPGRADE0_RELEASE_IMAGE_OVERRIDE}" \
#!/bin/bash
set -euo pipefail

file="ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh"

echo "Inspecting changed block:"
nl -ba "$file" | sed -n '286,296p'

echo
echo "Finding unquoted TEST_ARGS expansions in command invocations:"
rg -n '\$\{TEST_ARGS:-\}' "$file"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh`
at line 291, The command invocation openshift-tests run-upgrade
"${TEST_UPGRADE_SUITE}" ${TEST_ARGS:-} should quote the TEST_ARGS expansion to
prevent word-splitting/globbing: replace the unquoted ${TEST_ARGS:-} with a
quoted expansion (e.g. "${TEST_ARGS:-}") in the openshift-tests run-upgrade
invocation so TEST_ARGS is passed safely; if TEST_ARGS is meant to carry
multiple distinct arguments, refactor to use an array variable instead and
expand it as "${TEST_ARGS[@]}".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh`:
- Line 291: The command invocation openshift-tests run-upgrade
"${TEST_UPGRADE_SUITE}" ${TEST_ARGS:-} should quote the TEST_ARGS expansion to
prevent word-splitting/globbing: replace the unquoted ${TEST_ARGS:-} with a
quoted expansion (e.g. "${TEST_ARGS:-}") in the openshift-tests run-upgrade
invocation so TEST_ARGS is passed safely; if TEST_ARGS is meant to carry
multiple distinct arguments, refactor to use an array variable instead and
expand it as "${TEST_ARGS[@]}".

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 094f63b9-40ea-4209-a2de-e83f9fb7fdcb

📥 Commits

Reviewing files that changed from the base of the PR and between e033ce3 and 2404cfc.

📒 Files selected for processing (2)
  • ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml
  • ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh
✅ Files skipped from review due to trivial changes (1)
  • ci-operator/config/openshift/release/openshift-release-main__nightly-4.22-upgrade-from-stable-4.20.yaml

@hongkailiu
Copy link
Copy Markdown
Member Author

hongkailiu commented Apr 16, 2026

Progress!
The job failed on the 2nd upgrade:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/77842/rehearse-77842-periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused/2044616149156499456/artifacts/e2e-aws-ovn-upgrade-paused/openshift-e2e-test/build-log.txt | grep 'Starting control-plane upgrade to' -A3
3
+ echo 'Starting control-plane upgrade to registry.build09.ci.openshift.org/ci-op-nj99dkpv/release@sha256:d8c62aa4d5a312c1911468aa580dd2c404d967a784945ecee8c7ada72c0968e8'
Starting control-plane upgrade to registry.build09.ci.openshift.org/ci-op-nj99dkpv/release@sha256:d8c62aa4d5a312c1911468aa580dd2c404d967a784945ecee8c7ada72c0968e8
+ wait 368
+ openshift-tests run-upgrade all --disable-monitor=etcd-log-analyzer,node-lifecycle,on-prem-haproxy,on-prem-keepalived,initial-and-final-operator-log-scraper,apiserver-incluster-availability,kubelet-log-collector,audit-log-analyzer,metrics-endpoints-down,alert-summary-serializer,cpu-metric-collector,pod-network-avalibility,service-type-load-balancer-availability,ingress-availability,pathological-event-analyzer,legacy-test-framework-invariants,operator-state-analyzer,legacy-cvo-invariants,apiserver-external-availability,azure-metrics-collector,etcd-disk-metrics-intervals,termination-message-policy,legacy-test-framework-invariants-alerts,legacy-networking-invariants,oc-adm-upgrade-status --retry-strategy=aggressive --to-image registry.build09.ci.openshift.org/ci-op-nj99dkpv/release@sha256:d8c62aa4d5a312c1911468aa580dd2c404d967a784945ecee8c7ada72c0968e8 --options '' --provider '{"type":"aws","region":"us-west-1","zone":"us-west-1b","multizone":true,"multimaster":true}' -o /logs/artifacts/e2e.log --junit-dir /logs/artifacts/junit
I0416 04:15:39.388667     368 factory.go:195] Registered Plugin "containerd"
--
+ echo 'Starting control-plane upgrade to registry.build09.ci.openshift.org/ci-op-nj99dkpv/release@sha256:4f02290ff31104ecf367b8ec938f4ad0067be485db18ed1a6627817c3fd77d7e'
Starting control-plane upgrade to registry.build09.ci.openshift.org/ci-op-nj99dkpv/release@sha256:4f02290ff31104ecf367b8ec938f4ad0067be485db18ed1a6627817c3fd77d7e
+ wait 2791
+ openshift-tests run-upgrade all --to-image registry.build09.ci.openshift.org/ci-op-nj99dkpv/release@sha256:4f02290ff31104ecf367b8ec938f4ad0067be485db18ed1a6627817c3fd77d7e --options '' --provider '{"type":"aws","region":"us-west-1","zone":"us-west-1b","multizone":true,"multimaster":true}' -o /logs/artifacts/e2e.log --junit-dir /logs/artifacts/junit
I0416 05:36:11.072059    2791 factory.go:195] Registered Plugin "containerd"

The failure feels like a bug for the case of paused pool.

: [Monitor:oc-adm-upgrade-status][sig-cli][OCPFeatureGate:UpgradeStatus] oc adm upgrade status snapshots reflect the cluster upgrade lifecycle
...
=> Unexpected observation 'control plane updating' in state 'control plane updated'

https://redhat.atlassian.net/browse/OTA-1977

@hongkailiu hongkailiu force-pushed the fix-e2e-aws-ovn-upgrade-paused branch from 2404cfc to c17d3df Compare April 16, 2026 11:31
@hongkailiu
Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@hongkailiu
Copy link
Copy Markdown
Member Author

@hongkailiu
Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@hongkailiu hongkailiu force-pushed the fix-e2e-aws-ovn-upgrade-paused branch from a4c6432 to 24fa9f8 Compare April 21, 2026 03:37
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hongkailiu
Once this PR has been reviewed and has the lgtm label, please assign sosiouxme for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hongkailiu
Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.20-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade-paused periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@hongkailiu hongkailiu force-pushed the fix-e2e-aws-ovn-upgrade-paused branch from 24fa9f8 to 3821c1b Compare April 21, 2026 10:49
@hongkailiu
Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.20-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade-paused periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@hongkailiu
Copy link
Copy Markdown
Member Author

The failure on the 4.20 job looks irrelevant to this pull.
Retry ...

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.20-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@hongkailiu hongkailiu force-pushed the fix-e2e-aws-ovn-upgrade-paused branch from 3821c1b to c50517e Compare April 21, 2026 15:06
@hongkailiu
Copy link
Copy Markdown
Member Author

The 4.22 job failed. But our fix seems working.

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.20-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade-paused periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@hongkailiu
Copy link
Copy Markdown
Member Author

hongkailiu commented Apr 21, 2026

The 4.20 job has two failures run1 and run2. Both failed on disruption_tests: [bz-Cluster Version Operator] Verify presence of admin ack gate blocks upgrade until acknowledged.

However, we used to have successful runs such as run3.

I cannot explain what made the difference. The test became too flaky suddenly?

@hongkailiu
Copy link
Copy Markdown
Member Author

The test suite are too flaky.

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.20-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade-paused periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@hongkailiu
Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@hongkailiu
Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Some CO may go Progressing before the upgrade to 4.22 (i.e., still
in 4.21) and its exception on 4.22 has been removed in the test binary.
@hongkailiu
Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@hongkailiu
Copy link
Copy Markdown
Member Author

/retest

@hongkailiu
Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 22, 2026

@hongkailiu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused c50517e link unknown /pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused
ci/rehearse/periodic-ci-openshift-release-main-nightly-4.20-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade-paused c50517e link unknown /pj-rehearse periodic-ci-openshift-release-main-nightly-4.20-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade-paused

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hongkailiu
Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@hongkailiu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant