Skip to content

TEST: Add debug workflow for techpreview serial testing#79352

Open
weliang1 wants to merge 2 commits into
openshift:mainfrom
weliang1:add-debug-workflow-techpreview-serial
Open

TEST: Add debug workflow for techpreview serial testing#79352
weliang1 wants to merge 2 commits into
openshift:mainfrom
weliang1:add-debug-workflow-techpreview-serial

Conversation

@weliang1
Copy link
Copy Markdown
Contributor

@weliang1 weliang1 commented May 15, 2026

Adds openshift-e2e-aws-ovn-serial-debug workflow with:

  • cucushift-installer-wait step for extended cluster access
  • SLEEP_DURATION environment variable support (default 2h, max 72h)
  • 12h timeout for debugging scenarios
  • Same configuration as techpreview-serial (AWS + OVN + TechPreview)

This allows QE engineers to debug clusters for up to 8 hours using SLEEP_DURATION environment variable via gangway-cli.

Usage:
gangway-cli --job-name periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview-serial-debug
--initial
--env SLEEP_DURATION=8h
--env TEST_ARGS="--dry-run"

Summary

This PR adds a new debug workflow to the OpenShift CI configuration in the openshift/release repository. The workflow, openshift-e2e-aws-ovn-serial-debug, provides extended cluster access for debugging TechPreview serial test runs (AWS + OVN) by keeping the cluster available after test execution.

What changed (practical impact)

  • Registered a new periodic/test job (e2e-aws-ovn-techpreview-serial-debug) in the CI-4.22 job config so the debug job can be scheduled and run under the existing CI pipelines.
  • Added a workflow YAML that is a variant of the techpreview-serial e2e workflow but includes a post-test cucushift-installer-wait step to hold the cluster for investigation, followed by gather-network, gather-core-dump, and ipi-deprovision steps.
  • Added workflow metadata and an OWNERS file to declare the workflow’s owner/approver.

These changes affect CI job definitions and step-registry workflows used by OpenShift CI; no application code or public library APIs are changed.

Key features and operational notes

  • Extended cluster access via cucushift-installer-wait in the post phase to enable QA/QE debugging after tests complete.
  • SLEEP_DURATION environment variable controls how long the cluster remains available (default 2h, max 72h).
  • Workflow timeout set to 12 hours for debugging scenarios.
  • Supports passing TEST_ARGS (example in docs uses TEST_ARGS="--dry-run") so engineers can provision clusters without executing the full test suite.
  • Workflow reuses the same TEST_SUITE and cluster configuration as the existing techpreview-serial path (AWS + OVN) but adds debugging support without changing production test semantics.

Files added/updated

  • ci-operator/config/openshift/release/openshift-release-main__ci-4.22.yaml — registers the new test job with a 12h timeout.
  • ci-operator/step-registry/openshift/e2e/aws/ovn/serial-debug/openshift-e2e-aws-ovn-serial-debug-workflow.yaml — debug workflow definition and embedded usage documentation.
  • ci-operator/step-registry/openshift/e2e/aws/ovn/serial-debug/openshift-e2e-aws-ovn-serial-debug-workflow.metadata.json — workflow metadata (owners).
  • ci-operator/step-registry/openshift/e2e/aws/ovn/serial-debug/OWNERS — approver/reviewer entries.

Risk and review effort

  • Low risk to production test logic; this is an additive debug workflow.
  • Estimated review effort: Medium (workflow and CI job configuration).

Adds openshift-e2e-aws-ovn-serial-debug workflow with:
- cucushift-installer-wait step for extended cluster access
- SLEEP_DURATION environment variable support (default 2h, max 72h)
- 12h timeout for debugging scenarios
- Same configuration as techpreview-serial (AWS + OVN + TechPreview)

This allows QE engineers to debug clusters for up to 8 hours
using SLEEP_DURATION environment variable via gangway-cli.

Usage:
  gangway-cli --job-name periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview-serial-debug \
    --initial <release-image> \
    --env SLEEP_DURATION=8h \
    --env TEST_ARGS="--dry-run"

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: c5223af1-b146-4b27-bdf0-76fc7c2b9710

📥 Commits

Reviewing files that changed from the base of the PR and between 5b6df83 and 9b8c1d8.

📒 Files selected for processing (2)
  • ci-operator/step-registry/openshift/e2e/aws/ovn/serial-debug/OWNERS
  • ci-operator/step-registry/openshift/e2e/aws/ovn/serial-debug/openshift-e2e-aws-ovn-serial-debug-workflow.metadata.json
✅ Files skipped from review due to trivial changes (1)
  • ci-operator/step-registry/openshift/e2e/aws/ovn/serial-debug/OWNERS
🚧 Files skipped from review as they are similar to previous changes (1)
  • ci-operator/step-registry/openshift/e2e/aws/ovn/serial-debug/openshift-e2e-aws-ovn-serial-debug-workflow.metadata.json

Walkthrough

Adds a new OpenShift e2e AWS OVN serial-debug CI workflow, its metadata and OWNERS, and registers the e2e-aws-ovn-techpreview-serial-debug test in the 4.22 release CI with TechPreviewNoUpgrade and a 12h timeout.

Changes

AWS OVN Serial Debug Workflow

Layer / File(s) Summary
Workflow definition, metadata, OWNERS, and CI integration
ci-operator/step-registry/openshift/e2e/aws/ovn/serial-debug/openshift-e2e-aws-ovn-serial-debug-workflow.yaml, ci-operator/step-registry/openshift/e2e/aws/ovn/serial-debug/openshift-e2e-aws-ovn-serial-debug-workflow.metadata.json, ci-operator/step-registry/openshift/e2e/aws/ovn/serial-debug/OWNERS, ci-operator/config/openshift/release/openshift-release-main__ci-4.22.yaml
New workflow openshift-e2e-aws-ovn-serial-debug with ordered pre/test/post steps (including installer wait, network/core-dump gathers, deprovision), metadata mapping and OWNERS, and CI job e2e-aws-ovn-techpreview-serial-debug added to 4.22 with FEATURE_SET: TechPreviewNoUpgrade, observers enabled, and timeout: 12h0m0s.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested labels

lgtm, rehearsals-ack

Suggested reviewers

  • dgoodwin
  • enxebre
🚥 Pre-merge checks | ✅ 12
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'TEST: Add debug workflow for techpreview serial testing' directly and clearly summarizes the main change: adding a new debug workflow for techpreview serial testing.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR adds only CI/CD configuration files (YAML, JSON, OWNERS), not Ginkgo test code. No test names with dynamic information found. Check is not applicable to this PR.
Test Structure And Quality ✅ Passed PR contains no Ginkgo test code. Changes are purely CI/CD configuration (YAML workflow definition, metadata, and OWNERS files). Custom check for Ginkgo test quality is not applicable.
Microshift Test Compatibility ✅ Passed PR adds CI workflow configuration and job definitions only, not new Ginkgo e2e test code. The check for MicroShift compatibility applies only to new test definitions, which are not present in this PR.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds only CI workflow configuration (YAML, JSON, OWNERS files). No new Ginkgo test code is added, so SNO compatibility check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds only CI workflow configuration files that orchestrate existing test infrastructure steps. No deployment manifests, operator code, or scheduling constraints are introduced.
Ote Binary Stdout Contract ✅ Passed PR contains only YAML/JSON CI configuration files with no Go source code. The OTE Binary Stdout Contract check applies only to executable code in main/init/suite-level functions.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds CI workflow configuration, not new Ginkgo e2e tests. Check applies only to new test code (It(), Describe(), etc.). No test code added, so check is not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from jcaamano and smg247 May 15, 2026 17:54
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 15, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: weliang1
Once this PR has been reviewed and has the lgtm label, please assign neisw for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ci-operator/config/openshift/release/openshift-release-main__ci-4.22.yaml`:
- Around line 147-157: The CI config defines a new test entry
e2e-aws-ovn-techpreview-serial-debug (workflow:
openshift-e2e-aws-ovn-serial-debug) but the generated Prow job manifests are
missing; run make update locally to regenerate downstream job configs (which
will create/modify files under ci-operator/jobs/) and commit those generated
changes along with your YAML edit so the new test exists in both
ci-operator/config and the generated ci-operator/jobs outputs.

In
`@ci-operator/step-registry/openshift/e2e/aws/ovn/serial-debug/openshift-e2e-aws-ovn-serial-debug-workflow.yaml`:
- Around line 31-32: Update the workflow documentation stanza that describes
SLEEP_DURATION to reflect the effective cap enforced by this workflow: change
the "max 72h" wording to state the 12h enforced timeout (or explicitly state
"max 72h, but capped to 12h in this workflow") so users aren't misled; reference
the SLEEP_DURATION environment variable and the cucushift-installer-wait usage
in the openshift-e2e-aws-ovn-serial-debug-workflow.yaml so it's clear the 12h
wait timeout applies to this workflow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 8094d9fc-3eb0-428f-8509-deb1148c4d8a

📥 Commits

Reviewing files that changed from the base of the PR and between a28f9e7 and 5b6df83.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift/release/openshift-release-main-periodics.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (3)
  • ci-operator/config/openshift/release/openshift-release-main__ci-4.22.yaml
  • ci-operator/step-registry/openshift/e2e/aws/ovn/serial-debug/openshift-e2e-aws-ovn-serial-debug-workflow.metadata.json
  • ci-operator/step-registry/openshift/e2e/aws/ovn/serial-debug/openshift-e2e-aws-ovn-serial-debug-workflow.yaml

Comment on lines +147 to +157
- as: e2e-aws-ovn-techpreview-serial-debug
interval: 168h
steps:
cluster_profile: openshift-org-aws
env:
FEATURE_SET: TechPreviewNoUpgrade
observers:
enable:
- observers-resource-watch
workflow: openshift-e2e-aws-ovn-serial-debug
timeout: 12h0m0s
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Confirm the new test stanza exists in ci-operator config
rg -n --iglob '*.yaml' 'as:\s*e2e-aws-ovn-techpreview-serial-debug' ci-operator/config

# Confirm generated Prow job manifests reference the new test/job name
if [ -d ci-operator/jobs ]; then
  rg -n --iglob '*.ya?ml' \
    'periodic-ci-openshift-release-main-ci-4\.22-e2e-aws-ovn-techpreview-serial-debug|e2e-aws-ovn-techpreview-serial-debug' \
    ci-operator/jobs
else
  echo "ci-operator/jobs directory is not present in this checkout."
fi

Repository: openshift/release

Length of output: 182


Run make update to generate downstream Prow job configuration for this new test.

The test entry e2e-aws-ovn-techpreview-serial-debug exists in the CI configuration, but the corresponding generated Prow job config was not found. Per the coding guidelines for ci-operator/config/**/*.yaml, after editing the configuration file, you must run make update to generate the downstream Prow job manifests in ci-operator/jobs/.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ci-operator/config/openshift/release/openshift-release-main__ci-4.22.yaml`
around lines 147 - 157, The CI config defines a new test entry
e2e-aws-ovn-techpreview-serial-debug (workflow:
openshift-e2e-aws-ovn-serial-debug) but the generated Prow job manifests are
missing; run make update locally to regenerate downstream job configs (which
will create/modify files under ci-operator/jobs/) and commit those generated
changes along with your YAML edit so the new test exists in both
ci-operator/config and the generated ci-operator/jobs outputs.

Comment on lines +31 to +32
- Includes cucushift-installer-wait step for extended debugging (up to 12 hours)
- Supports SLEEP_DURATION environment variable (default 2h, max 72h)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align debug-duration docs with the enforced 12h limit.

Line 32 says SLEEP_DURATION supports up to 72h, but Line 15 enforces a 12h wait timeout in this workflow. Please document the effective cap for this workflow to avoid failed expectations.

Suggested doc fix
-    - Includes cucushift-installer-wait step for extended debugging (up to 12 hours)
-    - Supports SLEEP_DURATION environment variable (default 2h, max 72h)
+    - Includes cucushift-installer-wait step for extended debugging (up to 12 hours in this workflow)
+    - Supports SLEEP_DURATION environment variable (default 2h; effective max 12h due to workflow timeout)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Includes cucushift-installer-wait step for extended debugging (up to 12 hours)
- Supports SLEEP_DURATION environment variable (default 2h, max 72h)
- Includes cucushift-installer-wait step for extended debugging (up to 12 hours in this workflow)
- Supports SLEEP_DURATION environment variable (default 2h; effective max 12h due to workflow timeout)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift/e2e/aws/ovn/serial-debug/openshift-e2e-aws-ovn-serial-debug-workflow.yaml`
around lines 31 - 32, Update the workflow documentation stanza that describes
SLEEP_DURATION to reflect the effective cap enforced by this workflow: change
the "max 72h" wording to state the 12h enforced timeout (or explicitly state
"max 72h, but capped to 12h in this workflow") so users aren't misled; reference
the SLEEP_DURATION environment variable and the cucushift-installer-wait usage
in the openshift-e2e-aws-ovn-serial-debug-workflow.yaml so it's clear the 12h
wait timeout applies to this workflow.

- Add OWNERS file required by step-registry-metadata check
- Update metadata.json format (auto-generated by make update)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@weliang1: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview-serial-debug N/A periodic Periodic changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@weliang1
Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview-serial-debug

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@weliang1: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@weliang1
Copy link
Copy Markdown
Contributor Author

/cancle

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 16, 2026

@weliang1: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview-serial-debug 9b8c1d8 link unknown /pj-rehearse periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview-serial-debug

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant