Skip to content

CNTRLPLANE-3509: Remove n2/n3/n4 minor release NodePool skew e2e tests#8570

Draft
jparrill wants to merge 1 commit into
openshift:mainfrom
jparrill:CNTRLPLANE-3509
Draft

CNTRLPLANE-3509: Remove n2/n3/n4 minor release NodePool skew e2e tests#8570
jparrill wants to merge 1 commit into
openshift:mainfrom
jparrill:CNTRLPLANE-3509

Conversation

@jparrill
Copy link
Copy Markdown
Contributor

@jparrill jparrill commented May 21, 2026

Summary

  • Remove TestNodePoolPrevReleaseN2, TestNodePoolPrevReleaseN3, and TestNodePoolPrevReleaseN4 e2e test cases along with their CLI flags and option fields
  • Keep only TestNodePoolPrevReleaseN1 (n-1) as the single e2e validation for NodePool version skew

Why

The e2e-aws job defines 6+ releases (latest, initial, n1minor through n4minor). ci-operator imports them concurrently, creating CLI extraction pods in parallel. This triggers a race condition in openshift/ci-tools (pkg/steps/release/import_release.go) where pods disappear or hit UID mismatches before their termination messages can be read. This accounts for ~40-45% of all e2e-aws failures — tests never even execute.

The removed tests are either redundant or already covered by unit tests:

  • N4 (n-4, unsupported skew): Only validates a condition is set to False. Already covered by TestValidateVersionSkew in support/supportedversion/version_test.go
  • N3 (n-3): Creates a real node to validate labels/taints — identical to what N1 already covers
  • N2 (n-2): Same code path as N1 with a different version number

Impact

  • Reduces release imports from 6 to 3, cutting CI time by ~15-30 min per run
  • Significantly reduces exposure to the ci-tools race condition
  • No loss of test coverage: version skew validation is covered by unit tests, labels/taints by N1

A companion PR in openshift/release will remove the corresponding n2minor, n3minor, and n4minor release definitions and dependencies.

Jira: https://issues.redhat.com/browse/CNTRLPLANE-3509

Test plan

  • Verify go build ./test/e2e/... compiles
  • Verify TestValidateVersionSkew unit tests pass (21 cases covering n-3 boundary and n-4 rejection)
  • Verify no remaining references to N2/N3/N4 in test code
  • e2e-aws job passes with reduced release set (after companion openshift/release PR)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests
    • Simplified end-to-end test suite by removing test cases that targeted older minor releases (N2–N4).
    • Removed corresponding CLI/options for those older minor-release image selections, leaving only the latest and N1 minor-release paths.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 21, 2026

@jparrill: This pull request references CNTRLPLANE-3509 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Remove TestNodePoolPrevReleaseN2, TestNodePoolPrevReleaseN3, and TestNodePoolPrevReleaseN4 e2e test cases along with their CLI flags and option fields
  • Keep only TestNodePoolPrevReleaseN1 (n-1) as the single e2e validation for NodePool version skew

Why

The e2e-aws job defines 6+ releases (latest, initial, n1minor through n4minor). ci-operator imports them concurrently, creating CLI extraction pods in parallel. This triggers a race condition in openshift/ci-tools (pkg/steps/release/import_release.go) where pods disappear or hit UID mismatches before their termination messages can be read. This accounts for ~40-45% of all e2e-aws failures — tests never even execute.

The removed tests are either redundant or already covered by unit tests:

  • N4 (n-4, unsupported skew): Only validates a condition is set to False. Already covered by TestValidateVersionSkew in support/supportedversion/version_test.go
  • N3 (n-3): Creates a real node to validate labels/taints — identical to what N1 already covers
  • N2 (n-2): Same code path as N1 with a different version number

Impact

  • Reduces release imports from 6 to 3, cutting CI time by ~15-30 min per run
  • Significantly reduces exposure to the ci-tools race condition
  • No loss of test coverage: version skew validation is covered by unit tests, labels/taints by N1

A companion PR in openshift/release will remove the corresponding n2minor, n3minor, and n4minor release definitions and dependencies.

Jira: https://issues.redhat.com/browse/CNTRLPLANE-3509

Test plan

  • Verify go build ./test/e2e/... compiles
  • Verify TestValidateVersionSkew unit tests pass (21 cases covering n-3 boundary and n-4 rejection)
  • Verify no remaining references to N2/N3/N4 in test code
  • e2e-aws job passes with reduced release set (after companion openshift/release PR)

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1b2a830e-884d-4c46-9c4d-c12783dde4ca

📥 Commits

Reviewing files that changed from the base of the PR and between 8ba8dee and 5d681ab.

📒 Files selected for processing (3)
  • test/e2e/e2e_test.go
  • test/e2e/nodepool_test.go
  • test/e2e/util/options.go
💤 Files with no reviewable changes (3)
  • test/e2e/nodepool_test.go
  • test/e2e/util/options.go
  • test/e2e/e2e_test.go

📝 Walkthrough

Walkthrough

This PR removes test support for older minor releases (N2, N3, N4) across the e2e test infrastructure. The Options struct is updated to remove N2MinorReleaseImage, N3MinorReleaseImage, and N4MinorReleaseImage fields. The TestMain function is updated to remove CLI flag registrations for these three older release images and adds new flags for pull secret and SSH key files. Related test cases (TestNodePoolPrevReleaseN2, TestNodePoolPrevReleaseN3, TestNodePoolPrevReleaseN4) are removed from the test case list, keeping only TestNodePoolPrevReleaseN1.

Suggested reviewers

  • muraee
  • cblecker
🚥 Pre-merge checks | ✅ 12
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: removing n2/n3/n4 minor release NodePool skew e2e tests, which aligns with all three files modified.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names use static string literals without dynamic values. Removed tests (N2/N3/N4) and retained tests have stable, deterministic names that don't change between runs.
Test Structure And Quality ✅ Passed Remaining tests meet all quality criteria: single responsibility, proper cleanup with defer, timeouts via EventuallyObject/wait.Poll, descriptive assertions, and code consistency across test suite.
Microshift Test Compatibility ✅ Passed PR removes test cases and configurations, not adding new Ginkgo e2e tests. Modified files (test/e2e/*.go) use custom test patterns, not Ginkgo. Check does not apply.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR removes existing test cases, does not add new Ginkgo e2e tests; uses Go testing.T framework, not Ginkgo. Custom check not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only test files and test utilities (test/e2e/*), not deployment manifests, operator code, or controllers. No scheduling constraints introduced.
Ote Binary Stdout Contract ✅ Passed The PR contains test removals only with no new stdout writes in process-level code. Logging is correctly configured via zap to stderr, with no fmt.Print/klog violations detected in TestMain/main/init.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR removes test cases and configurations, not adding any new Ginkgo e2e tests. The custom check applies only to newly added tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added do-not-merge/needs-area do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels May 21, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@jparrill
Copy link
Copy Markdown
Contributor Author

/area ci-tooling

@jparrill
Copy link
Copy Markdown
Contributor Author

/area testing

@openshift-ci openshift-ci Bot added area/testing Indicates the PR includes changes for e2e testing area/ci-tooling Indicates the PR includes changes for CI or tooling and removed do-not-merge/needs-area labels May 21, 2026
The e2e-aws job defines 6+ releases (latest, initial, n1minor through
n4minor). ci-operator imports them concurrently, creating CLI extraction
pods in parallel. This triggers a race condition in ci-tools where pods
disappear or hit UID mismatches, causing ~40-45% of e2e-aws runs to
fail before any test executes.

Remove the n2, n3, and n4 minor release tests because:

- N4 (n-4, unsupported skew): only validates that the
  SupportedVersionSkew condition is set to False. This is already
  fully covered by TestValidateVersionSkew unit tests in
  support/supportedversion/version_test.go.

- N3 (n-3): creates a real AWS node to validate labels/taints, but
  this is identical to what N1 already covers. Labels and taints
  depend on the NodePool spec, not the release version.

- N2 (n-2): same code path as N1, just a different version number.
  No hypershift logic behaves differently for n-1 vs n-2.

Keeping only N1 (n-1) is sufficient to validate that nodes with an
older release image can provision and join the cluster. This reduces
the job from 6 to 3 release imports, cutting CI time by ~15-30 min
and significantly reducing exposure to the ci-tools race condition.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@hypershift-jira-solve-ci
Copy link
Copy Markdown

I have all the information needed. The root cause is clear.

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

1: CT1 Title does not start with one of fix, feat, chore, docs, style, refactor, perf, test, revert, ci, build: "e2e: remove n2/n3/n4 minor release NodePool skew tests"
make: *** [Makefile:599: run-gitlint] Error 1

Summary

The gitlint check failed because the commit message title e2e: remove n2/n3/n4 minor release NodePool skew tests uses the prefix e2e:, which is not in the repository's allowed list of Conventional Commits type prefixes. The .gitlint config enforces the contrib-title-conventional-commits rule, which requires the title to start with one of: fix, feat, chore, docs, style, refactor, perf, test, revert, ci, build.

Root Cause

The commit message title uses the non-standard prefix e2e: instead of a Conventional Commits-compliant prefix. The repository's .gitlint configuration (at repo root) enables the contrib-title-conventional-commits rule with an explicit allow-list of types: fix,feat,chore,docs,style,refactor,perf,test,revert,ci,build. The prefix e2e is not on this list.

The correct prefix for this change would be test: (since the PR removes e2e test code) or chore: (since it removes obsolete tests). The Conventional Commits spec requires the format <type>[optional scope]: <description>, so a scoped form like test(e2e): would also be valid if the author wants to retain the e2e context.

Recommendations

Amend the commit message to use a valid Conventional Commits prefix. Suggested options:

  1. test: remove n2/n3/n4 minor release NodePool skew tests — simplest fix, uses the test type since this PR removes test code.
  2. test(e2e): remove n2/n3/n4 minor release NodePool skew tests — preserves the e2e scope context while complying with the format.
  3. chore: remove n2/n3/n4 minor release NodePool skew e2e tests — if the author considers this a maintenance/cleanup task.

To fix:

git commit --amend -m "test: remove n2/n3/n4 minor release NodePool skew tests"
git push --force-with-lease
Evidence
Evidence Detail
Commit title e2e: remove n2/n3/n4 minor release NodePool skew tests
Gitlint rule violated CT1 (contrib-title-conventional-commits)
Allowed prefixes fix, feat, chore, docs, style, refactor, perf, test, revert, ci, build
Config file .gitlint at repo root enables contrib=contrib-title-conventional-commits
Prefix used e2e: — not in the allowed list
Job log make run-gitlint exited with code 1 → GitHub Actions exit code 2

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jparrill

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 21, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.40%. Comparing base (36dfb1b) to head (5d681ab).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8570   +/-   ##
=======================================
  Coverage   40.40%   40.40%           
=======================================
  Files         755      755           
  Lines       93235    93235           
=======================================
  Hits        37675    37675           
  Misses      52858    52858           
  Partials     2702     2702           
Flag Coverage Δ
cmd-support 34.44% <ø> (ø)
cpo-hostedcontrolplane 41.76% <ø> (ø)
cpo-other 40.31% <ø> (ø)
hypershift-operator 50.72% <ø> (ø)
other 31.54% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ci-tooling Indicates the PR includes changes for CI or tooling area/testing Indicates the PR includes changes for e2e testing do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants