Skip to content

OCPBUGS-83389: fix(supportedversion): normalize OCP 5.x versions for skew and release validation#8225

Merged
openshift-merge-bot[bot] merged 2 commits intoopenshift:mainfrom
bryan-cox:fix-version-skew-5x
Apr 14, 2026
Merged

OCPBUGS-83389: fix(supportedversion): normalize OCP 5.x versions for skew and release validation#8225
openshift-merge-bot[bot] merged 2 commits intoopenshift:mainfrom
bryan-cox:fix-version-skew-5x

Conversation

@bryan-cox
Copy link
Copy Markdown
Member

@bryan-cox bryan-cox commented Apr 13, 2026

What this PR does / why we need it:

OCP 5.0 is equivalent to OCP 4.23 (dual versioning). ValidateVersionSkew rejects cross-major-version skew (e.g., HC=5.0 with NP=4.22) because it compares major versions directly, and IsValidReleaseVersion rejects 5.x releases when bounds are expressed in 4.x.

This PR adds normalizeToV4/denormalizeFromV4 helpers that map 5.x → 4.(23+x) before comparison and back for error messages. Both ValidateVersionSkew and IsValidReleaseVersion now use normalization so the n-3 skew policy and release bounds work correctly across the 4.x/5.x boundary.

Which issue(s) this PR fixes:

Fixes https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/77738/rehearse-77738-pull-ci-openshift-hypershift-main-e2e-aws/2043772424360562688

TestNodePoolPrevReleaseN1 fails with:

SupportedVersionSkew=True, got SupportedVersionSkew=False: UnsupportedSkew(NodePool major version 4 must match HostedCluster major version 5)

Special notes for your reviewer:

  • normalizeToV4 maps 5.x → 4.(23+x); denormalizeFromV4 converts back for error messages
  • The v5MinorOffset constant (23) encodes the 5.0 == 4.23 mapping
  • IsValidReleaseVersion is also normalized to handle 5.x release images against 4.x bounds

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Bug Fixes

    • Version validation now normalizes certain 5.x releases into 4.x equivalents for range and skew checks, relaxes strict major-mismatch rejection between HostedCluster and NodePool, and yields clearer, more specific version-compare and "latest supported" messages; validations also handle absent/current-version cases more consistently.
  • Tests

    • Added and expanded tests for normalization/denormalization, cross-major and patch/pre-release scenarios; updated assertions and expected error-message text to match clarified messages.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 13, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 13, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Removed pointer-based clamping and the maxInt64 helper; added normalization helpers normalizeToV4 and denormalizeFromV4 (using v5MinorOffset = 23) to map 5.x into a 4.x minor space. IsValidReleaseVersion, LookupLatestSupportedRelease, and ValidateVersionSkew now normalize versions before comparisons (including latest/minimum supported checks, pre-4.8 gate, y-stream and CNI-specific constraints, and n-3 skew), and report denormalized display values. Tests were extended for normalization/denormalization, cross-major equivalence, and parallel execution. getReleaseImage now passes a *semver.Version (possibly nil) for the current version.

🚥 Pre-merge checks | ✅ 9 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (9 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Stable And Deterministic Test Names ✅ Passed All test names in modified files are static and deterministic with no dynamic values, UUIDs, timestamps, or generated identifiers.
Test Structure And Quality ✅ Passed Tests follow standard Go testing best practices with single responsibilities, proper setup/cleanup, appropriate timeouts, and consistent patterns.
Microshift Test Compatibility ✅ Passed PR modifies standard Go unit tests using testing.T framework, not Ginkgo e2e tests. MicroShift compatibility check applies only to Ginkgo e2e tests.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests were added in this PR; all test files use standard Go testing.T framework unit tests.
Topology-Aware Scheduling Compatibility ✅ Passed PR contains only version validation and normalization logic changes with no topology-aware or scheduling-related modifications.
Ote Binary Stdout Contract ✅ Passed PR modifications do not introduce stdout writes in process-level code; all changes confined to business logic, helper utilities, and test code.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR modifies only standard Go unit tests with testing.T functions, not Ginkgo e2e tests. No IPv4 assumptions or external connectivity requirements found.
Title check ✅ Passed The title clearly and specifically describes the main change: normalizing OCP 5.x versions for version skew and release validation logic, which aligns with the primary refactoring across multiple files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Apr 13, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 13, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 13, 2026
@bryan-cox bryan-cox marked this pull request as ready for review April 13, 2026 22:12
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 13, 2026
@openshift-ci openshift-ci bot requested review from enxebre and jparrill April 13, 2026 22:13
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
support/supportedversion/version_test.go (1)

424-517: Add caller-level coverage for IsValidReleaseVersion.

These additions exercise the helper mapping and skew path, but the other changed normalization consumer still has no 5.x boundary cases. A few TestIsValidReleaseVersion scenarios for 5.0 ↔ 4.23, clamped maxSupportedVersion, and an unsupported future major would make this change much safer.

Also applies to: 519-682

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@support/supportedversion/version_test.go` around lines 424 - 517, Add unit
tests for IsValidReleaseVersion that cover the 5.x ↔ 4.23 normalization mapping
and boundary/skew behavior: create cases where input is "5.0.0" (expect valid
via normalizeToV4 → "4.23.0"), a 5.x with patch/prerelease to ensure
preservation, a case where maxSupportedVersion is clamped (set a lower max and
assert IsValidReleaseVersion respects it), and a case with an unsupported future
major (e.g., major > supported range) to assert it returns false; reference the
IsValidReleaseVersion function and the helper functions normalizeToV4 and
denormalizeFromV4 when adding these tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@support/supportedversion/version.go`:
- Around line 130-132: The error messages still include the original
maxSupportedVersion even after you clamp it, causing a mismatch between
validation and message; update the error paths to reference the clamped variable
normalizedMax (which is adjusted against normalizedLatest) instead of
maxSupportedVersion wherever the error string is constructed (the occurrences
around the normalizedMax.GT(normalizedLatest) block and the similar block at
lines ~158-160) so the returned error reflects the actual capped maximum.
- Around line 355-370: normalizeToV4 currently silently returns unknown majors
unchanged which can misclassify inputs like 6.0; change normalizeToV4 to return
(semver.Version, error) and validate majors: if Major==5 map to 4.(23+minor) as
before, if Major==4 return the version, otherwise return a zero value and a
descriptive error (e.g., "unsupported major version") so callers (e.g., places
that compare against minSupportedVersion) must handle the error instead of
assuming a 4.x outcome; update all call sites to propagate or handle this error
and remove the implicit major assumption in downstream comparisons.

---

Nitpick comments:
In `@support/supportedversion/version_test.go`:
- Around line 424-517: Add unit tests for IsValidReleaseVersion that cover the
5.x ↔ 4.23 normalization mapping and boundary/skew behavior: create cases where
input is "5.0.0" (expect valid via normalizeToV4 → "4.23.0"), a 5.x with
patch/prerelease to ensure preservation, a case where maxSupportedVersion is
clamped (set a lower max and assert IsValidReleaseVersion respects it), and a
case with an unsupported future major (e.g., major > supported range) to assert
it returns false; reference the IsValidReleaseVersion function and the helper
functions normalizeToV4 and denormalizeFromV4 when adding these tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 06c22e12-1f6f-47c1-ba4d-caec94a75fea

📥 Commits

Reviewing files that changed from the base of the PR and between dd2dae7 and f80df4d.

📒 Files selected for processing (2)
  • support/supportedversion/version.go
  • support/supportedversion/version_test.go

Comment thread support/supportedversion/version.go
Comment thread support/supportedversion/version.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
support/supportedversion/version_test.go (1)

477-477: Minor: Inconsistent Gomega initialization style.

The new tests use NewWithT(t) while the rest of the file uses NewGomegaWithT(t). Both are functionally equivalent aliases, but consistency would improve readability.

♻️ Optional: Align with existing style
-			g := NewWithT(t)
+			g := NewGomegaWithT(t)

Apply to lines 477, 527, and 687.

Also applies to: 527-527, 687-687

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@support/supportedversion/version_test.go` at line 477, Replace the
inconsistent Gomega initialization calls: locate uses of NewWithT(t) in the test
file (they appear near other Gomega setup code) and change them to
NewGomegaWithT(t) so they match the existing style; update each occurrence of
NewWithT(t) to NewGomegaWithT(t) (e.g., in the test functions that call
NewWithT) to keep the file consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@support/supportedversion/version_test.go`:
- Line 477: Replace the inconsistent Gomega initialization calls: locate uses of
NewWithT(t) in the test file (they appear near other Gomega setup code) and
change them to NewGomegaWithT(t) so they match the existing style; update each
occurrence of NewWithT(t) to NewGomegaWithT(t) (e.g., in the test functions that
call NewWithT) to keep the file consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 8930884b-9b68-4f93-a101-cc4bb5c5e1c5

📥 Commits

Reviewing files that changed from the base of the PR and between f80df4d and b63d6ab.

📒 Files selected for processing (2)
  • support/supportedversion/version.go
  • support/supportedversion/version_test.go

@openshift-ci openshift-ci bot added the area/cli Indicates the PR includes changes for CLI label Apr 14, 2026
@bryan-cox bryan-cox force-pushed the fix-version-skew-5x branch from a351710 to d76616e Compare April 14, 2026 00:12
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
support/supportedversion/version.go (1)

249-258: Normalize the lower bound here as well.

The upper bound now lives in the normalized 4.x space, but the lower bound still formats minSupportedVersion directly. That works for today’s 4.x minimums, yet the next time the minimum supported release moves into 5.x, this query will silently fall back to >=4.0 unless the lower side is normalized too. A small special-case for the existing 0.0.0 skip-validation sentinel keeps the current behavior intact.

♻️ Suggested direction
 	minSupportedVersion := GetMinSupportedVersion(hc)
+	normalizedMin := minSupportedVersion
+	if minSupportedVersion.Major != 0 {
+		var err error
+		normalizedMin, err = normalizeToV4(minSupportedVersion)
+		if err != nil {
+			return "", fmt.Errorf("failed to normalize minimum supported version: %w", err)
+		}
+	}
 
 	// Normalize LatestSupportedVersion to 4.x so the filter range is correct
 	// even when LatestSupportedVersion is 5.x (e.g. 5.0 -> 4.23).
 	normalizedLatest, err := normalizeToV4(LatestSupportedVersion)
 	if err != nil {
 		return "", fmt.Errorf("failed to normalize latest supported version: %w", err)
 	}
 
 	prefix := "https://multi.ocp.releases.ci.openshift.org/api/v1/releasestream/4-stable-multi/latest"
 	filter := fmt.Sprintf("in=>4.%d.%d+<+4.%d.0-a",
-		minSupportedVersion.Minor, minSupportedVersion.Patch, normalizedLatest.Minor+1)
+		normalizedMin.Minor, normalizedMin.Patch, normalizedLatest.Minor+1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@support/supportedversion/version.go` around lines 249 - 258, The lower bound
should be normalized to 4.x like the upper bound: call normalizeToV4 on
minSupportedVersion (e.g., produce normalizedMin via
normalizeToV4(minSupportedVersion)) and use normalizedMin.Minor and
normalizedMin.Patch when building filter; preserve the existing sentinel
behavior by bypassing normalization when minSupportedVersion equals the
skip-validation sentinel (0.0.0) so the old >=4.0 behavior remains. Update the
filter construction to reference normalizedMin instead of minSupportedVersion
and reuse normalizedLatest as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@support/supportedversion/version.go`:
- Around line 378-400: normalizeToV4 adds 5.x -> 4.(23+x) mapping but
GetKubeVersionForSupportedVersion still does an exact lookup against
ocpVersionToKubeVersion, so LatestSupportedVersion (5.0.0 → 4.23.0) is reported
as unknown; update ocpVersionToKubeVersion to include the 4.23.0 boundary
mapping to the correct Kubernetes version (add the 4.23.0 entry) so
GetKubeVersionForSupportedVersion can resolve normalized 5.0/4.23 targets;
verify behavior using LatestSupportedVersion and the
GetKubeVersionForSupportedVersion function after the table update.

---

Nitpick comments:
In `@support/supportedversion/version.go`:
- Around line 249-258: The lower bound should be normalized to 4.x like the
upper bound: call normalizeToV4 on minSupportedVersion (e.g., produce
normalizedMin via normalizeToV4(minSupportedVersion)) and use
normalizedMin.Minor and normalizedMin.Patch when building filter; preserve the
existing sentinel behavior by bypassing normalization when minSupportedVersion
equals the skip-validation sentinel (0.0.0) so the old >=4.0 behavior remains.
Update the filter construction to reference normalizedMin instead of
minSupportedVersion and reuse normalizedLatest as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: fa16514e-c2ed-4a84-aae6-4f02ff345490

📥 Commits

Reviewing files that changed from the base of the PR and between d76616e and d64fef1.

📒 Files selected for processing (3)
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • support/supportedversion/version.go
  • support/supportedversion/version_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • support/supportedversion/version_test.go

Comment thread support/supportedversion/version.go
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 64.21053% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.65%. Comparing base (3dbb066) to head (55c6fca).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
support/supportedversion/version.go 62.63% 26 Missing and 8 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8225      +/-   ##
==========================================
+ Coverage   34.63%   34.65%   +0.01%     
==========================================
  Files         767      767              
  Lines       93200    93256      +56     
==========================================
+ Hits        32282    32318      +36     
- Misses      58243    58259      +16     
- Partials     2675     2679       +4     
Files with missing lines Coverage Δ
...erator/controllers/nodepool/nodepool_controller.go 39.65% <100.00%> (+0.07%) ⬆️
support/supportedversion/version.go 60.96% <62.63%> (+0.52%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-aws

@sjenning
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 14, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented Apr 14, 2026

Test Results

e2e-aws

e2e-aks

Failed Tests

Total failed tests: 9

  • TestAutoscaling
  • TestAzureScheduler
  • TestCreateCluster
  • TestCreateClusterCustomConfig
  • TestCreateClusterDefaultSecurityContextUID

... and 4 more failed tests

@sjenning
Copy link
Copy Markdown
Contributor

/pipeline required

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@sjenning
Copy link
Copy Markdown
Contributor

/verified by e2e tests

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aks | Build: 2043963489055150080 | Cost: $1.6794513499999995

Failed tests
  • TestAzureScheduler
  • TestUpgradeControlPlane
  • TestAutoscaling
  • TestCreateCluster
  • TestCreateClusterCustomConfig

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 14, 2026
…dation

Update TestValidMinorVersionCompatibility to expect the new error
format from ValidateVersionSkew when a 5.x NodePool is created
against a 4.x HostedCluster.

Signed-off-by: Bryan Cox <brcox@redhat.com>
Commit-Message-Assisted-by: Claude (via Claude Code)
…ease validation

OCP uses dual versioning where 5.0 == 4.23 (both succeed 4.22).
ValidateVersionSkew previously rejected cross-major-version skew
(e.g. HC=5.0, NP=4.22) with a major-version-mismatch error,
breaking CI tests like TestNodePoolPrevReleaseN1.

- Add normalizeToV4/denormalizeFromV4 helpers that map 5.x to
  4.(23+x) for consistent comparison across the 4.x/5.x boundary
- Rewrite ValidateVersionSkew and IsValidReleaseVersion to normalize
  before comparing, preserving original version numbers in errors
- Fix NodePool caller to pass nil instead of &semver.Version{} when
  no current version exists, matching hostedcluster and karpenter
- Normalize input in GetKubeVersionForSupportedVersion so 5.0
  resolves via the 4.23 lookup table; add 4.23.0 -> 1.36.0 mapping
- Fix LookupLatestSupportedRelease filter for 5.x by normalizing
  LatestSupportedVersion before building URL bounds
- Fix uint64 underflow in subtractMinor, remove dead maxInt64 helper
- Hoist repeated semver.MustParse constants to package-level vars
- Add comprehensive tests covering cross-major-version scenarios,
  normalization edge cases, and IsValidReleaseVersion 5.x boundary

Signed-off-by: Bryan Cox <brcox@redhat.com>
Commit-Message-Assisted-by: Claude (via Claude Code)
@bryan-cox bryan-cox force-pushed the fix-version-skew-5x branch from 057042f to 55c6fca Compare April 14, 2026 09:39
@enxebre
Copy link
Copy Markdown
Member

enxebre commented Apr 14, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 14, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@bryan-cox
Copy link
Copy Markdown
Member Author

/verified by e2e

@openshift-ci-robot
Copy link
Copy Markdown

@bryan-cox: This PR has been marked as verified by e2e.

Details

In response to this:

/verified by e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@bryan-cox bryan-cox changed the title fix(supportedversion): normalize OCP 5.x versions for skew and release validation OCPBUGS-83389: fix(supportedversion): normalize OCP 5.x versions for skew and release validation Apr 14, 2026
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 14, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@bryan-cox: This pull request references Jira Issue OCPBUGS-83389, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What this PR does / why we need it:

OCP 5.0 is equivalent to OCP 4.23 (dual versioning). ValidateVersionSkew rejects cross-major-version skew (e.g., HC=5.0 with NP=4.22) because it compares major versions directly, and IsValidReleaseVersion rejects 5.x releases when bounds are expressed in 4.x.

This PR adds normalizeToV4/denormalizeFromV4 helpers that map 5.x → 4.(23+x) before comparison and back for error messages. Both ValidateVersionSkew and IsValidReleaseVersion now use normalization so the n-3 skew policy and release bounds work correctly across the 4.x/5.x boundary.

Which issue(s) this PR fixes:

Fixes https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/77738/rehearse-77738-pull-ci-openshift-hypershift-main-e2e-aws/2043772424360562688

TestNodePoolPrevReleaseN1 fails with:

SupportedVersionSkew=True, got SupportedVersionSkew=False: UnsupportedSkew(NodePool major version 4 must match HostedCluster major version 5)

Special notes for your reviewer:

  • normalizeToV4 maps 5.x → 4.(23+x); denormalizeFromV4 converts back for error messages
  • The v5MinorOffset constant (23) encodes the 5.0 == 4.23 mapping
  • IsValidReleaseVersion is also normalized to handle 5.x release images against 4.x bounds

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Bug Fixes

  • Version validation now normalizes certain 5.x releases into 4.x equivalents for range and skew checks, relaxes strict major-mismatch rejection between HostedCluster and NodePool, and yields clearer, more specific version-compare and "latest supported" messages; validations also handle absent/current-version cases more consistently.

  • Tests

  • Added and expanded tests for normalization/denormalization, cross-major and patch/pre-release scenarios; updated assertions and expected error-message text to match clarified messages.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-aks

1 similar comment
@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-aks

@bryan-cox
Copy link
Copy Markdown
Member Author

/test e2e-aks-4-22

Hoping it was a flake because the issue that happened should not happen

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aks | Build: 2043998830898188288 | Cost: $1.7423763000000003 | Failed step: hypershift-azure-run-e2e

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@hypershift-jira-solve-ci
Copy link
Copy Markdown

I now have all the evidence needed. Both failures are CI infrastructure issues completely unrelated to the PR. Let me produce the final report.

Test Failure Analysis Complete

Job Information

Job 1: e2e-aks-4-22

  • Prow Job: pull-ci-openshift-hypershift-main-e2e-aks-4-22
  • Build ID: 2044013315859943424
  • Target: e2e-aks-4-22
  • Result: CI infrastructure failure (release import)

Job 2: e2e-aws-4-22

  • Prow Job: pull-ci-openshift-hypershift-main-e2e-aws-4-22
  • Build ID: 2043989175996256256
  • Target: e2e-aws-4-22
  • Result: 1 test failure out of 426 (AWS API throttling)

Test Failure Analysis

Error

Job 1 (e2e-aks-4-22):
step [release:latest-418] failed: failed to get CLI image: unable to wait for the 'cli'
image in the stable stream to populate: failed to import tag(s) [cli] on image stream
ci-op-gxjkzd9y/stable-latest-418 because of missing definition in the spec

Job 2 (e2e-aws-4-22):
TestCreateCluster — failed to create cluster, tearing down: failed to create infra:
cannot create VPC S3 endpoint: operation error EC2: CreateVpcEndpoint, exceeded maximum
number of attempts, 11, https response error StatusCode: 503, RequestID: 2013a3d8-...,
api error RequestLimitExceeded: Request limit exceeded. Account 820196288204 has been
throttled on ec2:CreateVpcEndpoint because it exceeded its request rate limit.

Summary

Both failures are CI infrastructure issues completely unrelated to the PR's code changes. Job 1 (e2e-aks-4-22) failed during release import setup — ci-operator could not populate the stable-latest-418 imagestream for OCP 4.18. The full release mirror pod (release-images-latest-418) was never created, leaving only 1 tag (cli) vs the 194–197 tags successfully imported for OCP 4.19–4.22. No test code from the PR was ever executed. Job 2 (e2e-aws-4-22) ran 426 tests with 400 passing and 25 skipped; the sole failure was TestCreateCluster, which failed during AWS infrastructure provisioning when the CI account hit EC2 API rate limits on CreateVpcEndpoint (HTTP 503 after 11 retries). The PR modifies version normalization logic in supportedversion, which has no relationship to release image imports or AWS VPC endpoint creation.

Root Cause

Job 1 (e2e-aks-4-22) — OCP 4.18 Release Import Failure:

ci-operator failed to fully import the OCP 4.18 release payload into the stable-latest-418 imagestream. The sequence was:

  1. ci-operator resolved latest-418 to registry.ci.openshift.org/ocp/release:4.18.0-0.ci-2026-04-13-223642
  2. It imported the release tag into the release:latest-418 imagestream (with retry conflicts — 4 attempts before success at 11:22:44)
  3. It ran the CLI extraction pod release-images-latest-418-cli, which succeeded in 2 seconds
  4. However, the full release mirror pod release-images-latest-418 was never created — unlike 4.19/4.20/4.21/4.22 which all had their mirror pods run successfully
  5. As a result, stable-latest-418 had only 1 spec tag (cli) vs 194–197 tags for all other versions
  6. ci-operator then failed after ~21 minutes waiting for the cli tag to populate correctly in the stable stream

The fact that ci-operator encountered "Unable to create image stream import up to conflicts" 3 times before succeeding suggests contention or an issue with the OCP 4.18 release image. The full mirror pod was never scheduled, indicating ci-operator's step graph determined it couldn't proceed after the conflicts. This is a ci-operator infrastructure issue, likely related to the OCP 4.18 CI release stream health.

Job 2 (e2e-aws-4-22) — AWS API Rate Limiting:

TestCreateCluster failed at hypershift_framework.go:501 during infrastructure creation. The AWS account 820196288204 was throttled on ec2:CreateVpcEndpoint — the operation was retried 11 times, receiving HTTP 503 each time. This is a transient AWS rate-limiting issue affecting the shared CI account, completely unrelated to any code changes. All other 400 tests passed successfully.

Recommendations
  1. Retest both jobs — Both failures are transient infrastructure issues. Use /retest or individually trigger /test e2e-aks-4-22 and /test e2e-aws-4-22.

  2. No code changes needed — The PR's supportedversion normalization changes are not exercised by either failure path. The AKS job never reached test execution, and the AWS job failure occurred during AWS infrastructure provisioning.

  3. If e2e-aks-4-22 fails again on 4.18 import — This may indicate a systemic issue with the OCP 4.18 CI release stream. Consider filing a ticket with the Test Platform team or checking if OCP 4.18 CI releases are still being produced (4.18 may be approaching or past EOL).

Evidence
Evidence Detail
AKS job failure step [release:latest-418] — Import the release payload "latest-418" from an external source
AKS stable-latest-418 tags 1 spec tag (cli only) vs 194–197 for 4.19/4.20/4.21/4.22
AKS full mirror pod release-images-latest-418 never created; release-images-latest-419/420/421/422 all ran successfully
AKS import conflicts 4 retries on release tag import with "Unable to create image stream import up to conflicts"
AKS failure timing Created 11:22:34 → error at 11:44:14 (21 min timeout)
AWS failed test TestCreateCluster — 1 of 426 tests (400 passed, 25 skipped)
AWS error source hypershift_framework.go:501 → AWS EC2 API CreateVpcEndpoint
AWS HTTP status 503 Service Unavailable after 11 retry attempts
AWS throttle reason RequestLimitExceeded on account 820196288204
PR scope supportedversion package — version normalization logic for OCP 5.x
PR relevance None — neither failure path exercises version normalization code

@sjenning
Copy link
Copy Markdown
Contributor

/override ci/prow/e2e-aks ci/prow/e2e-aks-4-22 ci/prow/e2e-aws-4-22

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 14, 2026

@sjenning: Overrode contexts on behalf of sjenning: ci/prow/e2e-aks, ci/prow/e2e-aks-4-22, ci/prow/e2e-aws-4-22

Details

In response to this:

/override ci/prow/e2e-aks ci/prow/e2e-aks-4-22 ci/prow/e2e-aws-4-22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 14, 2026

@bryan-cox: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@sjenning
Copy link
Copy Markdown
Contributor

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 14, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@sjenning: This pull request references Jira Issue OCPBUGS-83389, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot bot merged commit f2f826b into openshift:main Apr 14, 2026
31 checks passed
@openshift-ci-robot
Copy link
Copy Markdown

@bryan-cox: Jira Issue Verification Checks: Jira Issue OCPBUGS-83389
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-83389 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Details

In response to this:

What this PR does / why we need it:

OCP 5.0 is equivalent to OCP 4.23 (dual versioning). ValidateVersionSkew rejects cross-major-version skew (e.g., HC=5.0 with NP=4.22) because it compares major versions directly, and IsValidReleaseVersion rejects 5.x releases when bounds are expressed in 4.x.

This PR adds normalizeToV4/denormalizeFromV4 helpers that map 5.x → 4.(23+x) before comparison and back for error messages. Both ValidateVersionSkew and IsValidReleaseVersion now use normalization so the n-3 skew policy and release bounds work correctly across the 4.x/5.x boundary.

Which issue(s) this PR fixes:

Fixes https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/77738/rehearse-77738-pull-ci-openshift-hypershift-main-e2e-aws/2043772424360562688

TestNodePoolPrevReleaseN1 fails with:

SupportedVersionSkew=True, got SupportedVersionSkew=False: UnsupportedSkew(NodePool major version 4 must match HostedCluster major version 5)

Special notes for your reviewer:

  • normalizeToV4 maps 5.x → 4.(23+x); denormalizeFromV4 converts back for error messages
  • The v5MinorOffset constant (23) encodes the 5.0 == 4.23 mapping
  • IsValidReleaseVersion is also normalized to handle 5.x release images against 4.x bounds

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Bug Fixes

  • Version validation now normalizes certain 5.x releases into 4.x equivalents for range and skew checks, relaxes strict major-mismatch rejection between HostedCluster and NodePool, and yields clearer, more specific version-compare and "latest supported" messages; validations also handle absent/current-version cases more consistently.

  • Tests

  • Added and expanded tests for normalization/denormalization, cross-major and patch/pre-release scenarios; updated assertions and expected error-message text to match clarified messages.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants