Skip to content

feat(gcp): GCP-504 NodePool Platform Conditions#8917

Open
thetechnick wants to merge 1 commit into
openshift:mainfrom
thetechnick:gcp-504-nodepool-platform-conditions
Open

feat(gcp): GCP-504 NodePool Platform Conditions#8917
thetechnick wants to merge 1 commit into
openshift:mainfrom
thetechnick:gcp-504-nodepool-platform-conditions

Conversation

@thetechnick

@thetechnick thetechnick commented Jul 3, 2026

Copy link
Copy Markdown

What this PR does / why we need it:

Add GCP-specific conditions to NodePool status, to allow users to diagnose image resolution failures.

This functionality is modeled after the AWS implementation.

Which issue(s) this PR fixes:

Fixes #GCP-504

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • New Features

    • Enhanced status reporting for GCP-based node pools, indicating whether the platform image type is valid and ready.
  • Bug Fixes

    • Improved reporting for GCP node pool validation failures when required platform configuration or image resolution is unavailable.
    • Refined network CIDR conflict messaging to present overlapping details more clearly.
  • Tests

    • Added coverage for GCP node pool platform image condition behavior across success and failure scenarios.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 3, 2026
@openshift-ci

openshift-ci Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added do-not-merge/needs-area area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/gcp PR/issue for GCP (GCPPlatform) platform labels Jul 3, 2026
@openshift-ci

openshift-ci Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: thetechnick
Once this PR has been reviewed and has the lgtm label, please assign devguyio for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This change adds GCP-specific NodePool condition handling. setPlatformConditions now routes GCP NodePools to a new setGCPConditions helper, which checks the HostedCluster GCP platform config, resolves the GCP machine image, and sets NodePoolValidPlatformImageType to true or false with the corresponding reason and message. A small CIDR conflict message aggregation loop was also adjusted. Unit tests cover the new GCP condition behavior.

Sequence Diagram(s)

sequenceDiagram
  participant NodePoolReconciler
  participant HostedCluster
  participant resolveGCPImage

  NodePoolReconciler->>NodePoolReconciler: setPlatformConditions(GCPPlatform)
  NodePoolReconciler->>HostedCluster: check Spec.Platform.GCP
  alt HostedCluster GCP config missing
    NodePoolReconciler-->>NodePoolReconciler: return error
  else HostedCluster GCP config present
    NodePoolReconciler->>resolveGCPImage: resolveGCPImage(...)
    alt resolution fails
      resolveGCPImage-->>NodePoolReconciler: error
      NodePoolReconciler->>NodePoolReconciler: set NodePoolValidPlatformImageType=False
    else resolution succeeds
      resolveGCPImage-->>NodePoolReconciler: image
      NodePoolReconciler->>NodePoolReconciler: set NodePoolValidPlatformImageType=True
    end
  end
Loading

Compact metadata

  • Type: Feature addition, minor logic adjustment
  • Files changed: 3
  • Lines changed: +156/-1

Related issues: None specified

Related PRs: None specified

Suggested labels: gcp, nodepool, conditions, tests

Suggested reviewers: None specified

Poem
GCP paths now light the way,
Conditions speak in true or nay,
CIDR text is trimmed with care,
Tests confirm the state is there.

🚥 Pre-merge checks | ✅ 11
✅ Passed checks (11 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The new/updated test titles in gcp_test.go are static strings; no dynamic IDs, timestamps, node/namespace names, or other changing values were found.
Test Structure And Quality ✅ Passed The new table-driven unit test is single-purpose, has no cluster resources or waits, and follows existing nodepool test patterns.
Topology-Aware Scheduling Compatibility ✅ Passed Changed files only add GCP NodePool condition handling and machine-template validation; no pod affinity, nodeSelector, replica, PDB, or topology-spread logic.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR only adds unit tests and controller logic; no new Ginkgo e2e tests, IPv4 hardcodes, or external connectivity requirements were added.
No-Weak-Crypto ✅ Passed Touched GCP condition code and tests contain no MD5/SHA1/DES/RC4/ECB, custom crypto, or secret comparisons.
Container-Privileges ✅ Passed No privileged/container-manifest settings were added; the touched files are NodePool Go logic/tests, and repo search found no matching privilege flags.
No-Sensitive-Data-In-Logs ✅ Passed No new logging was added; the GCP path only sets status conditions and returns errors, and the existing logs in conditions.go don't emit secrets or tokens.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the PR’s main change: adding GCP-specific NodePool platform conditions.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
hypershift-operator/controllers/nodepool/gcp.go (1)

32-49: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Optional: avoid if/else after an early return.

Since the if branch already returns, the else block can be flattened for readability (common Go idiom / lint preference, e.g. revive's indent-error-flow).

♻️ Proposed refactor
-	if img, err := resolveGCPImage(nodePool, releaseImage); err != nil {
-		SetStatusCondition(&nodePool.Status.Conditions, hyperv1.NodePoolCondition{
-			Type:               hyperv1.NodePoolValidPlatformImageType,
-			Status:             corev1.ConditionFalse,
-			Reason:             hyperv1.NodePoolValidationFailedReason,
-			Message:            fmt.Sprintf("Couldn't discover a GCP machine image for release image %q: %s", nodePool.Spec.Release.Image, err.Error()),
-			ObservedGeneration: nodePool.Generation,
-		})
-		return fmt.Errorf("couldn't discover a GCP machine image for release image: %w", err)
-	} else {
-		SetStatusCondition(&nodePool.Status.Conditions, hyperv1.NodePoolCondition{
-			Type:               hyperv1.NodePoolValidPlatformImageType,
-			Status:             corev1.ConditionTrue,
-			Reason:             hyperv1.AsExpectedReason,
-			Message:            fmt.Sprintf("Bootstrap GCP machine image is %q", img),
-			ObservedGeneration: nodePool.Generation,
-		})
-	}
-	return nil
+	img, err := resolveGCPImage(nodePool, releaseImage)
+	if err != nil {
+		SetStatusCondition(&nodePool.Status.Conditions, hyperv1.NodePoolCondition{
+			Type:               hyperv1.NodePoolValidPlatformImageType,
+			Status:             corev1.ConditionFalse,
+			Reason:             hyperv1.NodePoolValidationFailedReason,
+			Message:            fmt.Sprintf("Couldn't discover a GCP machine image for release image %q: %s", nodePool.Spec.Release.Image, err.Error()),
+			ObservedGeneration: nodePool.Generation,
+		})
+		return fmt.Errorf("couldn't discover a GCP machine image for release image: %w", err)
+	}
+	SetStatusCondition(&nodePool.Status.Conditions, hyperv1.NodePoolCondition{
+		Type:               hyperv1.NodePoolValidPlatformImageType,
+		Status:             corev1.ConditionTrue,
+		Reason:             hyperv1.AsExpectedReason,
+		Message:            fmt.Sprintf("Bootstrap GCP machine image is %q", img),
+		ObservedGeneration: nodePool.Generation,
+	})
+	return nil
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/gcp.go` around lines 32 - 49,
Flatten the conditional in the GCP image resolution flow by removing the
unnecessary else after the early return in the resolveGCPImage/nodePool status
update block. Keep the error-handling branch in the existing if, return
immediately on failure, and move the successful SetStatusCondition call for
NodePoolValidPlatformImageType out of the else so the logic is easier to read
and matches Go idioms.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@hypershift-operator/controllers/nodepool/gcp_test.go`:
- Around line 791-907: Add a test case in
TestNodePoolReconciler_setGCPConditions that covers a GCP NodePool where
nodePool.Spec.Platform.Type is GCPPlatform but nodePool.Spec.Platform.GCP is nil
while HostedCluster.Spec.Platform.GCP is present. Update the test table and
check function to verify setGCPConditions does not panic and returns the
expected error/condition behavior for this nil GCP config path. Use the existing
NodePoolReconciler.setGCPConditions, NodePoolValidPlatformImageType, and
HostedCluster/NodePool fixture patterns to keep the case aligned with the other
scenarios.

In `@hypershift-operator/controllers/nodepool/gcp.go`:
- Around line 20-51: The setGCPConditions flow in NodePoolReconciler only guards
hcluster.Spec.Platform.GCP, but resolveGCPImage assumes
nodePool.Spec.Platform.GCP is present and can panic when it is nil. Add a nil
check for nodePool.Spec.Platform.GCP before calling resolveGCPImage, mirroring
the HostedCluster validation, and return/set a NodePoolValidPlatformImageType
failure condition with a clear message if the GCP platform config is missing.

---

Nitpick comments:
In `@hypershift-operator/controllers/nodepool/gcp.go`:
- Around line 32-49: Flatten the conditional in the GCP image resolution flow by
removing the unnecessary else after the early return in the
resolveGCPImage/nodePool status update block. Keep the error-handling branch in
the existing if, return immediately on failure, and move the successful
SetStatusCondition call for NodePoolValidPlatformImageType out of the else so
the logic is easier to read and matches Go idioms.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: e04c8126-9e2d-4471-9b24-d70185edc364

📥 Commits

Reviewing files that changed from the base of the PR and between df4e94a and 36ef0b4.

📒 Files selected for processing (3)
  • hypershift-operator/controllers/nodepool/conditions.go
  • hypershift-operator/controllers/nodepool/gcp.go
  • hypershift-operator/controllers/nodepool/gcp_test.go

Comment thread hypershift-operator/controllers/nodepool/gcp_test.go
Comment thread hypershift-operator/controllers/nodepool/gcp.go
@codecov

codecov Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.10345% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.30%. Comparing base (df4e94a) to head (f141100).
⚠️ Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
...rshift-operator/controllers/nodepool/conditions.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8917      +/-   ##
==========================================
+ Coverage   43.28%   43.30%   +0.02%     
==========================================
  Files         771      771              
  Lines       95503    95531      +28     
==========================================
+ Hits        41335    41367      +32     
+ Misses      51284    51282       -2     
+ Partials     2884     2882       -2     
Files with missing lines Coverage Δ
hypershift-operator/controllers/nodepool/gcp.go 71.37% <100.00%> (+4.85%) ⬆️
...rshift-operator/controllers/nodepool/conditions.go 53.87% <0.00%> (-0.19%) ⬇️

... and 1 file with indirect coverage changes

Flag Coverage Δ
cmd-support 36.67% <ø> (ø)
cpo-hostedcontrolplane 45.31% <ø> (ø)
cpo-other 45.10% <ø> (ø)
hypershift-operator 53.66% <93.10%> (+0.06%) ⬆️
other 31.69% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@thetechnick thetechnick force-pushed the gcp-504-nodepool-platform-conditions branch from 36ef0b4 to 89c2751 Compare July 3, 2026 09:46
Add GCP-specific conditions to NodePool status, to allow users to
diagnose image resolution failures.

This functionality is modeled after the AWS implementation.
@thetechnick thetechnick force-pushed the gcp-504-nodepool-platform-conditions branch from 89c2751 to f141100 Compare July 3, 2026 09:47
@thetechnick thetechnick marked this pull request as ready for review July 3, 2026 10:56
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 3, 2026
@openshift-ci openshift-ci Bot requested review from bryan-cox and muraee July 3, 2026 10:57
@openshift-ci

openshift-ci Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

@thetechnick: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +42 to +49
} else {
SetStatusCondition(&nodePool.Status.Conditions, hyperv1.NodePoolCondition{
Type: hyperv1.NodePoolValidPlatformImageType,
Status: corev1.ConditionTrue,
Reason: hyperv1.AsExpectedReason,
Message: fmt.Sprintf("Bootstrap GCP machine image is %q", img),
ObservedGeneration: nodePool.Generation,
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably drop the else statement here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do, but than I have to move img, err := resolveGCPImage(nodePool, releaseImage) out of the scope of the if condition. -> right now the img variable is not available outside of the else statement.

I personally have no preference for either way here.
Do you prefer it like this?

	img, err := resolveGCPImage(nodePool, releaseImage)
	if err != nil {
		SetStatusCondition(&nodePool.Status.Conditions, hyperv1.NodePoolCondition{
			Type:               hyperv1.NodePoolValidPlatformImageType,
			Status:             corev1.ConditionFalse,
			Reason:             hyperv1.NodePoolValidationFailedReason,
			Message:            fmt.Sprintf("Couldn't discover a GCP machine image for release image %q: %s", nodePool.Spec.Release.Image, err.Error()),
			ObservedGeneration: nodePool.Generation,
		})
		return fmt.Errorf("couldn't discover a GCP machine image for release image: %w", err)
	}
	SetStatusCondition(&nodePool.Status.Conditions, hyperv1.NodePoolCondition{
		Type:               hyperv1.NodePoolValidPlatformImageType,
		Status:             corev1.ConditionTrue,
		Reason:             hyperv1.AsExpectedReason,
		Message:            fmt.Sprintf("Bootstrap GCP machine image is %q", img),
		ObservedGeneration: nodePool.Generation,
	})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I personally like this version better, but at this point it's just a styling issue/preference.

Comment on lines +33 to +50
if img, err := resolveGCPImage(nodePool, releaseImage); err != nil {
SetStatusCondition(&nodePool.Status.Conditions, hyperv1.NodePoolCondition{
Type: hyperv1.NodePoolValidPlatformImageType,
Status: corev1.ConditionFalse,
Reason: hyperv1.NodePoolValidationFailedReason,
Message: fmt.Sprintf("Couldn't discover a GCP machine image for release image %q: %s", nodePool.Spec.Release.Image, err.Error()),
ObservedGeneration: nodePool.Generation,
})
return fmt.Errorf("couldn't discover a GCP machine image for release image: %w", err)
} else {
SetStatusCondition(&nodePool.Status.Conditions, hyperv1.NodePoolCondition{
Type: hyperv1.NodePoolValidPlatformImageType,
Status: corev1.ConditionTrue,
Reason: hyperv1.AsExpectedReason,
Message: fmt.Sprintf("Bootstrap GCP machine image is %q", img),
ObservedGeneration: nodePool.Generation,
})
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor behavioral inconsistency with AWS worth considering: when a user pins their own image via nodePool.Spec.Platform.GCP.Image, resolveGCPImage returns it directly with no error, so this condition is always set to ConditionTrue - including for user-defined images that HyperShift never actually validated.

The AWS equivalent (setAWSConditions) handles this explicitly by calling removeStatusCondition for custom AMIs, avoiding any implied validation claim.

Not a blocker, but worth either aligning the behavior or adding a comment documenting the intentional difference.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this. The condition type's own documentation in nodepool_conditions.go (line 9) explicitly says "If the image is direct user input then this condition is meaningless." Setting ConditionTrue for a user-pinned image contradicts that — aligning with the AWS removeStatusCondition pattern would be more accurate.

releaseImage *releaseinfo.ReleaseImage
check func(t *testing.T, nodePool *hyperv1.NodePool, err error)
}{
{

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: test names here don't follow the "When...it should..." format that every other test in this file and the AWS equivalent use. e.g. "Not a GCP NodePool" -> "When NodePool platform type is not GCP, it should not set platform image condition", "success" -> "When GCP NodePool has a valid image, it should set ValidPlatformImage to true", etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/gcp PR/issue for GCP (GCPPlatform) platform

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants