Bug 86133: validate duplicate Nutanix failureDomain names and topology#10564
Conversation
Adds two checks to Nutanix failure domain validation: 1. Duplicate name detection: rejects configurations where two failure domains share the same name, which previously went unvalidated. 2. Duplicate topology detection: rejects failure domains that have identical Prism Element UUID and subnet UUIDs. Copy-pasted failure domains that differ only in name provide no additional fault tolerance and can cause subtle scheduling issues. Bug: https://redhat.atlassian.net/browse/OCPBUGS-86073 Co-authored-by: Cursor <cursoragent@cursor.com>
WalkthroughThe PR enhances ValidatePlatform to detect duplicate Nutanix failure domains by name and by topology (Prism Element UUID plus the set of subnet UUIDs). It adds deterministic topology-key construction using sorted subnet UUIDs and updates tests to cover duplicate-name and duplicate-topology cases. ChangesFailure-domain duplication validation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 12✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @chdeshpa-hue. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/types/nutanix/validation/platform_test.go (1)
433-451: ⚡ Quick winAdd a regression case for subnet-order invariance.
Since topology comparison depends on sorted subnet UUIDs, add a case where two failure domains have identical subnet sets in different orders and still fail as duplicate topology. This protects the core normalization behavior from regressions.
Example test case to add
+ { + name: "failureDomain duplicate topology with same subnets in different order", + platform: func() *nutanix.Platform { + p := validPlatform() + p.FailureDomains = []nutanix.FailureDomain{ + { + Name: "fd-1", + PrismElement: nutanix.PrismElement{UUID: "fd-pe-uuid", Endpoint: nutanix.PrismEndpoint{Address: "fd-pe", Port: 9440}}, + SubnetUUIDs: []string{"subnet-a", "subnet-b"}, + }, + { + Name: "fd-2", + PrismElement: nutanix.PrismElement{UUID: "fd-pe-uuid", Endpoint: nutanix.PrismEndpoint{Address: "fd-pe", Port: 9440}}, + SubnetUUIDs: []string{"subnet-b", "subnet-a"}, + }, + } + return p + }(), + expectedError: `test-path\.failureDomains\[1\]: Invalid value: "fd-2": failure domain "fd-2" has identical topology`, + },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/types/nutanix/validation/platform_test.go` around lines 433 - 451, Add a regression test variant to the existing failure-domain duplicate-topology case that verifies subnet-order invariance: modify the test case that uses nutanix.Platform / FailureDomain (the case with Name "fd-1" and "fd-2") so one FailureDomain has SubnetUUIDs in a different order than the other (e.g., {"a","b"} vs {"b","a"}) while keeping the same PrismElement (PrismElement and PrismEndpoint fields identical), and assert the same expected error string about identical topology; this ensures the topology comparison logic (normalization/sorting of SubnetUUIDs) in the validation still treats differently ordered but identical subnet sets as duplicates.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/types/nutanix/validation/platform.go`:
- Around line 122-128: The duplicate-topology check using
topoKey/nutanixFailureDomainTopologyKey should be skipped for malformed failure
domains: ensure you only compute topoKey and consult/update fdTopologies after
basic validation that fd.PrismElement UUID is non-empty/valid and fd.Subnets is
non-empty and each subnet is valid; alternatively move the entire
topoKey/fdTopologies block to run after the existing required-field/subnet
validations so that field.Invalid additions for empty/invalid PrismElement or
Subnets occur instead of spurious "identical topology" errors.
---
Nitpick comments:
In `@pkg/types/nutanix/validation/platform_test.go`:
- Around line 433-451: Add a regression test variant to the existing
failure-domain duplicate-topology case that verifies subnet-order invariance:
modify the test case that uses nutanix.Platform / FailureDomain (the case with
Name "fd-1" and "fd-2") so one FailureDomain has SubnetUUIDs in a different
order than the other (e.g., {"a","b"} vs {"b","a"}) while keeping the same
PrismElement (PrismElement and PrismEndpoint fields identical), and assert the
same expected error string about identical topology; this ensures the topology
comparison logic (normalization/sorting of SubnetUUIDs) in the validation still
treats differently ordered but identical subnet sets as duplicates.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6c9bd4ef-5036-40cd-a0f4-1223dd900de4
📒 Files selected for processing (2)
pkg/types/nutanix/validation/platform.gopkg/types/nutanix/validation/platform_test.go
|
/ok-to-test cc: @abhay-nutanix |
- Move duplicate topology detection after PrismElement UUID and subnet validation so users see actionable "required field" errors instead of misleading "identical topology" when fields are simply empty - Use \x00 as subnet separator to eliminate join ambiguity - Add regression test confirming subnet-order invariance: identical subnet sets in different order are correctly detected as duplicates Bug: https://redhat.atlassian.net/browse/OCPBUGS-86133 Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/types/nutanix/validation/platform.go (1)
134-142:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSkip topology dedupe when subnet list is syntactically present but effectively empty.
Line 134 still allows duplicate-topology checks for malformed
SubnetUUIDslike[]string{""}, which can reintroduce misleading “identical topology” errors alongside required-subnet errors. Tighten the gate to matchvalidateSubnets’ required check semantics.Suggested fix
- if fd.PrismElement.UUID != "" && len(fd.SubnetUUIDs) > 0 { + if fd.PrismElement.UUID != "" && len(fd.SubnetUUIDs) > 0 && fd.SubnetUUIDs[0] != "" { topoKey := nutanixFailureDomainTopologyKey(fd) if prevName, exists := fdTopologies[topoKey]; exists { allErrs = append(allErrs, field.Invalid(fldPath.Child("failureDomains").Index(idx), fd.Name, fmt.Sprintf("failure domain %q has identical topology (same prismElement and subnets) as %q; this provides no additional fault tolerance", fd.Name, prevName))) } else { fdTopologies[topoKey] = fd.Name } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/types/nutanix/validation/platform.go` around lines 134 - 142, The topology dedupe currently runs when fd.SubnetUUIDs is non-nil or non-zero-length (e.g., []string{""}), which can falsely detect identical topologies; change the guard in the block that references nutanixFailureDomainTopologyKey(fd) so it only proceeds when SubnetUUIDs contains at least one non-empty UUID (for example: replace len(fd.SubnetUUIDs) > 0 with a helper-style check that scans for any s != "" or reuse the same required-subnet semantics from validateSubnets), then keep the rest of the logic (fdTopologies lookup and assignment) unchanged so malformed empty-string subnet slices are skipped and won't trigger the duplicate-topology error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@pkg/types/nutanix/validation/platform.go`:
- Around line 134-142: The topology dedupe currently runs when fd.SubnetUUIDs is
non-nil or non-zero-length (e.g., []string{""}), which can falsely detect
identical topologies; change the guard in the block that references
nutanixFailureDomainTopologyKey(fd) so it only proceeds when SubnetUUIDs
contains at least one non-empty UUID (for example: replace len(fd.SubnetUUIDs) >
0 with a helper-style check that scans for any s != "" or reuse the same
required-subnet semantics from validateSubnets), then keep the rest of the logic
(fdTopologies lookup and assignment) unchanged so malformed empty-string subnet
slices are skipped and won't trigger the duplicate-topology error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d5fcaf96-c9bf-42f3-b7f1-70141eaf16da
📒 Files selected for processing (2)
pkg/types/nutanix/validation/platform.gopkg/types/nutanix/validation/platform_test.go
|
/retest |
|
@chdeshpa-hue: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Adds two missing validation checks for Nutanix failure domains:
Split from #10561 per reviewer feedback — this PR contains the Nutanix portion only.
Bug: https://redhat.atlassian.net/browse/OCPBUGS-86133
Manual Test Results
Tested with a custom-built
openshift-installbinary. When two Nutanix failure domains share the same name or identical topology, the installer now correctly rejects:Duplicate name:
Duplicate topology:
Test Plan
go test ./pkg/types/nutanix/validation/failureDomain with duplicate name,failureDomain with duplicate topology same prismElement and subnet,valid failureDomain with different prismElementsMade with Cursor
Summary by CodeRabbit
Bug Fixes
Tests