USHIFT-6794: Config for Cluster 2 Cluster Communication#6545
Conversation
|
@pmtk: This pull request references USHIFT-6794 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds C2CC remote-cluster types and validation, integrates C2CC into Config (defaults, merge, validation), introduces an overridable host-IP discovery hook, updates OpenAPI schema, docs, packaging, and adds tests covering enablement, sanitization, and extensive validation rules. Changes
Sequence Diagram(s)sequenceDiagram
participant User as rgba(52,152,219,0.5) User
participant Config as rgba(46,204,113,0.5) Config
participant C2CC as rgba(155,89,182,0.5) C2CC.validate
participant Host as rgba(241,196,15,0.5) HostIPs
participant Checker as rgba(231,76,60,0.5) Checks
User->>Config: provide c2cc settings
Config->>C2CC: call validate(cfg)
C2CC->>Host: getHostIPs() (overridable)
Host-->>C2CC: host interface IPs
C2CC->>Checker: validate NextHop, CIDRs, domains, overlaps, families
Checker-->>C2CC: errors / success
C2CC-->>Config: return aggregated error or nil
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/config/c2cc.go`:
- Around line 89-99: The duplicate NextHop detection is inconsistent because it
compares raw rc.NextHop strings while the routing-loop check uses
ip.Equal(nodeIP) (which canonicalizes IPv6); normalize NextHop by parsing
rc.NextHop to net.ParseIP and using its canonical form (ip.String()) as the key
in seenNextHops and for comparisons, i.e., compute a normalizedNextHop :=
ip.String() after successful ParseIP and use normalizedNextHop for the nodeIP
equality check and for lookup/storage in seenNextHops (replace uses of
rc.NextHop as the map key and in duplicate error messages with the canonical
form).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 368a244e-5ca5-459f-8bd3-66477daae1ef
📒 Files selected for processing (3)
pkg/config/c2cc.gopkg/config/c2cc_test.gopkg/config/config.go
4f90a2a to
a0bec55
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/config/c2cc.go (1)
46-49: Avoid silently dropping host-interface lookup errors.Ignoring
AddrListfailures can miss host IPs and weaken management-traffic conflict checks. Consider aggregating and returning these errors.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/config/c2cc.go` around lines 46 - 49, The code currently ignores errors from netlink.AddrList(link, netlink.FAMILY_ALL); change it to collect per-link errors instead of silently continuing: when netlink.AddrList returns err, append a contextualized error (include link.Name or link.Attrs().Name and the err) to a local []error slice (or a multierror), continue processing other links, and after the loop if the slice is non-empty return an aggregated error (use errors.Join if available or a multierror) alongside any successful results so callers are aware of lookup failures; keep the variables addrs, err and netlink.AddrList as the error source so reviewers can find the change easily.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/config/c2cc.go`:
- Around line 111-120: The loops validate each CIDR against prior remotes
(seenCIDRs) but miss overlaps inside the same remote (rc) between
rc.ClusterNetwork and rc.ServiceNetwork; before appending rc.ClusterNetwork and
rc.ServiceNetwork to seenCIDRs, run intra-remote conflict checks by invoking
checkCIDRConflicts for each CIDR in rc.ClusterNetwork against rc.ServiceNetwork
and vice versa (and optionally check within rc.ClusterNetwork and within
rc.ServiceNetwork for duplicates) so that validateRemoteCIDR and
checkCIDRConflicts catch overlaps within the same remote cluster using the
existing variables rc, label, seenCIDRs, and hostIPs.
---
Nitpick comments:
In `@pkg/config/c2cc.go`:
- Around line 46-49: The code currently ignores errors from
netlink.AddrList(link, netlink.FAMILY_ALL); change it to collect per-link errors
instead of silently continuing: when netlink.AddrList returns err, append a
contextualized error (include link.Name or link.Attrs().Name and the err) to a
local []error slice (or a multierror), continue processing other links, and
after the loop if the slice is non-empty return an aggregated error (use
errors.Join if available or a multierror) alongside any successful results so
callers are aware of lookup failures; keep the variables addrs, err and
netlink.AddrList as the error source so reviewers can find the change easily.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: fbc7a9a0-303f-4bcd-9b2c-53dacbe6b3a3
📒 Files selected for processing (3)
pkg/config/c2cc.gopkg/config/c2cc_test.gopkg/config/config.go
✅ Files skipped from review due to trivial changes (1)
- pkg/config/c2cc_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/config/config.go
1dba25a to
bc38b57
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
pkg/config/c2cc.go (2)
213-220: Overlap check silently returns false on parse errors.If either CIDR fails to parse, the function returns
false(no overlap). This is safe becausevalidateRemoteCIDRis called first and will catch malformed CIDRs, but consider documenting this assumption.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/config/c2cc.go` around lines 213 - 220, The cidrsOverlap function currently returns false when net.ParseCIDR fails, which can hide parse errors; update the code by adding a clear comment above func cidrsOverlap(a, b string) bool that states the function assumes inputs have been validated (e.g., by validateRemoteCIDR) and therefore intentionally returns false on parse errors, and mention that callers must validate CIDRs first; ensure the comment references cidrsOverlap and validateRemoteCIDR so future maintainers understand the contract.
45-52: Silent error suppression when enumerating interface addresses.If
netlink.AddrListfails for a link, the error is silently ignored. This is likely intentional for resilience (e.g., transient interface issues), but could mask systematic failures. Consider logging at debug level.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/config/c2cc.go` around lines 45 - 52, The loop that calls netlink.AddrList on each link currently swallows errors (in the snippet using variables links, link, addrs, ips) which can hide systemic failures; modify the error handling in the block that calls netlink.AddrList(link, netlink.FAMILY_ALL) to log the returned error at debug (or trace) level instead of silently continuing—use the existing logger (or inject one if not present) and include the link identifier and err in the log message, then continue as before so resilience is preserved while making failures visible for debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/config/c2cc.go`:
- Around line 213-220: The cidrsOverlap function currently returns false when
net.ParseCIDR fails, which can hide parse errors; update the code by adding a
clear comment above func cidrsOverlap(a, b string) bool that states the function
assumes inputs have been validated (e.g., by validateRemoteCIDR) and therefore
intentionally returns false on parse errors, and mention that callers must
validate CIDRs first; ensure the comment references cidrsOverlap and
validateRemoteCIDR so future maintainers understand the contract.
- Around line 45-52: The loop that calls netlink.AddrList on each link currently
swallows errors (in the snippet using variables links, link, addrs, ips) which
can hide systemic failures; modify the error handling in the block that calls
netlink.AddrList(link, netlink.FAMILY_ALL) to log the returned error at debug
(or trace) level instead of silently continuing—use the existing logger (or
inject one if not present) and include the link identifier and err in the log
message, then continue as before so resilience is preserved while making
failures visible for debugging.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 82df0843-e348-4f54-b465-a95468d190dc
📒 Files selected for processing (6)
cmd/generate-config/config/config-openapi-spec.jsondocs/user/howto_config.mdpackaging/microshift/config.yamlpkg/config/c2cc.gopkg/config/c2cc_test.gopkg/config/config.go
✅ Files skipped from review due to trivial changes (1)
- pkg/config/c2cc_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/config/config.go
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/config/c2cc.go`:
- Around line 60-62: The loop currently swallows errors from
netlink.AddrList(link, netlink.FAMILY_ALL) by using continue; change this to
fail-closed: surface the error to the caller instead of skipping the interface
(e.g., return the error or wrap it with context and return), so the host-IP
overlap guard cannot ignore unreadable interfaces; update the code paths that
call this function to handle the returned error accordingly and reference
netlink.AddrList and the loop iterating over link to locate the change.
- Around line 147-150: Add a check that enforces clusterNetwork and
serviceNetwork on a remote have matching cardinality and IP-family alignment
(same number of CIDRs and corresponding v4/v6 family per index) before accepting
the remote; implement this either by adding a new helper (e.g.,
validateRemoteClusterServiceShape(rc.ClusterNetwork, rc.ServiceNetwork, label))
or by extending validateRemoteIPFamilyCompatibility to accept both lists and
compare lengths and per-index families, and call it alongside
validateIPFamilyConsistency for rc.ClusterNetwork and rc.ServiceNetwork so
remotes follow the same invariant as the local cluster.
In `@pkg/config/config.go`:
- Around line 458-460: The merge logic currently ignores an explicitly disabled
C2CC because it only copies when len(u.C2CC.RemoteClusters) > 0; change the
condition to check presence of the C2CC value itself (e.g. if u.C2CC != nil) and
assign c.C2CC = u.C2CC so that an empty RemoteClusters slice from the incoming
config overwrites/clears the inherited value (follow the nil-vs-empty pattern
used for manifests.kustomizePaths).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 750f88b5-1fc9-42aa-8b45-3681186160f3
📒 Files selected for processing (3)
pkg/config/c2cc.gopkg/config/c2cc_test.gopkg/config/config.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/config/c2cc_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/config/c2cc.go`:
- Around line 81-84: The validation currently only parses and compares
cfg.Node.NodeIP against nextHop; update the logic to also parse
cfg.Node.NodeIPV6 (when non-empty) and ensure nextHop is compared against both
local IPs to detect local-node loop in dual-stack mode. Specifically, extend the
parsing block that creates nodeIP (and the similar block at lines ~108-110) to
parse cfg.Node.NodeIPV6 into a separate variable (e.g., nodeIPV6), handle
nil/invalid parsing appropriately, and add a conditional that treats nextHop
equal to either nodeIP or nodeIPV6 as a loop error; use the existing nextHop
identifier and cfg.Node.NodeIP/NodeIPV6 names to locate and update the checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 3095f427-acfc-4db4-a1e3-830845888359
📒 Files selected for processing (6)
cmd/generate-config/config/config-openapi-spec.jsondocs/user/howto_config.mdpackaging/microshift/config.yamlpkg/config/c2cc.gopkg/config/c2cc_test.gopkg/config/config.go
✅ Files skipped from review due to trivial changes (1)
- docs/user/howto_config.md
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/config/config.go
- packaging/microshift/config.yaml
| func cidrsOverlap(a, b string) bool { | ||
| _, netA, errA := net.ParseCIDR(a) | ||
| _, netB, errB := net.ParseCIDR(b) | ||
| if errA != nil || errB != nil { |
There was a problem hiding this comment.
should we add a check before this one to report an error if errA or errB are nil?
for example, we can silently return false when errA is nil
There was a problem hiding this comment.
those CIDRs are validated in validateRemoteCIDR:
_, ipNet, err := net.ParseCIDR(cidr)
if err != nil {
return fmt.Errorf("%s %q is not a valid CIDR: %w", field, cidr, err)
}
| return cfg | ||
| }(), | ||
| expectErr: true, | ||
| errMsg: "requires OVN-Kubernetes", |
There was a problem hiding this comment.
why this is valid error if CNI can be unset? see this error message
There was a problem hiding this comment.
But this is CniPluginNone not Unset, right?
(I think) we use None in upstream to deploy Kindnet
There was a problem hiding this comment.
I see, so if we use CniPluginNone and Unset I suggest we add a unit test to validate the unset as a valid option.
|
/hold |
|
/unhold |
|
/retest |
| # to serve from the API server. Allowed values: VersionTLS12, VersionTLS13. | ||
| # Defaults to VersionTLS12. | ||
| minVersion: VersionTLS12 | ||
| c2cc: |
There was a problem hiding this comment.
The c2cc name in the configuration file may seem a bit cryptic for some users.
Can we make it more explicit, like clusterToCluster, etc.?
There was a problem hiding this comment.
That's a good point. Changed to clusterToCluster
|
/retest |
|
|
||
| GenericDevicePlugin GenericDevicePlugin `json:"genericDevicePlugin"` | ||
|
|
||
| C2CC C2CC `json:"clusterToCluster"` |
There was a problem hiding this comment.
Should this be optional instead of remoteClusters inside c2cc? It is required now because it doesnt have an omitempty. Then the remoteClusters inside is optional, which creates a confusing setup?
There was a problem hiding this comment.
None of fields in Config has omitempty and they still work as expected.
As for remoteClusters - I guess it only matters for microshift config --effective or whatever the command is?
| clusterNetwork: [] | ||
| # DNS domain suffix for the remote cluster (e.g., "cluster-b.remote"). | ||
| # Services are reachable as <svc>.<ns>.svc.<domain>. | ||
| # Optional — if empty, no DNS forwarding is configured for this remote. | ||
| domain: "" | ||
| # IP address of the remote cluster's node, used as next-hop for routing. | ||
| nextHop: "" | ||
| # Service CIDRs of the remote cluster. Must not overlap with local cluster or other remotes. | ||
| serviceNetwork: [] |
There was a problem hiding this comment.
Are these defaults ok? Some might fail to validate?
There was a problem hiding this comment.
There's a code to remove empty objects like this to make sure default config works (which we have test for)
|
/lgtm |
|
@pmtk: you cannot LGTM your own PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/verified by @pmtk |
|
@pmtk: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pacevedom, pmtk The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@pmtk: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary by CodeRabbit
New Features
Tests
Documentation
Chores