fix: OCPBUGS-78575: create virt-launcher NetworkPolicy on external infra cluster#8056
fix: OCPBUGS-78575: create virt-launcher NetworkPolicy on external infra cluster#8056dpateriya wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughKubeVirt NetworkPolicy reconciliation now branches by whether external infra credentials exist. If external credentials are present the controller discovers an infra-cluster client, attempts to GET the cluster-scoped configv1.Network "cluster", updates the HostedCluster condition Sequence DiagramsequenceDiagram
participant MC as Management Cluster Controller
participant Creds as KubeVirt Credentials
participant IC as Infrastructure Cluster API
participant NCfg as configv1.Network ("cluster")
participant Status as HostedCluster Status
participant Event as Infra Warning Event
participant NP as NetworkPolicy Resource
rect rgba(70, 130, 180, 0.5)
Note over MC,NP: External Infrastructure Flow
MC->>Creds: Check if Credentials != nil
alt External Infrastructure
Creds-->>MC: Credentials present
MC->>IC: Discover infra cluster client
IC->>NCfg: GET configv1.Network "cluster"
NCfg-->>IC: Return network CIDRs or error
IC-->>MC: Provide network config or error
MC->>Status: Update ValidKubeVirtInfraNetworkPolicyRBAC condition
MC->>Event: Emit infra-cluster RBAC warning (on read failure)
MC->>NP: Create/Update virt-launcher NetworkPolicy in infra namespace (use CIDRs if available)
else Centralized Infrastructure
Creds-->>MC: No credentials
MC->>NP: Create/Update virt-launcher NetworkPolicy in control-plane namespace
end
end
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
adca61c to
b3411d4
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
hypershift-operator/controllers/hostedcluster/network_policies.go (1)
719-842: Please extract the shared virt-launcher policy builder before these two paths drift.This function is nearly a copy of
reconcileVirtLauncherNetworkPolicy; only the blocked CIDR source and a few egress peers differ. Pull the common ingress/egress construction into a helper and pass the external-infra deltas in, otherwise future policy changes are easy to apply to one path and miss in the other.As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/hostedcluster/network_policies.go` around lines 719 - 842, The two functions reconcileVirtLauncherNetworkPolicyExternalInfra and reconcileVirtLauncherNetworkPolicy share almost identical ingress/egress construction; extract a helper (e.g., buildVirtLauncherPolicyBase or newVirtLauncherPolicyBuilder) that builds the common policy.Spec.PolicyTypes, PodSelector, common Ingress rules and base Egress rules and accepts parameters for the variable pieces (blockedIPv4Networks, blockedIPv6Networks, and a slice of extra egress peers or a callback to append external-infra-specific peers). Replace the duplicated blocks in both reconcileVirtLauncherNetworkPolicyExternalInfra and reconcileVirtLauncherNetworkPolicy to call the new helper, then apply the external-specific deltas (adding service NodePort IPBlocks and the infra-specific Pod/Namespace peers) to the returned policy or builder before returning; keep existing symbol names (policy, hcluster, infraClusterNetwork) to make locating sites straightforward.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 166-168: The code performs a cluster-scoped lookup via
infraClient.Get on infraClusterNetwork (configv1.Network{Name:"cluster"}) which
requires cluster-level RBAC; change this by either removing the live
cluster-scoped lookup (avoid calling infraClient.Get in the network policy
reconciliation path and use a safer local/default value or accept a passed-in
configuration) or explicitly require and document ClusterRole/ClusterRoleBinding
granting get on networks.config.openshift.io for the controller; update the code
around infraClusterNetwork/infraClient.Get to implement the chosen approach and
add a RBAC manifest entry (ClusterRole + ClusterRoleBinding) if you opt to keep
the live lookup, ensuring it grants verbs=get on
resource=networks.config.openshift.io.
---
Nitpick comments:
In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 719-842: The two functions
reconcileVirtLauncherNetworkPolicyExternalInfra and
reconcileVirtLauncherNetworkPolicy share almost identical ingress/egress
construction; extract a helper (e.g., buildVirtLauncherPolicyBase or
newVirtLauncherPolicyBuilder) that builds the common policy.Spec.PolicyTypes,
PodSelector, common Ingress rules and base Egress rules and accepts parameters
for the variable pieces (blockedIPv4Networks, blockedIPv6Networks, and a slice
of extra egress peers or a callback to append external-infra-specific peers).
Replace the duplicated blocks in both
reconcileVirtLauncherNetworkPolicyExternalInfra and
reconcileVirtLauncherNetworkPolicy to call the new helper, then apply the
external-specific deltas (adding service NodePort IPBlocks and the
infra-specific Pod/Namespace peers) to the returned policy or builder before
returning; keep existing symbol names (policy, hcluster, infraClusterNetwork) to
make locating sites straightforward.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 72ef2682-9485-4808-b0a3-ac41022f2c6d
📒 Files selected for processing (2)
docs/content/how-to/kubevirt/external-infrastructure.mdhypershift-operator/controllers/hostedcluster/network_policies.go
6b6911d to
a599aa8
Compare
|
@dpateriya: This pull request references Jira Issue OCPBUGS-78575, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
hypershift-operator/controllers/hostedcluster/network_policies.go (2)
181-183: Reuse the injectedcreateOrUpdatein the external-infra branch.Constructing a fresh upsert helper here bypasses the caller-supplied
createOrUpdate, so this path no longer shares the same wrapper/test seam as the rest ofreconcileNetworkPolicies.Suggested change
- infraCreateOrUpdate := upsert.New(r.EnableCIDebugOutput).CreateOrUpdate - if _, err := infraCreateOrUpdate(ctx, infraClient, policy, func() error { + if _, err := createOrUpdate(ctx, infraClient, policy, func() error { return reconcileVirtLauncherNetworkPolicyExternalInfra(log, policy, hcluster, infraClusterNetwork) }); err != nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/hostedcluster/network_policies.go` around lines 181 - 183, This code creates a new upsert helper (infraCreateOrUpdate := upsert.New(r.EnableCIDebugOutput).CreateOrUpdate) instead of using the injected caller-supplied createOrUpdate, breaking the test/wrapper seam; change the branch to call the injected createOrUpdate (the same createOrUpdate used elsewhere in reconcileNetworkPolicies/external-infra path) when applying policy = networkpolicy.VirtLauncherNetworkPolicy(infraNamespace) so the call becomes createOrUpdate(ctx, infraClient, policy, func() error { ... }) and remove the local upsert.New(...) construction.
733-857: Extract the shared virt-launcher policy builder.This helper copies most of
reconcileVirtLauncherNetworkPolicy: selector setup, ingress rules, CIDR blocking, and NodePort exception handling. Only a small subset of egress peers differs, so keeping two near-identical implementations will drift quickly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/hostedcluster/network_policies.go` around lines 733 - 857, The two virt-launcher reconciler functions are largely duplicated; extract the shared logic into a helper (e.g., buildVirtLauncherPolicyBase or new function buildVirtLauncherPolicySpec) that accepts the logger, policy pointer or returns a prepared networkingv1.NetworkPolicySpec, the HostedCluster (hcluster) and infraClusterNetwork and performs: setting PolicyTypes, PodSelector (hyperv1.InfraIDLabel and "kubevirt.io":"virt-launcher"), ingress ports, building blockedIPv4/IPv6 via addToBlockedNetworks, and the NodePort exception handling loop over hcluster.Spec.Services (preserving netip.ParseAddr, utilsnet IPv4/IPv6 checks and error returns). Have reconcileVirtLauncherNetworkPolicyExternalInfra and the other reconcileVirtLauncherNetworkPolicy call this helper to get the base Spec (or mutate policy) and then append only their differing egress peers (the DNS/ingress/peer differences), returning errors from the helper as needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 170-179: The infra cluster Network lookup currently swallows
errors (infraClient.Get) leaving infraClusterNetwork nil which causes fallback
to unrestricted egress; instead, when infraClient.Get fails, either return a
reconciliation error from the surrounding reconcile function (so the controller
retries) or mark the HostedCluster as degraded via the existing status/condition
helper (e.g., setHostedClusterDegraded or SetProgressing/SetDegraded) with a
clear message referencing the failed infra network lookup; update the code paths
around infraClusterNetwork, networkObj and the virt-launcher NetworkPolicy
creation to rely on that error flow so we do not silently emit 0.0.0.0/0 and
::/0 rules when the CIDR cannot be retrieved.
---
Nitpick comments:
In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 181-183: This code creates a new upsert helper
(infraCreateOrUpdate := upsert.New(r.EnableCIDebugOutput).CreateOrUpdate)
instead of using the injected caller-supplied createOrUpdate, breaking the
test/wrapper seam; change the branch to call the injected createOrUpdate (the
same createOrUpdate used elsewhere in reconcileNetworkPolicies/external-infra
path) when applying policy =
networkpolicy.VirtLauncherNetworkPolicy(infraNamespace) so the call becomes
createOrUpdate(ctx, infraClient, policy, func() error { ... }) and remove the
local upsert.New(...) construction.
- Around line 733-857: The two virt-launcher reconciler functions are largely
duplicated; extract the shared logic into a helper (e.g.,
buildVirtLauncherPolicyBase or new function buildVirtLauncherPolicySpec) that
accepts the logger, policy pointer or returns a prepared
networkingv1.NetworkPolicySpec, the HostedCluster (hcluster) and
infraClusterNetwork and performs: setting PolicyTypes, PodSelector
(hyperv1.InfraIDLabel and "kubevirt.io":"virt-launcher"), ingress ports,
building blockedIPv4/IPv6 via addToBlockedNetworks, and the NodePort exception
handling loop over hcluster.Spec.Services (preserving netip.ParseAddr, utilsnet
IPv4/IPv6 checks and error returns). Have
reconcileVirtLauncherNetworkPolicyExternalInfra and the other
reconcileVirtLauncherNetworkPolicy call this helper to get the base Spec (or
mutate policy) and then append only their differing egress peers (the
DNS/ingress/peer differences), returning errors from the helper as needed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 37a29eba-4fda-4644-8e36-3c43d3834bdc
📒 Files selected for processing (2)
docs/content/how-to/kubevirt/external-infrastructure.mdhypershift-operator/controllers/hostedcluster/network_policies.go
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/content/how-to/kubevirt/external-infrastructure.md
a599aa8 to
5c389e3
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
hypershift-operator/controllers/hostedcluster/network_policies.go (1)
172-177:⚠️ Potential issue | 🟠 MajorDon't fall back on every infra-network lookup error.
This path currently treats transient API/auth/connectivity failures the same as missing RBAC and silently creates the weaker no-CIDR-blocking policy instead of retrying. Reserve the fallback for the expected permission / unsupported-resource cases and return the rest.
Possible fix
- if err := infraClient.Get(ctx, client.ObjectKeyFromObject(networkObj), networkObj); err != nil { - log.Info("unable to read networks.config.openshift.io/cluster from the infrastructure cluster; "+ - "virt-launcher NetworkPolicy will be created without CIDR-based egress restrictions. "+ - "Grant get permission on networks.config.openshift.io via a ClusterRole for full network isolation", - "error", err) - } else { - infraClusterNetwork = networkObj - } + if err := infraClient.Get(ctx, client.ObjectKeyFromObject(networkObj), networkObj); err != nil { + if apierrors.IsForbidden(err) || apierrors.IsNotFound(err) || meta.IsNoMatchError(err) { + log.Info("unable to read networks.config.openshift.io/cluster from the infrastructure cluster; "+ + "virt-launcher NetworkPolicy will be created without CIDR-based egress restrictions. "+ + "Grant get permission on networks.config.openshift.io via a ClusterRole for full network isolation", + "error", err) + } else { + return fmt.Errorf("failed to get infrastructure cluster network config: %w", err) + } + } else { + infraClusterNetwork = networkObj + }This needs
k8s.io/apimachinery/pkg/api/errorsandk8s.io/apimachinery/pkg/api/metaimports above.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/hostedcluster/network_policies.go` around lines 172 - 177, The current error handling around infraClient.Get(networkObj) in network_policies.go treats all errors as "missing RBAC" and silently falls back; change it to only swallow expected permission/unsupported-resource errors (e.g., apierrors.IsNotFound(err), apierrors.IsForbidden(err), or meta.IsNoMatchError(err)) and for any other error return/propagate it so the reconcile will retry; update the branch around the Get call (the networkObj / infraClient.Get(...) handling) to import and use k8s.io/apimachinery/pkg/api/errors and k8s.io/apimachinery/pkg/api/meta and only perform the no-CIDR-blocking fallback on those specific conditions, otherwise return the original error.
🧹 Nitpick comments (1)
hypershift-operator/controllers/hostedcluster/network_policies.go (1)
181-183: Reuse the injectedcreateOrUpdatehere.
reconcileNetworkPoliciesalready receives aCreateOrUpdateFNthat takes the target client. Spinning up a fresh upserter only for this branch bypasses that seam and makes tests/observability wrappers easier to miss on the external-infra path.Small cleanup
- infraCreateOrUpdate := upsert.New(r.EnableCIDebugOutput).CreateOrUpdate - if _, err := infraCreateOrUpdate(ctx, infraClient, policy, func() error { + if _, err := createOrUpdate(ctx, infraClient, policy, func() error { return reconcileVirtLauncherNetworkPolicyExternalInfra(log, policy, hcluster, infraClusterNetwork) }); err != nil {As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/hostedcluster/network_policies.go` around lines 181 - 183, The code creates a new upserter for the infra path (infraCreateOrUpdate := upsert.New(r.EnableCIDebugOutput).CreateOrUpdate) instead of using the injected CreateOrUpdateFN passed into reconcileNetworkPolicies; remove the upsert.New(...) call and invoke the existing injected createOrUpdate function with the infraClient (i.e., call createOrUpdate(ctx, infraClient, policy, func() error { ... })) so the external-infra path uses the same create/update wrapper, tests and observability hooks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 172-177: The current error handling around
infraClient.Get(networkObj) in network_policies.go treats all errors as "missing
RBAC" and silently falls back; change it to only swallow expected
permission/unsupported-resource errors (e.g., apierrors.IsNotFound(err),
apierrors.IsForbidden(err), or meta.IsNoMatchError(err)) and for any other error
return/propagate it so the reconcile will retry; update the branch around the
Get call (the networkObj / infraClient.Get(...) handling) to import and use
k8s.io/apimachinery/pkg/api/errors and k8s.io/apimachinery/pkg/api/meta and only
perform the no-CIDR-blocking fallback on those specific conditions, otherwise
return the original error.
---
Nitpick comments:
In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 181-183: The code creates a new upserter for the infra path
(infraCreateOrUpdate := upsert.New(r.EnableCIDebugOutput).CreateOrUpdate)
instead of using the injected CreateOrUpdateFN passed into
reconcileNetworkPolicies; remove the upsert.New(...) call and invoke the
existing injected createOrUpdate function with the infraClient (i.e., call
createOrUpdate(ctx, infraClient, policy, func() error { ... })) so the
external-infra path uses the same create/update wrapper, tests and observability
hooks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: eae1fb84-55eb-47d1-a51c-1034c5cf8414
📒 Files selected for processing (2)
docs/content/how-to/kubevirt/external-infrastructure.mdhypershift-operator/controllers/hostedcluster/network_policies.go
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/content/how-to/kubevirt/external-infrastructure.md
5c389e3 to
f33bbc9
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/hypershift/v1beta1/hostedcluster_conditions.go`:
- Around line 125-132: The new condition type
ValidKubeVirtInfraNetworkPolicyRBAC is defined but not registered in the
ExpectedHCConditions initialization, so add ValidKubeVirtInfraNetworkPolicyRBAC
to the KubevirtPlatform case in the ExpectedHCConditions slice/map in
support/conditions/conditions.go (the same place where
ValidKubeVirtInfraNetworkMTU and KubeVirtNodesLiveMigratable are listed) so the
reconciliation logic will initialize and maintain this condition; update the
KubevirtPlatform entry to include the ValidKubeVirtInfraNetworkPolicyRBAC
ConditionType.
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 2015-2022: The ValidKubeVirtInfraNetworkPolicyRBAC condition set
by reconcileNetworkPolicies must be persisted immediately instead of waiting
until after reconcileKubevirtCSIClusterRBAC; update the HostedCluster status
(call r.Client.Status().Update(ctx, hcluster) and handle apierrors.IsConflict
the same way) right after the condition is set (or at least before any
subsequent paths that may return an error such as
reconcileKubevirtCSIClusterRBAC), so the infra-network RBAC condition is never
lost or left stale even if later KubeVirt steps fail.
In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 721-730: Update the NamespaceSelector for the peer that targets
ingress router pods so it uses the standard namespace label key
"kubernetes.io/metadata.name" instead of "name"; locate the block that has
PodSelector matching
"ingresscontroller.operator.openshift.io/deployment-ingresscontroller":
"default" and replace the NamespaceSelector MatchLabels key so it matches
"kubernetes.io/metadata.name": "openshift-ingress" to ensure the selector
actually matches the openshift-ingress namespace.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: c57503d0-c095-4f5e-8422-9ea2faf6a6e3
⛔ Files ignored due to path filters (1)
vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (4)
api/hypershift/v1beta1/hostedcluster_conditions.godocs/content/how-to/kubevirt/external-infrastructure.mdhypershift-operator/controllers/hostedcluster/hostedcluster_controller.gohypershift-operator/controllers/hostedcluster/network_policies.go
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/content/how-to/kubevirt/external-infrastructure.md
f33bbc9 to
04deba2
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
hypershift-operator/controllers/hostedcluster/network_policies.go (1)
721-730:⚠️ Potential issue | 🟠 MajorUse the namespace label key that this router peer can actually match.
The selector at Line 729 still depends on
name=openshift-ingress, while this file otherwise matches standard namespaces viakubernetes.io/metadata.name. If the namespace only has the standard label, this peer never matches and virt-launcher egress to router pods stays blocked.Suggested fix
NamespaceSelector: &metav1.LabelSelector{ MatchLabels: map[string]string{ - "name": "openshift-ingress", + "kubernetes.io/metadata.name": "openshift-ingress", }, },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/hostedcluster/network_policies.go` around lines 721 - 730, The NamespaceSelector in the NetworkPolicy peer uses the non-standard label key "name" which prevents matching namespaces that only have the standard label; update the NamespaceSelector.MatchLabels key from "name" to "kubernetes.io/metadata.name" in the NetworkPolicy block (the peer with PodSelector matching "ingresscontroller.operator.openshift.io/deployment-ingresscontroller": "default") so the selector correctly matches the openshift-ingress namespace.hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go (1)
2011-2022:⚠️ Potential issue | 🟠 MajorPersist
ValidKubeVirtInfraNetworkPolicyRBACbefore the CSI RBAC reconcile.The extra status write at Line 2016 still runs only after
reconcileKubevirtCSIClusterRBAC. If that call fails, the new infra-network RBAC condition can still be left stale or missing, which removes the main diagnostic signal for the external-infra path.Suggested fix
case hyperv1.KubevirtPlatform: + if hcluster.Spec.Platform.Kubevirt != nil && hcluster.Spec.Platform.Kubevirt.Credentials != nil { + if err := r.Client.Status().Update(ctx, hcluster); err != nil { + if apierrors.IsConflict(err) { + return ctrl.Result{Requeue: true}, nil + } + return ctrl.Result{}, fmt.Errorf("failed to update status after network policy RBAC check: %w", err) + } + } err = r.reconcileKubevirtCSIClusterRBAC(ctx, createOrUpdate, hcluster) if err != nil { return ctrl.Result{}, fmt.Errorf("failed to reconcile kubevirt CSI cluster wide RBAC: %w", err) } - if hcluster.Spec.Platform.Kubevirt != nil && hcluster.Spec.Platform.Kubevirt.Credentials != nil { - if err := r.Client.Status().Update(ctx, hcluster); err != nil { - if apierrors.IsConflict(err) { - return ctrl.Result{Requeue: true}, nil - } - return ctrl.Result{}, fmt.Errorf("failed to update status after network policy RBAC check: %w", err) - } - }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go` around lines 2011 - 2022, The status update for the infra-network RBAC condition is happening after reconcileKubevirtCSIClusterRBAC and can be skipped if that call errors; move/persist the ValidKubeVirtInfraNetworkPolicyRBAC status write so it runs before or independently of reconcileKubevirtCSIClusterRBAC. Specifically, ensure the code that sets/updates the ValidKubeVirtInfraNetworkPolicyRBAC condition on hcluster and calls r.Client.Status().Update(ctx, hcluster) executes (and handles apierrors.IsConflict) prior to invoking r.reconcileKubevirtCSIClusterRBAC, leaving reconcileKubevirtCSIClusterRBAC to run afterwards so the infra-network RBAC diagnostic state cannot be lost if CSI RBAC reconciliation fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 2011-2022: The status update for the infra-network RBAC condition
is happening after reconcileKubevirtCSIClusterRBAC and can be skipped if that
call errors; move/persist the ValidKubeVirtInfraNetworkPolicyRBAC status write
so it runs before or independently of reconcileKubevirtCSIClusterRBAC.
Specifically, ensure the code that sets/updates the
ValidKubeVirtInfraNetworkPolicyRBAC condition on hcluster and calls
r.Client.Status().Update(ctx, hcluster) executes (and handles
apierrors.IsConflict) prior to invoking r.reconcileKubevirtCSIClusterRBAC,
leaving reconcileKubevirtCSIClusterRBAC to run afterwards so the infra-network
RBAC diagnostic state cannot be lost if CSI RBAC reconciliation fails.
In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 721-730: The NamespaceSelector in the NetworkPolicy peer uses the
non-standard label key "name" which prevents matching namespaces that only have
the standard label; update the NamespaceSelector.MatchLabels key from "name" to
"kubernetes.io/metadata.name" in the NetworkPolicy block (the peer with
PodSelector matching
"ingresscontroller.operator.openshift.io/deployment-ingresscontroller":
"default") so the selector correctly matches the openshift-ingress namespace.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: e6fa90e7-dc22-429b-9381-bf3a998f498e
⛔ Files ignored due to path filters (3)
docs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.mddocs/content/reference/api.mdis excluded by!docs/content/reference/api.mdvendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (6)
api/hypershift/v1beta1/hostedcluster_conditions.godocs/content/how-to/kubevirt/external-infrastructure.mdhypershift-operator/controllers/hostedcluster/hostedcluster_controller.gohypershift-operator/controllers/hostedcluster/network_policies.gosupport/conditions/conditions.gotest/e2e/util/util.go
🚧 Files skipped from review as they are similar to previous changes (2)
- api/hypershift/v1beta1/hostedcluster_conditions.go
- docs/content/how-to/kubevirt/external-infrastructure.md
04deba2 to
d7f90ca
Compare
|
/retest |
|
/lgtm |
|
Scheduling tests matching the |
|
/test e2e-aws |
|
/jira refresh |
|
@dpateriya: This pull request references Jira Issue OCPBUGS-78575, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@enxebre @sjenning could one of you add /approve when you have a moment? @orenc1 has already /lgtm, CI is green, and @csrwng is on PTO until mid-April so we need another approver to satisfy OWNERS + api/OWNERS. This change adds the virt-launcher NetworkPolicy on the external KubeVirt infra cluster (OCPBUGS-78575), including the follow-up fixes for ingress namespace labels and persisting the RBAC status before CSI reconcile. CC: @celebdor |
|
/approve Changes under api LGTM |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dpateriya, JoelSpeed, qinqon The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
csrwng
left a comment
There was a problem hiding this comment.
My main concern is the impact of this change on existing kubevirt deployments that are already using external infra and don't have the updated RBAC to allow creating/updating network policies.
Other than that, looks good.
|
@csrwng Ingress namespace from "name": "openshift-ingress" to kubernetes.io/metadata.name was a CodeRabbit-driven fix for matching real namespace labels. Regarding the existing hostedclusters using external infra, we can
If you have any other suggestion let me know. |
|
Of the options above, I prefer 1. (nobody reads docs) Also, for the ingress label, if we're going to change it from what it was in the past, we should use what the networking team recommends: |
d7f90ca to
1bb7f9f
Compare
|
New changes are detected. LGTM label has been removed. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8056 +/- ##
=======================================
Coverage ? 35.60%
=======================================
Files ? 767
Lines ? 93459
Branches ? 0
=======================================
Hits ? 33280
Misses ? 57484
Partials ? 2695
🚀 New features to boost your workflow:
|
| meta.SetStatusCondition(&hcluster.Status.Conditions, metav1.Condition{ | ||
| Type: string(hyperv1.ValidKubeVirtInfraNetworkPolicyRBAC), | ||
| Status: metav1.ConditionFalse, | ||
| Reason: hyperv1.InfraClusterNetworkReadFailedReason, |
There was a problem hiding this comment.
The reason should be different.
| ValidKubeVirtInfraNetworkMTU ConditionType = "ValidKubeVirtInfraNetworkMTU" | ||
|
|
||
| // ValidKubeVirtInfraNetworkPolicyRBAC indicates whether the external infra | ||
| // kubeconfig has sufficient permissions to read the infrastructure cluster's |
There was a problem hiding this comment.
This description should be updated to reflect that the condition now applies to insufficient RBAC for networkpolicies in addition to the network configuration.
1bb7f9f to
2cfbc6c
Compare
…fra cluster When deploying HCP KubeVirt with external infrastructure (workers on a separate cluster), the virt-launcher NetworkPolicy was never created on the infrastructure cluster. The reconcileNetworkPolicies function explicitly skipped it when Credentials != nil, leaving guest VMs with unrestricted network access to all pods and services on the infra cluster. This patch adds an else branch that uses the existing infra cluster client (from KubevirtInfraClientMap) to create the virt-launcher NetworkPolicy in the infra namespace on the infrastructure cluster. A new reconcileVirtLauncherNetworkPolicyExternalInfra function builds the policy adapted for external infra: - Blocks infra cluster's clusterNetwork/serviceNetwork CIDRs - Allows inter-VM, DNS, and ingress controller traffic - Omits control-plane pod selectors (kube-apiserver, oauth, ignition-server-proxy) since those pods run on the management cluster and are reached via external IPs The Network config lookup is best-effort: if the infra kubeconfig lacks cluster-scoped get on networks.config.openshift.io, the policy is still created but without CIDR-based egress blocking. Also updates the documented minimum RBAC role for external infra to include networkpolicies (networking.k8s.io) and documents the optional ClusterRole for full network isolation. Made-with: Cursor
2cfbc6c to
7b03ca8
Compare
|
Now I have the complete root cause analysis. Here's my final report: Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThis is a known flaky test unrelated to PR #8056. The Root CauseThe root cause is a race condition in
When the subsequent per-suite test (
The non-deterministic nature (different Kube versions fail on different runs) is because the race depends on timing of the API server's CRD deletion controller, which varies with envtest server startup time and runner load. An open fix exists: PR #8261 (OCPBUGS-83585) adds a wait-for-removal loop after Recommendations
Evidence
|
|
/retest |
|
/retest-required |
|
@dpateriya: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Problem
When deploying a Hosted Control Plane with KubeVirt platform using external infrastructure (workers on a separate cluster from the control plane), the virt-launcher NetworkPolicy is never created on the infrastructure cluster. The code in reconcileNetworkPolicies explicitly skips it when hcluster.Spec.Platform.Kubevirt.Credentials != nil. This leaves guest VMs with unrestricted network access to all pods and services on the infrastructure cluster, breaking tenant isolation in multi-tenant environments.
Fix
Uses the existing infra cluster client (KubevirtInfraClientMap) -- the same client already used for VM creation, route management, and version validation -- to also create the NetworkPolicy on the infra cluster. The policy:
Test plan
Made with Cursor
Summary by CodeRabbit
Documentation
New Features
Bug Fixes / Observability