Skip to content

NetworkPolicy blocks OLM catalog connectivity and bundle unpacking jobs #3676

@m4dm4rtig4n

Description

@m4dm4rtig4n

Environment

  • OLM Version: v0.36.0 (using quay.io/operator-framework/olm:master) and master branch
  • Kubernetes: Rancher Desktop (K3s)
  • Catalog Source: quay.io/operatorhubio/catalog:latest
  • Installation method: Helm chart (./deploy/chart)

Issue Summary

I encountered this issue on both v0.36.0 and the master branch. After troubleshooting with Claude Code, we identified two NetworkPolicy misconfigurations that prevent OLM from functioning properly. Below are the findings and the workarounds that successfully resolved the issues.

Problem Description

When deploying OLM with NetworkPolicies enabled (specifically a default-deny-all-traffic policy), two critical connectivity issues prevent OLM from functioning properly:

Issue 1: Catalog gRPC connectivity blocked

The operatorhubio-catalog pod cannot receive incoming connections on port 50051, causing persistent connection failures:

failed to populate resolver cache from source operatorhubio-catalog/operator-lifecycle-manager: 
failed to list bundles: rpc error: code = Unavailable desc = connection error: 
desc = "transport: Error while dialing dial tcp 10.43.119.62:50051: connect: connection refused"

Status observed:

status:
  connectionState:
    lastObservedState: TRANSIENT_FAILURE

The Helm chart creates a default-deny-all-traffic NetworkPolicy that blocks all ingress by default, but doesn't include a corresponding NetworkPolicy to allow ingress to the catalog pod.

Issue 2: Bundle unpacking jobs cannot access Kubernetes API

When a subscription attempts to install an operator, the bundle unpacking jobs fail to access the Kubernetes API server:

Error: error loading manifests from directory: 
Get "https://10.43.0.1:443/api/v1/namespaces/operator-lifecycle-manager/configmaps/...": 
dial tcp 10.43.0.1:443: connect: connection refused

This causes InstallPlans to fail with:

status:
  phase: Failed
  conditions:
  - type: BundleLookupFailed
    reason: BackoffLimitExceeded
    message: Job has reached the specified backoff limit

The default-deny-all-traffic NetworkPolicy blocks egress traffic, preventing jobs from reaching the Kubernetes API server.

Expected Behavior

When NetworkPolicies are enabled, OLM should include appropriate NetworkPolicies to allow:

  1. Ingress to catalog pods on port 50051 for gRPC communication
  2. Egress from all pods to access the Kubernetes API server and external registries

Reproduction Steps

  1. Deploy OLM using the Helm chart with default NetworkPolicies
  2. Create a CatalogSource pointing to quay.io/operatorhubio/catalog:latest
  3. Create a Subscription for any operator (e.g., cloudnative-pg)
  4. Observe catalog connection failures and InstallPlan failures

Workaround Applied

Two additional NetworkPolicies were required to fix the issues:

1. Allow ingress to catalog pods

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: operatorhubio-catalog
  namespace: operator-lifecycle-manager
spec:
  podSelector:
    matchLabels:
      olm.catalogSource: operatorhubio-catalog
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector: {}
    ports:
    - protocol: TCP
      port: 50051
  egress:
  - {}

2. Modify default-deny-all-traffic to allow egress

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all-traffic
  namespace: operator-lifecycle-manager
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  egress:
  - {}  # Allow all egress traffic

Result: After applying these NetworkPolicies, OLM became fully operational with catalog connectivity working (lastObservedState: READY) and operators installing successfully.

Proposed Solution

The OLM Helm chart should include:

  1. A catalog NetworkPolicy template that automatically creates ingress rules for any CatalogSource pods, using label selectors like olm.catalogSource

  2. Modified default NetworkPolicy that includes egress rules for:

    • Kubernetes API access (port 443)
    • DNS resolution (port 53 TCP/UDP)
    • Container registry access (port 443 for HTTPS registries)
    • gRPC catalog communication (port 50051)

This would allow OLM to work out-of-the-box in environments with strict NetworkPolicy enforcement, which is a common security requirement in production Kubernetes clusters.

Additional Context

The existing Helm chart includes NetworkPolicies for catalog-operator, olm-operator, and packageserver, which do include appropriate egress rules. However:

  • There's no NetworkPolicy for catalog pods themselves
  • The default-deny-all-traffic policy is too restrictive for OLM's operational requirements

Files to check:

  • deploy/chart/templates/networkpolicy.yaml
  • Catalog operator NetworkPolicy definitions

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions