Skip to content

k8s NetworkPolicy: restrict BoJ pod ingress to HCG pod only (HCG tier-2 defence-in-depth) #135

@hyperpolymath

Description

@hyperpolymath

Summary

Optional defence-in-depth follow-up to the HCG tier-2 channel (hyperpolymath/standards#100, #91). Not acceptance-critical — Phase E acceptance is satisfied by the ClusterIP Service (PR #131) + the Elixir Cowboy loopback bind (PR #130) + the Zig adapter loopback bind (PR #132). This adds a fourth, finer-grained layer.

Background

Phase E's consumer-side audit (action item #8) named ClusterIP as the controlling lever for k8s ingress isolation. The audit also noted:

Optional: add a NetworkPolicy restricting BoJ pod ingress to only the HCG pod. Defence-in-depth beyond ClusterIP. Not strictly required for Phase E acceptance — Service-type is the controlling lever — but worth filing as a follow-up issue.

This issue is that filing.

Motivation

ClusterIP makes BoJ unreachable from outside the cluster. A NetworkPolicy makes BoJ unreachable from anywhere inside the cluster except the HCG pod. Three threat models it covers that ClusterIP alone doesn't:

  1. Compromised neighbour pod. Any pod in the same cluster that knows BoJ's ClusterIP can talk to it today. A NetworkPolicy restricts ingress to a labelled subset (e.g., app: http-capability-gateway), so a compromised pod elsewhere in the cluster cannot pivot to BoJ.
  2. Operator misconfiguration. If a future kustomize overlay re-introduces type: NodePort or type: LoadBalancer, the NetworkPolicy still blocks external ingress at the pod-network layer. Belt-and-braces against the Service spec drifting.
  3. Defence-in-depth for the §4 invariant. ADR-0004 §1 (and the Phase A contract §3 invariant 4) require BoJ's back-side bind to be "not externally routable". Today that's enforced by (a) loopback bind in code/container env and (b) ClusterIP. NetworkPolicy adds (c) pod-network ingress restriction — three layers that must all be violated for the §3 invariant to be reachable.

Proposed shape

New file: k8s/networkpolicy.yaml

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: boj-server-ingress
  labels:
    app: boj-server
spec:
  podSelector:
    matchLabels:
      app: boj-server
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: http-capability-gateway
      ports:
        - protocol: TCP
          port: 7700
        # Reserve forward-compat ports — once gRPC/GraphQL/SSE land,
        # they will be reached via the same HCG-pod allow-list.
        - protocol: TCP
          port: 7701
        - protocol: TCP
          port: 7702
        - protocol: TCP
          port: 7703

Caveats to address in the PR body:

  • Requires a CNI plugin that enforces NetworkPolicy (Calico, Cilium, Weave-NetPol, etc.). Some bare-metal / dev clusters use a CNI without NetworkPolicy support (e.g., flannel without --vxlan mode); in those clusters this manifest is silently a no-op. Document this in the file header.
  • Cluster operators using a non-HCG-fronted deployment will need to either skip applying this manifest or override the from.podSelector.matchLabels to allow their own ingress source. Same override pattern as the ClusterIP migration (kustomize/helm overlay).
  • Health probes from kubelet on the node IP are NOT pod-to-pod and may need a separate allow rule. Worth testing in staging before production.

Acceptance criteria (when filed as a PR)

  • k8s/networkpolicy.yaml lints clean with kubectl apply --dry-run=client.
  • Header comment cross-references this issue + ADR-0004 + docs/integration/hcg-tier2-rollout-runbook.md.
  • Override recipe documented (kustomize/helm overlay for non-HCG-fronted deployments).
  • CHANGELOG entry under ### Added.
  • Staging test: with the policy applied, curl from another pod (not labelled app: http-capability-gateway) to BoJ's ClusterIP times out. From an HCG-labelled pod, succeeds.
  • Refs hyperpolymath/standards#100 / #91, not Closes.

Priority

Low. Phase E can land and close without this. File it now so the design isn't lost; pick up after Phase E exit unless an incident raises the priority.

Refs

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions