Skip to content

feat(snd/p2p): SEI_NLB_TARGET_TYPE env var for per-cluster target-type selection#372

Merged
bdchatham merged 1 commit into
mainfrom
feat/nlb-target-type-env
May 30, 2026
Merged

feat(snd/p2p): SEI_NLB_TARGET_TYPE env var for per-cluster target-type selection#372
bdchatham merged 1 commit into
mainfrom
feat/nlb-target-type-env

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

Summary

The per-pod P2P endpoint Service (PR #365) previously hardcoded service.beta.kubernetes.io/aws-load-balancer-nlb-target-type=ip. That's the right call on VPC-CNI clusters (prod, dev) where pod IPs are VPC-routable — but it's a hard blocker on harbor, where Cilium cluster-pool puts pods in 100.64.0.0/14 (CGNAT, unreachable from any NLB in a VPC subnet).

This PR adds a controller-level env var (SEI_NLB_TARGET_TYPE) so each overlay picks the right mode without baking cluster-specific logic into the controller.

Behaviour

  • SEI_NLB_TARGET_TYPE=ip (default) — pod IPs registered directly with the target group; AllocateLoadBalancerNodePorts: false so the limited 30000-32767 NodePort range is preserved.
  • SEI_NLB_TARGET_TYPE=instance — NLB targets the EC2 node at its NodePort; kube-proxy / Cilium socket-LB does the NodePort→pod rewrite locally. AllocateLoadBalancerNodePorts left nil so kube allocates one.
  • Anything else → controller logs Invalid SEI_NLB_TARGET_TYPE and exits 1 at startup.

Per-cluster value lives in each overlay's manager-patch.yaml, mirroring how SEI_P2P_ENDPOINT_DOMAIN, SEI_GATEWAY_DOMAIN, etc. are wired today.

Validation + default placement

  • Validation and default-resolution both happen at controller construction in cmd/main.go. The reconciler's NLBTargetType field is canonical — by the time any reconcile runs, the field is always either ip or instance.
  • No per-reconcile branching, no "if empty" fallback in the hot path.

Files

  • cmd/main.go — env read + validation + default fill-in.
  • internal/controller/nodedeployment/controller.go — new NLBTargetType field on SeiNodeDeploymentReconciler.
  • internal/controller/nodedeployment/p2p_endpoint.go — exported constants NLBTargetTypeIP, NLBTargetTypeInstance, DefaultNLBTargetType. generateP2PEndpointService takes target-type as a parameter; AllocateLoadBalancerNodePorts gated on it.
  • internal/controller/nodedeployment/p2p_endpoint_test.go — existing ip test asserts NodePort allocation disabled; new TestGeneratePublishableService_InstanceTargetType asserts the instance shape (annotation + nil-NodePort-allocation).

Downstream

  • harbor overlay (platform repo) — separate PR to add SEI_NLB_TARGET_TYPE: instance to clusters/harbor/sei-k8s-controller/manager-patch.yaml. After that lands and Flux applies, harbor's publishable-P2P Services will emit target-type: instance and the per-pod NLB → node:NodePort → Cilium-socket-LB → pod path becomes viable.
  • prod, dev — no change; defaults to ip (current behaviour).

One-way doors

  • Env var name SEI_NLB_TARGET_TYPE. Once any overlay sets it, renaming is painful. Convention-matched to existing SEI_* env vars.

Test plan

  • Unit + lint pass on CI.
  • Manual smoke after harbor's overlay PR: harbor syncer-0-0-p2p Service annotation = instance, NodePort allocated.
  • Prod/dev unaffected (no overlay change → default ip → existing behaviour).

🤖 Generated with Claude Code

…e selection

Per-pod P2P endpoint Services previously hardcoded
aws-load-balancer-nlb-target-type=ip. That's correct on VPC-CNI clusters
(prod, dev) where pod IPs are VPC-routable, but breaks on harbor where
Cilium cluster-pool uses 100.64.0.0/14 — AWS has no route for that CIDR
so an NLB with target-type=ip can never reach the pod.

Adds SEI_NLB_TARGET_TYPE (values: "ip" or "instance") on the controller.
Read once in cmd/main.go: defaults to "ip" when unset, validated at
startup (fails loudly on typo), then passed to the reconciler. The
reconciler stamps the value verbatim into the Service annotation and
also gates AllocateLoadBalancerNodePorts on it — ip keeps the NodePort
range untouched (no NodePort hop), instance leaves the field nil so
kube allocates one (the NLB→node→pod hop needs it).

Harbor's overlay will set value: instance in a separate platform PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cursor
Copy link
Copy Markdown

cursor Bot commented May 30, 2026

PR Summary

Medium Risk
Changes how internet-facing P2P NLBs register targets and NodePorts; misconfiguration breaks external P2P reachability on affected clusters, though defaults preserve prod/dev behavior.

Overview
Per-pod P2P LoadBalancer Services no longer hardcode AWS NLB target-type=ip. The controller reads SEI_NLB_TARGET_TYPE at startup (ip default, or instance), validates it in cmd/main.go, and passes it into SeiNodeDeploymentReconciler so each cluster overlay can pick the right mode without cluster-specific code in the reconciler.

ip keeps prior behavior: annotation aws-load-balancer-nlb-target-type=ip and AllocateLoadBalancerNodePorts: false. instance sets target-type=instance and leaves NodePort allocation unset so the NLB can reach pods via node NodePort (for non–VPC-routable pod CIDRs). Unit tests cover both shapes.

Reviewed by Cursor Bugbot for commit dfc998a. Bugbot is set up for automated code reviews on this repo. Configure here.

@bdchatham bdchatham merged commit 6a4334e into main May 30, 2026
4 of 5 checks passed
@bdchatham bdchatham deleted the feat/nlb-target-type-env branch May 30, 2026 01:27
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit dfc998a. Configure here.

desiredNames := make(map[string]struct{}, group.Spec.Replicas)
for i := range int(group.Spec.Replicas) {
desired := generateP2PEndpointService(group, i, r.p2pEndpointHostname(group, i))
desired := generateP2PEndpointService(group, i, r.p2pEndpointHostname(group, i), r.NLBTargetType)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Envtest reconciler missing NLBTargetType causes test failure

High Severity

The envtest reconciler in suite_test.go (line 202) does not initialize NLBTargetType, so it defaults to "". When reconcileP2PEndpoints passes r.NLBTargetType to generateP2PEndpointService, the annotation aws-load-balancer-nlb-target-type becomes empty string, and the AllocateLoadBalancerNodePorts logic skips the == NLBTargetTypeIP branch (producing nil instead of *false). The envtest at p2p_endpoint_test.go:98-99 asserts the annotation equals "ip" — this assertion will fail, breaking CI.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit dfc998a. Configure here.

bdchatham added a commit that referenced this pull request May 30, 2026
PR #372 moved SEI_NLB_TARGET_TYPE default-resolution to cmd/main.go so
the reconciler treats Spec.NLBTargetType as canonical. The envtest
suite_test.go is a parallel "main" that constructs its own reconciler
and never reads the env var — it was left with NLBTargetType="" after
the rebase, which stamped an empty target-type annotation on the
generated Service and broke
TestP2PEndpointP2P_CreateWithTCP_ChildHasAddressAndServiceExists.

Apply the same default in the envtest suite so it mirrors cmd/main.go's
construction-site invariant.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant