Skip to content

feat: add multi-ALB load balancing for scope ingress#164

Merged
fedemaleh merged 4 commits intobetafrom
feature/resolve-balancer
Apr 9, 2026
Merged

feat: add multi-ALB load balancing for scope ingress#164
fedemaleh merged 4 commits intobetafrom
feature/resolve-balancer

Conversation

@fedemaleh
Copy link
Copy Markdown
Collaborator

Summary

  • Extracts ALB name resolution from k8s/scope/build_context into a dedicated k8s/scope/networking/resolve_balancer script
  • Adds support for multiple ALBs via new provider configuration properties additional_public_balancers and additional_private_balancers, distributing scopes across ALBs by selecting the one with the fewest HTTPS listener rules
  • Ensures deployment-time consistency between DNS (Route53) and ingress routing — deployments look up the ALB already assigned to the scope instead of recalculating

Problem

ALBs have a limit of ~100 rules per listener. When a cluster hosts many scopes on a single ALB, it fills up. Additionally, the ALB is resolved independently during scope creation (Route53 record) and deployment (ingress), which can produce different results if rule counts change between the two — leading to DNS pointing to ALB-A while the ingress is on ALB-B.

Solution

Multi-ALB load balancing (scope creation)

When additional_public_balancers or additional_private_balancers are configured in the scope-configurations provider, the script:

  1. Builds a candidate list: base ALB + additional balancers
  2. Queries AWS ELBv2 for each candidate's HTTPS (443) listener rule count (excluding the default rule)
  3. Selects the ALB with the fewest rules

Deployment-time consistency

When running in a deployment context (vs scope creation), the script skips recalculation and instead looks up the ALB already in use:

  1. K8s ingress lookup (fastest) — reads the alb.ingress.kubernetes.io/load-balancer-name annotation from existing ingress for the scope
  2. Route53 lookup (first deployment) — finds the A-record alias for the scope domain and reverse-lookups the ALB name from its DNS name
  3. Fallback — if neither lookup succeeds, recalculates as a safety net

Provider configuration

{
  "scope-configurations": {
    "networking": {
      "additional_public_balancers": ["alb-public-2", "alb-public-3"],
      "additional_private_balancers": ["alb-private-2", "alb-private-3"]
    }
  }
}

When these arrays are not configured, the script behaves exactly as before (single ALB, no AWS API calls for rule counts).

AWS Permissions

New permission required (only when using additional balancers at deployment time):

Permission Used by Purpose
route53:ListResourceRecordSets get_alb_from_route53 Look up existing A-record alias to determine which ALB was assigned during scope creation

The following permissions are already required by existing scripts (verify_ingress_reconciliation, route53/manage_route):

Permission Already used by
elasticloadbalancing:DescribeLoadBalancers verify_ingress_reconciliation, route53/manage_route
elasticloadbalancing:DescribeListeners verify_ingress_reconciliation
elasticloadbalancing:DescribeRules verify_ingress_reconciliation

Files changed

File Change
k8s/scope/networking/resolve_balancer New — dedicated ALB resolution script with load balancing and deployment-time lookup
k8s/scope/build_context Replaced 15 lines of inline ALB logic with source call
k8s/scope/tests/networking/resolve_balancer.bats New — 25 BATS tests
CHANGELOG.md Updated with feature description and new permission

Test plan

  • 25 new BATS tests for resolve_balancer covering:
    • Default ALB names (public/private)
    • Provider override priority (scope-configurations > container-orchestration)
    • Least-loaded ALB selection with multiple candidates
    • Deployment-time lookup from K8s ingress
    • Deployment-time fallback to Route53
    • Deployment-time fallback to recalculation
    • AWS API failure handling (skip + warn)
    • All candidates failing (keeps default)
    • Empty array handling
    • Tie-breaking (first candidate wins)
    • Scope creation always calculates (never uses stale ingress)
  • 13 existing build_context tests pass (no regression)
  • 29 existing deployment/build_context tests pass (no regression)
  • Total: 67/67 tests passing

@fedemaleh fedemaleh changed the base branch from feature/instance-tests to beta April 9, 2026 16:55
@fedemaleh fedemaleh changed the base branch from beta to feature/alb-capacity-validation April 9, 2026 16:56
@fedemaleh fedemaleh force-pushed the feature/resolve-balancer branch from 4b4fac7 to 65e580a Compare April 9, 2026 17:00
@fedemaleh fedemaleh changed the base branch from feature/alb-capacity-validation to beta April 9, 2026 17:20
javi-null
javi-null previously approved these changes Apr 9, 2026
@fedemaleh fedemaleh dismissed javi-null’s stale review April 9, 2026 19:09

The merge-base changed after approval.

@fedemaleh fedemaleh force-pushed the feature/resolve-balancer branch from 0b85c26 to 5dceeb1 Compare April 9, 2026 19:10
@javi-null javi-null self-requested a review April 9, 2026 19:10
@fedemaleh fedemaleh merged commit d643516 into beta Apr 9, 2026
3 checks passed
@fedemaleh fedemaleh deleted the feature/resolve-balancer branch April 9, 2026 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants