Skip to content

fix(mesh): VIP MASQUERADE deadlock when all control-plane nodes share a location #495

@Arsolitt

Description

@Arsolitt

Description

When all control-plane nodes are assigned to the same kilo.squat.ai/location in cross-mesh granularity, a floating VIP (e.g., Talos VIP used as the kube-apiserver endpoint) becomes unreachable, causing a permanent cluster deadlock.

Root cause

pkg/mesh/routes.go:405-408 creates a POSTROUTING jump to KILO-NAT for allowedLocationIPs when t.location == s.location:

rules.AddToPrepend(iptables.NewRule(..., "nat", "POSTROUTING", "-d", alip.String(), ..., "-j", "KILO-NAT"))

The KILO-NAT chain has RETURN rules for individual node IPs (discovered as peers) but not for floating/virtual IPs. Traffic destined for the VIP falls through to MASQUERADE, which rewrites the source IP and breaks etcd coordination. Since all control-plane nodes have the same rule, the VIP cannot stabilize on any node.

This creates a permanent deadlock:

  • VIP unreachable → kube-apiserver down
  • kube-apiserver down → Kilo cannot read node annotations → cannot reconcile
  • iptables rules persist in broken state → VIP stays unreachable

Steps to reproduce

  1. Deploy a cluster with 3+ control-plane nodes using a floating VIP as the API endpoint
  2. Set kilo.squat.ai/granularity: cross and kilo.squat.ai/allowed-location-ips: <node-subnet> on all nodes
  3. Annotate control-plane nodes one by one with kilo.squat.ai/location=<same-value>
  4. After the last node is annotated, the API server becomes unreachable

Proposed fix

Add a source negation to the POSTROUTING rule so that same-subnet L2 traffic is not sent to KILO-NAT:

- rules.AddToPrepend(iptables.NewRule(iptables.GetProtocol(alip.IP), "nat", "POSTROUTING", "-d", alip.String(), "-m", "comment", "--comment", "Kilo: jump to NAT chain", "-j", "KILO-NAT"))
+ rules.AddToPrepend(iptables.NewRule(iptables.GetProtocol(alip.IP), "nat", "POSTROUTING", "-d", alip.String(), "!", "-s", alip.String(), "-m", "comment", "--comment", "Kilo: jump to NAT chain", "-j", "KILO-NAT"))

Cross-mesh traffic (source from a remote subnet, arriving via WireGuard) still gets MASQUERADEd. Local L2 traffic between nodes on the same subnet is excluded.

Known limitation

If a location has multiple disjoint allowedLocationIPs CIDRs, traffic between those CIDRs within the same location will still be MASQUERADEd. The ! -s exclusion only covers the matching CIDR, not the full set of local CIDRs. This is a pre-existing issue not introduced by the fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions