test(e2e): add cross mesh granularity#490
Conversation
Mirrors e2e/full-mesh.sh and e2e/location-mesh.sh for the new --mesh-granularity=cross mode introduced by the preceding commits. setup_suite annotates the kind nodes into two locations (control-plane and the first worker as loc-a, the second worker as loc-b) so the test exercises the case "cross" is meant to handle: direct WireGuard tunnels between locations, native CNI inside a location. Tests: - test_cross_mesh_connectivity: pings + adjacency matrix - test_cross_mesh_peer: kgctl peer create/showconf - test_mesh_granularity_auto_detect: kgctl graph auto-detection - test_cross_peer_topology: sanity that loc-a nodes see only the loc-b node as a peer (and vice versa), distinguishing "cross" from "full" (where every node is a peer) and "location" (where non-leaders have no peers at all) The new suite is wired into the existing e2e make target between location-mesh.sh and multi-cluster.sh. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The "cross from {a,b,c,d}" test cases added in 3590b12 predate the
cniCompatibilityIPs field on segment, introduced by Cilium support
in squat#409. Each segment in the cross test cases describes a single
node, so the expected value mirrors the existing full/location
cases: []*net.IPNet{nil}.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The cross granularity intentionally removes the WireGuard tunnel between nodes that share a location and relies on the underlying CNI to carry intra-location pod traffic over its own overlay (e.g. Cilium VXLAN). The bridge CNI used by the e2e harness has no such overlay, so check_ping/check_adjacent cannot succeed on this cluster — they were timing out trying to reach the same-location worker. Keep the topology checks (peer count per node, kgctl graph auto-detect, kgctl peer create), which validate the cross routing logic without depending on a CNI overlay. End-to-end connectivity under cross is covered by the Cilium-CNI suite added separately. Also clean up the location annotations in teardown_suite so the suites that follow (multi-cluster, handlers, kgctl) start from the same node-annotation state they used to. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
d1ef6fd to
840d14f
Compare
Just removing the location annotations leaves the DaemonSet in --mesh-granularity=cross. The handler tests that follow assume the control-plane WireGuard IP is 10.4.0.1 (the leader of a single- location mesh) and time out when cross's per-node leader assignment hands that IP to a different node. Roll the DaemonSet back to --mesh-granularity=location in the teardown so the cluster state mirrors what location-mesh.sh leaves behind, which is the working baseline expected by multi-cluster.sh, handlers.sh and kgctl.sh. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
|
Thanks for picking this back up. FYI: While I was running this in my clusters for years and it worked great in general, there was one issue which I observed from time to time but could not track down. It might not even be related to this change: When adding nodes to the cluster sometimes the networking would be weird and I was seeing a 10Hz (IIRC) reconcile metric on Grafana. I am not using it anymore at the moment so I don't have any concrete metrics or traces I can share unfortunately. |
|
@skirsten thanks for chiming in! For the rebase + attribution: to keep this clean and merge under https://github.com/cozystack/kilo/tree/add-cross-mesh-granularity-rebased You can fast-forward your fork's PR branch onto it with: That refreshes #328 with three of your commits unchanged plus the About the 10 Hz reconcile loop you saw: I had a quick look at the |
Summary
This rebases #328 (
--mesh-granularity=crossby @skirsten) onto currentmainand adds the e2e test suite that was the only blocker for merge,per the discussion in #489.
The three original commits from #328 are preserved as-is to keep
authorship attribution. A small conflict in
docs/kg.mdwas resolvedin favour of the current
--mtudescription (which moved to theautodefault in #406) while keeping #328's addition ofcrosstothe granularity list.
What
crossdoesDirect WireGuard tunnels between every pair of nodes that live in
different locations; intra-location traffic stays on the CNI overlay.
This sits between
location(one tunnel per location pair, leader asrelay → SPOF) and
full(one tunnel per node pair, includingintra-location overhead).
New e2e suite (
e2e/cross-mesh.sh)Mirrors
e2e/full-mesh.shande2e/location-mesh.sh:setup_suiteannotates kind nodes into two locations (control-planeloc-a, second worker asloc-b) so the testexercises the cross-location case
test_cross_mesh_connectivity— pings + adjacency matrixtest_cross_mesh_peer—kgctl peer create / showconfwithgranularity
crosstest_mesh_granularity_auto_detect—kgctl graphauto-detectiontest_cross_peer_topology— sanity that loc-a nodes only see theloc-b node as a peer (and vice versa), distinguishing
crossfromfull(every node is a peer) andlocation(non-leaders have nopeers at all)
Wired into the
e2e:make target betweenlocation-mesh.shandmulti-cluster.sh.Refs