Skip to content

test(e2e): add Cilium-CNI e2e suite#491

Open
kvaps wants to merge 9 commits intosquat:mainfrom
cozystack:feat/cilium-e2e-tests
Open

test(e2e): add Cilium-CNI e2e suite#491
kvaps wants to merge 9 commits intosquat:mainfrom
cozystack:feat/cilium-e2e-tests

Conversation

@kvaps
Copy link
Copy Markdown
Contributor

@kvaps kvaps commented Apr 28, 2026

Summary

Adds an e2e suite that runs Kilo with --compatibility=cilium on top of
a kind cluster where Cilium is the CNI. The existing e2e suites only
cover the Kilo bridge CNI path, so the --compatibility=cilium mode
(used by downstream platforms shipping Kilo + Cilium) currently has no
end-to-end coverage in upstream CI.

Depends on

This PR is stacked on top of #490 (which adds the cross granularity
and its bridge-CNI e2e tests). Once #490 is merged this branch will
rebase down to just the Cilium-specific commit.

What's added

  • e2e/kilo-kind-cilium.yaml — Kilo DaemonSet for the Cilium-CNI
    cluster: no kilo CNI ConfigMap, no install-cni init container,
    --cni=false --compatibility=cilium
  • e2e/lib.shinstall_cilium() (Helm, Cilium 1.16.5, VXLAN
    tunnelProtocol, ipam.mode=kubernetes, host firewall off) and
    create_cilium_cluster() mirroring create_cluster()
  • e2e/cilium-setup.sh, cilium-cross-mesh.sh, cilium-teardown.sh
    — three-stage suite mirroring the bridge setup/...mesh/teardown
    pattern
  • Makefile — new e2e-cilium target, kept separate from e2e
    because the Cilium cluster is incompatible with Kilo's bridge CNI
    used by the default suite

Scope

This is a baseline. Coverage is intentionally narrow:

  • Only cross granularity is covered; full and location with
    Cilium CNI are reasonable follow-ups in the same harness.
  • kubeProxyReplacement is off, so this exercises Kilo's
    Cilium-overlay handling without entangling Cilium's eBPF service LB.
    KPR coverage can be added with a values flag in a follow-up.

Validation

The CI runner needs helm available. Most GitHub-hosted Linux runners
have it preinstalled; if not, a setup-helm step will need adding to
the workflow that triggers the new make target. I couldn't run
make e2e-cilium locally (Docker Desktop on macOS doesn't reproduce
the Linux kind+Cilium path well), so behaviour will first be observed
in upstream CI; tuning (Cilium version pin, MTU, timeouts) may be
needed once it runs there.

Refs

@kvaps kvaps closed this Apr 28, 2026
@kvaps kvaps reopened this Apr 28, 2026
@kvaps kvaps force-pushed the feat/cilium-e2e-tests branch from fbfd0c0 to dfe4661 Compare April 28, 2026 12:32
squat pushed a commit that referenced this pull request Apr 28, 2026
Adds my GitHub username to the breakpoint authorized-users so I can
SSH into the runner when the e2e job fails on PRs I'm involved in
(currently #490 / #491). Per maintainer's suggestion in #489.

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
skirsten and others added 6 commits April 28, 2026 14:35
Mirrors e2e/full-mesh.sh and e2e/location-mesh.sh for the new
--mesh-granularity=cross mode introduced by the preceding commits.

setup_suite annotates the kind nodes into two locations (control-plane
and the first worker as loc-a, the second worker as loc-b) so the test
exercises the case "cross" is meant to handle: direct WireGuard
tunnels between locations, native CNI inside a location.

Tests:
- test_cross_mesh_connectivity: pings + adjacency matrix
- test_cross_mesh_peer: kgctl peer create/showconf
- test_mesh_granularity_auto_detect: kgctl graph auto-detection
- test_cross_peer_topology: sanity that loc-a nodes see only the
  loc-b node as a peer (and vice versa), distinguishing "cross"
  from "full" (where every node is a peer) and "location" (where
  non-leaders have no peers at all)

The new suite is wired into the existing e2e make target between
location-mesh.sh and multi-cluster.sh.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The "cross from {a,b,c,d}" test cases added in 3590b12 predate the
cniCompatibilityIPs field on segment, introduced by Cilium support
in squat#409. Each segment in the cross test cases describes a single
node, so the expected value mirrors the existing full/location
cases: []*net.IPNet{nil}.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The cross granularity intentionally removes the WireGuard tunnel
between nodes that share a location and relies on the underlying CNI
to carry intra-location pod traffic over its own overlay (e.g. Cilium
VXLAN). The bridge CNI used by the e2e harness has no such overlay,
so check_ping/check_adjacent cannot succeed on this cluster — they
were timing out trying to reach the same-location worker.

Keep the topology checks (peer count per node, kgctl graph
auto-detect, kgctl peer create), which validate the cross routing
logic without depending on a CNI overlay. End-to-end connectivity
under cross is covered by the Cilium-CNI suite added separately.

Also clean up the location annotations in teardown_suite so the
suites that follow (multi-cluster, handlers, kgctl) start from the
same node-annotation state they used to.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@kvaps kvaps force-pushed the feat/cilium-e2e-tests branch 2 times, most recently from 5e22206 to 82e994d Compare April 28, 2026 12:36
kvaps and others added 2 commits April 28, 2026 15:19
Just removing the location annotations leaves the DaemonSet in
--mesh-granularity=cross. The handler tests that follow assume the
control-plane WireGuard IP is 10.4.0.1 (the leader of a single-
location mesh) and time out when cross's per-node leader assignment
hands that IP to a different node.

Roll the DaemonSet back to --mesh-granularity=location in the
teardown so the cluster state mirrors what location-mesh.sh leaves
behind, which is the working baseline expected by multi-cluster.sh,
handlers.sh and kgctl.sh.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Adds a new e2e harness that brings up a kind cluster with Cilium as
the CNI (VXLAN overlay, default kube-proxy) and runs Kilo on top
with --cni=false --compatibility=cilium. This validates the cross
granularity + Cilium combination, which is the configuration shipped
by downstream platforms but not exercised by the existing bridge-CNI
suite.

Files:
- e2e/kilo-kind-cilium.yaml: Kilo DaemonSet for the Cilium-CNI
  cluster (no kilo CNI ConfigMap, no install-cni init container,
  --cni=false, --compatibility=cilium)
- e2e/lib.sh: install_cilium() helper (Helm, Cilium 1.16.5, VXLAN
  tunnelProtocol, IPAM kubernetes, host firewall off) and
  create_cilium_cluster()
- e2e/cilium-setup.sh, cilium-cross-mesh.sh, cilium-teardown.sh:
  three-stage suite mirroring the existing setup/...mesh/teardown
  pattern
- Makefile: new `e2e-cilium` target, kept separate from `e2e`
  because the Cilium cluster is incompatible with the Kilo bridge
  CNI used by the default suite

Kube-proxy replacement is intentionally left at the default (off)
for this baseline; KPR coverage can be added in a follow-up.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@kvaps kvaps force-pushed the feat/cilium-e2e-tests branch from 82e994d to 72eabe0 Compare April 28, 2026 13:20
Adds a new `e2e-cilium` job that mirrors the existing `e2e` job but
runs `make e2e-cilium` against the Cilium-CNI test cluster. Helm is
installed via azure/setup-helm because nscloud runners do not have
it preinstalled and lib.sh's install_cilium uses the Helm Cilium
chart.

Without this job the new Cilium e2e harness added in the previous
commit is never executed in CI.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@kvaps kvaps force-pushed the feat/cilium-e2e-tests branch from 72eabe0 to 2864940 Compare April 28, 2026 13:53
@kvaps kvaps marked this pull request as ready for review April 28, 2026 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants