daemon-0.2.2: trace flag, XDP mode + structured attach log, distinct link-add codes by KRuskowski · Pull Request #21 · hyper-derp/Hyper-DERP

KRuskowski · 2026-05-04T18:03:35Z

Daemon-side changes from the 0.2.2 brief from the benchmark agent. Each P0 item is one commit; all four pass local repro.

P0.1 — diagnostic surface for t1-integrity (`8a13935`)

New --trace-forward-hashes daemon flag (off by default). When enabled, every wg-relay forward logs a SHA-256(payload) hash + length + source/destination peer pubkey prefix at two points: ingress (after the source peer matches and after optional MAC1 verification) and egress (just before the sendto to the destination). The same SHA on both lines proves the relay didn't mutate the frame; divergence pinpoints a corrupting code path. Per-frame log on the hot path — start warns the operator and the docs note it must not be left on in production.
wg peer list now surfaces per-peer drop counters peer.<i>.drop_no_link and peer.<i>.drop_pubkey_mismatch. These are the pair-attributable subset of the existing aggregate counters. Aggregate counters are unchanged; the other drop classes (drop_unknown_src, drop_not_wg_shaped, drop_handshake_no_pubkey_match) are by definition unattributable to a known peer and stay aggregate-only.

Operator-observable behaviour: with the runner patched to pass --trace-forward-hashes for the integrity test specifically, the relay logs ingress + egress SHA pairs the runner can compare end-to-end. wg peer list shows where each known peer's drops are landing without diffing aggregate counters between bursts.

P0.2 — XDP attach (`acb29ec`)

Structured xdp_attached iface=<nic> ifindex=N mode={drv,skb} driver=<drv> kernel=<release> log line per attached NIC.
New --xdp-mode={drv,skb,auto,off} flag, default drv. The previous code unconditionally tried DRV then fell back to SKB — the silent fallback that on cloud-gcp-c4 labelled rows "xdp" that were really running on the userspace recv loop. auto preserves the historical behaviour explicitly; off skips XDP attach even when --xdp-interface is set.
Attach failure is FATAL. XdpAttach failure logs xdp_attach_failed iface=<nic> ifindex=N mode=<mode> driver=<drv> kernel=<release> reason=<errno-string> and WgRelayStart returns nullptr → main.cc exits non-zero. The runner sees a clear failure signal rather than a silently-degraded relay.
gVNIC / vmxnet3 / virtio-net advertise XDP only in generic mode on stock kernels through 6.12 — operators on those NICs need --xdp-mode=auto (or =skb) to opt in to the generic fallback explicitly. Documented in --help and in the 0.2.2 CHANGELOG note.

Operator-observable behaviour: a fresh daemon launch on the cloud fleet either attaches XDP and logs the achieved mode, or fails fast with the structured xdp_attach_failed line. Userspace path is no longer reached implicitly.

P0.3 — link-per-peer limit (`8137a2c`)

Decision: option (b) from the brief. The "each peer is in at most one link" invariant is load-bearing — it's the iter-1 design contract that lets the relay forward purely on the source 4-tuple. Multi-link mesh routing is future work (per-link UDP ports or in-packet introspection); not on the path for 0.2.2.

New WgRelayLinkAddResult enum (kWgLinkOk / UnknownPeer / SelfLink / LimitExceeded / Duplicate) plus WgRelayLinkAddDetail. Bool WgRelayLinkAdd kept as a thin wrapper for callers that only need success/failure.
WgLinkAdd einheit handler maps each non-OK result onto a distinct error code: wg_peer_unknown, wg_link_self, link_limit_exceeded, wg_link_duplicate. The runner now has a discriminator to switch on; star-topology configs that hit the limit can fall back to disjoint pairs without parsing free-form error text.
wg link add verb description in the registry calls out the iter-1 limit so wg link add --help documents it.

Operator-observable behaviour: a wg link add that exceeds the one-link-per-peer limit returns error.code = link_limit_exceeded with a one-line message explaining the rule. Other failure modes (typo'd peer, duplicate pair, self-link) return distinct codes too.

P1 — verb registration (verified, no code change)

einheit_channel.cc:2191-2254 registers all 11 wg_* verbs (peer add/pubkey/nic/remove/list, link add/remove/list, show, show config, blocklist list) with handler, role, dotted path, description, and arg schema. The round-1 oneshot: no matching command was the einheit CLI's hd_relay adapter (separate repo, out of scope).

Test coverage

tests/test_wg_relay_trace.cc (5 cases): trace on/off, invalid --xdp-mode, --xdp-mode=off, per-peer drop counter increment.
tests/test_wg_relay_links.cc (7 cases): all five outcome codes, the brief's star topology, the runner's expected fallback (disjoint pairs).

12/12 local pass.

Out of scope (confirmed)

No changes in ~/dev/HD.Benchmark, ~/dev/einheit, DERP mode, HD-Protocol mode, or the wg-relay steady-state forwarding hot path. Trace logging is gated behind a single bool check; per-peer counter bumps sit next to existing aggregate increments. The 0.2.1 24 h T2 soak signal stands.

Daemon-side diagnostic surface for the t1-integrity failure on the 0.2.1 cloud-gcp-c4 benchmark. The brief is to ship diagnostics first, fix second — until we can see what diverges, any "fix" is a guess. What the daemon does now: * New `--trace-forward-hashes` flag (off by default). When enabled, every wg-relay forward logs a SHA-256(payload) hash + frame length + source/destination peer pubkey prefix at two points: ingress (after the source peer matches and after optional MAC1 verification) and egress (just before the sendto to the destination). The same SHA on both lines proves the relay didn't mutate the frame; divergence pinpoints a corrupting code path. Per-frame log on the hot path — start warns the operator and the docs note it must not be left on in production. * `wg peer list` now surfaces per-peer drop counters `peer.<i>.drop_no_link` and `peer.<i>.drop_pubkey_mismatch`. These are the pair-attributable subset of the existing aggregate counters, populated at the same drop sites where `src_peer` is already known. Aggregate counters are unchanged; the other drop classes (`drop_unknown_src`, `drop_not_wg_shaped`, `drop_handshake_no_pubkey_match`) are by definition unattributable to a known peer and stay aggregate-only. What the operator should observe on next run: with the runner patched to pass `--trace-forward-hashes` for the integrity test specifically, the relay logs ingress + egress SHA pairs the runner can compare end-to-end. `wg peer list` shows where each known peer's drops are landing without the runner having to diff aggregate counters between bursts. Smoke test: tests/test_wg_relay_trace.cc drives the wg-relay forwarder in-process with two peers + one link, exercises both trace=on / trace=off paths, and verifies the per-peer drop_no_link counter increments on an unlinked peer's traffic. Steady-state forwarding path is unchanged when the trace flag is off — the per-frame log is gated behind a single bool check, and the per-peer counter bumps are next to existing aggregate increments. The 24-h soak signal from 0.2.1 stands.

Daemon-side answer to the t1-xdp-attach failure on the 0.2.1 cloud-gcp-c4 benchmark, where gVNIC didn't support XDP_DRV_MODE and the relay silently fell back to SKB mode — the runner labelled subsequent rows "xdp" that were really running on the userspace recv loop. What the daemon does now: * `xdp_attached iface=<nic> ifindex=N mode={drv,skb} driver=<drv> kernel=<release>` is emitted per attached NIC on success. The achieved mode + driver/kernel pair are legible from a single grep instead of having to infer the state from older free-form text. * `--xdp-mode={drv,skb,auto,off}` (default `drv`). Replaces the previous hardcoded "try DRV, fall back to SKB" loop that was the source of the silent fallback. `auto` preserves the old behaviour; `skb` forces generic; `off` skips XDP attach entirely so the operator can keep `--xdp-interface` in the unit file and toggle at runtime. * Attach failure is FATAL. `XdpAttach` returning false logs `xdp_attach_failed iface=<nic> ifindex=N mode=<mode> driver=<drv> kernel=<release> reason=<errno-string>` and WgRelayStart tears the relay down; main.cc exits non-zero. The runner sees a clear failure signal rather than a silently-degraded relay. * Driver / kernel detection helpers: `ReadNicDriver` reads `/sys/class/net/<iface>/device/driver` and returns its basename; `ReadKernelRelease` calls `uname()`. Both safe when sysfs paths are absent (loopback, bridges) — they return "?" and the log line still parses. What the operator should observe on next run: a fresh daemon launch on the cloud fleet either attaches XDP and logs the achieved mode, or fails fast with the structured `xdp_attach_failed` line. Userspace path is no longer reached implicitly. gVNIC / vmxnet3 / virtio-net advertise XDP only in generic mode on stock kernels through 6.12 — operators on those NICs need `--xdp-mode=auto` (or `=skb`) to opt into the generic fallback explicitly. Documented in --help and in the 0.2.2 CHANGELOG note. Smoke tests in tests/test_wg_relay_trace.cc: * RejectsUnknownXdpMode — invalid mode → WgRelayStart fails. * XdpModeOffSkipsAttach — `off` skips attach even when an interface is configured. Steady-state forwarding path is untouched. The 24-h soak signal from 0.2.1 stands.

Decision: option (b) from the brief. The "each peer is in at most one link" invariant is load-bearing — it's the iter-1 design contract that lets the relay forward purely on the source 4-tuple. The relay's design memory captures the rationale, and the comment at LinkAddLocked already calls it out. Multi-link mesh routing is future work (per-link UDP ports or in-packet introspection); not on the path for 0.2.2. Daemon-side fix: differentiate the einheit-channel error codes so the benchmark runner can detect "you hit the one-link-per- peer limit" specifically rather than the generic `wg_link_failed`. What changed: * New `WgRelayLinkAddResult` enum (kWgLinkOk / UnknownPeer / SelfLink / LimitExceeded / Duplicate) plus a `WgRelayLinkAddDetail` accessor that returns it. Bool `WgRelayLinkAdd` is preserved as a thin wrapper for the many call sites that only need success/failure. * `WgLinkAdd` einheit handler maps each non-OK result onto a distinct error code: `wg_peer_unknown`, `wg_link_self`, `link_limit_exceeded`, `wg_link_duplicate`. The runner now has a discriminator to switch on; star-topology configs that land on the limit can fall back to disjoint pairs without parsing free-form error text. * `wg link add` verb description in the registry calls out the iter-1 limit so `wg link add --help` documents it. What the operator should observe on next run: a `wg link add` that exceeds the one-link-per-peer limit returns `error.code = link_limit_exceeded` with a one-line message explaining the rule. Other failure modes (typo'd peer, duplicate pair, self-link) return distinct codes too. No silent drops — every rejection is named and surfaced. Regression: tests/test_wg_relay_links.cc covers all five outcome codes (Ok / UnknownPeer / SelfLink / LimitExceeded / Duplicate), the star topology from the brief, and the runner's expected fallback (disjoint pairs). The bool wrapper is exercised separately to confirm any non-OK detail still maps to false. Steady-state forwarding path is untouched.

Two flake sources in the test, both observed across 15 back-to-back runs of the full binary: * SetUp picked the relay's UDP port via bind-then-close on loopback. A different process grabbing the port between the test's close() and the relay's bind() left WgRelayStart returning nullptr. Replaced with StartRelay, which retries up to 5 times with fresh ephemeral ports. * The test read fwd_packets immediately after recv() on bob's socket returned, but the relay loop bumps the counter AFTER sendto, so RecvOnBob can return while the relay thread is still between sendto and fetch_add. The trace egress log fires in the same window, which is why the failing run shows the egress log but stats=0. Added WaitFwdPackets that polls up to ~500 ms. Trace path is unchanged. The egress log line still fires in the order ingress → sendto → egress → counter — same as production.

Followup on the daemon-0.2.2 review: the three drop classes the brief asked for that I'd skipped as "unattributable to a known peer" still have an apparent source IP, and that's exactly what the runner needs to triage the c4_gcp internal- vs-external NAT bug — `drop_handshake_no_pubkey_match` aggregate is the failure mode but says nothing about which IP is producing it. Added a small bounded `drop_by_src` map (host-byte-order uint32 → struct of three counters + last_seen_ns), capped at 256 source IPs with FIFO eviction on overflow. Bumped at every userspace site that increments `drop_unknown_src`, the userspace `drop_not_wg_shaped` shape-filter site, and the `drop_handshake_no_pubkey_match` site at the bottom of `HandleUnknownSrcHandshakeLocked`. Surfaced via a new `wg show drop_sources` verb that emits one row per source IP with the three counters plus last-seen timestamp. v6 sources are skipped — the brief's diagnostic targets are v4 NAT mismatches. The XDP-path `drop_not_wg_shaped` (in the BPF program) stays aggregate-only — populating a per-source map from XDP would need new BPF map plumbing and isn't on the runner's diagnostic path. The userspace shape-filter path (cold; runs when XDP isn't attached or when XDP_PASSes for shape) is covered. Smoke test in test_wg_relay_trace.cc: send a 1-byte non-WG-shaped UDP datagram, verify the histogram populates with `drop_not_wg_shaped >= 1` for the source IP. The aggregate counters in `wg show` are unchanged; this is strictly an additive observability surface.

KRuskowski · 2026-05-04T18:12:58Z

Thanks for the review — addressing each point.

1. P0.3 — option (b) rationale, with file:line

The iter-1 limit is load-bearing in the data plane, not incidental scaffolding:

bpf/wg_relay.bpf.c:43-59 — the XDP ep_val is a single egress endpoint per source 4-tuple (one IP + one port + one ifindex). Not a list, not a tagged variant.
bpf/wg_relay.bpf.c:74-79 — the wg_peers BPF map is BPF_MAP_TYPE_HASH with key=ep_key (src 4-tuple) → value=ep_val (single partner endpoint). Multiple links per peer would have multiple values for the same source key — the map type doesn't allow it.
bpf/wg_relay.bpf.c:307-309 — the XDP fast path looks up exactly one partner: bpf_map_lookup_elem(&wg_peers, &src_key) returns one ep_val and forwards there. There's no disambiguation logic; with multiple partners possible, the kernel would silently arbitrarily pick one.
src/wg_relay.cc:710-724 — userspace FindLinkPartnerLocked mirrors the same "first match wins" behaviour by linear-scanning r->links. With multi-link peers it'd silently pick whichever link is first in the vector.

Relaxing to N≥4 (option a) would require: (1) BPF map type change to a list-valued map or a different key scheme, (2) some way to disambiguate the destination from the packet — the relay has nothing other than the source 4-tuple to go on, since it deliberately ignores WG protocol contents. That's a multi-week design change for iteration 2. Option (b) — distinct error code so the runner sees the rejection cleanly — is the right call for 0.2.2.

The iter-1 invariant is documented in docs/design/wg_relay_design.md and in the project memory; I should have cited it in the commit message originally.

2. P0.1 — counter set was incomplete; fixed in `7b3deb0`

You're right. drop_handshake_no_pubkey_match always has a source IP available, even though it isn't attributable to a known peer, and it's specifically the failure mode the c4_gcp internal/external NAT bug produces.

Added a per-source-IP histogram in 7b3deb0 covering the three classes that fall outside per-peer attribution: drop_unknown_src, drop_not_wg_shaped (userspace path), and drop_handshake_no_pubkey_match. Surfaced via a new wg show drop_sources verb that emits one row per source IP with the three counters + last-seen timestamp. Bounded at 256 IPs with FIFO eviction so a spoofed-source flood can't grow the map without bound.

XDP-path drop_not_wg_shaped stays aggregate-only — populating a per-source map from BPF would need new map plumbing and isn't on the diagnostic path the brief flagged. The userspace shape filter (which runs when XDP isn't attached or when packets XDP_PASS for shape reasons) is covered.

3. P0.1 — t1-integrity reproduction status

P0.1 ships diagnostics. Integrity reproduction is blocked on the runner agent enabling --trace-forward-hashes for the integrity test specifically.

The trace flag's own smoke tests pass (the 5 test_wg_relay_trace cases), proving the new code paths are live. I did not run the integrity test against a 0.2.1-equivalent build with trace on — that requires the runner-side change to flip the flag for that test, which the brief explicitly puts out of scope ("HD.Benchmark/tooling — different agent, different repo").

Recommend that as the next runner-side change. Once the runner enables --trace-forward-hashes (and now optionally wg show drop_sources for the per-source breakdown), the daemon-side diagnostic surface is sufficient to triage what's currently a one-log-line failure.

4. P0.1 — flake fix nature: test-only, no daemon change

Both fixes in b4b9b76 are test-side. No production race.

Port-bind retry: the test harness binds a UDP socket on port 0, reads the assigned ephemeral port, closes the fd, and passes the port to WgRelayStart to rebind. There's a TOCTOU window between close and rebind where another process can grab the port. Pure harness issue; fixed by retrying SetUp up to 5 times with fresh ephemeral ports.

Counter-poll: the test calls recv() on bob's socket then immediately reads WgRelayGetStats. The relay loop's order is sendto → trace egress log → fwd_packets.fetch_add, and sendto is what makes the kernel deliver bytes to bob. So recv() can return in the test while the relay thread is still microseconds away from fetch_add on the counter — read it too soon and fwd_packets is still 0 even though the packet was forwarded. This is a benign read-after-write skew, not a correctness issue: in production wg show is polled on operator timescale and the lag is invisible. Fixed by polling the counter up to ~500 ms before asserting.

The trace-log line firing without the counter incrementing is the same skew window — the egress log is emitted between sendto and fetch_add. That ordering is intentional (we want to log "we sent it" before claiming the forward succeeded in counters); the test just needs to acknowledge the window.

KRuskowski added 5 commits May 4, 2026 19:30

KRuskowski merged commit 6137cb3 into master May 4, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

daemon-0.2.2: trace flag, XDP mode + structured attach log, distinct link-add codes#21

daemon-0.2.2: trace flag, XDP mode + structured attach log, distinct link-add codes#21
KRuskowski merged 5 commits into
masterfrom
daemon-0.2.2

KRuskowski commented May 4, 2026

Uh oh!

KRuskowski commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KRuskowski commented May 4, 2026

P0.1 — diagnostic surface for t1-integrity (8a13935)

P0.2 — XDP attach (acb29ec)

P0.3 — link-per-peer limit (8137a2c)

P1 — verb registration (verified, no code change)

Test coverage

Out of scope (confirmed)

Uh oh!

KRuskowski commented May 4, 2026

1. P0.3 — option (b) rationale, with file:line

2. P0.1 — counter set was incomplete; fixed in 7b3deb0

3. P0.1 — t1-integrity reproduction status

4. P0.1 — flake fix nature: test-only, no daemon change

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

P0.1 — diagnostic surface for t1-integrity (`8a13935`)

P0.2 — XDP attach (`acb29ec`)

P0.3 — link-per-peer limit (`8137a2c`)

2. P0.1 — counter set was incomplete; fixed in `7b3deb0`