evpn: ADR-0059 slice 1 — RemoteMacEntry alias_group_key + projection AF guard#84
Merged
Merged
Conversation
…AF guard Foundation slice for the EVPN aliasing dataplane. Pure-logic / portable-intent change; the existing diff/apply layer ignores the new field by construction, so this ships independently green with no dataplane behaviour change. - Add `alias_group_key: Option<(EthernetSegmentIdentifier, EthernetTagId)>` to `RemoteMacEntry` (crates/evpn/src/mac.rs). Populated by the projection layer when the originating Type 2 carries a non-zero ESI and at least one EAD-per-EVI alias has been observed. Empty `alias_vtep_ips` ⇔ `alias_group_key.is_none()`. - Add `aliasing::group_members(entry)` helper returning the canonical FDB nexthop group membership (`remote_vtep_ip` ∪ `alias_vtep_ips`, sorted + deduplicated via BTreeSet). Two entries with the same member set produce the same canonical group — the dataplane will key "membership unchanged" on this to avoid spurious NLM_F_REPLACE traffic in slice 3. - Enforce same-address-family-per-`(ESI, EthernetTag)` invariant on aliasing resolution in `project_evpn_routes_with_aliases`. Mixed-family EAD-per-EVI aliases under a same-segment primary are treated as operator misconfiguration: the mismatched VTEP is dropped from `alias_vtep_ips` and a `warn!` fires with the full (VNI, MAC, ESI, EthernetTag, primary, dropped) tuple. Backs ADR-0059's cross-family out-of-scope clause with code from day one. - 4 new projection unit tests covering: ESI==0 → key None; key populated when aliases observed; key None when non-zero ESI has no aliases; mixed-family aliases dropped (and the survivor carries the group key). - 3 new aliasing unit tests covering: primary-only group membership; sort order under unsorted insertion; defensive dedup of primary appearing in alias_vtep_ips. Foundation for ADR-0059 slice 2 (`nexthop_raw` raw-netlink module) and slice 3 (reconcile actor + owned-state lifecycle).
The module-level docs predated PR #84. They claimed aliasing "does not mutate projection or dataplane intent" and referenced a future `RemoteMacEntry::alias_vtep_ips` field — but slice 1 now wires both `alias_vtep_ips` and `alias_group_key` through the projection. The "FDB + ip route ECMP, or per-CE nexthop group" wording also predated ADR-0059's settled FDB-nexthop-group direction. Refresh the "What this module does" section to describe the three current surfaces (AliasIndex, alias_resolved_next_hops, group_members) and link to the projection consumer. Tighten the "does NOT do" bullet to "doesn't touch the kernel — that's crates/evpn-linux behind ADR-0059 slices 2-4" so the boundary stays clear without claiming the intent shape is untouched.
6 tasks
lance0
added a commit
that referenced
this pull request
May 12, 2026
… + CVE guard (#87) * evpn-linux: ADR-0059 slice 3a — FDB-NHG state types + apply primitive + CVE guard Foundation infrastructure for the aliasing-ECMP dataplane. **Zero operational behavior change in this PR.** Slice 3b lands the diff Pass 1b, the reconcile actor's apply coordinator, the startup NHID adoption pass, the `NexthopOps` impls on `LinuxDataplane` and `InMemoryDataplane`, and the Docker-runnable netns integration test that brings the surface onto the apply path. Slice 3a is the review-friendly half: pure-logic state types + the kernel-touching FDB-NHG row primitive (gated behind the CVE-2025-39851 inline guard) + a clean compile / clippy / test sweep. What lands: - `crates/evpn-linux/src/nh_id_alloc.rs` (NEW) — `NhIdAllocator` over a `[1, 0x4000]` bitmap with tag bits `0x3000_0000` (per-VTEP FDB nexthop) / `0x4000_0000` (FDB nexthop group), deliberately offset from FRR's `0x1000` / `0x2000` reservations so concurrent FRR + rustbgpd installs never collide on NLM_F_REPLACE (ADR-0059 §5 invariant 6). `reserve()` API for slice 3b's startup adoption pass. 13 unit tests. - `crates/evpn-linux/src/group_state.rs` (NEW) — `GroupOwnedMap` with per-group `(VNI, ESI, EthernetTag)` refcount across referring `(VNI, MAC)` rows, plus a per-VTEP-NH refcount across referring groups. `RefDelta` return value tells the apply coordinator when a group should be torn down on `RemoveFdbNhg`. ADR-0059 §7's `share_l2_nhg` defaults off, so the Linux-owned key includes VNI even though the portable `RemoteMacEntry::alias_group_key` from slice 1 stays VNI-less. 9 unit tests covering single-MAC install/remove, multi-MAC group sharing, N->1 drain (group stays alive), VTEP NH shared across groups, and member-set change isolation. - `crates/evpn-linux/src/linux/fdb_nhg.rs` (NEW) — `apply_install_fdb_nhg_row` / `apply_remove_fdb_nhg_row` building bridge FDB rows with NDA_NH_ID (= 13, via `NeighbourAttribute::Other(DefaultNla)` since `netlink-packet-route 0.30` has no typed variant). Inline **CVE-2025-39851 guard** rejects any install whose target L2VXLAN does not have `learning_disabled == Some(true)` (mainline fix `6ead38147ebb` "vxlan: Fix NPD when refreshing an FDB entry with a nexthop object"; the readiness probe is the upstream guard, this is belt-and-suspenders). 4 unit tests on the CVE guard. - `KernelFdbEntry::nh_id: Option<u32>` (snapshot.rs) + `linux/fdb.rs` parse path extracts `NDA_NH_ID` from the kernel dump via the `Nla` trait accessor on the `Other` escape hatch. `merge_fdb_rows` preserves `nh_id` across the self/master row split so the merged entry surfaces both halves. 1 new merge test. - `OwnedEntry` enum refactor — replaced the field-soup `{ last_applied_dst, last_applied_seq }` shape with an explicit `OwnedEntryKind { SingleDst { dst, mobility_seq }, FdbNhg { group_key } }`. Invalid states like "dst set AND group_key set" are structurally impossible. All existing call sites updated to use the `OwnedEntry::single_dst()` constructor or accessor methods (`last_applied_dst()`, `last_applied_seq()`, `group_key()`). - `DataplaneOp::InstallFdbNhg` / `UpdateFdbNhgMembers` / `RemoveFdbNhg` variants declared on `dataplane.rs`. Slice 3b will emit them from `compute_diff` Pass 1b and route them through the coordinator. In slice 3a they are explicitly **not yet emitted** — `compute_diff` still produces only single-dst ops — and `Dataplane::apply` returns `InvalidArgument("FDB-NHG ops must be applied via the reconcile-actor coordinator, not Dataplane::apply")` if one slips through. - `NexthopOps` trait declared on `dataplane.rs` with the six low-level methods the slice 3b coordinator will call (`add_nexthop_member`, `add_nexthop_group`, `del_nexthop`, `install_fdb_nhg_row`, `remove_fdb_nhg_row`, `dump_owned_nexthops`). `KernelNexthop` + `KernelNexthopKind` for startup adoption. No impls yet — slice 3b lands `LinuxDataplane` + `InMemoryDataplane` impls. Stats: +1400 / -100 lines across 14 files. 213 lib tests passing (13 nh_id_alloc + 9 group_state + 4 fdb_nhg + 1 merge-nh_id + 186 pre-existing); 21 doctest / integration / smoke buckets unchanged. Docs touched: - CHANGELOG `[Unreleased] ### Added` entry describing the 3a/3b split + the foundation pieces. - ROADMAP entry rewritten to show slice 1 + 2 shipped on `main`, slice 3 in flight split as 3a (this PR) + 3b (wiring), slice 4 (M40 interop) still ahead. The "aliasing dataplane ECMP" line in the Not-yet-supported row of `README.md` stays accurate — 3a doesn't ship the capability, 3b does. No netns test in slice 3a; the end-to-end Docker-runnable round trip lands with slice 3b's coordinator (per ~/.claude/projects/-home-lance-projects-rustbgpd/memory/feedback_docker_for_privileged_kernel_work.md the autonomy path is `docker run --cap-add=NET_ADMIN`). References: - ADR-0059 (merged in #83): design. - Slice 1 (merged #84): `RemoteMacEntry::alias_group_key` portable intent + projection AF guard. - Slice 2 (merged #86): `nexthop_raw` netlink primitive. - CVE-2025-39851 mainline fix `6ead38147ebb` + stable backports. * evpn-linux: review fixups for slice 3a — idempotent remove + permanent reject + explicit FDB-NHG kinds Five review findings (one from Lance, four from Copilot) addressed in one batch: 1. `apply_remove_fdb_nhg_row` was not idempotent on `ENOENT`. The `NexthopOps::remove_fdb_nhg_row` doc comment promises idempotent semantics (matching the slice-2 `NexthopSocket::del`) but the implementation mapped delete failures through `classify_apply_error`, which classifies `ENOENT` as transient. Switched to `classify_remove_apply_error` (the existing sibling that the single-dst delete path uses) so an already-deleted row returns Ok rather than wedging the slice 3b teardown / adoption flow on a phantom retry. 2-3. `Dataplane::apply` rejection for FDB-NHG ops used `DataplaneError::Other` in both `LinuxDataplane` and `InMemoryDataplane`. `Other` classifies as `Transient`, so a misrouted op would backoff-retry forever instead of surfacing as a programming bug. Switched both to `InvalidArgument`, which `FailureClass::class()` maps to `Permanent` — the actor suppresses the op via its permanent-failures map until intent changes. 4. CHANGELOG referenced a pre-3a `last_applied_group_key` field that never existed. Reworded to describe the actual pre-3a shape (`last_applied_dst` + `last_applied_seq`) with a note that the alternative was a naïve forward-compat extension; the enum refactor obviates it. 5. `op_to_kind` previously mapped `UpdateFdbNhgMembers` to `DataplaneOpKind::UpdateRemoteFdb { mac: [0; 6], dst: 0.0.0.0 }` (and `InstallFdbNhg` to a sentinel-dst `AddRemoteFdb`). The sentinel MAC collided with any other FDB-NHG group failure in `permanent_failures` (keyed by `(VNI, MAC)`), making operator debugging confusing. Added three explicit variants to `rustbgpd_evpn::DataplaneOpKind` — `InstallFdbNhg { mac }`, `UpdateFdbNhgMembers { esi }`, `RemoveFdbNhg { mac }` — and `op_to_kind` now uses them directly. `DataplaneOpKind` isn't surfaced over the gRPC/proto layer (verified), so the addition is internal-only. All gates green: fmt, clippy `-D warnings`, full workspace tests (213 lib tests + 21 integration buckets), doc with `RUSTDOCFLAGS="-D warnings"`, MSRV 1.92 check. * evpn: add ethernet_tag to DataplaneOpKind::UpdateFdbNhgMembers Per review feedback on PR #87: `UpdateFdbNhgMembers { esi }` was slightly under-identified — a single ESI can appear across multiple Ethernet Tags within the same VNI. AppliedOp carries VNI separately, so adding `ethernet_tag` to the kind payload completes the triplet that uniquely identifies a Linux-owned `AliasGroupKey`. Cheap + useful future-proofing if the kind ever escapes logs/reports into a user-facing surface. DataplaneOpKind is still report-internal (no proto/gRPC consumer), so the addition is non-breaking.
7 tasks
lance0
added a commit
that referenced
this pull request
May 13, 2026
First slice in the ADR-0059 chain with operational behavior change. Multi-homed Type 2 routes now program FDB nexthop groups via `NDA_NH_ID` in the kernel; traffic to a multi-homed MAC fans out across every observed alias VTEP on the encap path. Slice 4 (M40 manual FRR smoke) is the remaining piece before production-default aliasing ECMP. ## What's in - `compute_diff` Pass 1b emitting InstallFdbNhg / UpdateFdbNhgMembers / RemoveFdbNhg with IPv6 fallback to single-dst. - Reconcile actor coordinator orchestrating per ADR-0059 §5 (members → group → FDB row install order; reverse on teardown). - `NexthopOps` impls on LinuxDataplane + InMemoryDataplane with CVE-2025-39851 guard. - Startup NHID adoption: dump_owned_nexthops scan, allocator reservation, deferred stale cleanup with snapshot-aware retention. - `NhIdAllocator` (0x3000_/0x4000_ tag scheme), GroupOwnedMap with refcounting, partial-install rollback (orphan-aware). - Three retry/permanent-failure key spaces: FDB-MAC, BUM-ifindex, AliasGroupKey for UpdateFdbNhgMembers. - Per-errno error classification mirroring linux::fdb to avoid trapping transient kernel errors as permanent. - pending_deletes retry queue for steady-state GC failures. - Drain branches on OwnedEntryKind so FdbNhg-owned MACs use the RemoveFdbNhg teardown sequence, not RemoveRemoteFdb. - 4 actor-level tests + 6 diff transition tests + 7 dump-parser unit tests + 2 Docker-runnable netns tests. ## Review trail 13 Copilot review rounds + 1 deep human review. The deep review caught the load-bearing bug Copilot missed: adoption cleanup could delete still-live adopted NHIDs after FDB-NHG installs hit permanent suppression. Fixed with a suppression gate + snapshot-aware retention set, both regression-tested via the new actor-level tests. ## What's deferred - Slice 4: M40 manual containerlab smoke against FRR. - Slice 3.5: periodic RTM_GETNEXTHOP drift recovery, apply_aliasing_ecmp operator toggle, IPv6 alias members. ## References - ADR-0059 (merged #83) - Slice 1 (#84): portable intent - Slice 2 (#86): nexthop_raw netlink primitive - Slice 3a (#87): state types + apply primitive + CVE guard - CVE-2025-39851 mainline fix: torvalds/linux@6ead38147ebb
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Foundation slice for the EVPN aliasing dataplane (ADR-0059). Pure-logic / portable-intent change; the existing diff/apply layer ignores the new field by construction, so this ships independently green with no dataplane behaviour change.
The slice extends
RemoteMacEntrywith a group identity so the FDB nexthop group dataplane (slices 2-4) can key one kernel resource per(ESI, EthernetTag)instead of re-deriving it at apply time. It also backs the ADR's cross-family out-of-scope clause with code from day one.What's in this PR
RemoteMacEntry::alias_group_key: Option<(EthernetSegmentIdentifier, EthernetTagId)>(crates/evpn/src/mac.rs) — populated by the projection layer when the originating Type 2 carries a non-zero ESI and at least one EAD-per-EVI alias has been observed. Emptyalias_vtep_ips⇔alias_group_key.is_none().aliasing::group_members(entry) -> Vec<IpAddr>(crates/evpn/src/aliasing.rs) — returns the canonical FDB-NHG membership (remote_vtep_ip∪alias_vtep_ips, sorted + deduplicated viaBTreeSet). Two entries with the same member set produce the same canonical group, which slice 3 will key "membership unchanged" on to avoid spuriousNLM_F_REPLACE.(ESI, EthernetTag)invariant inproject_evpn_routes_with_aliases(crates/evpn/src/projection.rs) — mixed-family EAD-per-EVI aliases under a same-segment primary are treated as operator misconfiguration: the mismatched VTEP is dropped fromalias_vtep_ipsand awarn!fires with the full(VNI, MAC, ESI, EthernetTag, primary, dropped)tuple. The dataplane therefore never sees a mixed-family FDB-NHG member list.projection.rs(alias_group_key_*,mixed_family_aliases_dropped); 3 inaliasing.rs(group_members_*).Non-goals (slice 2-4)
nexthop_rawmodule (slice 2).AddRemoteFdbNhgdiff op + apply path (slice 3).The full slicing rationale lives in ADR-0059 §6.
Test plan
cargo fmt --all -- --checkcargo clippy --workspace --all-targets -- -D warningscargo test --workspace --no-fail-fast(full sweep)cargo test -p rustbgpd-evpn projection— 34 passed (4 new)cargo test -p rustbgpd-evpn aliasing— 12 passed (3 new)RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-depscargo +1.92 check --workspace --all-targets --locked(MSRV)