Skip to content

evpn: ADR-0059 slice 1 — RemoteMacEntry alias_group_key + projection AF guard#84

Merged
lance0 merged 2 commits into
mainfrom
evpn/adr-0059-slice-1-alias-group-key
May 12, 2026
Merged

evpn: ADR-0059 slice 1 — RemoteMacEntry alias_group_key + projection AF guard#84
lance0 merged 2 commits into
mainfrom
evpn/adr-0059-slice-1-alias-group-key

Conversation

@lance0
Copy link
Copy Markdown
Owner

@lance0 lance0 commented May 12, 2026

Summary

Foundation slice for the EVPN aliasing dataplane (ADR-0059). Pure-logic / portable-intent change; the existing diff/apply layer ignores the new field by construction, so this ships independently green with no dataplane behaviour change.

The slice extends RemoteMacEntry with a group identity so the FDB nexthop group dataplane (slices 2-4) can key one kernel resource per (ESI, EthernetTag) instead of re-deriving it at apply time. It also backs the ADR's cross-family out-of-scope clause with code from day one.

What's in this PR

  • RemoteMacEntry::alias_group_key: Option<(EthernetSegmentIdentifier, EthernetTagId)> (crates/evpn/src/mac.rs) — populated by the projection layer when the originating Type 2 carries a non-zero ESI and at least one EAD-per-EVI alias has been observed. Empty alias_vtep_ipsalias_group_key.is_none().
  • aliasing::group_members(entry) -> Vec<IpAddr> (crates/evpn/src/aliasing.rs) — returns the canonical FDB-NHG membership (remote_vtep_ipalias_vtep_ips, sorted + deduplicated via BTreeSet). Two entries with the same member set produce the same canonical group, which slice 3 will key "membership unchanged" on to avoid spurious NLM_F_REPLACE.
  • Same-AF-per-(ESI, EthernetTag) invariant in project_evpn_routes_with_aliases (crates/evpn/src/projection.rs) — mixed-family EAD-per-EVI aliases under a same-segment primary are treated as operator misconfiguration: the mismatched VTEP is dropped from alias_vtep_ips and a warn! fires with the full (VNI, MAC, ESI, EthernetTag, primary, dropped) tuple. The dataplane therefore never sees a mixed-family FDB-NHG member list.
  • 7 new unit tests — 4 in projection.rs (alias_group_key_*, mixed_family_aliases_dropped); 3 in aliasing.rs (group_members_*).

Non-goals (slice 2-4)

  • Raw-netlink nexthop_raw module (slice 2).
  • AddRemoteFdbNhg diff op + apply path (slice 3).
  • M40 interop smoke (slice 4).

The full slicing rationale lives in ADR-0059 §6.

Test plan

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo test --workspace --no-fail-fast (full sweep)
  • cargo test -p rustbgpd-evpn projection — 34 passed (4 new)
  • cargo test -p rustbgpd-evpn aliasing — 12 passed (3 new)
  • RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps
  • cargo +1.92 check --workspace --all-targets --locked (MSRV)
  • CI: build matrix + interop pass on push.

lance0 added 2 commits May 12, 2026 11:43
…AF guard

Foundation slice for the EVPN aliasing dataplane. Pure-logic /
portable-intent change; the existing diff/apply layer ignores the
new field by construction, so this ships independently green with
no dataplane behaviour change.

- Add `alias_group_key: Option<(EthernetSegmentIdentifier,
  EthernetTagId)>` to `RemoteMacEntry` (crates/evpn/src/mac.rs).
  Populated by the projection layer when the originating Type 2
  carries a non-zero ESI and at least one EAD-per-EVI alias has
  been observed. Empty `alias_vtep_ips` ⇔ `alias_group_key.is_none()`.
- Add `aliasing::group_members(entry)` helper returning the
  canonical FDB nexthop group membership (`remote_vtep_ip` ∪
  `alias_vtep_ips`, sorted + deduplicated via BTreeSet). Two
  entries with the same member set produce the same canonical
  group — the dataplane will key "membership unchanged" on this
  to avoid spurious NLM_F_REPLACE traffic in slice 3.
- Enforce same-address-family-per-`(ESI, EthernetTag)` invariant
  on aliasing resolution in `project_evpn_routes_with_aliases`.
  Mixed-family EAD-per-EVI aliases under a same-segment primary
  are treated as operator misconfiguration: the mismatched VTEP
  is dropped from `alias_vtep_ips` and a `warn!` fires with the
  full (VNI, MAC, ESI, EthernetTag, primary, dropped) tuple.
  Backs ADR-0059's cross-family out-of-scope clause with code
  from day one.
- 4 new projection unit tests covering: ESI==0 → key None; key
  populated when aliases observed; key None when non-zero ESI
  has no aliases; mixed-family aliases dropped (and the survivor
  carries the group key).
- 3 new aliasing unit tests covering: primary-only group
  membership; sort order under unsorted insertion; defensive
  dedup of primary appearing in alias_vtep_ips.

Foundation for ADR-0059 slice 2 (`nexthop_raw` raw-netlink
module) and slice 3 (reconcile actor + owned-state lifecycle).
The module-level docs predated PR #84. They claimed aliasing
"does not mutate projection or dataplane intent" and referenced
a future `RemoteMacEntry::alias_vtep_ips` field — but slice 1
now wires both `alias_vtep_ips` and `alias_group_key` through
the projection. The "FDB + ip route ECMP, or per-CE nexthop
group" wording also predated ADR-0059's settled
FDB-nexthop-group direction.

Refresh the "What this module does" section to describe the
three current surfaces (AliasIndex, alias_resolved_next_hops,
group_members) and link to the projection consumer. Tighten
the "does NOT do" bullet to "doesn't touch the kernel — that's
crates/evpn-linux behind ADR-0059 slices 2-4" so the boundary
stays clear without claiming the intent shape is untouched.
@lance0 lance0 merged commit ce4f7ed into main May 12, 2026
21 of 22 checks passed
lance0 added a commit that referenced this pull request May 12, 2026
… + CVE guard (#87)

* evpn-linux: ADR-0059 slice 3a — FDB-NHG state types + apply primitive + CVE guard

Foundation infrastructure for the aliasing-ECMP dataplane.
**Zero operational behavior change in this PR.** Slice 3b lands the
diff Pass 1b, the reconcile actor's apply coordinator, the startup
NHID adoption pass, the `NexthopOps` impls on `LinuxDataplane` and
`InMemoryDataplane`, and the Docker-runnable netns integration test
that brings the surface onto the apply path. Slice 3a is the
review-friendly half: pure-logic state types + the kernel-touching
FDB-NHG row primitive (gated behind the CVE-2025-39851 inline
guard) + a clean compile / clippy / test sweep.

What lands:

- `crates/evpn-linux/src/nh_id_alloc.rs` (NEW) — `NhIdAllocator`
  over a `[1, 0x4000]` bitmap with tag bits `0x3000_0000`
  (per-VTEP FDB nexthop) / `0x4000_0000` (FDB nexthop group),
  deliberately offset from FRR's `0x1000` / `0x2000` reservations
  so concurrent FRR + rustbgpd installs never collide on
  NLM_F_REPLACE (ADR-0059 §5 invariant 6). `reserve()` API for
  slice 3b's startup adoption pass. 13 unit tests.

- `crates/evpn-linux/src/group_state.rs` (NEW) — `GroupOwnedMap`
  with per-group `(VNI, ESI, EthernetTag)` refcount across
  referring `(VNI, MAC)` rows, plus a per-VTEP-NH refcount across
  referring groups. `RefDelta` return value tells the apply
  coordinator when a group should be torn down on `RemoveFdbNhg`.
  ADR-0059 §7's `share_l2_nhg` defaults off, so the Linux-owned
  key includes VNI even though the portable
  `RemoteMacEntry::alias_group_key` from slice 1 stays VNI-less.
  9 unit tests covering single-MAC install/remove, multi-MAC
  group sharing, N->1 drain (group stays alive), VTEP NH shared
  across groups, and member-set change isolation.

- `crates/evpn-linux/src/linux/fdb_nhg.rs` (NEW) —
  `apply_install_fdb_nhg_row` / `apply_remove_fdb_nhg_row`
  building bridge FDB rows with NDA_NH_ID (= 13, via
  `NeighbourAttribute::Other(DefaultNla)` since
  `netlink-packet-route 0.30` has no typed variant). Inline
  **CVE-2025-39851 guard** rejects any install whose target
  L2VXLAN does not have `learning_disabled == Some(true)`
  (mainline fix `6ead38147ebb` "vxlan: Fix NPD when refreshing
  an FDB entry with a nexthop object"; the readiness probe is the
  upstream guard, this is belt-and-suspenders). 4 unit tests on
  the CVE guard.

- `KernelFdbEntry::nh_id: Option<u32>` (snapshot.rs) +
  `linux/fdb.rs` parse path extracts `NDA_NH_ID` from the kernel
  dump via the `Nla` trait accessor on the `Other` escape hatch.
  `merge_fdb_rows` preserves `nh_id` across the self/master row
  split so the merged entry surfaces both halves. 1 new merge test.

- `OwnedEntry` enum refactor — replaced the field-soup
  `{ last_applied_dst, last_applied_seq }` shape with an explicit
  `OwnedEntryKind { SingleDst { dst, mobility_seq }, FdbNhg { group_key } }`.
  Invalid states like "dst set AND group_key set" are structurally
  impossible. All existing call sites updated to use the
  `OwnedEntry::single_dst()` constructor or accessor methods
  (`last_applied_dst()`, `last_applied_seq()`, `group_key()`).

- `DataplaneOp::InstallFdbNhg` / `UpdateFdbNhgMembers` /
  `RemoveFdbNhg` variants declared on `dataplane.rs`. Slice 3b
  will emit them from `compute_diff` Pass 1b and route them
  through the coordinator. In slice 3a they are explicitly **not
  yet emitted** — `compute_diff` still produces only single-dst
  ops — and `Dataplane::apply` returns
  `InvalidArgument("FDB-NHG ops must be applied via the
  reconcile-actor coordinator, not Dataplane::apply")` if one
  slips through.

- `NexthopOps` trait declared on `dataplane.rs` with the six
  low-level methods the slice 3b coordinator will call
  (`add_nexthop_member`, `add_nexthop_group`, `del_nexthop`,
  `install_fdb_nhg_row`, `remove_fdb_nhg_row`,
  `dump_owned_nexthops`). `KernelNexthop` + `KernelNexthopKind`
  for startup adoption. No impls yet — slice 3b lands
  `LinuxDataplane` + `InMemoryDataplane` impls.

Stats: +1400 / -100 lines across 14 files. 213 lib tests passing
(13 nh_id_alloc + 9 group_state + 4 fdb_nhg + 1 merge-nh_id + 186
pre-existing); 21 doctest / integration / smoke buckets unchanged.

Docs touched:
- CHANGELOG `[Unreleased] ### Added` entry describing the 3a/3b
  split + the foundation pieces.
- ROADMAP entry rewritten to show slice 1 + 2 shipped on `main`,
  slice 3 in flight split as 3a (this PR) + 3b (wiring), slice 4
  (M40 interop) still ahead. The "aliasing dataplane ECMP" line in
  the Not-yet-supported row of `README.md` stays accurate — 3a
  doesn't ship the capability, 3b does.

No netns test in slice 3a; the end-to-end Docker-runnable round
trip lands with slice 3b's coordinator (per
~/.claude/projects/-home-lance-projects-rustbgpd/memory/feedback_docker_for_privileged_kernel_work.md
the autonomy path is `docker run --cap-add=NET_ADMIN`).

References:
- ADR-0059 (merged in #83): design.
- Slice 1 (merged #84): `RemoteMacEntry::alias_group_key`
  portable intent + projection AF guard.
- Slice 2 (merged #86): `nexthop_raw` netlink primitive.
- CVE-2025-39851 mainline fix `6ead38147ebb` + stable backports.

* evpn-linux: review fixups for slice 3a — idempotent remove + permanent reject + explicit FDB-NHG kinds

Five review findings (one from Lance, four from Copilot) addressed in
one batch:

1. `apply_remove_fdb_nhg_row` was not idempotent on `ENOENT`. The
   `NexthopOps::remove_fdb_nhg_row` doc comment promises idempotent
   semantics (matching the slice-2 `NexthopSocket::del`) but the
   implementation mapped delete failures through
   `classify_apply_error`, which classifies `ENOENT` as transient.
   Switched to `classify_remove_apply_error` (the existing
   sibling that the single-dst delete path uses) so an
   already-deleted row returns Ok rather than wedging the slice 3b
   teardown / adoption flow on a phantom retry.

2-3. `Dataplane::apply` rejection for FDB-NHG ops used
   `DataplaneError::Other` in both `LinuxDataplane` and
   `InMemoryDataplane`. `Other` classifies as `Transient`, so a
   misrouted op would backoff-retry forever instead of surfacing
   as a programming bug. Switched both to `InvalidArgument`,
   which `FailureClass::class()` maps to `Permanent` — the actor
   suppresses the op via its permanent-failures map until intent
   changes.

4. CHANGELOG referenced a pre-3a `last_applied_group_key` field
   that never existed. Reworded to describe the actual pre-3a
   shape (`last_applied_dst` + `last_applied_seq`) with a note
   that the alternative was a naïve forward-compat extension; the
   enum refactor obviates it.

5. `op_to_kind` previously mapped `UpdateFdbNhgMembers` to
   `DataplaneOpKind::UpdateRemoteFdb { mac: [0; 6], dst: 0.0.0.0 }`
   (and `InstallFdbNhg` to a sentinel-dst `AddRemoteFdb`). The
   sentinel MAC collided with any other FDB-NHG group failure in
   `permanent_failures` (keyed by `(VNI, MAC)`), making operator
   debugging confusing. Added three explicit variants to
   `rustbgpd_evpn::DataplaneOpKind` — `InstallFdbNhg { mac }`,
   `UpdateFdbNhgMembers { esi }`, `RemoveFdbNhg { mac }` — and
   `op_to_kind` now uses them directly. `DataplaneOpKind` isn't
   surfaced over the gRPC/proto layer (verified), so the addition
   is internal-only.

All gates green: fmt, clippy `-D warnings`, full workspace tests
(213 lib tests + 21 integration buckets), doc with
`RUSTDOCFLAGS="-D warnings"`, MSRV 1.92 check.

* evpn: add ethernet_tag to DataplaneOpKind::UpdateFdbNhgMembers

Per review feedback on PR #87: `UpdateFdbNhgMembers { esi }` was
slightly under-identified — a single ESI can appear across multiple
Ethernet Tags within the same VNI. AppliedOp carries VNI separately,
so adding `ethernet_tag` to the kind payload completes the triplet
that uniquely identifies a Linux-owned `AliasGroupKey`. Cheap +
useful future-proofing if the kind ever escapes logs/reports into a
user-facing surface.

DataplaneOpKind is still report-internal (no proto/gRPC consumer),
so the addition is non-breaking.
lance0 added a commit that referenced this pull request May 13, 2026
First slice in the ADR-0059 chain with operational behavior change.
Multi-homed Type 2 routes now program FDB nexthop groups via
`NDA_NH_ID` in the kernel; traffic to a multi-homed MAC fans out
across every observed alias VTEP on the encap path. Slice 4 (M40
manual FRR smoke) is the remaining piece before production-default
aliasing ECMP.

## What's in

- `compute_diff` Pass 1b emitting InstallFdbNhg / UpdateFdbNhgMembers /
  RemoveFdbNhg with IPv6 fallback to single-dst.
- Reconcile actor coordinator orchestrating per ADR-0059 §5
  (members → group → FDB row install order; reverse on teardown).
- `NexthopOps` impls on LinuxDataplane + InMemoryDataplane with
  CVE-2025-39851 guard.
- Startup NHID adoption: dump_owned_nexthops scan, allocator
  reservation, deferred stale cleanup with snapshot-aware retention.
- `NhIdAllocator` (0x3000_/0x4000_ tag scheme), GroupOwnedMap with
  refcounting, partial-install rollback (orphan-aware).
- Three retry/permanent-failure key spaces: FDB-MAC, BUM-ifindex,
  AliasGroupKey for UpdateFdbNhgMembers.
- Per-errno error classification mirroring linux::fdb to avoid
  trapping transient kernel errors as permanent.
- pending_deletes retry queue for steady-state GC failures.
- Drain branches on OwnedEntryKind so FdbNhg-owned MACs use the
  RemoveFdbNhg teardown sequence, not RemoveRemoteFdb.
- 4 actor-level tests + 6 diff transition tests + 7 dump-parser
  unit tests + 2 Docker-runnable netns tests.

## Review trail

13 Copilot review rounds + 1 deep human review. The deep review
caught the load-bearing bug Copilot missed: adoption cleanup could
delete still-live adopted NHIDs after FDB-NHG installs hit permanent
suppression. Fixed with a suppression gate + snapshot-aware retention
set, both regression-tested via the new actor-level tests.

## What's deferred

- Slice 4: M40 manual containerlab smoke against FRR.
- Slice 3.5: periodic RTM_GETNEXTHOP drift recovery,
  apply_aliasing_ecmp operator toggle, IPv6 alias members.

## References

- ADR-0059 (merged #83)
- Slice 1 (#84): portable intent
- Slice 2 (#86): nexthop_raw netlink primitive
- Slice 3a (#87): state types + apply primitive + CVE guard
- CVE-2025-39851 mainline fix: torvalds/linux@6ead38147ebb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant