Skip to content

telemetry/bgpstatus: collect BGP state from global VRF for multicast users#3606

Merged
juan-malbeclabs merged 7 commits intomainfrom
jo/bgp_status_multicast
Apr 29, 2026
Merged

telemetry/bgpstatus: collect BGP state from global VRF for multicast users#3606
juan-malbeclabs merged 7 commits intomainfrom
jo/bgp_status_multicast

Conversation

@juan-malbeclabs
Copy link
Copy Markdown
Contributor

Summary of Changes

  • Multicast users' GRE tunnels live in the global VRF on Arista devices (no per-tenant vrf qualifier), mapping to Linux namespace ns-vrf0. The BGP status submitter was never collecting ns-vrf0 because vrfNamespaces skipped VrfId == 0 tenant entries, leaving multicast users' onchain BGP status permanently stale.
  • vrfNamespaces now accepts the slice of activated device users in addition to tenants; if any user has UserTypeMulticast, ns-vrf0 is appended to the namespace list.
  • tick() pre-filters activated users for this device before calling vrfNamespaces, and reuses that slice in the per-user status loop (eliminating a redundant pass).

Diff Breakdown

Category Files Lines (+/-) Net
Core logic 2 +35 / -15 +20
Tests 3 +298 / -5 +293
Total 5 +333 / -20 +313

Small fix, heavy test coverage — the change itself is ~20 net lines across two files.

Key files (click to expand)
  • controlplane/telemetry/internal/bgpstatus/bgpstatus.govrfNamespaces gains a users []serviceability.User parameter; appends ns-vrf0 when any user has UserTypeMulticast
  • controlplane/telemetry/internal/bgpstatus/submitter.gotick() pre-filters device users before namespace derivation and reuses the slice in the status loop
  • e2e/user_bgp_status_test.go — new TestE2E_UserBGPStatus_MulticastUser: creates a multicast group, connects a subscriber, and verifies Up → Down BGP status transitions onchain; BGP session is verified in the global VRF ("default" key)
  • controlplane/telemetry/internal/bgpstatus/submitter_linux_test.go — new TestTick_MulticastUser_UsesVrf0: verifies that a multicast user's tunnel in ns-vrf0 is found and reported Up
  • controlplane/telemetry/internal/bgpstatus/submitter_test.go — three new vrfNamespaces unit tests covering multicast-only, non-multicast, and mixed multicast+tenant VRF scenarios; existing tests updated to pass nil users

Testing Verification

  • All existing bgpstatus unit tests pass unchanged
  • TestVrfNamespaces_MulticastUserAddsVrf0: multicast user causes ns-vrf0 to be included
  • TestVrfNamespaces_NonMulticastUserNoVrf0: non-multicast user does not add ns-vrf0
  • TestVrfNamespaces_MulticastAndTenantVrfs: multicast user combined with tenant VRFs produces the correct namespace list
  • TestTick_MulticastUser_UsesVrf0 (Linux): tick finds the multicast user's tunnel in ns-vrf0 and enqueues an Up submission
  • TestE2E_UserBGPStatus_MulticastUser: end-to-end validation that a multicast subscriber reaches BGP status Up onchain and Down after the daemon is killed

@juan-malbeclabs juan-malbeclabs enabled auto-merge (squash) April 29, 2026 16:51
Comment thread controlplane/telemetry/internal/bgpstatus/submitter.go Outdated
Comment thread controlplane/telemetry/internal/bgpstatus/submitter.go Outdated
Comment thread controlplane/telemetry/internal/bgpstatus/submitter_linux_test.go Outdated
Comment thread controlplane/telemetry/internal/bgpstatus/submitter_test.go Outdated
…users

Multicast users' GRE tunnels live in the global VRF on Arista devices
(no per-tenant vrf qualifier), which maps to the Linux namespace ns-vrf0.
The vrfNamespaces helper was skipping VrfId == 0 tenant entries, so ns-vrf0
was never collected and multicast users' BGP status remained permanently stale.

Fix: extend vrfNamespaces to accept the slice of device users in addition
to tenants. If any user has UserTypeMulticast, ns-vrf0 is appended to the
namespace list. tick() now pre-filters activated users for this device before
calling vrfNamespaces, and reuses that slice in the per-user status loop.
Unit tests:
- TestVrfNamespaces_MulticastUserAddsVrf0: multicast user causes ns-vrf0
  to be included in the namespace list
- TestVrfNamespaces_NonMulticastUserNoVrf0: non-multicast user does not
  add ns-vrf0
- TestVrfNamespaces_MulticastAndTenantVrfs: multicast user and tenant
  VRFs produce the correct combined namespace list
- TestTick_MulticastUser_UsesVrf0: tick() finds a multicast user's tunnel
  in ns-vrf0 and enqueues an Up submission
- Updated existing TestVrfNamespaces_* to pass nil users (no behavioral change)

E2E test:
- TestE2E_UserBGPStatus_MulticastUser: end-to-end validation that a
  multicast subscriber reaches BGP status Up onchain (session checked in
  the global VRF "default") and Down after the daemon is killed
Arista EOS places the global/default VRF in the root Linux network
namespace, not a named namespace under /var/run/netns/. Attempting to
open ns-vrf0 via netns.GetFromName always fails, so multicast users'
BGP sessions were never detected and their onchain status stayed stale.

Fix vrfNamespaces to use "" (empty string) as the root namespace
sentinel for multicast users instead of "ns-vrf0". Teach RunInNamespace
to skip namespace switching when given an empty string, running the
collector function in the current (root) namespace directly.
@juan-malbeclabs juan-malbeclabs force-pushed the jo/bgp_status_multicast branch from 2d91aa0 to 4a64846 Compare April 29, 2026 17:28
@juan-malbeclabs juan-malbeclabs merged commit 0e547b0 into main Apr 29, 2026
40 of 41 checks passed
@juan-malbeclabs juan-malbeclabs deleted the jo/bgp_status_multicast branch April 29, 2026 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants