telemetry/bgpstatus: collect BGP state from global VRF for multicast users#3606
Merged
juan-malbeclabs merged 7 commits intomainfrom Apr 29, 2026
Merged
telemetry/bgpstatus: collect BGP state from global VRF for multicast users#3606juan-malbeclabs merged 7 commits intomainfrom
juan-malbeclabs merged 7 commits intomainfrom
Conversation
martinsander00
approved these changes
Apr 29, 2026
…users Multicast users' GRE tunnels live in the global VRF on Arista devices (no per-tenant vrf qualifier), which maps to the Linux namespace ns-vrf0. The vrfNamespaces helper was skipping VrfId == 0 tenant entries, so ns-vrf0 was never collected and multicast users' BGP status remained permanently stale. Fix: extend vrfNamespaces to accept the slice of device users in addition to tenants. If any user has UserTypeMulticast, ns-vrf0 is appended to the namespace list. tick() now pre-filters activated users for this device before calling vrfNamespaces, and reuses that slice in the per-user status loop.
Unit tests: - TestVrfNamespaces_MulticastUserAddsVrf0: multicast user causes ns-vrf0 to be included in the namespace list - TestVrfNamespaces_NonMulticastUserNoVrf0: non-multicast user does not add ns-vrf0 - TestVrfNamespaces_MulticastAndTenantVrfs: multicast user and tenant VRFs produce the correct combined namespace list - TestTick_MulticastUser_UsesVrf0: tick() finds a multicast user's tunnel in ns-vrf0 and enqueues an Up submission - Updated existing TestVrfNamespaces_* to pass nil users (no behavioral change) E2E test: - TestE2E_UserBGPStatus_MulticastUser: end-to-end validation that a multicast subscriber reaches BGP status Up onchain (session checked in the global VRF "default") and Down after the daemon is killed
Arista EOS places the global/default VRF in the root Linux network namespace, not a named namespace under /var/run/netns/. Attempting to open ns-vrf0 via netns.GetFromName always fails, so multicast users' BGP sessions were never detected and their onchain status stayed stale. Fix vrfNamespaces to use "" (empty string) as the root namespace sentinel for multicast users instead of "ns-vrf0". Teach RunInNamespace to skip namespace switching when given an empty string, running the collector function in the current (root) namespace directly.
2d91aa0 to
4a64846
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary of Changes
vrfqualifier), mapping to Linux namespacens-vrf0. The BGP status submitter was never collectingns-vrf0becausevrfNamespacesskippedVrfId == 0tenant entries, leaving multicast users' onchain BGP status permanently stale.vrfNamespacesnow accepts the slice of activated device users in addition to tenants; if any user hasUserTypeMulticast,ns-vrf0is appended to the namespace list.tick()pre-filters activated users for this device before callingvrfNamespaces, and reuses that slice in the per-user status loop (eliminating a redundant pass).Diff Breakdown
Small fix, heavy test coverage — the change itself is ~20 net lines across two files.
Key files (click to expand)
controlplane/telemetry/internal/bgpstatus/bgpstatus.go—vrfNamespacesgains ausers []serviceability.Userparameter; appendsns-vrf0when any user hasUserTypeMulticastcontrolplane/telemetry/internal/bgpstatus/submitter.go—tick()pre-filters device users before namespace derivation and reuses the slice in the status loope2e/user_bgp_status_test.go— newTestE2E_UserBGPStatus_MulticastUser: creates a multicast group, connects a subscriber, and verifies Up → Down BGP status transitions onchain; BGP session is verified in the global VRF ("default"key)controlplane/telemetry/internal/bgpstatus/submitter_linux_test.go— newTestTick_MulticastUser_UsesVrf0: verifies that a multicast user's tunnel inns-vrf0is found and reported Upcontrolplane/telemetry/internal/bgpstatus/submitter_test.go— three newvrfNamespacesunit tests covering multicast-only, non-multicast, and mixed multicast+tenant VRF scenarios; existing tests updated to passnilusersTesting Verification
bgpstatusunit tests pass unchangedTestVrfNamespaces_MulticastUserAddsVrf0: multicast user causesns-vrf0to be includedTestVrfNamespaces_NonMulticastUserNoVrf0: non-multicast user does not addns-vrf0TestVrfNamespaces_MulticastAndTenantVrfs: multicast user combined with tenant VRFs produces the correct namespace listTestTick_MulticastUser_UsesVrf0(Linux): tick finds the multicast user's tunnel inns-vrf0and enqueues an Up submissionTestE2E_UserBGPStatus_MulticastUser: end-to-end validation that a multicast subscriber reaches BGP status Up onchain and Down after the daemon is killed