telemetry/bgpstatus: collect BGP state from all tenant VRF namespaces#3597
Merged
juan-malbeclabs merged 4 commits intomainfrom Apr 28, 2026
Merged
telemetry/bgpstatus: collect BGP state from all tenant VRF namespaces#3597juan-malbeclabs merged 4 commits intomainfrom
juan-malbeclabs merged 4 commits intomainfrom
Conversation
Users whose tenant has VrfId != 1 have their GRE tunnel interface placed in ns-vrf<N> on the Arista device. The BGP status submitter was only checking ns-vrf1, causing a persistent "tunnel not found" debug log and leaving those users' onchain BGP status stale. Fix: derive the set of Linux VRF namespaces from programData.Tenants on each tick (vrfNamespaces helper), then collect BGP socket stats and local interfaces from all of them, merging before the per-user loop. Tunnel IPs are globally unique (onchain-allocated) so merging across namespaces is safe. The NamespaceCollector function type replaces LocalNet in Config, making the per-namespace collection fully injectable for testing. DefaultCollector wraps the real Linux calls and is wired in main.go. Unit tests cover vrfNamespaces (dedup, zero VrfId skip, multi-tenant) and tick() behavior (user in ns-vrf2, partial namespace failure, all-fail abort).
Adds TestE2E_UserBGPStatus_NonDefaultTenant, which exercises the multi-namespace collection path end-to-end: a tenant is created (VrfId != 1), a client connects under that tenant, and the test verifies that the BGP status submitter correctly reports Up (session established) and then Down (doublezerod killed) onchain for the user whose tunnel lives in ns-vrf<N>.
nikw9944
approved these changes
Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary of Changes
VrfId != 1have their GRE tunnel interface placed inns-vrf<N>on the Arista device, but the BGP status submitter was only checkingns-vrf1. This caused a persistent "tunnel not found" debug log and left those users' onchain BGP status permanently stale.programData.Tenantsvia a newvrfNamespaceshelper, then collect BGP socket stats and local interfaces from all of them before the per-user loop. Tunnel IPs are globally unique (onchain-allocated), so merging across namespaces is safe.NamespaceCollectoras an injectable function type inConfig, replacingLocalNet.DefaultCollectorwraps the real Linux calls; tests supply a mock without any Linux syscalls.Diff Breakdown
Mostly test additions; the fix itself is compact -- ~60 net lines across two core files.
Key files (click to expand)
controlplane/telemetry/internal/bgpstatus/submitter.go-- addsDefaultCollector(wraps Linux calls), rewritestick()to loop over all VRF namespaces and merge results; aborts only if every namespace failscontrolplane/telemetry/internal/bgpstatus/bgpstatus.go-- addsNamespaceCollectorfunc type,vrfNamespaceshelper (derives namespace list from tenant VRF IDs), replacesLocalNetwithCollectorinConfigcontrolplane/telemetry/internal/bgpstatus/submitter_linux_test.go-- four newtick()behavioral tests: user in ns-vrf2 found and reported Up; partial namespace failure continues; all-namespace failure abortscontrolplane/telemetry/internal/bgpstatus/submitter_test.go-- migrates test helpers toNamespaceCollector, adds fivevrfNamespacesunit tests (dedup, zero VrfId skip, multi-tenant)e2e/user_bgp_status_test.go-- addsTestE2E_UserBGPStatus_NonDefaultTenant: creates a tenant (VrfId != 1), connects a client under it, and verifies Up -> Down BGP status transitions onchaincontrolplane/telemetry/cmd/telemetry/main.go-- wiresDefaultCollector(localNet)intobgpstatus.ConfigTesting Verification
bgpstatusunit tests pass unchangedTestVrfNamespaces_*(5 cases): deduplication, zero VrfId skip, base namespace always included, additional VRF appendedTestTick_*(4 cases, Linux): user in ns-vrf2 is found and reported Up; one failing namespace is warned and skipped while others succeed; all namespaces failing aborts the tick with no submissionsTestE2E_UserBGPStatus_NonDefaultTenant: end-to-end validation that a user in a non-default tenant VRF reaches BGP status Up onchain when the session is established, and Down after the daemon is killed