controller: hoist unknown BGP peer cleanup before main router bgp block#3627
Merged
ben-malbeclabs merged 2 commits intomainfrom Apr 30, 2026
Merged
Conversation
nikw9944
approved these changes
Apr 30, 2026
EOS silently exits address-family context when `no neighbor X` is issued for a peer that doesn't exist in that AF. Any AF-scoped command emitted after the cleanup loop then runs at the wrong level — RFC-18's `next-hop resolution ribs` line was the first to visibly hit this, failing as "Invalid input" on physical EOS. Move UnknownBgpPeers `no neighbor X` emission into a per-peer block that re-enters `router bgp 65342` before the main config, and remove the inline cleanup loops from address-family ipv4, address-family vpn-ipv4, and the per-VRF block. Each peer's removal starts with a clean `router bgp 65342` context, so any silent context-exit inside the cleanup is bounded to that one iteration.
1a299fe to
115de72
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary of Changes
UnknownBgpPeersno neighbor Xemission into a per-peer block that re-entersrouter bgp 65342before the main config, removing the inline cleanup loops fromaddress-family ipv4,address-family vpn-ipv4, and the per-VRF block.router bgp 65342re-entry and explicitly emitsno neighbor Xat top-level and in each AF/VRF, so cleanup is comprehensive and isolated per peer.unknown.peer.removal.tmpl,e2e.peer.removal.tmpl,e2e.last.user.tmpl.Fundamental Bug Being Fixed
The previous template emitted
no neighbor Xper unknown peer insideaddress-family ipv4,address-family vpn-ipv4, and eachvrf vrfNblock. EOS silently exits AF/VRF context whenno neighbor Xis issued for a peer that doesn't exist there. With a list of unknown peers, this means:no neighbor Xruns atrouter bgptop-level instead of inside the AF.So the cleanup was order-dependent and non-deterministic: depending on which peer happened to iterate first, you'd either get the intended per-AF/VRF cleanup for the whole list or a partial mix where only the first peer got AF treatment and the rest got incidental top-level deletion. The bug rarely produced observable damage because top-level deletion is also "remove this peer," and
UnknownBgpPeersare by definition peers that should be fully removed — but the cleanup behavior was unreliable in principle.RFC-18 made the bug visible by adding
next-hop resolution ribs ...(an AF-only command) immediately after theUnknownBgpPeersloop inaddress-family vpn-ipv4. When the loop's first peer triggered context-exit, the new command landed atrouter bgptop-level and EOS rejected it asInvalid input. That was the surface symptom; the real defect was structural and pre-existing.The fix:
router bgp 65342re-entry per unknown peer means context corruption inside one peer's cleanup is bounded to that single iteration.Diff Breakdown
Single-file template change with three matching fixture refreshes.
Key files (click to expand)
controlplane/controller/internal/controller/templates/tunnel.tmpl— adds the per-peer cleanup block (top-level + per-AF + per-VRFno neighbor), removes the three inlineUnknownBgpPeersranges.controlplane/controller/internal/controller/fixtures/e2e.last.user.tmpl— updated expected output for two unknown peers.controlplane/controller/internal/controller/fixtures/e2e.peer.removal.tmpl— updated expected output for two unknown peers.controlplane/controller/internal/controller/fixtures/unknown.peer.removal.tmpl— updated expected output for one unknown peer.Testing Verification
no neighbor Xfor a non-existent peer insideaddress-family vpn-ipv4followed bynext-hop resolution ribs tunnel-rib system-tunnel-ribis rejected asInvalid input. Control test (same command without precedingno neighbor) is accepted normally — proving the rejection is caused by silent context-exit, not by the command itself.169.254.1.7, exact-match canary one digit off from real169.254.1.6) produces zero diff. A sentinel command emitted after the cleanup correctly lands atrouter bgp 65342top-level context. All real neighbors untouched. Session aborted; device state restored.