Skip to content

bgpstatus,smartcontract: enumerate netns and submit per-user BGP RTT onchain#3716

Merged
juan-malbeclabs merged 5 commits into
mainfrom
jo/bgpstatus_rtt
May 19, 2026
Merged

bgpstatus,smartcontract: enumerate netns and submit per-user BGP RTT onchain#3716
juan-malbeclabs merged 5 commits into
mainfrom
jo/bgpstatus_rtt

Conversation

@juan-malbeclabs
Copy link
Copy Markdown
Contributor

@juan-malbeclabs juan-malbeclabs commented May 18, 2026

Summary of Changes

  • bgpstatus discovers VRFs by enumerating /var/run/netns/ instead of synthesizing names from onchain tenant data and a hardcoded base prefix. Fixes multicast user BGP status never being collected on Arista EOS: the default VRF is exposed as /var/run/netns/default, not the agent's current Linux namespace (which is ns-management).
  • bgpstatus reports per-user smoothed BGP TCP RTT alongside every onchain status write. RTT comes from the same INET_DIAG snapshot already used to detect ESTABLISHED sessions (no new collection cost) and is converted to nanoseconds at the boundary to match Link.delay_ns / Link.jitter_ns. A Down submission clears RTT so a stale sample cannot outlive the session.
  • Smartcontract: extend SetUserBGPStatusArgs and the User account with bgp_rtt_ns: u64. Old (status-only) payloads decode with bgp_rtt_ns=0 via BorshDeserializeIncremental; old serialized User accounts decode with bgp_rtt_ns=0 via the existing append-only reader. Deploy order is unconstrained.
  • SDK (Go/Python/TypeScript): deserialize the new field; the Go executor builds the new 10-byte instruction payload. CLI: `doublezero user get/list` surface BGP RTT in ms.
  • Cleanups: drop `--bgp-namespace` wiring from the bgpstatus submitter (the flag is still used by the state collector); remove the empty-string short-circuit in `netns.RunInNamespace` that was added to support the prior (incorrect) default-VRF sentinel.

Testing Verification

  • New Rust integration test asserts `bgp_rtt_ns` round-trips through the `SetUserBGPStatus` handler with a 7.5 ms value; new unit tests cover incremental decode of old vs new payloads and the User account append-only default.
  • New Go tests cover RTT plumbing end to end (collector to `submitTask` to `UserBGPStatusUpdate`), the Down-clears-RTT path, and the 10-byte instruction payload layout.
  • E2E `user_bgp_status_test` now asserts `BgpRttNs > 0` on Up and `BgpRttNs == 0` on Down for regular, multicast, and non-default-tenant scenarios. User-list fixtures updated for the new `rtt` column.
  • `make rust-test` clean; `bun test` (TS SDK) 144/144; `uv run pytest` (Python SDK) 121 pass / 24 skip. Broader `go test ./...` shows pre-existing environmental failures in `tools/solana/pkg/tpu-quic` and `client/doublezerod/internal/{latency,liveness,manager}` (UDP buffer sysctl, `CAP_NET_RAW`) that reproduce on clean `main`.

@juan-malbeclabs juan-malbeclabs enabled auto-merge (squash) May 18, 2026 17:30
Two related changes that share files in the bgpstatus submitter and
the serviceability program. Tests follow in a separate commit.

- bgpstatus: discover VRFs by enumerating /var/run/netns/ instead of
  synthesizing names from onchain tenant data and a hardcoded base
  prefix. Fixes multicast user BGP status never being collected on
  Arista EOS: the default VRF is exposed as /var/run/netns/default,
  not the agent's current Linux namespace (which is ns-management).
  Drops --bgp-namespace wiring from the submitter (the flag is still
  used by the state collector); adds NetnsDir config.
- bgpstatus: submit smoothed BGP TCP RTT alongside every status
  write. Sourced from the same INET_DIAG snapshot already used to
  detect ESTABLISHED sessions, converted to nanoseconds at the
  boundary to match Link.delay_ns / jitter_ns. A Down submission
  clears RTT so a stale sample cannot outlive the session.
- smartcontract: extend SetUserBGPStatusArgs and the User account
  with bgp_rtt_ns: u64. Old (status-only) payloads decode with
  bgp_rtt_ns=0 via BorshDeserializeIncremental; old serialized User
  accounts decode with bgp_rtt_ns=0 via the existing append-only
  reader.
- sdk (go/python/typescript): deserialize the new field; the Go
  executor builds the new 10-byte instruction payload.
- cli: doublezero user get/list surface BGP RTT in ms.
- netns: remove the empty-string short-circuit in RunInNamespace;
  it was added to support the prior (incorrect) default-VRF sentinel
  and now masks accidental misuse.
Covers the previous commit across all test layers.

- bgpstatus: tests for listNamespaces, the new BGPPeerState collector
  shape, end-to-end RTT plumbing through tick() and submitTask, and
  Down clearing RTT. Existing tests adapted to the new collector
  signature via a staticEstablishedCollector adapter.
- netns: regression test confirming RunInNamespace("") now errors
  rather than silently no-opping.
- smartcontract: integration test for SetUserBGPStatus now asserts
  bgp_rtt_ns round-trips onto the User account; unit tests cover
  SetUserBGPStatusArgs incremental decode of old vs new payloads and
  the User append-only default for bgp_rtt_ns. Sizing assertions in
  the access pass airdrop test bumped to cover the 8-byte addition.
- sdk fixtures (go/python/typescript): new bgp_rtt_ns field
  round-trips; old-layout backward-compat tests strip the new
  trailing bytes; Go executor builds the 10-byte payload.
- e2e: doublezero user list fixtures gain the rtt column ("-" since
  no real BGP session is up in those scenarios). user_bgp_status_test
  asserts BgpRttNs > 0 on Up and BgpRttNs == 0 on Down for regular,
  multicast, and non-default-tenant scenarios.
The CLI renames the bgp_rtt field to a 3-char rtt column in the
user list table. The fixtures still said bgp_rtt with 7-char value
padding, so the byte-exact comparison in ibrl_with_allocated_ip_test
and the ibrl test would fail against the live output. Shorten the
column header to rtt and trim the data cell padding from "-       "
to "-   " to match.
Extend RFC 19 to cover the smoothed BGP TCP RTT field added in the
same PR: new bgp_rtt_ns: u64 on the User account (same unit as
Link.delay_ns), corresponding 8 bytes in the args struct (decoded via
BorshDeserializeIncremental so deploy order is unconstrained), Down
clears the value, and the telemetry collector sources RTT from the
same INET_DIAG snapshot already used to detect ESTABLISHED sessions
so there is no new collection cost. Status flips from Approved to
Implemented now that the implementation has shipped on this branch.
@juan-malbeclabs juan-malbeclabs disabled auto-merge May 19, 2026 13:39
@juan-malbeclabs juan-malbeclabs enabled auto-merge (squash) May 19, 2026 13:40
@juan-malbeclabs juan-malbeclabs merged commit 6d24b06 into main May 19, 2026
33 checks passed
@juan-malbeclabs juan-malbeclabs deleted the jo/bgpstatus_rtt branch May 19, 2026 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants