Skip to content

sentinel: set tunnel endpoint on multicast publisher create#3583

Merged
snormore merged 8 commits intomainfrom
snor/sentinel-tunnel-endpoint
Apr 24, 2026
Merged

sentinel: set tunnel endpoint on multicast publisher create#3583
snormore merged 8 commits intomainfrom
snor/sentinel-tunnel-endpoint

Conversation

@snormore
Copy link
Copy Markdown
Contributor

Summary of Changes

  • Sentinel now picks a concrete tunnel_endpoint when creating a multicast publisher instead of sending Ipv4Addr::UNSPECIFIED. It prefers a user_tunnel_endpoint interface IP on the user's device and falls back to the device's public_ip, excluding any IP already used by another user at the same client_ip — mirroring the CLI behavior at client/doublezero/src/command/connect.rs:310.
  • Adds a pure-function tunnel_endpoint module (select_tunnel_endpoint, in_use_tunnel_endpoints, select_tunnel_endpoint_for_user) with unit tests.
  • Extends MulticastDzLedgerClient with fetch_all_device_endpoints so poll_cycle can resolve endpoints for each candidate; the RPC impl derives them from activated Device accounts.
  • DzUser now carries tunnel_endpoint and DzDeviceInfo carries public_ip + user_tunnel_endpoints; build_create_multicast_publisher_instructions takes an explicit tunnel_endpoint argument.
  • Threads the same selection into the admin CLI's create-validator-multicast-publishers command.

Diff Breakdown

Category Files Lines (+/-) Net
Core logic 3 +393 / -15 +378
Scaffolding 5 +32 / -1 +31
Total 8 +425 / -16 +409

Overwhelmingly additive: new selection module + wiring, plus small field additions to existing data types.

Key files (click to expand)
  • crates/sentinel/src/tunnel_endpoint.rs — new module: select_tunnel_endpoint, in_use_tunnel_endpoints, select_tunnel_endpoint_for_user, and unit tests
  • crates/sentinel/src/multicast_publisher.rs — adds fetch_all_device_endpoints to the trait, implements it in RpcMulticastDzLedgerClient, wires selection into poll_cycle, updates mocks and adds a test asserting the multicast publisher skips an IBRL-occupied UTE
  • controlplane/doublezero-admin/src/cli/sentinel.rs — computes the exclude list from all_users for each candidate and passes the selected endpoint to build_create_multicast_publisher_instructions
  • crates/sentinel/src/dz_ledger_reader.rs — reads user.tunnel_endpoint into DzUser; populates public_ip + user_tunnel_endpoints on DzDeviceInfo from device interfaces
  • crates/sentinel/src/dz_ledger_writer.rsbuild_create_multicast_publisher_instructions takes tunnel_endpoint: Ipv4Addr instead of hardcoding UNSPECIFIED

Testing Verification

  • Added tunnel_endpoint_is_selected_excluding_ibrl_endpoint in multicast_publisher.rs: device exposes two UTE IPs, the IBRL user occupies the first, and the asserted create_multicast_publisher call receives the second.
  • Added unit tests covering select_tunnel_endpoint across all branches (UTE preferred, UTE excluded, public_ip fallback, everything excluded, no endpoints at all) and select_tunnel_endpoint_for_user (exclude-list derivation, unknown device).
  • cargo test -p doublezero-sentinel -p doublezero-admin — 56 + 8 tests pass.

Mirror the CLI's behavior at client/doublezero/src/command/connect.rs:310:
pick a tunnel endpoint on the user's device (public_ip or a
user_tunnel_endpoint interface IP) that is not already in use by another
user at the same client_ip, instead of sending UNSPECIFIED and letting
the activator choose.
@snormore snormore force-pushed the snor/sentinel-tunnel-endpoint branch from 9184543 to 7f8495a Compare April 24, 2026 17:29
@snormore snormore marked this pull request as ready for review April 24, 2026 17:34
Comment thread crates/sentinel/src/tunnel_endpoint.rs Outdated
Users created before the tunnel_endpoint field was populated onchain
store UNSPECIFIED, but the activator implicitly routes their tunnel
through the device's public_ip. The exclude list built for new
multicast publisher creates was dropping these users, so the sentinel
could select public_ip and collide with an existing legacy tunnel.

Resolve UNSPECIFIED users to their device's public_ip when building
the exclude list (mirrors client/doublezero/src/command/connect.rs:939).
@snormore snormore enabled auto-merge (squash) April 24, 2026 17:58
@snormore snormore disabled auto-merge April 24, 2026 18:00
@snormore snormore enabled auto-merge (squash) April 24, 2026 18:03
@snormore snormore disabled auto-merge April 24, 2026 18:03
…ilable

When all of a device's tunnel endpoints (public_ip + UTEs) are already
in use by the candidate's client_ip, selection returns UNSPECIFIED.
Previously the sentinel forwarded that to create_multicast_publisher
and the activator rejected the user, burning a create tx for a
guaranteed rejection. Skip the candidate with an error log and
doublezero_sentinel_multicast_pub_no_endpoint metric; next poll cycle
retries if state changes.

The admin CLI's create command does the same (writes to stderr).
@snormore snormore enabled auto-merge (squash) April 24, 2026 18:09
@snormore snormore merged commit 0f1f23d into main Apr 24, 2026
36 checks passed
@snormore snormore deleted the snor/sentinel-tunnel-endpoint branch April 24, 2026 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants