Skip to content

feat(router): N-hop BFS in calculateLocalRoutes — deterministic local routing#2755

Merged
0pcom merged 1 commit into
skycoin:developfrom
0pcom:feat/local-route-calc-n-hop
May 21, 2026
Merged

feat(router): N-hop BFS in calculateLocalRoutes — deterministic local routing#2755
0pcom merged 1 commit into
skycoin:developfrom
0pcom:feat/local-route-calc-n-hop

Conversation

@0pcom
Copy link
Copy Markdown
Collaborator

@0pcom 0pcom commented May 21, 2026

Summary

Operator's observation: the route-finder service is the dominant source of variance in our mux-bw measurements (26× throughput swing across intermediates for an identical request shape). Local route calc is deterministic by construction — same input, same output. But until now `calculateLocalRoutes` only supported 1-hop (direct) and 2-hop routes; anything deeper fell through to the route-finder service (and its randomness).

This PR extends `calculateLocalRoutes` to find N-hop paths via BFS over the transport-discovery graph, bounded by `Config.MaxHops`. Combined with `--local-route` on the caller, the operator gets a deterministic selector:

```
cli visor ping mux-bw --local-route --min-hops 3
→ same 3-hop path every call, picked deterministically by
remote-PK string sort at each BFS expansion step.
```

Combined with `DialOptions.ExcludeIntermediatePKs` (#2746), the operator can run controlled N=2/4/8 disjoint-mux experiments where each route is reproducibly chosen — the per-intermediate variance becomes selectable rather than random.

Implementation

  • Drop the previous "if MinHops > 2: error out" guard (was fix(router): calculateLocalRoutes must honor MinHops > 2 #2753's plug)
  • Drop the previous 2-hop-only intermediate search loop
  • BFS state: `{ pk, path []Hop }`, level = path length
  • Seed queue with `localTps` (1-hop paths to direct neighbors), excluding any in `ExcludeIntermediatePKs`
  • At each level, check if any queued node is dst with path ≥ MinHops; return shortest-acceptable
  • Expand each node by walking `transportsByEdge[node.pk]`, skipping already-visited (no cycles)
  • Deterministic ordering: sort seed and each level's children by remote-PK string before queueing

`reverseHops` + `hopPath` helpers added for forward-route flipping and log-line rendering.

Backwards compatibility

  • Callers without `DialOptions.MinHops` set get `minHops=2` (same as previous behavior)
  • 1-hop direct fast path above the BFS is unchanged
  • Self-ping 2-hop loop above the BFS is unchanged
  • Only N≥3 path discovery is new

Test plan

  • `go build ./...` clean
  • `golangci-lint run` clean
  • `go test ./pkg/router/...` pass (37s)
  • Live: `mux-bw --local-route --min-hops 5 ` finds a 5-hop path without route-finder service
  • Live: same call repeated returns same path (determinism)
  • Live: `--local-route --routes 4 --min-hops 2` with DisjointMux yields 4 routes with disjoint intermediates

… routing

Operator's observation: route-finder service randomization is the
dominant source of variance in mux-bw measurements (26x throughput
swing across intermediates for an identical request shape). Local
route calc is deterministic by construction — same input, same
output. But until now calculateLocalRoutes only supported 1-hop
(direct) and 2-hop routes. Anything deeper fell through to the
route-finder service (and its randomness).

This PR extends calculateLocalRoutes to find N-hop paths via BFS
over the transport-discovery graph, bounded by Config.MaxHops.
Combined with --local-route (SetForceLocalRoutes(true)) on the
caller side, this gives the operator a deterministic selector
for multi-hop runs:

  cli visor ping mux-bw <peer> --local-route --min-hops 3
    → same 3-hop path every call, picked deterministically by
      remote-PK string sort at each BFS expansion step.

Combined with DialOptions.ExcludeIntermediatePKs (Gamma's skycoin#2746),
the operator can run controlled N=2/4/8 disjoint-mux experiments
where each route is reproducibly chosen — the per-intermediate
variance becomes selectable rather than random.

IMPLEMENTATION

  - Drop the previous "if MinHops > 2: error out" guard.
  - Drop the previous 2-hop-only intermediate search loop.
  - BFS state: { pk, path []Hop }, level = path length.
  - Seed queue with localTps (1-hop paths to direct neighbors),
    excluding any in ExcludeIntermediatePKs.
  - At each level, check if any queued node is dst and the path
    is >= MinHops. If yes, return that node's path (this is the
    shortest acceptable path — BFS guarantees it).
  - Expand each node by walking transportsByEdge[node.pk],
    skipping already-visited nodes (no cycles).
  - Deterministic ordering: sort seed and each level's children
    by remote-PK string before queueing.
  - Stop when a node hits MaxHops or the queue empties.

  reverseHops + hopPath helpers added for forward-route flipping
  and log-line rendering respectively. Existing 1-hop and self-
  ping fast paths above the BFS are unchanged.

EFFECT

  - mux-bw --local-route --min-hops 5 now actually finds a 5-hop
    path locally without touching the route-finder service.
  - Same call repeated returns the same path → reproducible
    cross-call measurements.
  - --min-hops 3, 4 work too (was previously erroring out per
    skycoin#2753 since the 2-hop fallback couldn't satisfy them).
  - Backwards compat for callers that don't set MinHops: minHops
    defaults to 2 (same as previous behavior); BFS finds the
    same 2-hop path the loop would have.

NOTES

  - MaxHops is taken from r.conf.MaxHops (router-wide config,
    defaults to 50 per router.go's maxHops constant — bounded
    enough to prevent runaway BFS over a large global graph).
    Future enhancement: per-call MaxHops in DialOptions if the
    operator wants tighter control.
  - The previous "2-hop only" comment in the function docstring
    is updated to reflect N-hop BFS.

Build / gofmt / golangci-lint clean. Existing router tests pass
(37s wall — full pkg/router test suite).
@0pcom 0pcom merged commit 2db0bfd into skycoin:develop May 21, 2026
4 checks passed
0pcom added a commit that referenced this pull request May 21, 2026
…2758)

#2755 introduced an N-hop BFS in calculateLocalRoutes. Empirical
re-run against Beta and Gamma exposed a bug in the visited-set
semantics:

  Gamma h=2 (--min-hops 2): 2-hop path returned ✓
  Gamma h=3 (--min-hops 3): "BFS found no path" ✗
  Gamma h=4 (--min-hops 4): same ✗
  Gamma h=5 (--min-hops 5): same ✗
  Beta h=2 (--min-hops 2): 4-hop path returned (despite shorter
                            existing — visited-set picked an
                            arbitrary suboptimal one due to
                            sort+shadow interaction)

ROOT CAUSE

The previous BFS used a GLOBAL `visited[pk]` set. When intermediate
P gets marked visited at depth K (because it sits on the path of
length K to dst), the BFS refuses to expand through P at depth K+1
— even when no length-K path to dst exists and a length-K+1 path
through P would be the only option.

This is fine semantics for "shortest path from src to dst" (regular
BFS) but WRONG for "any path from src to dst with length >= MinHops".

FIX

Replace global visited[pk] with two checks:

  1. Per-path PK-membership for loop prevention (`pkInPath`):
     the same intermediate can appear in multiple paths but never
     twice in the same path. Cycles still impossible.

  2. (pk, depth) expansion cache (`expandedAtDepth`): when multiple
     inbound paths converge on the same (node, depth) pair, only
     expand once. Children would be identical anyway. This caps the
     total work at O(|visors| × MaxHops) — same as the old global
     visited check, but parameterized by depth so longer paths
     through known intermediates aren't shadowed.

EFFECT POST-MERGE

  cli visor ping mux-bw <peer> --local-route --min-hops 3
  cli visor ping mux-bw <peer> --local-route --min-hops 4
  cli visor ping mux-bw <peer> --local-route --min-hops 5

now find paths of the requested length when they exist in the local
transport graph, instead of returning "no path" because shorter
paths exist and shadowed the search. Determinism preserved (PK-
string sort at each expansion step is unchanged).

Build / gofmt / golangci-lint clean. Existing router tests pass
(37s — full pkg/router test suite, unchanged from #2755 baseline).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant