fix(router): calculateLocalRoutes must honor MinHops > 2#2753
Merged
Conversation
The local route-calc fallback in calculateLocalRoutes only knows how to build 1-hop (direct) and 2-hop routes. When the global route-finder fails to produce a path (transient timeout, stale TPD, ErrNoSuitableTransport), DialRoutes falls back to calculateLocalRoutes — which would silently return a 2-hop route even when the caller demanded min-hops >= 3. Empirically reproduced: cli visor ping mux-bw <peer> --min-hops 3 → hops_count=2 cli visor ping mux-bw <peer> --min-hops 4 → hops_count=2 cli visor ping mux-bw <peer> --min-hops 5 → hops_count=5 (correct) The route-finder service does honor min-hops correctly when queried directly (`cli route find <peer> -n 3 -x 3` returns proper 3-hop paths). The bug is in the local fallback: it returns whatever 2-hop path it can build, regardless of opts. FIX: short-circuit calculateLocalRoutes with an error when dialOpts.MinHops > 2. The router-level retry loop above will then either re-attempt the route-finder query or return the error to the caller — both of which are correct behaviors when the caller explicitly demands a multi-hop path the local cache can't satisfy. Effect: `mux-bw --min-hops N` with N >= 3 either gets a route that actually has >= N hops (when the route-finder service is healthy) or fails cleanly with a useful error — no more silent constraint violations. Build / gofmt / golangci-lint clean. The earlier MinHops > 1 guard (suppressing the direct-1-hop probe) added in skycoin#2749 is unchanged; this PR layers a second guard for the 2-hop case.
6 tasks
0pcom
added a commit
that referenced
this pull request
May 21, 2026
… routing (#2755) Operator's observation: route-finder service randomization is the dominant source of variance in mux-bw measurements (26x throughput swing across intermediates for an identical request shape). Local route calc is deterministic by construction — same input, same output. But until now calculateLocalRoutes only supported 1-hop (direct) and 2-hop routes. Anything deeper fell through to the route-finder service (and its randomness). This PR extends calculateLocalRoutes to find N-hop paths via BFS over the transport-discovery graph, bounded by Config.MaxHops. Combined with --local-route (SetForceLocalRoutes(true)) on the caller side, this gives the operator a deterministic selector for multi-hop runs: cli visor ping mux-bw <peer> --local-route --min-hops 3 → same 3-hop path every call, picked deterministically by remote-PK string sort at each BFS expansion step. Combined with DialOptions.ExcludeIntermediatePKs (Gamma's #2746), the operator can run controlled N=2/4/8 disjoint-mux experiments where each route is reproducibly chosen — the per-intermediate variance becomes selectable rather than random. IMPLEMENTATION - Drop the previous "if MinHops > 2: error out" guard. - Drop the previous 2-hop-only intermediate search loop. - BFS state: { pk, path []Hop }, level = path length. - Seed queue with localTps (1-hop paths to direct neighbors), excluding any in ExcludeIntermediatePKs. - At each level, check if any queued node is dst and the path is >= MinHops. If yes, return that node's path (this is the shortest acceptable path — BFS guarantees it). - Expand each node by walking transportsByEdge[node.pk], skipping already-visited nodes (no cycles). - Deterministic ordering: sort seed and each level's children by remote-PK string before queueing. - Stop when a node hits MaxHops or the queue empties. reverseHops + hopPath helpers added for forward-route flipping and log-line rendering respectively. Existing 1-hop and self- ping fast paths above the BFS are unchanged. EFFECT - mux-bw --local-route --min-hops 5 now actually finds a 5-hop path locally without touching the route-finder service. - Same call repeated returns the same path → reproducible cross-call measurements. - --min-hops 3, 4 work too (was previously erroring out per #2753 since the 2-hop fallback couldn't satisfy them). - Backwards compat for callers that don't set MinHops: minHops defaults to 2 (same as previous behavior); BFS finds the same 2-hop path the loop would have. NOTES - MaxHops is taken from r.conf.MaxHops (router-wide config, defaults to 50 per router.go's maxHops constant — bounded enough to prevent runaway BFS over a large global graph). Future enhancement: per-call MaxHops in DialOptions if the operator wants tighter control. - The previous "2-hop only" comment in the function docstring is updated to reflect N-hop BFS. Build / gofmt / golangci-lint clean. Existing router tests pass (37s wall — full pkg/router test suite).
5 tasks
0pcom
added a commit
that referenced
this pull request
May 21, 2026
… + MinHops (#2757) Beta's #2756 MuxRouteFailure event surfaced the smoking gun on mux-bw --routes N --min-hops 2: one route would establish but its pump loop immediately failed with no ping connection for <pk>#0, call DialPing first while MuxRouteEstablished named route_index = 2 for the same route. The lookup PingRouteRef.Index was 0 even though the pump goroutine had RouteIndex = 2. ROOT CAUSE The rpcgrpc PingConf → visor PingConfig adapter in pkg/visor/init_apps.go's visorPingAdapter forwards RouteIndex correctly for DialPing (line 238) and PingOnce (line 249), but PingOnceWithEcho's adapter (lines 316-322) was missing the field — same for MinHops. Aux-route pumps therefore degraded to a primary-route (Index=0) lookup, finding nothing because DialPing had registered the conn at the matching aux Index. FIX Add `RouteIndex: conf.RouteIndex` and `MinHops: conf.MinHops` to the visorPingAdapter.PingOnceWithEcho conversion. Three-line adapter parity fix. DMSG adapter (DmsgPingOnceWithEcho) intentionally untouched — v.dmsgPing.conns is keyed by PK alone, not by PingRouteRef, so no DMSG path consumes RouteIndex. TEST TestPingAdapter_PingOnceWithEcho_ForwardsRouteIndex pins the adapter contract: with no ping connection registered, the visor's PingOnceWithEcho returns no ping connection for %s#%d, call DialPing first The "%d" portion is conf.RouteIndex post-adapter. The test calls the adapter with RouteIndex in {0, 1, 2, 7} and asserts the error message contains the matching `#<idx>,` — so a regression that drops RouteIndex (or stuffs in MinHops by accident) surfaces in the error string directly. No mock VisorAPI, no fixtures. EMPIRICAL CHAIN This is the third bug in the mux-bw --min-hops measurement chain to surface via the wire-event observability landed in #2746 → #2749 → #2751 → #2752 → #2750 → #2753 → #2754 → #2756. The pattern holds: each event-surface fix lights up the next bug downstream. The operator's "mux > direct" hypothesis test should now finally be measurable end-to-end once #2756 (route_failure event) + this PR auto-deploy.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The local route-calc fallback in `calculateLocalRoutes` only knows how to build 1-hop and 2-hop routes. When the global route-finder fails to produce a path (transient timeout, stale TPD, `ErrNoSuitableTransport`), `DialRoutes` falls back to `calculateLocalRoutes` — which would silently return a 2-hop route even when the caller demanded `min-hops >= 3`.
Empirical repro (against my visor)
`cli route find -n 3 -x 3` and `-n 4 -x 4` both return proper 3-hop and 4-hop paths, confirming the route-finder service itself honors min-hops. The bug is purely in the local fallback.
Fix
```go
// Skip the 2-hop fallback when the caller demands >2 hops.
// calculateLocalRoutes only knows how to build 1-hop and 2-hop
// paths; returning a 2-hop path when min-hops>=3 was asked
// would silently violate the constraint.
if dialOpts != nil && dialOpts.MinHops > 2 {
return nil, nil, fmt.Errorf("local route calc cannot satisfy min_hops=%d (supports up to 2-hop only)",
dialOpts.MinHops)
}
```
The router's retry loop above will then either re-attempt the route-finder query or return the error to the caller — both correct behaviors when the local cache genuinely can't satisfy the constraint.
Test plan