Skip to content

fix(router): calculateLocalRoutes must honor MinHops > 2#2753

Merged
0pcom merged 1 commit into
skycoin:developfrom
0pcom:fix/local-route-calc-honor-minhops
May 21, 2026
Merged

fix(router): calculateLocalRoutes must honor MinHops > 2#2753
0pcom merged 1 commit into
skycoin:developfrom
0pcom:fix/local-route-calc-honor-minhops

Conversation

@0pcom
Copy link
Copy Markdown
Collaborator

@0pcom 0pcom commented May 20, 2026

Summary

The local route-calc fallback in `calculateLocalRoutes` only knows how to build 1-hop and 2-hop routes. When the global route-finder fails to produce a path (transient timeout, stale TPD, `ErrNoSuitableTransport`), `DialRoutes` falls back to `calculateLocalRoutes` — which would silently return a 2-hop route even when the caller demanded `min-hops >= 3`.

Empirical repro (against my visor)

call requested actual hops_count
`mux-bw --min-hops 3` ≥3 2 (wrong)
`mux-bw --min-hops 4` ≥4 2 (wrong)
`mux-bw --min-hops 5` ≥5 5 (correct — RF query succeeded, no fallback)

`cli route find -n 3 -x 3` and `-n 4 -x 4` both return proper 3-hop and 4-hop paths, confirming the route-finder service itself honors min-hops. The bug is purely in the local fallback.

Fix

```go
// Skip the 2-hop fallback when the caller demands >2 hops.
// calculateLocalRoutes only knows how to build 1-hop and 2-hop
// paths; returning a 2-hop path when min-hops>=3 was asked
// would silently violate the constraint.
if dialOpts != nil && dialOpts.MinHops > 2 {
return nil, nil, fmt.Errorf("local route calc cannot satisfy min_hops=%d (supports up to 2-hop only)",
dialOpts.MinHops)
}
```

The router's retry loop above will then either re-attempt the route-finder query or return the error to the caller — both correct behaviors when the local cache genuinely can't satisfy the constraint.

Test plan

  • `go build ./...` clean
  • `golangci-lint run` clean
  • Live: `mux-bw --min-hops 3 ` reports hops_count >= 3 OR fails with a clean error message

The local route-calc fallback in calculateLocalRoutes only knows
how to build 1-hop (direct) and 2-hop routes. When the global
route-finder fails to produce a path (transient timeout, stale
TPD, ErrNoSuitableTransport), DialRoutes falls back to
calculateLocalRoutes — which would silently return a 2-hop route
even when the caller demanded min-hops >= 3.

Empirically reproduced:
  cli visor ping mux-bw <peer> --min-hops 3  → hops_count=2
  cli visor ping mux-bw <peer> --min-hops 4  → hops_count=2
  cli visor ping mux-bw <peer> --min-hops 5  → hops_count=5 (correct)

The route-finder service does honor min-hops correctly when
queried directly (`cli route find <peer> -n 3 -x 3` returns
proper 3-hop paths). The bug is in the local fallback: it
returns whatever 2-hop path it can build, regardless of opts.

FIX: short-circuit calculateLocalRoutes with an error when
dialOpts.MinHops > 2. The router-level retry loop above will then
either re-attempt the route-finder query or return the error to
the caller — both of which are correct behaviors when the caller
explicitly demands a multi-hop path the local cache can't
satisfy.

Effect: `mux-bw --min-hops N` with N >= 3 either gets a route
that actually has >= N hops (when the route-finder service is
healthy) or fails cleanly with a useful error — no more silent
constraint violations.

Build / gofmt / golangci-lint clean. The earlier MinHops > 1
guard (suppressing the direct-1-hop probe) added in skycoin#2749 is
unchanged; this PR layers a second guard for the 2-hop case.
@0pcom 0pcom merged commit e89333a into skycoin:develop May 21, 2026
3 of 4 checks passed
0pcom added a commit that referenced this pull request May 21, 2026
… routing (#2755)

Operator's observation: route-finder service randomization is the
dominant source of variance in mux-bw measurements (26x throughput
swing across intermediates for an identical request shape). Local
route calc is deterministic by construction — same input, same
output. But until now calculateLocalRoutes only supported 1-hop
(direct) and 2-hop routes. Anything deeper fell through to the
route-finder service (and its randomness).

This PR extends calculateLocalRoutes to find N-hop paths via BFS
over the transport-discovery graph, bounded by Config.MaxHops.
Combined with --local-route (SetForceLocalRoutes(true)) on the
caller side, this gives the operator a deterministic selector
for multi-hop runs:

  cli visor ping mux-bw <peer> --local-route --min-hops 3
    → same 3-hop path every call, picked deterministically by
      remote-PK string sort at each BFS expansion step.

Combined with DialOptions.ExcludeIntermediatePKs (Gamma's #2746),
the operator can run controlled N=2/4/8 disjoint-mux experiments
where each route is reproducibly chosen — the per-intermediate
variance becomes selectable rather than random.

IMPLEMENTATION

  - Drop the previous "if MinHops > 2: error out" guard.
  - Drop the previous 2-hop-only intermediate search loop.
  - BFS state: { pk, path []Hop }, level = path length.
  - Seed queue with localTps (1-hop paths to direct neighbors),
    excluding any in ExcludeIntermediatePKs.
  - At each level, check if any queued node is dst and the path
    is >= MinHops. If yes, return that node's path (this is the
    shortest acceptable path — BFS guarantees it).
  - Expand each node by walking transportsByEdge[node.pk],
    skipping already-visited nodes (no cycles).
  - Deterministic ordering: sort seed and each level's children
    by remote-PK string before queueing.
  - Stop when a node hits MaxHops or the queue empties.

  reverseHops + hopPath helpers added for forward-route flipping
  and log-line rendering respectively. Existing 1-hop and self-
  ping fast paths above the BFS are unchanged.

EFFECT

  - mux-bw --local-route --min-hops 5 now actually finds a 5-hop
    path locally without touching the route-finder service.
  - Same call repeated returns the same path → reproducible
    cross-call measurements.
  - --min-hops 3, 4 work too (was previously erroring out per
    #2753 since the 2-hop fallback couldn't satisfy them).
  - Backwards compat for callers that don't set MinHops: minHops
    defaults to 2 (same as previous behavior); BFS finds the
    same 2-hop path the loop would have.

NOTES

  - MaxHops is taken from r.conf.MaxHops (router-wide config,
    defaults to 50 per router.go's maxHops constant — bounded
    enough to prevent runaway BFS over a large global graph).
    Future enhancement: per-call MaxHops in DialOptions if the
    operator wants tighter control.
  - The previous "2-hop only" comment in the function docstring
    is updated to reflect N-hop BFS.

Build / gofmt / golangci-lint clean. Existing router tests pass
(37s wall — full pkg/router test suite).
0pcom added a commit that referenced this pull request May 21, 2026
… + MinHops (#2757)

Beta's #2756 MuxRouteFailure event surfaced the smoking gun on
mux-bw --routes N --min-hops 2: one route would establish but its
pump loop immediately failed with

  no ping connection for <pk>#0, call DialPing first

while MuxRouteEstablished named route_index = 2 for the same
route. The lookup PingRouteRef.Index was 0 even though the
pump goroutine had RouteIndex = 2.

ROOT CAUSE
The rpcgrpc PingConf → visor PingConfig adapter in
pkg/visor/init_apps.go's visorPingAdapter forwards RouteIndex
correctly for DialPing (line 238) and PingOnce (line 249), but
PingOnceWithEcho's adapter (lines 316-322) was missing the
field — same for MinHops. Aux-route pumps therefore degraded to
a primary-route (Index=0) lookup, finding nothing because
DialPing had registered the conn at the matching aux Index.

FIX
Add `RouteIndex: conf.RouteIndex` and `MinHops: conf.MinHops` to
the visorPingAdapter.PingOnceWithEcho conversion. Three-line
adapter parity fix.

DMSG adapter (DmsgPingOnceWithEcho) intentionally untouched —
v.dmsgPing.conns is keyed by PK alone, not by PingRouteRef, so
no DMSG path consumes RouteIndex.

TEST
TestPingAdapter_PingOnceWithEcho_ForwardsRouteIndex pins the
adapter contract: with no ping connection registered, the visor's
PingOnceWithEcho returns

  no ping connection for %s#%d, call DialPing first

The "%d" portion is conf.RouteIndex post-adapter. The test calls
the adapter with RouteIndex in {0, 1, 2, 7} and asserts the error
message contains the matching `#<idx>,` — so a regression that
drops RouteIndex (or stuffs in MinHops by accident) surfaces in
the error string directly. No mock VisorAPI, no fixtures.

EMPIRICAL CHAIN
This is the third bug in the mux-bw --min-hops measurement chain
to surface via the wire-event observability landed in #2746#2749#2751#2752#2750#2753#2754#2756. The pattern
holds: each event-surface fix lights up the next bug downstream.
The operator's "mux > direct" hypothesis test should now finally
be measurable end-to-end once #2756 (route_failure event) +
this PR auto-deploy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant