Skip to content

fix(visor): write deadlines on PingOnceWithEcho + probe RouteIndex passthrough#2762

Merged
0pcom merged 1 commit into
skycoin:developfrom
0pcom:fix/mux-bw-pump-stall-and-probe-route-index
May 21, 2026
Merged

fix(visor): write deadlines on PingOnceWithEcho + probe RouteIndex passthrough#2762
0pcom merged 1 commit into
skycoin:developfrom
0pcom:fix/mux-bw-pump-stall-and-probe-route-index

Conversation

@0pcom
Copy link
Copy Markdown
Collaborator

@0pcom 0pcom commented May 21, 2026

Summary

Two related fixes from the post-#2761 measurement work.

1. Silent pump stall — `pkg/visor/api_ping.go` (#127)

`PingOnceWithEcho` set read deadlines (30s) but NO write deadlines. If the underlying TCP send buffer filled (route under load, slow intermediate, etc.) `conn.Write` would block indefinitely. Empirical symptom: mux-bw runs with route_established success + 0 bytes pumped + 0 MuxRouteFailure events emitted, because the pump goroutine was stuck inside Write rather than returning with an error.

Phase 6 sweep against Gamma post-#2761: 10 of 12 cells showed exactly this — idle baseline n=40-46 probes, loaded n=0-2, sent=recv=0B for the full pump duration.

Fix: `SetWriteDeadline` mirrors the `SetReadDeadline` calls already in place (30s). On deadline, Write returns `net.ErrDeadlineExceeded` and the pump's existing error-return path fires a `MuxRouteFailure` event (#2756) — making the stall observable instead of silent.

2. Probe hardcoded RouteIndex 0 — Gamma's find at 20:20Z (#130)

`muxBwProbeLoop`'s `probeConf` didn't set `RouteIndex`, so every probe used `PingRouteRef{PK, Index: 0}`. When N>1 and the first established route was at a non-zero index (Gamma's Phase D: only R2 of N=8 came up), every probe failed with "no ping connection". Same class as #2757's adapter bug but in the probe path.

Fix: `muxBwProbeLoop` takes a `routeIndex` parameter. Both callers (idle baseline + loaded probe) pick the first established route and pass its index. The idle phase needs the same selection logic as the loaded phase — previously it ran blindly, also hitting RouteIndex 0.

Test plan

  • `go build ./...` clean
  • `golangci-lint run` clean
  • Live: `mux-bw --routes 1 --probe-rtt --duration 30s` against a peer where TCP buffers fill — pump terminates with MuxRouteFailure within 30s instead of hanging 30s with 0 bytes
  • Live: `mux-bw --routes 8` where only a non-zero index establishes — probes work against that index instead of all failing

…ssthrough

Two related fixes that emerged from the post-skycoin#2761 measurement work:

1. SILENT PUMP STALL (skycoin#127) — pkg/visor/api_ping.go

   PingOnceWithEcho set read deadlines (30s) but NO write deadlines.
   If the underlying TCP send buffer filled (route under load, slow
   intermediate, etc.) conn.Write would block indefinitely. Empirical
   symptom: mux-bw runs with route_established success + 0 bytes
   pumped + 0 MuxRouteFailure events emitted, because the pump
   goroutine was stuck inside Write rather than returning with an
   error.

   In my Phase 6 sweep against Gamma post-skycoin#2761: 10 of 12 cells
   showed exactly this pattern (idle baseline n=40-46 probes,
   loaded n=0-2, sent=recv=0B for the full pump duration).

   Fix: SetWriteDeadline mirrors the SetReadDeadline calls already
   in place (30s). On deadline, Write returns net.ErrDeadlineExceeded
   and the pump's existing error-return path fires a MuxRouteFailure
   event (skycoin#2756) — making the stall observable instead of silent.

2. PROBE HARDCODED RouteIndex 0 (skycoin#130) — Gamma's find at 20:20Z

   muxBwProbeLoop's probeConf didn't set RouteIndex, so every probe
   used PingRouteRef{PK, Index: 0}. When N>1 and the first
   established route was at a non-zero index (Gamma's Phase D:
   only R2 of N=8 came up), every probe failed with "no ping
   connection". Same class as skycoin#2757's adapter bug but in the probe
   path.

   Fix: muxBwProbeLoop takes a routeIndex parameter. Both callers
   (idle baseline + loaded probe) pick the first established route
   and pass its index. The idle phase needs the same selection
   logic as the loaded phase — previously it just ran the loop
   blindly, which would also hit RouteIndex 0 even when no
   route 0 had been established.

For skycoin#127, the deferred SetReadDeadline reset paths gain the
matching SetWriteDeadline reset so the conn returns to its
deadline-free state on all return paths.

For skycoin#130, the loaded-phase probe selection logic at line 257-265
is preserved verbatim; the only change is passing probeTarget.index
into the goroutine. The idle-phase selection gets new "find first
established route" logic that mirrors the loaded-phase pattern.

Build / gofmt / golangci-lint clean.
@0pcom 0pcom merged commit a024a24 into skycoin:develop May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant