Skip to content

iOS worker: TCP listener accepts SYN probes but refuses full connect() from master #79

@cjchanh

Description

@cjchanh

Environment

  • Master: macOS 26.3.1, Apple M1 (8GB), Cake v0.1.0 built from source (commit f2e6b1ef area, Metal feature)
  • Worker: iPad Air 13-inch M3 (iPad15,5), iPadOS 26.2, Cake iOS app built from source (KMP + Rust static lib, aarch64-apple-ios, Metal feature)
  • Network: Same WiFi LAN subnet (10.0.0.x), both devices reachable via ping

Behavior

UDP zero-config discovery works perfectly — the master discovers the iPad worker and assigns layers:

discovered worker 'My Phone' at 10.0.0.246:10128 with 1 GPU(s)
  iPad15,5 — 7.4 GiB (~3.0 TFLOPS)
discovery complete: 1 worker(s) found
master: 3.2 TFLOPS — workers: 3.0 TFLOPS — assigning 12 of 24 layers to workers
  My Phone (7.4 GiB, 3.0 TFLOPS) → 12 layers (475 MiB)
connecting to worker 'My Phone' at 10.0.0.246:10128 ...
Error: can't connect to 10.0.0.246:10128: No route to host (os error 65)

The subsequent TCP connection to the worker always fails with EHOSTUNREACH (os error 65) or ECONNREFUSED (os error 61).

Diagnostic evidence

SYN probe succeeds, full connect() fails:

# nc -z (SYN probe only) — succeeds every time
$ nc -z -w 3 10.0.0.246 10128
Connection to 10.0.0.246 port 10128 [tcp/bmc-perf-sd] succeeded!

# Python full TCP connect() — fails every time
$ python3 -c "
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(5)
s.connect(('10.0.0.246', 10128))
"
# → [Errno 61] Connection refused

This pattern was 100% reproducible across 10+ attempts. nc -z always reports the port as open. Python socket.connect() and Cake's Rust TcpStream::connect() always fail.

Ruling out network issues:

  • ping 10.0.0.246 succeeds (56ms avg)
  • route get 10.0.0.246 routes through en0 (WiFi LAN), not Tailscale
  • UDP discovery broadcast on port 10127 works both directions
  • iPad Local Network permission is granted (Settings → app → Local Network = ON)
  • iPad Developer Mode is enabled
  • iPad screen auto-lock is OFF, app is in foreground showing "waiting for master"

Hypothesis

The iOS Cake worker binds TcpListener on 0.0.0.0:10128 but the iOS sandbox or network stack may be interfering with accept(). The port appears open to SYN probes (half-open scan) but full three-way handshakes are refused. This could be:

  1. iOS sandbox restricting incoming TCP connections after the initial bind
  2. The TcpListener not being polled/accepted in time (async runtime issue)
  3. An iOS network entitlement requirement for accepting incoming connections that isn't in the current entitlements

Steps to reproduce

  1. Build cake-mobile for iOS: RUSTC_WRAPPER="" SDKROOT=$(xcrun --sdk iphoneos --show-sdk-path) IPHONEOS_DEPLOYMENT_TARGET=16.0 cargo build --release --target=aarch64-apple-ios -p cake-mobile --features metal
  2. Build KMP framework: cd cake-mobile-app && ./gradlew :shared:linkReleaseFrameworkIosArm64
  3. Deploy to iPad via Xcode
  4. Start worker with any cluster key
  5. On macOS master: cake run <model> --cluster-key <same-key> --discovery-timeout 10
  6. Discovery succeeds, TCP connection fails

Expected

Master connects to worker on TCP 10128, streams weight shards, distributed inference begins.

Actual

Error: can't connect to 10.0.0.246:10128: No route to host (os error 65)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions