Skip to content

feat(discovery): mDNS-SD primary-IP hints for service-order-aware dialing#423

Closed
jondkinney wants to merge 52 commits intofeschber:mainfrom
jondkinney:split/04-mdns
Closed

feat(discovery): mDNS-SD primary-IP hints for service-order-aware dialing#423
jondkinney wants to merge 52 commits intofeschber:mainfrom
jondkinney:split/04-mdns

Conversation

@jondkinney
Copy link
Copy Markdown
Contributor

@jondkinney jondkinney commented May 6, 2026

Review-only focused diff (just this PR's commits, vs. split/03-network): jondkinney/lan-mouse@split/03-network...split/04-mdns

Summary

mDNS-SD service-order discovery — even with a multi-homed listener, the dialer still has to choose which of the peer's IPs to dial first, and plain hostname resolution returns every interface's IP without ranking. connect_any's parallel race picks whichever DTLS handshake completes first, which is RTT-roughly-correct but not always what the user wanted. The classic symptom: Wi-Fi wins the race even when the user has Ethernet ranked higher in macOS's service order, leading to a stuttery session over Wi-Fi while a healthy wired path sits idle.

Each lan-mouse instance now publishes a _lan-mouse._udp.local. Bonjour service whose TXT record carries primary=<ipv4>, where <ipv4> is the IP of the interface that owns the default route — which on macOS reflects service order, on Linux the lowest-metric default route, on Windows whatever GetBestRoute2 selects. The dialer continuously browses the same service type and caches peer_hostname → primary_ipv4 in a Rc<RefCell<HashMap>> shared with LanMouseConnection.

connect_any extended with happy-eyeballs head-start: if a preferred address is known, dial it alone for 200ms before joining the rest of the candidate list to the race. A healthy preferred path virtually always wins; a broken one only delays connect by 200ms before fallbacks kick in. (Cf. RFC 8305 IPv6→IPv4 fallback delay.)

Subsystem gated by a new mdns_discovery config flag (default true) and a corresponding GUI switch under a new "Network Discovery" preferences group. Toggling off unregisters the service, aborts the browse task, and shuts the daemon, but preserves the primary_cache so already-known hints stay queryable until overwritten — useful on networks where mDNS multicast (224.0.0.251) is firewalled. A 30-second discovery_refresh_tick re-publishes the TXT record so it stays accurate when the OS-preferred interface changes (e.g. user toggles Wi-Fi off and Ethernet takes over).

New deps: mdns-sd (cross-platform mDNS responder, doesn't piggyback on system Avahi/Bonjour), netdev (default-route lookup), hostname (local hostname for the service instance name).

Falls back gracefully when ServiceDaemon::new fails (multicast group locked / no perms), no interface owns the default route, or the peer isn't announcing (old version or discovery disabled there) — the dialer just sees preferred = None and the existing connect_any race runs unchanged.

Test plan

  • Multi-homed Mac (Wi-Fi + Ethernet on same subnet): Linux dialer consistently selects the Ethernet path even when Wi-Fi wins a raw RTT race
  • Disable mDNS via the GUI switch on one peer — that peer's record disappears; the other peer's dialer falls back to non-preferred connection
  • Toggle the Mac's Wi-Fi off mid-session — within 30s the discovery refresh tick re-publishes with the new primary IP
  • Old peer (no _lan-mouse._udp.local. advertised) — dialer sees preferred = None and falls back to existing race behavior

Split out from #418, the umbrella PR collecting ~10 independent feature areas. This PR is the mDNS-SD discovery subset and depends on the multi-homed-listener PR. See #418 for the full picture.

Stack overview

These PRs are split out from #418 and stack in this order:

  1. feat(capture): wall-press auto-release + Bounds protocol foundation #420 — cursor sync + wall-press + host-lock + slider/UI
  2. feat: peer version exchange via Hello proto event #421 — peer version exchange
  3. fix: hostname resolution via OS resolver + multi-homed DTLS listener #422 — hostname resolver + multi-homed DTLS listener
  4. feat(discovery): mDNS-SD primary-IP hints for service-order-aware dialing #423 — mDNS-SD service-order discovery
  5. macOS: QoL bundle (LSUIElement, TCC flow, quit-unfreezable, display wake) + UI polish #424 — macOS QoL + UI polish
  6. feat(scroll): receiver natural-scroll toggle + wlroots axis_source + macOS v120 fix #425 — scroll forwarding
  7. feat(gui): cross-platform GUI singleton via dedicated socket #426 — GUI singleton

Each PR's branch builds on the previous one, so until earlier PRs are merged the cumulative diff against main includes all preceding work. Reviewing in order is easiest.

@feschber
Copy link
Copy Markdown
Owner

feschber commented May 6, 2026

Similar to my comment on #422: Is this really necessary? From my testing (I will have to check again), at least on Linux if I connect an Ethernet Cable, the Listener on the Wifi Interface becomes unreachable (because it replies via the Ethernet port) - Service order is automatically correct.
Again: Not 100% sure about this, I will need to retest.

@jondkinney
Copy link
Copy Markdown
Contributor Author

@feschber it was necessary in my testing because regardless of service order my mac was having the wifi chosen over the ethernet. I could immediately tell because it was choppy and slow on wifi and smooth on ethernet. Then confirmed by poking at things while they were running on both machines.

@jondkinney jondkinney force-pushed the split/04-mdns branch 3 times, most recently from d03f5fe to f053971 Compare May 6, 2026 20:38
jondkinney and others added 17 commits May 6, 2026 16:01
Adds a host-side fallback that releases capture when the user
sweeps the cursor against the host-adjacent edge of the guest
and keeps pushing past a configurable threshold. Solves the
"two locked screens" case where the peer's capture backend
can't fire CaptureBegin (and therefore can't send Leave back),
leaving the host stuck capturing indefinitely until the
release-bind chord is pressed.

Algorithm lives in InputCapture::poll_next so every backend
(macOS, libei, layer-shell, x11, windows, dummy) gets it for
free — they only need to emit standard motion events through
the existing Stream interface, which they already do. The
wrapper tracks:

  virtual_pos: signed position along the entry axis, clamped at
    0 from below. No upper clamp — the wrapper can't know the
    guest's far-edge extent without protocol-level cooperation,
    and any proxy is wrong for some user's setup.
  wall_pressure: motion that overshoots the host-adjacent edge
    and would have driven virtual_pos negative. Fires
    CaptureEvent::AutoRelease when the threshold is reached;
    the capture loop then runs the same teardown path as the
    release-bind chord.

State resets on Begin (entry to capture), AutoRelease (we
self-released), and external release (chord, peer Leave,
connection error, EnterOnly fallback).

Surface:
- New FrontendRequest::SetReleaseThreshold + FrontendEvent::
  ReleaseThreshold IPC pair.
- New release_threshold_px field on the daemon config (0 = off,
  serialized to config.toml).
- New AdwPreferencesGroup with a 0–500px slider in the GTK
  window. Default 0 (disabled) so existing users see no
  behavior change until they opt in.
- New CaptureEvent::AutoRelease variant + handling in
  src/capture.rs's handle_capture_event (short-circuit to
  release_capture, which already synthesizes key-ups and sends
  Leave to the peer).

Known limitation: the wrapper has no way to know where the
guest's cursor actually is (the guest doesn't tell us). On
re-entry into a peer mid-session, virtual_pos resets to 0 but
the guest's cursor may still be in the middle of its screen
from the prior session, causing the threshold to fire from
the wrong reference point. A protocol-level Bounds event +
cursor-warp on Enter is needed for full correctness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a new ProtoEvent variant carrying the receiving device's
display geometry (in pixels). Sent by the emulation side
right after acknowledging an Enter so the capturing peer can
model the guest cursor's position along the entry axis.

Wire format: 1-byte EventType discriminator (Bounds = 11)
followed by big-endian u32 width and big-endian u32 height
— 9 bytes total, well under MAX_EVENT_SIZE (21).

This commit only adds the protocol wiring. Senders and the
host-side cache come in subsequent commits. Old peers that
don't recognize EventType=11 will skip the datagram per the
forward-compat fix in the previous commit, so deployment is
incremental: the emulation side can start sending Bounds
without breaking older capturing peers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `display_bounds(pos)` and `warp_cursor(pos)` to the
InputEmulation trait and implement them across every backend:

  - macOS: CGDisplay APIs for bounds, CGWarpMouseCursorPosition for warp
  - x11: RandR for bounds, XWarpPointer for warp
  - wlroots: wl_output extents + virtual_pointer.motion_absolute
  - libei: region walking + ei_pointer.emit_motion_absolute
  - Windows: GetSystemMetrics + SetCursorPos
  - xdg_desktop_portal: no-op fallback (the protocol exposes neither
    bounds nor a warp primitive)

These are the prerequisites for the protocol-based wall-press
auto-release: emulation hosts now have a common API to report their
display extents to peers and to warp the cursor on Enter so the
host's modeled virtual_pos = 0 matches the guest's actual cursor.
Wire the new emulation-side capabilities into the daemon's
listener task. When a peer's Enter arrives:

  1. Reply Ack (existing behavior).
  2. Reply Bounds(width, height) using the cached display
     geometry from the active emulation backend.
  3. Warp the local cursor to the entry edge of the displayed
     position (0 for Left, width-1 for Right, etc., centered
     along the orthogonal axis).

The warp is the structural fix for the "cursor jumps back to
where it was" symptom: previously, on re-entry into a peer
mid-session, the cursor stayed wherever the prior capture
session left it, breaking the host's wall-press model
(virtual_pos=0 in the host's mind didn't match the guest's
actual cursor column). With the warp, the host's model is
synchronized with the guest's reality on every Enter.

EmulationProxy gains:
  - Cached display_bounds (Rc<Cell<Option<(u32, u32)>>>),
    refreshed each time the underlying InputEmulation is
    (re)created. Read by the listener task.
  - warp_cursor(x, y) fire-and-forget. Drops if emulation
    isn't currently active (no live backend to receive it).

ProxyRequest::Warp(x, y) carries the request to EmulationTask,
which dispatches to InputEmulation::warp_cursor.

If the active backend doesn't implement display_bounds — every
non-macOS backend right now — the listener skips the Bounds
reply and the warp call. The capturing peer falls back to its
existing "no upper clamp / virtual_pos = 0 on Begin" heuristic,
which is degraded but functional. Adding display_bounds /
warp_cursor to other backends unlocks correct behavior
incrementally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
InputCapture now keeps a per-position HashMap of peer display
geometry, populated when ProtoEvent::Bounds arrives from the
peer (handled in src/capture.rs's recv arm). track_wall_press
uses the cached entry-axis extent as the upper clamp for
virtual_pos:

  self.virtual_pos = proposed.clamp(0.0, peer_extent);

Eliminates the runaway-virtual_pos bug from the heuristic
fallback: when the user obliviously over-pushes their physical
mouse past the guest's actual far edge, the modeled position
clamps at the real width instead of climbing fictionally to
infinity. Now the user's "walk back" cost is bounded by the
guest's actual screen width.

When the peer hasn't sent Bounds yet (older peer running
without the protocol extension, or in the brief pre-Ack
window of a fresh connection), peer_extent returns INFINITY
and the model degrades to the prior heuristic.

Cache lifecycle:
  - Insert on ProtoEvent::Bounds.
  - Drop on CaptureRequest::Destroy(handle) so re-adding the
    same peer later starts fresh.

Combined with the previous commit (emulation warps cursor on
Enter), the host's virtual_pos = 0 at Begin now matches the
guest's actual cursor at column 0 (or width-1, etc.) on every
re-entry. The "cursor was in the middle, 200px back fires
release prematurely" bug is fixed structurally rather than
papered over.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The label "Auto-Release" reads as a global app preference; the
description's "forwarded mouse capture" was ambiguous about which
machine does the forwarding. Rename the group to
"Outgoing Auto-Release" so the scope mirrors the surrounding
"Outgoing Connections" / "Incoming Connections" groups, and lead
the description with "When this machine is capturing input for a
peer …" so a user scanning the window can tell at a glance that
this setting only matters when the local machine is the host.
GtkScale's default behavior treats a vertical scroll event as
+/- increment, which means the threshold creeps any time the
user is scrolling the window and the cursor passes over the
slider — easy to do given the slider sits in the middle of the
preferences pane.

Add an EventControllerScroll to the slider in CAPTURE phase that
returns Propagation::Stop unconditionally. The scale's own scroll
controller never sees the event, so the value doesn't change.

Trade-off: scrolling doesn't pass through to the parent
GtkScrolledWindow while the cursor is on the slider — the wheel
becomes inert there. Acceptable: prior behavior was actively
destructive (silent state corruption); this is just "no scroll
in this small region." If users start complaining about the gap,
the next step is to forward dy to the ancestor scrolled window's
vadjustment manually before returning Stop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Old text described the mechanism ("releases capture automatically once
the cursor pushes past the host-adjacent edge") without explaining
when the user would actually need it. With the new peer-Leave deadline
gate (34605a7), wall-press only fires when the peer can't deliver a
Leave — i.e. when the peer's screen is locked or its capture backend
is otherwise suppressed. New text leads with that framing and trims
two sentences to two.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Capture-phase scroll handler used to return Propagation::Stop to
suppress GtkScale's default scroll-to-adjust behavior, but Stop also
killed propagation to the parent — so the main window wouldn't
scroll when the cursor was over the slider. Frustrating because the
slider sits in the middle of the preferences pane and "I just want
to scroll past this" is the common interaction.

Same capture-phase handler now walks up to the ancestor
ScrolledWindow and bumps its vadjustment by `dy * step_increment`
(or 40px when step_increment is unset). Mimics what native scroll
passthrough would have done — slider value stays fixed, parent
scrolls smoothly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Windows clippy flags `loop { let Some(...) = get_msg() else { break } }`
as while-let-loop. Rewrite to `while let Some(msg) = get_msg() { … }`.
The inner `break` for `RequestType::Exit` still breaks the surrounding
while-let, so semantics are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the Proxy trait import needed by the wlroots backend's
`output.id()` call (introduced when the emulation side started
binding wl_output for display_bounds), and applies cargo fmt for
this split's own files.
Before: when crossing machines, the guest's cursor jumped to the
midpoint of the entry edge — a ~100 px Y-jump on typical
displays — because the guest snapped to a hardcoded
(0, h/2) / (w/2, 0) point on Enter. Visually discontinuous and
hard to follow when the user is mid-task.

After: the host's capture backend snapshots the screen-space cursor
position at the instant of the edge crossing (CGEvent.location()
on macOS — the only backend that can report this today; others
emit None and the guest falls back to the prior midpoint warp).
The capture loop scales those host coords against the cached peer
geometry and sends them as a new ProtoEvent::MotionAbsolute right
after Enter. The guest handles MotionAbsolute by warping the
cursor to (x, y), overriding the entry-edge midpoint so the user
sees visual continuity across the boundary.

Layered choices:

- New ProtoEvent::MotionAbsolute { x, y } primitive rather than
  bolting an offset onto Enter — gives a reusable
  position-setting building block for future features (snap to
  point on app launch, multi-monitor handoff, follow-host-cursor
  modes) without inventing more event variants.
- Pixel coordinates in the receiver's screen space, not normalized
  floats — host already caches peer bounds (Bounds proto event)
  for the wall-press upper clamp, so it can do the scaling and
  the guest just calls warp_cursor directly. Guest's
  warp_cursor primitive already takes pixels.
- Backwards compatibility: peers running the previous protocol
  don't recognize MotionAbsolute and skip it via the forward-
  compat decode-tolerance fix from earlier in this branch. Old
  hosts paired with new guests fall through to the entry-edge
  midpoint (current behavior); new hosts paired with old guests
  ignore MotionAbsolute and the cursor stays at the edge midpoint
  too — neither pair regresses.

Capture backend coverage in this commit: macOS only (the
CGEventTap callback has cg_ev.location() at the moment of edge
crossing). Other backends (libei, x11, layer_shell, windows,
dummy) emit Begin { cursor: None } and don't send MotionAbsolute,
so the guest falls back to the midpoint warp on Enter. Adding
cursor-position reporting to those backends is a per-backend
follow-up.

InputCapture trait grew display_bounds() (default impl returns
None; macOS implements via CGDisplay::active_displays) and a
peer_warp_target(pos, cursor) helper that combines the host's
own bounds, the cached peer bounds, and the cursor position into
a target point on the peer's screen. peer_warp_target returns
None when either bounds is unavailable, in which case the capture
loop just doesn't emit MotionAbsolute.
The cross-axis cursor preservation introduced in 6c1bd88 was macOS-only;
the layer-shell capture backend (Wayland/Hyprland and similar wlroots
compositors) emitted Begin { cursor: None }, so transitions where Linux
was the host fell back to the entry-edge midpoint warp on the guest —
the same 300–400 px Y-jump the macOS path was fixed to avoid.

Read surface_x / surface_y from wl_pointer::Enter and translate to
compositor screen-space using the layer-surface's anchor edge: surfaces
here are 1 px on the on-axis dimension and span the cross-axis, so the
surface-local cross-axis coord is the screen offset directly. To support
multi-output setups, store the output's compositor position+size on the
Window when it's created, and add a display_bounds() override that
returns the union rectangle of all active outputs (mirrors the macOS
impl so MotionAbsolute scaling stays consistent).

Effect: Linux→peer transitions where Linux is the source now preserve
cross-axis cursor position the same way macOS→peer transitions already
do.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Counterpart to 6c1bd88's Enter-time cross-axis preservation. When the
host releases capture (release-bind chord, auto-release threshold, peer
destroyed), the visible cursor reappears at whatever point capture
started — typically the entry-edge midpoint or wherever the guest
chose to warp to. The user perceives this as a 100–400 px Y-jump even
though Mac→Linux→Mac round-trip "should" feel continuous, because
nothing in the release path tells the host where the guest's cursor
visually was at the moment of release.

Track a virtual_cursor (f64, f64) in the wrapper that mirrors the
guest's screen-space cursor: seeded on Begin from the
peer_warp_target / entry-edge midpoint (whatever the guest will
actually do on Enter), accumulated against every Motion event we
forward, clamped to peer bounds. On release, project it back to host
screen-space with host_warp_target_on_release — symmetric inverse of
peer_warp_target — and pass that as a new Option<(i32, i32)>
parameter on the Capture::release trait method. macOS threads the
target through ProducerEvent::Release and warps before show_cursor()
so the visible cursor reappears at the matching host point. Other
backends ignore the parameter (they don't hide/manage the system
cursor on the way out).

This is a no-op when peer_bounds or display_bounds is unavailable —
fallback is the previous behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-sufficient counterpart to MotionAbsolute. Carries the host's
cursor as a normalized fraction (0..1) of the host's own screen
plus the entry side from the receiver's frame. The receiver
scales nx/ny against its own display bounds and pins the on-axis
dimension to the matching edge.

The point: MotionAbsolute requires the host to know the peer's
geometry (cached via a prior `Bounds` event), which doesn't exist
on the very first crossing — `Bounds` is only sent in response to
`Enter`, so the host can't include MotionAbsolute on the same
crossing that asks for the bounds it needs. CursorPos sidesteps
the round-trip dependency entirely; the receiver does the
scaling locally with its own bounds.

Wire format adds f32 codec impl alongside existing u8/u32/i32/f64.
Old peers don't know the new EventType tag and skip the event via
the proto forward-compat decode-tolerance path; they continue to
warp to the entry-edge midpoint as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Companion to peer_warp_target for the bounds-free CursorPos path.
Normalizes the host's screen-space cursor against the host's own
display bounds — no peer geometry consulted, so a return value
of Some is independent of whether the peer has sent Bounds yet.

The capture loop will emit this fraction as ProtoEvent::CursorPos
right after Enter so the guest can warp on the very first
crossing instead of falling through to the entry-edge midpoint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jondkinney and others added 26 commits May 6, 2026 16:07
When the peer takes over (sends Enter+CursorPos), the host was
also releasing capture and warping its local cursor based on the
last-known peer virtual_cursor. The two warps fired on the same
shared cursor and raced — the host's stale warp frequently won,
clobbering the peer's authoritative proportional landing and
making the cursor appear at whatever position the host *thought*
the peer cursor was, regardless of where the user actually
crossed.

Split the release path: ReleaseForHandover skips the host
warp_target so CursorPos is the only warp on remote-takeover.
The release-bind chord and backend auto-release still go through
the original release_capture path that computes a host warp.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ReleaseNotify wasn't the only source of host warp races. When the
peer's local capture begins, it sends ProtoEvent::Leave to every
incoming connection (service.rs:357), which the recipient's
capture loop handles by calling release_capture — computing a
host warp from stale virtual_cursor and racing against the peer's
upcoming CursorPos warp on the shared cursor.

Route peer-Leave release through release_capture_handover so the
proportional CursorPos warp lands without competition. The rare
case where the peer released without taking over (no Enter/
CursorPos follows) just leaves our cursor where it was — fine,
since nothing else is moving it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Enter handler unconditionally warped the local cursor to the
midpoint of the entry edge, intending to seat virtual_pos=0 at
column 0 before the host's stream of relative motion arrived.
But the host now sends CursorPos right after Enter, which carries
the proportional landing point AND pins the on-axis dimension to
the matching edge — making the midpoint warp redundant.

Worse, the midpoint warp races against fast handovers: when the
user crosses, then crosses back within ~100ms, the local
CGEventTap (or layer-shell equivalent) reads the cursor's
location field at the new crossing while the cursor is still
sitting at the midpoint from the previous Enter — never
advancing to the proportional CursorPos warp that would have
followed. The opposite-direction CursorPos then encodes
ny=0.500 ("middle of source") and the receiver dutifully warps
its cursor to its own middle, producing the persistent
"always lands in the middle" symptom even after suppressing the
host-warp races on both sides.

Trust the host: if it can compute a proportional point (which it
can in every case where Begin.cursor was populated), CursorPos
seats the cursor correctly. If it can't, the cursor stays where
it was — preferable to a forced midpoint that masquerades as a
mid-screen crossing on subsequent re-crosses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wall-press fallback previously fired the moment the cursor pressed
the host-adjacent edge of the peer for `release_threshold_px` worth of
unabsorbed motion — racing the peer's layer-shell `Leave` (the
authoritative handover signal) on every normal cross. In practice the
network round-trip beats 200px of physical motion easily, so layer-shell
won the race and wall-press only visibly fired on the lock screen where
the peer has no layer-shell. The right outcome, by accident.

Make it explicit. When wall_pressure crosses the threshold, set
`wall_press_pending_at` and arm a 150ms timer instead of firing.
`release_no_host_warp` (the path peer-Leave already routes through)
clears the pending flag via `reset_wall_press_state`, so a healthy
handover cancels the deferred AutoRelease before it can fire. The timer
itself is polled in `poll_next` so the deadline elapses even when the
user pinned the cursor at the wall and stopped moving.

Result:
- Normal operation: peer Leave arrives in <50ms → wall-press cancelled,
  no race against the proportional CursorPos warp the handover path
  uses to position the host's cursor.
- Lock screen / dead peer / network down: no Leave arrives → 150ms
  past threshold → fire AutoRelease as the original fallback intended.

Costs +150ms of latency to the genuine fallback case (lock screen),
which is imperceptible on top of the 200px of cursor "stickiness" the
user already sees while the threshold accumulates.

Also retreating into the interior now cancels a pending fire — a brief
bump against the wall followed by motion deeper into the guest no
longer leaves a primed timer waiting to misfire.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Instant value was stored but never read — only `is_some()` /
`is_none()` / `take()`. `tokio::time::Instant::now()` already
gives us the deadline base for the timer reset, so the std::time
import drops too. No behavior change.
Wayland's compositor revokes input on layer-shell surfaces while the
screen is locked, so Linux-as-host gets this behavior for free. macOS
and Windows do not — CGEventTap and WH_MOUSE_LL hooks both keep
firing under the lock screen — leaving a half-broken state where the
mouse can move to the peer but the keyboard can't follow (the lock
screen consumes keys before any tap/hook sees them).

Match Wayland's behavior on the other two platforms by detecting
lock state and gating barrier crossings on it.

macOS:
- Register CFNotificationCenter distributed-notification observers
  for `com.apple.screenIsLocked` / `com.apple.screenIsUnlocked` on
  the same CFRunLoop thread that hosts the event tap.
- Add `host_locked: bool` to InputCaptureState; the lock callback
  flips it via blocking_lock and synthesizes `AutoRelease` upward
  via the event channel if a capture was already in flight.
- Gate the cross-detection branch in event_tap_callback on
  `!state.host_locked`. The mutex serializes against the callback so
  events delivered after the lock-state flip see the new value.

Windows:
- Add `Win32_System_RemoteDesktop` to the `windows` crate features
  for `WTSRegisterSessionNotification` / `WTSUnRegisterSessionNotification`.
- Register the existing message-only window for
  `NOTIFY_FOR_THIS_SESSION` so it receives `WM_WTSSESSION_CHANGE`.
- Add `HOST_LOCKED: Cell<bool>` thread-local; window_proc updates
  it on `WTS_SESSION_LOCK` / `WTS_SESSION_UNLOCK` and synthesizes
  `AutoRelease` via the event channel if a capture was active.
- Gate the cross-detection in `check_client_activation` on
  `!HOST_LOCKED.get()`.

Linux X11 backend is currently `NotImplemented` so there's nothing
to gate; whoever wires up the X11 capture path can add the same
check using their preferred lock-state source (D-Bus
org.freedesktop.ScreenSaver, xss XScreenSaverQueryInfo, etc).

Known limitation: distributed notifications / WM_WTSSESSION_CHANGE
fire only on transitions — if the daemon starts while the host is
already locked, host_locked stays false until the next lock cycle.
Acceptable for now since the daemon normally starts before lock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous attempt to gate cursor crossings while the host's
screen is locked tried `CFNotificationCenterGetDistributedCenter`
for `com.apple.screenIsLocked` / `Unlocked`. Empirically, the
callback never fires when the daemon is non-Cocoa: the distnoted
mach port is attached to the main thread's CFRunLoop regardless
of which thread called AddObserver, and lan-mouse's main thread
runs the GLib main loop instead of a CFRunLoop, so the port is
never serviced. A dedicated worker thread with its own CFRunLoop
doesn't help (port still attaches to main). `notify_register_check`
against the same names is also a dead end — `loginwindow` doesn't
post on notify(3) for these keys (verified with `notifyutil -1`).

Replace the entire observer machinery with a direct poll of
`CGSessionCopyCurrentDictionary["CGSSessionScreenIsLocked"]` on
each `MouseMoved` event in the tap callback. ~10-50us per call
(XPC to WindowServer); negligible at typical mouse rates. On the
unlocked → locked transition, synthesize an `AutoRelease` so the
cursor returns to the host. On Sequoia 15+ the key is absent
(not `kCFBooleanFalse`) when unlocked — treat missing-or-nil as
unlocked.

Verified: with macOS as host and Linux as guest, locking the Mac
prevents the cursor from crossing to the Linux peer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously polled CGSession on every MouseMoved tap callback (~1000Hz
worst case = 1-5% CPU on the XPC). The only forwarding decision that
actually consults the lock state is the cross-detection commit point
inside `state.crossed(cg_ev)` returning Some — fire-once-per-cross,
not fire-once-per-twitch. Move the `is_screen_locked()` call there
and drop the per-event polling, the `host_locked` cached field, and
the transition-detection logic.

Tradeoff: mid-capture lock (cursor on peer when Mac auto-locks via
idle timeout) no longer auto-releases the cursor back to the Mac.
The user can release-bind (Ctrl+Shift+Cmd+Alt) to bring the cursor
back. Acceptable: cursor stuck on peer while screen locked is mildly
annoying, not dangerous; auto-lock-during-capture is rare in practice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply rustfmt to the host-lock suppression code (input-capture macOS
+ Windows event_thread).
The DTLS recv loops in src/listen.rs and src/connect.rs each read
one full datagram per call. A failed `try_into::<ProtoEvent>()`
means the datagram's leading EventType byte didn't match any
known variant — a misalignment is impossible because DTLS is
message-framed, not stream-framed.

Previously, src/listen.rs would `break` out of the loop on parse
failure (tearing down the connection) and src/connect.rs would
silently swallow the error with no log. Both are wrong as
forward-compat behavior: any future protocol addition (e.g. a
new event variant) would force every existing peer to disconnect
rather than gracefully ignoring the unknown event.

Skip-and-continue on both sides, with a debug-level log so the
behavior is observable. Pre-requisite for any future ProtoEvent
variant to land without forcing a coordinated upgrade across
every peer in a deployment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a one-shot Hello message to the lan-mouse wire protocol so each
peer can display the other end's build commit hash and warn on
version mismatch. Soft-warn only — mismatched versions never refuse
traffic.

Wire change (lan-mouse-proto)
* `ProtoEvent::Hello { commit: [u8; 8] }` carries the 8-byte ASCII
  short commit from shadow_rs's `SHORT_COMMIT`. Encoded/decoded
  alongside the existing event variants.
* `EventType::Hello` is appended to the enum so existing IDs are
  untouched. Old peers receive the event, hit `InvalidEventId`, and
  silently skip it via the forward-compat handler in
  `connect.rs::receive_loop` — the connection is unaffected.

Daemon
* Connect side sends one Hello immediately after the DTLS handshake
  authenticates and before the ping_pong loop starts. Best-effort,
  fire-and-forget — `log::debug!` on send error.
* Listen side mirrors the peer's Hello with its own (same shape as
  the existing Ping → Pong reply), so the peer's connect-side
  receive_loop populates `ClientState::peer_commit` for that
  handle.
* The disconnect path clears `peer_commit` so a stale hash isn't
  shown after the connection drops.

IPC
* `ClientState::peer_commit: Option<[u8; 8]>`. `None` means the
  peer hasn't sent Hello yet — either fresh connection or older
  build that predates the event.

GTK
* `ClientObject` exposes `peer-commit` as an `Option<String>`
  property; `peer_commit_to_string` converts the wire `[u8; 8]` to
  the displayable hex.
* `lan_mouse_gtk::run` now takes the local commit and stashes it in
  a `OnceLock` so per-row UI can compare against each peer's hash.
* `ClientRow::refresh_version_status` re-renders the collapsed
  subtitle with Pango markup whenever the property changes:
   - matched → green   "peer version: <hex> · matched"
   - mismatch → orange "peer version: <hex> · ours: <hex>"
   - unknown → orange  "peer version: unknown · ours: <hex>"
* Window invokes `refresh_version_status` from
  `update_client_state` after writing the new property, and
  `bind` calls it once on row construction so the initial
  subtitle isn't blank.

Known limitation: state-change broadcasts from the network side
(set_alive / set_active_addr / set_peer_commit) don't currently
trigger a `FrontendEvent::State` directly; the UI picks up the
latest values on the next user-driven broadcast. Same pre-existing
behavior as the alive/active_addr fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These are user-visible labels in the version-status subtitle, so
sentence-case reads better than the lowercase first-pass. "matched"
stays lowercase since it's a status descriptor, not a label.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the Hello handler in `ListenTask` echoed our local commit
back but deliberately threw away the peer's, on the assumption that
the outgoing connect-side path (`connect.rs:278-279` →
`set_peer_commit`) would always populate the visible state for any
bidirectionally-configured peer.

That assumption breaks any time the *outgoing* TCP/DTLS direction is
broken even though the inbound direction is fine — happened just now
when the peer Mac's daemon stopped listening on 4242 (DHCP-renewed
IP, daemon crashed, asymmetric NAT, …). Mac was still happily
connecting in the other direction and sending events, including the
initial Hello, but Linux silently displayed "peer version unknown"
because the listen side dropped Mac's commit on the floor.

Add a `PeerHello { addr, commit }` EmulationEvent variant fired from
the listen-side Hello handler. The service maps `addr → ClientHandle`
via `client_manager.get_client(addr)` and calls `set_peer_commit` +
`broadcast_client` exactly like the connect path does. The connect
path remains the primary source for symmetric setups; this is the
defensive fallback so version visibility doesn't depend on outbound
reachability.

Skips silently when no outgoing client is configured for the peer's
addr (incoming-only setup) — there's no UI row to update in that
case anyway.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Lan Mouse window previously couldn't scroll its preference
groups when the window height was reduced below the natural
content height — content was simply clipped, with no way to
reach the lower groups. AdwStatusPage doesn't include built-in
scrolling.

Wrap the AdwStatusPage in a GtkScrolledWindow inside the
existing AdwToastOverlay, with vertical scroll on demand and
horizontal scroll disabled (we use AdwClamp for horizontal
sizing). propagate-natural-height keeps the window's preferred
size identical when content fits, so existing layout behavior
on tall windows is unchanged.

Effect: when the user resizes the window shorter than the
natural content height (or has a small display), all preference
groups remain reachable via vertical scroll.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hickory_resolver's TokioResolver only consults /etc/resolv.conf and
queries upstream DNS servers — which means it can't see /etc/hosts,
mDNS (Avahi/Bonjour), NetBIOS, or anything else in the system's full
name-resolution stack. On a typical home LAN there's no DNS server
that knows about peer machine names, so users had to fall back to
typing IP addresses, which broke the moment they moved their setup
to a different network.

Swap to tokio::net::lookup_host, which calls getaddrinfo (or
GetAddrInfoEx on Windows). That walks /etc/nsswitch.conf on Linux
(picking up Avahi-resolved .local names, /etc/hosts, and DNS), uses
Bonjour for .local on macOS, and the full Windows resolver on
Windows. A Bonjour hostname like "JKMBP-M4-Max.local" now resolves
on every modern network without explicit configuration; the user
can carry their two machines between LANs and the connection still
finds them.

Drop the hickory-resolver dependency entirely — it's no longer
needed. ServiceError::Dns also goes away; lookup failures surface as
io::Error which is already covered by ServiceError::Io.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a host has two interfaces on the same subnet (e.g. macOS with
Wi-Fi en0 and a USB-C dock en7 both on 192.168.1.0/24), a single
0.0.0.0:port DTLS listener silently breaks for peers that dial the
non-routed IP: the kernel sources its reply from the routing
table's preferred interface, so the reply's src-IP doesn't match
the 4-tuple the peer expects, and webrtc-dtls drops the packet.

Replace the single 0.0.0.0 bind with one Listener per local IPv4
address (loopback + link-local skipped). Each listener's reply
socket is bound to a specific IP, so the kernel uses *that* IP as
source — symmetric replies guaranteed regardless of the routing
table.

A supervisor task watches if-watch (Network.framework on macOS,
netlink on Linux) for interface up/down events and adds/drops
listener slots dynamically: plugging a dock or toggling Wi-Fi no
longer requires a lan-mouse restart. Port-change rebuilds all
slots together.

Falls back to a single 0.0.0.0 bind only if interface enumeration
or every per-IP bind fails — preserves single-NIC behavior and
ensures we never silently fail to listen.

Removes the previous user-facing workaround of forcing
`ips = ["192.168.1.88"]` on the peer; with this change `ips = []`
+ hostname resolution Just Works on multi-homed hosts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The if-watch supervisor added in 2c7ce2e already had a Down-event
handler — but `if_watch` on macOS uses Network.framework, which
doesn't reliably fire `IfEvent::Down` when an interface is
administratively disabled (e.g. the user toggles Wi-Fi off in System
Settings). The Up event is reliable; the Down event is not.

Result: when the user toggled Wi-Fi off mid-session, the Wi-Fi IP's
listener slot stayed live in the HashMap, bound to a vanished IP.
Harmless in isolation (no traffic can reach an unbound IP), but it
defeats 2c7ce2e's "no restart needed when interface state changes"
promise — the user has to restart lan-mouse to clean up.

Add a 30-second polling reconciliation arm to the supervisor's
select! loop:
- Enumerate currently-present IPv4 addresses (same logic as startup
  via `enumerate_listenable_ipv4`).
- Diff against the listeners HashMap. Drop slots whose IP is no
  longer present (catches missed Down events). Add slots for new
  IPs that appeared without an Up event (defensive, symmetric).

Polling cost is negligible (`getifaddrs` is a syscall) and the
30-second cadence is fast enough that "I just toggled Wi-Fi" feels
prompt without spamming.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`if !listeners.contains_key(&ip) { ... insert(ip, ...) }` plus
`#[allow(clippy::map_entry)]` works but is uglier than just using
the `Entry::Vacant` slot up front. The Vacant arm handles both the
existence check and the subsequent insert in a single hash lookup —
which is the exact rewrite clippy was suggesting, just expressed
without forcing `or_insert_with` (which doesn't fit because
`try_bind_listener` is async + fallible).

Brings combined branch in line with the equivalent fix on the
split stack so both express the same behavior in the same shape.
Two small cleanups on the reconcile loop:

- The `if listeners.remove(&ip).is_some()` check was redundant —
  `to_drop` is collected from `listeners.keys()` and we run
  single-threaded, so `remove()` is guaranteed to return Some.

- `reconcile_tick` (30s) now uses `MissedTickBehavior::Skip`. The
  default `Burst` would replay backlog ticks back-to-back when
  resuming from a long suspend (laptop closed for hours), each
  triggering a redundant interface enumeration. `Skip` collapses
  the backlog to a single fire on resume.
…ling

Adds Bonjour service registration + browsing under
`_lan-mouse._udp.local.`. Each instance's TXT record carries a
`primary=<ipv4>` field whose value is the IP of the interface that
owns the default route — which on macOS reflects the user's
service-order ranking, on Linux the lowest-metric default route, on
Windows the route GetBestRoute2 selects.

The dialer reads peer announcements via a continuous browse and
caches `peer_hostname → primary_ipv4` in a `Rc<RefCell<HashMap>>`
shared with `LanMouseConnection`. On each connection attempt,
`connect_to_handle` looks up the peer's hostname and (when found)
hands the resulting `SocketAddr` to `connect_any` as a "preferred"
address that gets a 200ms head start over the rest of the candidate
list — modeled on RFC 8305 happy-eyeballs. A healthy preferred path
virtually always wins; a broken one only delays connect by 200ms
before the rest of the IPs join the race.

Subsystem is gated by a new config flag `mdns_discovery` (default
true). Toggling off unregisters our service, aborts the browse task,
and shuts the daemon, but preserves the `primary_cache` so any
already-known hints stay queryable until overwritten on re-enable.
Useful on networks where mDNS multicast (224.0.0.251) is firewalled.

GUI exposes the toggle as a new "Network Discovery" preferences
group with an mDNS Discovery switch row, mirroring the existing
natural-scroll switch's plumbing (block/unblock signal handler when
the daemon pushes the value via Sync, etc).

Service-side wiring:
- Service owns Discovery; clones the shared cache into
  LanMouseConnection on construction.
- A 30-second `tokio::time::interval` calls `Discovery::refresh()`
  so the TXT record stays accurate when the OS-preferred interface
  changes (e.g. user toggles Wi-Fi off and Ethernet takes over).
- Port-change events forward through `Discovery::set_port` so the
  re-published TXT/SRV records reflect the new listen port.

New deps: `mdns-sd = "0.19"` (cross-platform mDNS responder, doesn't
piggyback on system Avahi/Bonjour), `netdev = "0.43"` for default-
route lookup, `hostname = "0.4"` for the local hostname.

Falls back gracefully when:
- `ServiceDaemon::new` fails (multicast group locked / no perms)
- No interface owns the default route
- Peer isn't announcing (old version or discovery disabled there)
…in all cases the dialer just returns `preferred = None` and the
existing connect_any race runs unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…re bypass

Previously each call into LanMouseConnection::send spawned a fresh
connect_to_handle, even when the prior attempt had failed because the
peer was unreachable. With ips=[] in the client config and the peer
offline, this produced dozens of attempts per second — every mouse
event near the boundary triggered another DNS lookup and another round
of "client (N) connecting ... (ips: [], preferred: None)" log spam.
Hours of that may have contributed to mDNS state corruption observed
when the peer eventually returned to the LAN.

Gate the spawn at the call site:

  - Per-handle RetryState tracks next_attempt_at + a doubling backoff
    capped at 30s.
  - signature_of(ips, primary_hint) hashes the candidate set; when the
    signature changes between attempts (mDNS browse populates a primary,
    DNS resolves new IPs) the gate is bypassed and the next send tries
    immediately.
  - Successful connect drops the retry entry; failure (or empty
    candidate set) records a new backoff floor.

Net effect: no retry storm during outages, and a peer reappearing via
mDNS reconnects on the very next mouse event without waiting on the
backoff to expire.
The dialer's `primary_hints` lookup keys on the configured `hostname`
("JKMBP-M4-Max.local"), but the cache was being populated with the
SRV target hostname returned by `ServiceInfo::get_hostname()`. macOS
will sometimes appear in mDNS-SD with a suffixed system hostname
("JKMBP-M4-Max-2.local") for the SRV record while the service-instance
label keeps the user-visible identifier ("JKMBP-M4-Max.local") — those
two names are advertised together but mdns-sd resolves only one
SRV target into the event, so the cache key drifted to a name the
config never references and `preferred` came back None.

Switch the cache key to the service-instance label, parsed off the
fullname's `.<SERVICE_TYPE>` suffix. The label is what users put in
their config (the announcer derives it from the same `local_hostname()`
on registration) and it's stable across SRV-target variations.

Log line now shows both fields so future hostname/target mismatches
are visible without a packet capture:
  mdns: peer instance=jkmbp-m4-max.local (target=jkmbp-m4-max-2.local) ...
Discovery now caches by service-instance label, but the announcer's
choice of label is platform-dependent: macOS's `hostname::get()`
returns the FQDN (`Foo.local`) while Linux's returns the short name
(`omarchy`). Without normalization this works asymmetrically — a
config of `omarchy.local` for a Linux peer wouldn't match the cached
`omarchy` key.

Add `normalize_mdns_name` (lower-case, drop trailing `.`, drop
`.local` suffix) and apply it on both insert (start_browse) and
lookup (`peer_primary_ip`, `should_attempt`, `connect_to_handle`).
The `.local` domain is implied for everything mDNS-SD touches, so
collapsing it on both sides is lossless and matches how `dns-sd`
and Bonjour APIs treat instance labels in their wire form.
- src/connect.rs: insert blank line in `should_attempt` doc-comment
  before the "Otherwise returns false" continuation. Clippy's
  `doc_lazy_continuation` (Rust 1.94+) treats text immediately after
  a list item without a blank line or indent as continuing the last
  bullet, which it isn't.
- src/discovery.rs: cargo fmt collapsed a wrapped `let instance = …`
  line onto one line at column-fit.

No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- discovery.rs: drop the unused `peer_primary_ip` method. It was
  kept as a "canonical lookup entry point" with `#[allow(dead_code)]`,
  but the dialer reads `primary_cache` directly via the shared
  `Rc<RefCell>` in `connect.rs` and nothing else calls it.

- discovery.rs: fix the `refresh()` doc — said "call from the
  if-watch supervisor" but it's actually driven by the service's
  periodic tick.

- service.rs: set `MissedTickBehavior::Skip` on
  `discovery_refresh_tick` (30s). The default `Burst` would
  replay backlog ticks back-to-back when resuming from a long
  suspend, each triggering a redundant interface enumeration and
  TXT republish.
@jondkinney jondkinney closed this May 6, 2026
@jondkinney jondkinney deleted the split/04-mdns branch May 6, 2026 21:33
@jondkinney
Copy link
Copy Markdown
Contributor Author

Superseded by #433 (same content; the branch was renumbered as part of restructuring the stack from 7 PRs into 9, which auto-closed this cross-fork PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants