Skip to content

Garmr/radiance daemon refactor and clean-up#8578

Merged
myleshorton merged 106 commits intomainfrom
garmr/radiance-daemon-refactor
Apr 28, 2026
Merged

Garmr/radiance daemon refactor and clean-up#8578
myleshorton merged 106 commits intomainfrom
garmr/radiance-daemon-refactor

Conversation

@garmr-ulfr
Copy link
Copy Markdown
Contributor

@garmr-ulfr garmr-ulfr commented Mar 25, 2026

Depends on

  • restructure codebase around LocalBackend with clear data ownership radiance#370 (restructure codebase around LocalBackend with clear data ownership) — this PR migrates lantern onto the new ipc.Client architecture introduced there. Reviewing or merging this PR without radiance#370 in place will not make sense. For AI code review (/ultrareview etc.), run the radiance review separately — cross-repo dependencies are not followed automatically.

Summary

  • Migrate lantern to radiance's new IPC Client architecture: All client-related functionality now goes through radiance/ipc.Client instead of directly calling radiance internals (radiance.Radiance, vpn.* package-level functions, api.APIClient, etc.)
  • Rewrite lantern-core/core.go: LanternCore now wraps *ipc.Client with platform-specific initialization via build tags (init_mobile.go for iOS/Android/macOS, init_desktop.go for Linux). All VPN, account, server, split tunnel, and settings operations delegate to the IPC client.
  • Update iOS/macOS network extensions: Start the IPC server (MobileStartIPCServer) before VPN operations in startTunnel, and close it (MobileCloseIPCServer) in stopTunnel.
  • Update Linux packaging: systemd service management updated for the lanternd daemon; package scripts use lanternd install/lanternd uninstall.
  • Adopt radiance's lanternd for Windows service management: lanternd self-installs via lanternd install, handling service setup and lifecycle. This removes the need for the separate lanternsvc binary, wintunmgr pipe server, Dart pipe client, and PowerShell installer scripts.
  • Migrate Flutter from cached app settings to on-demand radiance queries: Settings previously cached locally in Flutter (smart routing, ad block, split tunneling, etc.) are now fetched from radiance on demand via FFI, eliminating stale-state bugs and the AppSettingNotifier class.
  • Flatten server model to match radiance: The Server / ServerGroup hierarchy is replaced with a flat Server model matching radiance's representation, simplifying server selection and available-server plumbing across Go and Dart.
  • Unify CGo safety with RunOffCgoStack: All gomobile-exported functions in mobile.go and all FFI-exported functions in ffi.go now run on real Go goroutines via utils.RunOffCgoStack, preventing bulkBarrierPreWrite panics from CGo callback stacks.

closes getlantern/engineering/issues/3047

@garmr-ulfr garmr-ulfr changed the title Garmr/radiance daemon refactor Garmr/radiance daemon refactor and clean-up Apr 6, 2026
Server tags are determined by URL content, not caller-supplied names.
addServerBasedOnURLs now returns the tags of added servers so callers
can connect using the actual tag. Also sends VPN status updates from
connectToServer on Linux so the UI reflects connection state changes.
jigar-f and others added 16 commits April 23, 2026 21:46
* android: make restartService block until restart completes

Two bugs in the platformIfce restart path that together let the tunnel
wedge in Restarting forever on Android, triggering the "Error in VPN
operation" on every subsequent Connect attempt
(getlantern/engineering#3297, Freshdesk #173681).

1. restartService() used serviceScope.launch { ... } and returned
   immediately. Radiance's Restart() treats the sync return as "restart
   succeeded" and leaves the tunnel at status=Restarting, expecting the
   platform coroutine to drive it through stopVPN → startVPN and
   transition status via Mobile.* side-effects. If the service is torn
   down before the coroutine completes (onDestroy, process pressure),
   nothing ever transitions the tunnel out of Restarting.

   Switch to runBlocking(Dispatchers.IO) so the return actually
   reflects completion. c.mu is released on the Go side before
   RestartService is invoked, so synchronous Mobile.* callbacks on
   this thread don't deadlock.

2. stopVPNTunnel() skipped Mobile.stopVPN() when Mobile.isVPNConnected()
   returned false. isVPNConnected is status == Connected — but at the
   point stopVPNTunnel is called from restartService, radiance has
   already set status=Restarting, so the guard always skips and the
   tunnel is never actually closed.

   Swap the guard for Mobile.isRadianceConnected() — i.e. only skip
   when the IPC server itself isn't up. Mobile.stopVPN() is a no-op
   when c.tunnel is nil on the Go side, so the original guard was
   redundant even for the Connected == true case.

Evidence from Freshdesk #173681 logs for the broken path:
- 15:17:34.826 Restart → 15:17:34.828 "Tunnel restarted successfully"
  (2ms total — consistent with fire-and-forget, not real teardown)
- No subsequent tunnel.init / Tunnel connection established
- 15:19:10 onDestroy logs "Skipping stopVPN — VPN tunnel was never
  started" (same isVPNConnected() check)
- 15:21:48 next Connect fails within 2ms of the IPC request with
  "tunnel is currently Restarting"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* android: drop isVPNConnected guard in onDestroy too

Same shape as the restart-path fix: if c.tunnel is non-nil on the Go
side but the tunnel status is anything other than Connected (Restarting
after a failed restart, Connecting mid-startup, Error from a prior
failure), isVPNConnected() returns false and the old guard skipped
Mobile.stopVPN(). That left the radiance tunnel state dangling across
service destroy.

Observed in Freshdesk #173681: "onDestroy — radianceConnected=true
vpnConnected=false, Skipping stopVPN — VPN tunnel was never started"
while the tunnel was actually alive at status=Restarting.

Swap the second guard for an unconditional call. Mobile.stopVPN() is a
no-op when c.tunnel is nil, so the guard was always redundant — it just
happened to also hide the non-Connected-but-non-nil case that's
load-bearing during restart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* android: verify restart postcondition before returning to Go

launchVPN wraps its body in runCatching { ... }.onFailure { ... } and
returns normally regardless of whether Mobile.startVPN() threw — so a
nil return from startVPN() does not mean the restart succeeded. Without
a postcondition check, restartService would log "completed" and return
to radiance as if everything worked, even though the tunnel is still
stuck in Restarting, which defeats the whole point of making this
function block.

Check Mobile.isVPNConnected() at the end of the runBlocking block and
throw IllegalStateException if false. The exception propagates through
runBlocking → restartService → radiance's platformIfce.RestartService()
as a non-nil error, so Restart() hits the ErrorStatus branch and the
caller sees the failure.

Addresses Copilot review feedback on PR #8697.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Adam Fisk <afisk@mini.local>
The PacketTunnelExtension hosts the IPC server, so cancelTunnelWithError
tears down the daemon along with the tunnel. Inline MobileStartVPN in
restartService so a failed restart leaves the extension (and IPC socket)
alive; radiance's status events surface the failure for retry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: atavism <atavism@users.noreply.github.com>
* main: don't block first paint on Updater.init()

Moving Updater.init() off the critical path to runApp. Investigating a
one-shot black-screen-on-startup report on a local macOS dev build
(9.0.29 build 487): flutter.log stopped at the last pre-runApp log line
with no Dart exception and no crash, while the Go side kept running
normally. The only awaited call between that last log and runApp is
Updater.init().

Inside init(), the actual update check is already deferred 45 s via
Future.delayed + unawaited. But setFeedURL and setScheduledCheckInterval
are awaited — both bridge into Sparkle via the auto_updater Flutter
plugin, and both can stall on first launch: feed URL resolution,
keychain access, or a previous launch's background worker still holding
a lock. Any of those becomes a main-isolate hang that prevents runApp,
which exactly matches the observed symptom.

Fix: drop the await so Updater.init() runs concurrently with the rest
of startup. All errors are already handled inside init() itself, so
unawaited is safe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* review: guard sl<Updater>() lookup against failed service injection

Copilot flagged that if injectServices() throws above (caught at
main.dart:45), Updater is never registered (it's registered at
injection_container.dart:40, after storage init), and sl<Updater>()
throws synchronously. unawaited() doesn't help — the throw happens
before the Future is constructed, so it propagates out of main and
prevents runApp.

Wrap the call in try/catch + sl.isRegistered<Updater>() so any failure
to look up or start Updater.init logs and continues to runApp.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the FFI path to radiance's ipc.Client.TailLogs and merges in-app
flutter.log records so the diagnostic logs view shows both sources.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks up:
- refactor(vpn): own VPN status on the client so restarts span tunnels
- vpn: instrument tunnel.start phases + VPNClient.Restart (#443)

The VPN-status-ownership refactor moves setStatus calls out of
tunnel and onto VPNClient so a restart transitions Restarting →
Disconnecting → Disconnected → Connecting → Connected cleanly.

The instrumentation PR adds child spans around libbox.Setup,
libbox.NewServiceWithContext, libbox.BoxService.Start, and
newMutableGroupManager so SigNoz can attribute the 10s+ tail
on /service/start observed in Freshdesk #173696.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nnel (#8702)

* lantern-core: dispatch ConnectVPN to SelectServer on live tunnel

When the Flutter UI triggers an auto-select on a live tunnel — most
visibly Jigar's rewrite of onSmartLocation (server_selection.dart), which
routes "switch back to Smart" through startVPN(force: true) → Dart
lantern.startVPN() → ffi.go:startVPN → c.ConnectVPN("") — radiance's
/vpn/connect endpoint rejects the request with ErrTunnelAlreadyConnected
(radiance/vpn/vpn.go:126 in VPNClient.Connect). The error is returned to
the Dart UI as a snackbar, the tunnel stays pinned to the previously
selected manual server, and lantern.log is silent because neither
LocalBackend.ConnectVPN nor VPNClient.Connect slog the ErrTunnelAlready
Connected path.

Observed on 9.0.30 beta (internal tester, Freshdesk #173763, build from
commit 4054689 which includes Jigar's 2895072). After manually
picking Bogotá, clicking "Smart" at the top of the server-selection
screen surfaces the snackbar and the tunnel keeps routing traffic
through the Bogotá samizdat outbound.

Fix: when Status() == Connected, LanternCore.ConnectVPN dispatches the
request to /server/selected (the live-tunnel outbound swap) instead of
/vpn/connect. Empty tag normalizes to vpn.AutoSelectTag — Dart sends ""
for Smart, radiance recognizes only the literal "auto" and otherwise
falls into the manual-outbound branch of SelectServer, stranding Clash
in manual mode with an empty selector. The mapping is centralized in a
small normalizeAutoTag helper used by both ConnectVPN and SelectServer.

This puts the same dispatch logic that lives in ffi.go:connectToServer
onto every caller of LanternCore.ConnectVPN — including ffi.go:startVPN
(which Jigar's rewrite now funnels through) and any future FFI/mobile
entry point.

getlantern/engineering#3291 issue 3. Supersedes earlier work on
fisk/connect-dispatch-select-when-connected (485bf5a), which was
scoped to this same dispatch but predated the current refactor branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* vpn_tunnel: dispatch StartVPN to SelectServer on live tunnel (mobile path)

Mobile.StartVPN (the gomobile entry point for Android MainActivity and
iOS VPNManager) routes through vpn_tunnel.StartVPN(client), which calls
client.ConnectVPN(ctx, vpn.AutoSelectTag) directly — bypassing
lanterncore.Core. Jigar's onSmartLocation rewrite dispatches "switch
back to Smart" through startVPN(force: true), which on Android/iOS
lands here. Same ErrTunnelAlreadyConnected bug as the FFI path fixed in
the previous commit.

Mirror the VPNStatus dispatch pattern garmr already added to
vpn_tunnel.ConnectToServer in 4054689: when Status() == Connected,
swap outbound via /server/selected; otherwise fall through to the
existing /vpn/connect start.

Together with the LanternCore.ConnectVPN dispatch, this closes the
Smart-from-connected bug on every platform (Windows FFI, Android/iOS
gomobile).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ffi: drop now-redundant VPNStatus dispatch in connectToServer

LanternCore.ConnectVPN already routes to /server/selected when the
tunnel is live (added earlier in this PR), so ffi.go:connectToServer's
own VPNStatus check is duplicate work. Collapse to a single c.ConnectVPN
call — both the live-tunnel-swap and fresh-connect paths flow through
the dispatch one layer down.

Behavior unchanged. The "start service failed" error wrapper is kept
for Dart-side snackbar stability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* lantern-core: collapse dispatch to a single implementation in vpn_tunnel

Three functions had independent VPNStatus → SelectServer-vs-ConnectVPN
dispatches after the earlier commits: LanternCore.ConnectVPN,
vpn_tunnel.StartVPN (both added in this PR), and vpn_tunnel.ConnectToServer
(pre-existing from 4054689). Consolidate so vpn_tunnel.ConnectToServer
is the authoritative dispatch and the other two delegate.

- LanternCore.ConnectVPN → vpn_tunnel.ConnectToServer(lc.client, tag)
- vpn_tunnel.StartVPN → ConnectToServer(client, vpn.AutoSelectTag)

LanternCore.SelectServer keeps its own empty-tag normalization since its
scope is the one-shot SelectServer IPC, not the dispatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…089) (#8703)

Patrick's radiance fac9089 ("fix(vpn): treat the empty string as
AutoSelect in SelectServer") is now pinned on this branch via
72a6c62. Radiance normalizes tag == "" → AutoSelectTag on both
ConnectVPN and SelectServer, so the client-side normalizations we
added earlier (normalizeAutoTag helper in core.go, `if tag == ""` in
vpn_tunnel.ConnectToServer) are redundant — radiance handles the Dart
"" convention uniformly.

Remove:
- LanternCore.normalizeAutoTag helper + its use in SelectServer
- `if tag == "" { tag = vpn.AutoSelectTag }` branch in
  vpn_tunnel.ConnectToServer
- lantern-core/core_test.go (only tested the removed helper)

Behavior unchanged end-to-end: empty tag still means auto-select on
every path (FFI, gomobile, connectToServer, startVPN).

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Server empty-tag fix (#8705)

radiance@d5a1872 completes fac9089's empty-string → AutoSelectTag
normalization by extending it to LocalBackend.SelectServer, which
previously only matched the literal "auto" and fell through to the
srvManager lookup for tag == "" — producing "no server found with tag"
(HTTP 500, snackbar) on Smart-from-connected flows after the client-
side normalization was removed in this branch's 6de3c9a.

Reported on Lantern 9.0.30 beta via Freshdesk #173773.

go.mod + go.sum bump only; no lantern code changes. Pinned commit:
getlantern/radiance@d5a18726afbc (#444).

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(logs): stream diagnostic logs via ipc TailLogs on mobile

Adds a mobile gomobile binding for ipc.Client.TailLogs (TailLogs +
LogSubscription) and switches Android and iOS to consume it, replacing
the per-platform log-file tailers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(logs): stream diagnostic logs via ipc TailLogs on macos

Switches the macOS log stream to MobileTailLogs, matching iOS. Removes
the file-watching LogTailer (no remaining callers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(logs): harden TailLogs against nil, panics, and listener leaks

- Reject nil listener in mobile.TailLogs; recover from panics crossing
  the gomobile bridge so the stream survives unexpected bridge errors.
- Retain the Kotlin LogListener in a field so the Go side's reference
  stays strongly rooted on the JVM.
- On iOS/macOS, cancel any pre-existing subscription before starting a
  new one and clear the stored listener when MobileTailLogs errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(logs): share TailLogs plumbing across mobile and ffi

Adds lantern-core/logs.Subscribe wrapping ipc.Client.TailLogs so the
mobile and desktop integrations go through one helper. Drops the iOS
LogTailer dead code and the unused lantern-core/logging package.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Update log formatting

* Fix issue with ios

* Fix macos logs issue

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Jigar-f <jigar@getlantern.org>
myleshorton and others added 8 commits April 27, 2026 15:40
…r-operation timeouts (#8707)

* ffi: skip the daemon-reachability preflight on Windows / macOS / mobile

The 300 ms preflight in lantern-core/core.go's CheckDaemonReachable
was originally tuned for the Linux flow (PR #8494 by atavism, commit
bf054f4), where the failure path falls back to `systemctl is-active
lanternd.service` for a rich diagnostic error. The 300 ms cap made
sense as "fast probe → systemd-rich-error", with the systemd query
adding the actual user-facing context.

Subsequent refactors (commit bd89bea Apr 7, then PR #8578 commit
4d4e06d Apr 16) generalized that preflight to all platforms but
the systemd fallback only survived in ffi_linux.go. On Windows /
macOS / mobile, ffi_nonlinux.go ended up running the same 300 ms
probe with no fallback — just an artificial guillotine in front of
ConnectVPN, which has its own "lanternd not reachable" error path
with equivalent precision.

Cold-start IPC on Windows regularly exceeds 300 ms (named-pipe dial
+ winio impersonation token dance + H2c connection preface +
goroutine scheduling on a 96-second-idle daemon), so the first VPN
toggle after launch reliably trips the timeout and shows the user a
"lanternd not reachable" error. Clicking again 10 seconds later
silently succeeds. Reproduced on the same Windows machine across
9.0.29 (Freshdesk #173696) and 9.0.30 (#173932).

Make the preflight a no-op on non-Linux. Linux keeps the original
fast-probe-then-systemdDiag flow unchanged. If we add Windows
(`sc query LanternSvc`) or macOS (`launchctl list`) diagnostics
later, restore the preflight and call them from here.

See getlantern/engineering#3382 for the full archaeology + design
discussion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ffi + lantern-core: bound IPC calls with per-operation timeouts

Companion to dropping the non-Linux daemon-reachability preflight in this
same PR. The preflight (ffi_nonlinux.go's `checkDaemonReachable`) was
introduced in commit bd89bea along with the *removal* of per-call
timeouts that used to live on the FFI layer:

    -    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    -    if err := c.Client().DisconnectVPN(ctx); err != nil { ... }
    +    if err := c.DisconnectVPN(); err != nil { ... }

After that change, the only IPC call with any deadline at all was the
300 ms preflight. Every other operation flowed lc.ctx (
context.WithCancel(context.Background())) straight through, meaning a
hung lanternd would freeze the UI indefinitely. Dropping the preflight
without restoring per-call timeouts removes the only line of defense.

Restore them at the LanternCore layer where they belong, with values
sized for the inherent work each operation does (state changes can run
into multi-second territory; status queries should be near-instant):

    ipcConnectTimeout     = 60 * time.Second   // ConnectVPN
    ipcStateChangeTimeout = 30 * time.Second   // SelectServer, DisconnectVPN
    ipcStatusTimeout      = 10 * time.Second   // VPNStatus, IsVPNRunning

These bound the worst case (hung daemon → user sees a clear error within
a minute, no indefinite spinner) without firing during normal slow paths.
The dialer's 10 s connect timeout (radiance/ipc/conn_windows.go) already
covers the lanternd-crashed case; these guard the lanternd-hung case.

vpn_tunnel.{StartVPN, StopVPN, ConnectToServer} take the ctx through
their signatures instead of building their own context.Background()
internally, so callers stay in charge of their own deadlines. mobile/
mobile.go updated to set 60 s / 30 s / 60 s contexts on its three
gomobile entry points.

CheckDaemonReachable's 300 ms timeout is kept untouched — Linux still
calls it from ffi_linux.go for the systemctl is-active fallback that's
the whole point of the fast probe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ogging (#8709)

Two narrow fixes that together resolve Freshdesk #173774 / #173778 /
#173826 (Derek's "Failed to fetch installed apps" empty list on Windows
split tunneling). Split out from #8706 so they can land independently
of the broader app-discovery rework that PR also contained.

1. **GetEnabledApps returns []string{} instead of nil.**
   When no apps are split-tunneled, the previous code returned nil,
   which json.Marshal serialized as "null". Dart's jsonDecode("null")
   returns null; the receiving code does `as List`, which throws and
   the UI shows "Failed to fetch installed apps". Initializing as an
   empty slice serializes to "[]" — Dart parses that as an empty list,
   no exception, no error UI. THIS is the actual root cause of the
   empty-list reports we've been chasing; the apps-discovery scanner
   work was investigating a different (also-real but secondary) issue.

2. **UI-process slog wired up via common.Init.**
   On the refactor branch, the UI process never called common.Init.
   slog wrote to stderr (= nowhere on a GUI host), settings were
   uninitialized, no lantern.log was produced outside the daemon.
   Patrick caught this — it was a one-line miss in the refactor.

   Platform-aware so we don't double-init on platforms where the
   backend embeds in-process:
     - windows/linux: full common.Init (separate UI + daemon procs)
     - darwin/ios:    setupAppLogging into a distinct lantern-app.log
                      so the main-app slog doesn't race the tunnel
                      extension's lantern.log on lumberjack rotation
     - android:       Mobile.SetupRadiance already ran common.Init
                      upstream — fall through

3. **Auto-attach UI-process *.log to ReportIssue (windows/linux only).**
   Without it the daemon's archive glob only sees the daemon's logDir;
   UI-side lantern.log + flutter.log never reach the issue bundle. The
   daemon runs as SYSTEM on Windows; we keep UI logDir at
   %PUBLIC%\Lantern\logs so SYSTEM can read it.

The broader Windows app-discovery work from #8706 (App Paths scan, Run
keys, Squirrel pattern, isAppPathsNoise heuristic filters) is being
held in a separate PR for independent review.

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…en't lost (#8711)

On Android the entire app runs in a single process, so once common.Init
runs slog.SetDefault covers everything. But common.Init only runs deep
inside SetupRadiance / StartIPCServer, which LanternVpnService launches
asynchronously from an intent fired by MainActivity.startLanternService.
Any slog call emitted in the gap — including any of the wide MethodHandler
surface that Flutter can reach before the VPN service is up — falls
through to the stdlib default (text → stderr → logcat at INFO), so DEBUG
logs vanish and the format diverges from what we use everywhere else.

Add Mobile.InitLogging as a thin gomobile-exposed wrapper around
common.Init, and call it from MainActivity.configureFlutterEngine before
startLanternService. common.Init is guarded by an atomic.Bool, so the
later call from backend.NewLocalBackend is a no-op.

Mirrors PR #8709 (Windows). Reported on Slack by Jigar.

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-refactor

# Conflicts:
#	lib/core/windows/pipe_client.dart
#	lib/lantern/lantern_ffi_service.dart
#	lib/lantern/lantern_windows_service.dart
PR getlantern/radiance#370 merged the long-lived refactor branch into
radiance main. This branch was previously pinned at the refactor tip
(5643163d8d70); repinning to main (e312570c7aea) so all downstream
work consumes the merged code.

Transitive: getlantern/kindling auto-bumps to 6143132aaf40 to match
the kindling.TransportName + domainfront API radiance now uses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@myleshorton myleshorton merged commit c0871a7 into main Apr 28, 2026
9 checks passed
@myleshorton myleshorton deleted the garmr/radiance-daemon-refactor branch April 28, 2026 18:20
atavism added a commit that referenced this pull request Apr 29, 2026
…ise filter (#8710)

* migrate to new ipc.Client api, first-pass

* pullin a couple fixes, update linux vpn status poller

* start ipc server ios/macos

* update radiance, fix linux daemon build

* start ipc server windows service

* fix datacap stream

* decode user response data to json

* gofmt

* update ipc request path check for linux smoke test

* Fixed issue with user apis

* redo linux packaging changes undone by merge

* move RunOffCgoStack from radiance to here, small cleanup

* fetch radiance-owned settings on demand instead of caching locally

* add missing smart-routing, ad-block, oauth calls

* clean up

* fix ref async issue for IPC calls

* gofmt

* fix test, linux package verification

* update radiance, remove server groups

* fix: return added server tags from AddServersByURL

Server tags are determined by URL content, not caller-supplied names.
addServerBasedOnURLs now returns the tags of added servers so callers
can connect using the actual tag. Also sends VPN status updates from
connectToServer on Linux so the UI reflects connection state changes.

* wrap ffi calls in runOnGoStack, update win service

* add explicit not linux build tag

* update radiance

* use RADIANCE_REPO in lanternd src

* flatten server model to match radiance, fix tests

* use loopback ipc client for mobile

* update radiance, log service install error in smoke test

* retrieve selected server from radiance instead of cacheing

* stop lantern before unintall, revert accidental service name change

* remove allow override

* fix name reference and misplaced stop call

* fix several issues

* code review

* fix toggles not registering and fetching plans

* always refetch server list when view opens

* fix crash in server select screen

* fix split tunnel website view not loading websites

* sync vpn status from system on launch

* fix stale onboarding marker persisting reinstall

* Revert "fix stale onboarding marker persisting reinstall"

This reverts commit a21a218eac7df90d678ce5d35d27892bbe893da2.

* fix vpn prompt displaying when quiting

* Macos system extension updates #2 (#8637)

* if system extension is in uninstall state do not block new installtion.

* update macos system extension test

* do not cache dart_tool

* Set the default status as unknown.

* code review updates

* Filter system apps from Windows split tunneling (#8641)

* Add split tunneling e2e test

* Fix split tunneling website smoke assertion

* Fix split tunneling smoke navigation

* code review updates

* code review updates

* code review updates

* Filter Windows system apps in split tunneling list

* code review updates

* code review updates

* Update system apps filter

* code review updates

* fix: upload and notify for nightlies even when some platforms fail (#8649)

The upload-s3 and upload-release-artifacts jobs required ALL platform
builds to succeed or be skipped. When a matrix entry failed (e.g.,
Linux arm64), the entire build-linux job reported as 'failure', which
caused both upload jobs to skip entirely — even though macOS, Android,
iOS, and Linux amd64 all succeeded.

Simplify the condition: run uploads if at least one platform build
succeeded. The upload steps already handle missing artifacts gracefully
(upload_if_exists checks for file existence).

This ensures the Slack notification goes out with download links for
whatever platforms did build successfully.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add arch to releases (#8652)

* feat: add arch to releases

* Update linux/packaging/usr/lib/systemd/system/lanternd.service

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* chore: remove committed lanternd.service file

Agent-Logs-Url: https://github.com/getlantern/lantern/sessions/15085485-3c6a-4e1e-93ea-6e9bf0623d09

Co-authored-by: reflog <109876+reflog@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: reflog <109876+reflog@users.noreply.github.com>

* fix issues from 3173

* Refactor and fixed multiple bugs

* cache selected server location locally to avoid UI flash

* Fix tunnel issue in android

* App event issue and auto server location fixes

* added logs

* mobile: return string instead of []byte + update Swift callers (#8663)

* mobile: return string instead of []byte from gomobile-exported funcs

The gomobile wrapper copies Go pointer-containing return values to the C
thread stack using runtime.wbMove. When a GC cycle runs during the copy,
bulkBarrierPreWrite panics because the destination isn't GC-tracked.
Returning string avoids this — gomobile marshals strings via C heap
allocation rather than leaving them as Go slice headers.

See getlantern/engineering#3175 for the full crash analysis (from
Freshdesk #172640 — Derek reporting "Lantern Crash" on macOS 26.3.1).

Go changes:
  AvailableFeatures, UserData, FetchUserData, GetAvailableServers,
  GetSelectedServerJSON, OAuthLoginCallback, AcknowledgeGooglePurchase,
  AcknowledgeApplePurchase, Login, Logout, DeleteAccount

Swift changes (macos + ios): preserve Flutter contract by converting
the string back to Data for methods whose Dart side reads `bytes` via
utf8.decode (getUserData, fetchUserData, oauthLoginCallback, login,
logout, deleteAccount, acknowledgeInAppPurchase). For methods whose Dart
side expects String (featureFlags, getLanternAvailableServers,
getSelectedServerJSON), just pass the gomobile string directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* android: update MethodHandler for string-returning gomobile bindings

The gomobile-exported funcs in lantern-core/mobile/mobile.go now return
string instead of []byte. The generated Android binding will therefore
return String where it used to return ByteArray.

For each affected method, match what the iOS handler does so the Flutter
platform-channel contract stays stable:

  * Methods whose Dart callers expect bytes (Uint8List) — login,
    logout, deleteAccount, userData, fetchUserData, oauthLoginCallback,
    acknowledgeGooglePurchase — convert the String result via
    `.toByteArray(Charsets.UTF_8)` before calling success() (mirrors
    Swift's `.data(using: .utf8)`).

  * Methods whose Dart callers expect a String — availableFeatures,
    getAvailableServers, getSelectedServerJSON — drop the
    `String(byteArray)` constructor and use the return value directly,
    with the same "{}" / "[]" empty-default that iOS uses.

Addresses Copilot review on PR #8663.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* encapsulate ipc.Client behind LanternCore interface

Route all IPC operations through LanternCore methods instead of
exposing Client() to callers. Add GetSelectedServerTag,
GetAutoLocationJSON, CheckDaemonReachable, PatchSettings, and
VPNStatusEvents to the Core interface. Update FFI and mobile layers
to use them, and remove now-unused vpn_tunnel helper functions.

Also includes Flutter-side fixes: device-removal sign-in race
condition, plans fetch retry logic, and private server setup
improvements.

* ios/macos: drop invalid optional-chaining on non-optional String (#8671)

The gomobile-exported functions in lantern-core/mobile/mobile.go were
migrated from ([]byte, error) to (string, error). gomobile renders the
new signatures with a non-optional Swift String return (Data was
optional; String is not), so `json?.data(using: .utf8)` and
`payload?.data(using: .utf8)` now fail to compile:

    error: cannot use optional chaining on non-optional value of type
    'String'

Drop the `?` on all 14 call sites (7 each in ios/ and macos/). The
resulting `json.data(using: .utf8)` returns Data? anyway — an empty
Go string still produces a non-nil empty Data, which preserves the
Flutter contract the comment on these lines describes.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* add android-test + android-reproduce for emulator testing and ticket reproduction (#8672)

* add android-test script for quick emulator testing with env overrides

Usage:
  scripts/android/android-test <apk> [ENV_KEY=VALUE ...]

Example:
  scripts/android/android-test lantern.apk RADIANCE_COUNTRY=BG RADIANCE_FEATURE_OVERRIDES=dns_ruleset_host_bypass

Starts an emulator, installs the APK, pushes a .env file with overrides
to the app's data dir (via adb root on Google APIs images, run-as on
debug APKs, or su on rooted devices), restarts the app, and streams
filtered logcat.

Prefers the "lantern-test" AVD if it exists (create with Google APIs
image for root access):
  sdkmanager "system-images;android-35;google_apis;arm64-v8a"
  avdmanager create avd -n lantern-test -k "system-images;android-35;google_apis;arm64-v8a"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* address review: serial targeting, su quoting, trap cleanup, fix comment

- Use -s <serial> throughout so multiple devices don't break adb
- Fix su -c quoting so $(stat ...) expands on-device
- Add trap to clean up temp .env on EXIT/INT/TERM
- Fix header comment (no /sdcard/ fallback)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* android-test: push .env to .lantern data dir (not app root)

The Go env package reads .env from the data directory (via
env.LoadFromDir called from common.Init), not from the app's root
data dir. Push to /data/data/$PKG/.lantern/.env so radiance finds it.

Companion: getlantern/radiance#421 (env.LoadFromDir)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* android-test: auto-install system image and create AVD if none exists

If no AVDs are found, the script now automatically:
1. Detects host arch (arm64 vs x86_64)
2. Installs the Google APIs system image via sdkmanager
3. Creates a "lantern-test" AVD via avdmanager

This means running android-test on a fresh machine with just the
Android SDK installed works out of the box — no manual AVD setup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* address review: array for ADB_CMD, timeouts, remove unused PID

- Use bash array for ADB_CMD so paths with spaces work correctly
- Add configurable timeouts for emulator appear (120s) and boot (300s)
- Remove unused EMULATOR_PID — emulator intentionally left running
  between invocations so subsequent runs don't pay boot cost

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* add android-reproduce: reproduce Freshdesk tickets on emulator

Usage:
  android-reproduce /tmp/ticket-172722              # auto-downloads APK
  android-reproduce /tmp/ticket-172722 lantern.apk  # uses provided APK

After running /analyze-ticket, this script:
1. Extracts country + version from the ticket's config/logs
2. Downloads the matching APK from GitHub releases (gh CLI)
3. Pushes the user's exact config.json, servers.json, split-tunnel.json
   to the emulator so it gets the same proxies, DNS rules, rule sets
4. Sets RADIANCE_COUNTRY to match the user's region
5. Installs, restarts, and streams filtered logcat

This gives near-exact reproduction of Android-specific issues by
replicating the user's proxy assignments, country routing, and
sing-box config on a local emulator.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* android-reproduce: match user's Android API level from ticket logs

Extracts sdkInt, osVersion, and model from flutter.log's "Device info"
line. Creates an AVD with the matching API level (e.g. "lantern-api36"
for a user on Android 16/SDK 36). Falls back to API 35 if the target
image isn't available.

Example for ticket #172722 (Android 16, SM-A556B):
  Creates lantern-api35 (API 36 clamped to 35), installs matching APK,
  pushes user's exact config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* android-reproduce: dynamically find closest available API image

Instead of hardcoding a fallback to API 35, step down from the user's
sdkInt until we find an installable Google APIs image. Each API level
gets its own AVD (lantern-api29, lantern-api34, etc.) that persists
across runs, building up a catalog over time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* address review: install-before-push, fix eval injection, f-string, file search

- Install APK + launch once before pushing configs (so data dir exists)
- Replace eval with mapfile for device info extraction (no shell injection)
- Fix f-string syntax error in locations display
- Search both ticket-dir and config-dir for servers.json/split-tunnel.json
- Remove unused SCRIPT_DIR
- Update android-test header to document auto-AVD-creation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix private server navigation issue

* deps: bump sing-box-minimal to v1.12.21-lantern on refactor (#8679)

Companion to #8678. The refactor branch still pins v1.12.19-lantern,
which is missing the non-fatal-rule-set-fetch fix (sing-box-minimal
9c79c311, shipped in v1.12.21-lantern). Without it, Android builds
from this branch hit the same bootstrap deadlock.

* Add IPC starter in android

1

* macos, ios and android cleanup

* lantern-core: wire config events through IPC (#8673)

* lantern-core: subscribe to config events over IPC (/config/events)

The refactor branch removed listenConfigEvents when it was discovered
that the in-process events.SubscribeContext no longer worked — the
extension's radiance process is where config.NewConfigEvent is emitted,
and the host's subscription never fires across processes.

Now that the companion radiance PR adds a /config/events SSE endpoint,
restore the listener using lc.client.ConfigEvents with the same
reconnect-with-backoff pattern listenAutoSelectedEvents uses. Each
frame fires notifyFlutter(EventTypeConfig, "") so Flutter's
app_event_notifier "config" case resumes driving
availableServersProvider.forceFetchAvailableServers() and
homeProvider.fetchUserDataIfNeeded() on every config change.

Also bumps the radiance pin to the commit that adds the endpoint.

Addresses the config-events half of getlantern/engineering#3182.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* lantern-core: update StartBackgroundListeners comment to include config

Reflects that listenConfigEvents also starts automatically from
initialize, addressing Copilot review on PR #8673.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* add android-reproduce: reproduce Freshdesk tickets on emulator

Usage:
  android-reproduce /tmp/ticket-172722              # auto-downloads APK
  android-reproduce /tmp/ticket-172722 lantern.apk  # uses provided APK

After running /analyze-ticket, this script:
1. Extracts country + version from the ticket's config/logs
2. Downloads the matching APK from GitHub releases (gh CLI)
3. Pushes the user's exact config.json, servers.json, split-tunnel.json
   to the emulator so it gets the same proxies, DNS rules, rule sets
4. Sets RADIANCE_COUNTRY to match the user's region
5. Installs, restarts, and streams filtered logcat

This gives near-exact reproduction of Android-specific issues by
replicating the user's proxy assignments, country routing, and
sing-box config on a local emulator.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Adam Fisk <afisk@mini.local>

* windows ffi cleanup

* Update bindings

* point to radiance refactor branch

* feat(dev-mode): hidden 5-tap unlock on support view + expanded dev screen

Show Build number alongside Lantern version on the support view. Tapping
the Build row 5× within 3s toggles developer mode (gated to nightly/debug
builds for enabling; disabling works anywhere). The developer entry in
settings now hides unless dev mode is enabled.

Developer screen adds radiance env-var overrides (country, version,
feature overrides), a log-level dropdown, a config-fetch toggle, and
buttons to send a config request, run URL tests, show live settings/env,
and disable dev mode. Pins qpack to v0.5.1 via replace directive to match
radiance's own pin so sing-box-minimal's quic-go HTTP/3 continues to
build.

Wires radiance ipc.Client.PatchSettings / PatchEnvVars / RunOfflineURLTests
/ UpdateConfig through lantern-core and exports them via FFI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* bump radiance - limit config fetch to 1 at a time

* feat(dev-mode): show spinner on in-flight action tiles

Tapping Send config request / Run URL tests / Show settings & env vars
now disables the tile and shows a spinner until the IPC call returns, so
users don't assume the button is broken during the latency before the
snackbar appears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Developer mode refactor

* Sync garmr/radiance-daemon-refactor with origin/main (#8684)

* deps: update radiance to fix outbound removal breaking config refresh (#8639)

Picks up radiance PR #405 which fixes removeOutbounds failing when
extra outbounds (non-smart Pro locations) aren't in the URL test group.
This was causing every config refresh IPC to return 500, preventing
SetURLOverrides and CheckOutbounds from running — resulting in ~50%
of bandit probe callbacks never firing.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Smart location country fix (#8638)

* Do not reset a smart location.

* code review updates

* Fix website split-tunneling reliability and CI validation (#8640)

* Add split tunneling e2e test

* Fix split tunneling website smoke assertion

* Fix split tunneling smoke navigation

* code review updates

* code review updates

* code review updates

* code review updates

* code review updates

* Macos system extension updates #2 (#8637)

* if system extension is in uninstall state do not block new installtion.

* update macos system extension test

* do not cache dart_tool

* Set the default status as unknown.

* code review updates

* Filter system apps from Windows split tunneling (#8641)

* Add split tunneling e2e test

* Fix split tunneling website smoke assertion

* Fix split tunneling smoke navigation

* code review updates

* code review updates

* code review updates

* Filter Windows system apps in split tunneling list

* code review updates

* code review updates

* Update system apps filter

* code review updates

* deps: update radiance + lantern-box to fix ~20% callback failure (#8642)

Picks up:
- radiance PR #406 → lantern-box PR #231: clear URL test history
  when SetURLOverrides is called so outbounds are re-tested with
  new callback URLs
- radiance PR #405: best-effort URL test group removal (already in
  previous update, carried forward)
- lantern-box v0.0.61: includes CA cert install + history fix

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: update radiance + lantern-box for callback-all-outbounds (#8644)

- radiance: removes URL test filtering, all outbounds tested (PR #407)
- lantern-box v0.0.62: 6-worker URL test pool + client delay reporting (PR #232)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Hide system apps without dropping user apps on Windows (#8643)

* code review updates

* code review updates

* code review updates

* chore: update radiance for async IPC outbound handlers (#8645)

Picks up getlantern/radiance#410: IPC outbound update/add/remove
handlers return 202 immediately and process asynchronously, fixing
the EOF errors on every config refresh.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update radiance for split tunnel persistence fix (#8646)

Picks up getlantern/radiance#411: fixes split tunnel filters silently
not persisting due to dangling slice pointers in initRuleMap.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: upload and notify for nightlies even when some platforms fail (#8649)

The upload-s3 and upload-release-artifacts jobs required ALL platform
builds to succeed or be skipped. When a matrix entry failed (e.g.,
Linux arm64), the entire build-linux job reported as 'failure', which
caused both upload jobs to skip entirely — even though macOS, Android,
iOS, and Linux amd64 all succeeded.

Simplify the condition: run uploads if at least one platform build
succeeded. The upload steps already handle missing artifacts gracefully
(upload_if_exists checks for file existence).

This ensures the Slack notification goes out with download links for
whatever platforms did build successfully.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Stabilize nightly smoke checks and platform release publishing (#8651)

* Stabilize nightly smoke checks and platform release publishing

* code review updates

* code review updates

* chore: bump radiance to latest main (lantern-box v0.0.65) (#8654)

Picks up:
- Reflex active-probe resistance: silence-timeout + masquerade
  fallback (getlantern/lantern-box#237 via radiance#413)
- TLS 1.3 minimum enforcement for Reflex
  (getlantern/lantern-box#236)
- radiance split-tunnel filter persistence fix (#411)

No Flutter / client-side behavior changes required — the Reflex
hardening is server-side.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add arch to releases (#8652)

* feat: add arch to releases

* Update linux/packaging/usr/lib/systemd/system/lanternd.service

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* chore: remove committed lanternd.service file

Agent-Logs-Url: https://github.com/getlantern/lantern/sessions/15085485-3c6a-4e1e-93ea-6e9bf0623d09

Co-authored-by: reflog <109876+reflog@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: reflog <109876+reflog@users.noreply.github.com>

* ran go mod tidy

* Improve Windows app discovery for shortcut wrappers (#8653)

* code review updates

* Improve Windows app discovery for shortcut wrappers

* code review updates

* code review updates

* code review updates

* The radiance-to-device limit is flow fix. (#8659)

* only use permalinks (#8658)

Co-authored-by: atavism <paul@getlantern.org>

* Add auth E2E tests and wire Linux/Windows CI (#8607)

* auth flow test updates

* auth flow test updates

* auth flow test updates

* code review updates

* code review updates

* code review updates

* code review updates

* deps: update sing-box-minimal to v1.12.21-lantern (#8660)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Show vpn conflict dialog on smart location (#8661)

* Show vpn conflict dialog on smart location

* code review updates

* chore: bump radiance and lantern-box to latest (#8664)

- radiance: f1c425231e41 → 4241e6c5a9c6 (main HEAD)
- lantern-box: v0.0.65 → v0.0.67

Ran go mod tidy.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Windows installer cleanup, improve app discovery and icon loading (#8666)

* code review updates

* Add comment

* code review updates

* remove sentry (#8665)

* Save last server location (#8655)

* save server location

* update radiance.

* Forbid AutoConnect if connect fails.

* update radiance

* code review updates

* update radiance

* code review updates (#8675)

* deps: restore sing-box-minimal v1.12.21-lantern (#8678)

PR #8655 ("Save last server location") accidentally downgraded
sing-box-minimal from v1.12.21-lantern back to v1.12.19-lantern in
go.mod during review churn. v1.12.21-lantern contains commit 9c79c311
("fix: make initial remote rule-set fetch non-fatal"), which turns the
Android bootstrap deadlock ("no available network interface" during
initial rule-set fetch) from a fatal libbox startup error into a
WARN + retry-after-start. Without it, nightly builds from main fail
to connect on any smart-routing country (Macao, Bulgaria, etc.).

Confirmed by comparing Freshdesk #172722 (broken, rule_set_remote.go:235,
v1.12.19-lantern) with #172795 (working, rule_set_remote.go:113,
v1.12.21-lantern). Same user, same device, same 9.0.25 version, same
smart-routing-bg-common-direct fetch failure — only the sing-box-minimal
version differs. The v9.0.25-beta-android tag was cut before #8655
merged, which is why Alexander's beta works while the nightly doesn't.

`go mod tidy` also dropped stale go.sum entries for superseded radiance
and lantern-box pseudo-versions and removed the unused getsentry/sentry-go
indirect (left behind after #8665).

* Makefile: fix empty common.Version on Windows CI (missing app version 400) (#8677)

* Makefile: use env-provided APP_VERSION so Windows CI populates version ldflag

common.Version in radiance was being linked as an empty string on Windows
CI builds. The `-X .../common.Version=$(APP_VERSION_PUBSPEC)` ldflag
depended on `$(shell grep ... | sed ...)` or a PowerShell fallback, and
the Windows path was producing an empty value. With common.Version empty,
backend.NewRequestWithHeaders sets X-Lantern-App-Version to "", and
lantern-cloud's /v1/config-new handler rejects the request with
400 "missing app version" — no config is returned, so the client falls
back to the embedded server list with no bandit tracks. Observed on
Freshdesk #172794 (Windows 9.0.26 nightly, radiance 400s on every retry).

Use the APP_VERSION already exported to GITHUB_ENV by build-windows.yml's
"Read app version from pubspec.yaml" step, and compute APP_VERSION_PUBSPEC
with Make built-ins ($(firstword $(subst +, ,...))) so no shell tools are
required. Drops the Windows_NT branch; local dev on Mac/Linux still uses
the grep/sed fallback (APP_VERSION ?=).

* Makefile: restore Windows local-dev fallback for APP_VERSION

The previous commit removed the Windows_NT branch under the assumption
that APP_VERSION would always come from the environment. That's true on
CI (build-windows.yml exports it to GITHUB_ENV), but local Windows
developers running `make windows-release` directly don't set the env
var, and the grep/sed fallback runs under cmd.exe where Unix-style
quoting fails silently.

Add back the Windows PowerShell branch, but only as the fallback when
APP_VERSION isn't in the environment (`?=` on both branches). CI keeps
working via the env override; local Mac/Linux uses grep/sed; local
Windows uses PowerShell Select-String. The `+`-splitting stays in
Make built-ins so it works no matter which branch produced APP_VERSION.

* Makefile: fail the build when APP_VERSION_PUBSPEC ends up empty

Adds a parse-time guard so an unresolvable version fails loudly rather
than producing a binary with empty common.Version — which is what caused
this whole bug in the first place. Addresses Copilot review feedback on
PR #8677.

$ APP_VERSION="" make
Makefile:36: *** APP_VERSION_PUBSPEC is empty; export APP_VERSION ...

* Roll in #8676: PowerShell quoting + Windows service startup log

Incorporates the non-overlapping pieces of @atavism's PR #8676 so we
can close it in favor of this PR:

- Swap the Windows APP_VERSION fallback's PowerShell invocation to
  outer-single / inner-double quoting. The previous outer-double /
  inner-single form gets mangled when Make expands $$ and cmd.exe
  passes the resulting string to powershell, even in the local-dev
  fallback path.
- Same fix for GO_VERSION's PowerShell shell-out further down in the
  Makefile (separate variable, same root cause).
- Log the Windows service startup (name, version, mode) so it's
  visible when triaging issues. Matches the log line from #8676.

* Fix data cap issue (#8668)

* Report an Issue screen fixes (#8670)

* updates to report issue screen

* updates to report issue screen

* rename report issue

* rename report issue

* code review updates

* ffi: add missing base64 import for app icon encoding

* code review updates

* code review updates

* code review updates

* code review updates

* code review updates

* code review updates

* code review updates

* code review updates

---------

Co-authored-by: Myles Horton <afisk@getlantern.org>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: jigar-f <132374182+jigar-f@users.noreply.github.com>
Co-authored-by: Ilya Yakelzon <reflog@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: reflog <109876+reflog@users.noreply.github.com>
Co-authored-by: Jay <110402935+jay-418@users.noreply.github.com>

* code review updates

* bump radiance

* bump go to v1.26.2

Go v1.26.2 includes a patch to CGo that addresses some of
the bulkBarrierPreWrite panics.

* bump radiance - fix event streams on mobile

* fix sign up issue to point new radiance.

* split tunneling: treat FFI "ok" response as success, not error (#8691)

* split tunneling: treat FFI "ok" response as success, not error

_runSplitTunnelCall was checking `result != nullptr` and treating any
non-null return as an error message. But the Go FFI
(lantern-core/ffi/ffi.go) returns C.CString("ok") on success for both
addSplitTunnelItem and removeSplitTunnelItem — a non-null C string.

As a result, every successful add/remove was being reported to the UI as
a failure with message "ok". Symptoms:

- Adding a website in split tunneling showed an unstyled default
  snackbar reading "OK" (the default Material SnackBar rendering
  failure.localizedErrorMessage).
- The website appeared to not be saved — but it actually was; the
  provider's `reloaded` flag was never set, so the on-screen list never
  re-fetched from the backend.
- Re-clicking "Add" with the same domain created a duplicate entry on
  disk (visible as repeated items in split-tunnel.json) because the
  provider's local "already-added" check worked against a stale copy
  that had never been refreshed.

Fix: mirror the checkAPIError convention — treat literal "ok" as
success, parse JSON {"error": "..."} bodies for the error message, and
fall back to the raw string otherwise.

Reported in getlantern/engineering#3291 against Windows 9.0.29 build 481
(Freshdesk #173656).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* split tunneling: reuse _ffiOkResults for success-string check

Rather than hardcoding 'ok', use the existing _ffiOkResults set
({'ok', 'true'}) defined at the top of this file so the split-tunnel
path stays in sync with the other FFI success checks (e.g.
_setupRadiance at line 201).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* split tunneling: use design-system error snackbar on add (#8692)

The local showSnackbar helper in website_domain_input was using
Material's default ScaffoldMessenger.showSnackBar(SnackBar(content:
Text(message))) — producing an unstyled grey/dark snackbar that the rest
of the app doesn't use. Every call site in this file is an error path
(empty input, invalid domain, already-added, backend failure), so route
them through context.showSnackBarError which applies the app's rounded,
floating, red-background error style.

Follow-up to #8691. Addresses the "unstyled snackbar" symptom in
getlantern/engineering#3291 issue 3 for any remaining error surface
after the FFI "ok" fix.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(vpn): use SelectServer when switching servers on a live tunnel

 connectToServer previously always called ConnectVPN, which radiance
 rejects with ErrTunnelAlreadyConnected when the tunnel is up. Check
 VPNStatus first and route to SelectServer when Connected, falling
 back to ConnectVPN otherwise.

* android: detach connect() scope so withTimeout actually unblocks the UI (#8689)

* review: detach connect() scope so timeout actually unblocks the UI

Copilot flagged on #8689 that the existing coroutineScope { ... } still
hangs in exactly the scenario this change is meant to protect against.
Structured coroutineScope cancels its children on exception but then
waits for them to complete — so when withTimeout fires, we cancel the
deferred (which the JNI call ignores, since it has no suspension
points) and then block on it finishing anyway. Net effect: the UI is
still frozen, which is the symptom we're trying to prevent.

Switch to a DETACHED CoroutineScope(SupervisorJob() + Dispatchers.IO).
Its Job is not a child of the enclosing coroutine, so cancelling it
doesn't join — the orphan coroutine keeps running the JNI call in the
background until Go returns or the process exits, but the caller is
unblocked and the runCatching.onFailure path fires the timeout error
state for the UI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* review: add single-flight gate to prevent orphan accumulation

Copilot correctly pointed out on #8689 that the detached-scope approach
can accumulate orphan coroutines if the user retries while a previous
connect() is still stuck in JNI. Each orphan pins a Dispatchers.IO
thread; enough retries against a truly deadlocked Go side could
pressure the IO pool.

Their suggested fix (Dispatchers.IO.limitedParallelism(1)) would
serialize retries behind the orphan, turning the 2nd retry into
another 60s hang. A simple single-flight AtomicBoolean gate with fast
rejection is the cleaner mitigation:

- compareAndSet rejects concurrent attempts with IllegalStateException
  (surfaces via the existing runCatching.onFailure → error state).
- The flag clears in a try/finally inside the async block, which runs
  when the JNI call eventually returns — cancellation alone can't
  break it out, but once Go completes the finally runs and a future
  retry is admitted.
- Process death (reboot, force-stop) resets the flag naturally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Show the fastest location on smart location.

* android: make restartService block until restart completes (#8697)

* android: make restartService block until restart completes

Two bugs in the platformIfce restart path that together let the tunnel
wedge in Restarting forever on Android, triggering the "Error in VPN
operation" on every subsequent Connect attempt
(getlantern/engineering#3297, Freshdesk #173681).

1. restartService() used serviceScope.launch { ... } and returned
   immediately. Radiance's Restart() treats the sync return as "restart
   succeeded" and leaves the tunnel at status=Restarting, expecting the
   platform coroutine to drive it through stopVPN → startVPN and
   transition status via Mobile.* side-effects. If the service is torn
   down before the coroutine completes (onDestroy, process pressure),
   nothing ever transitions the tunnel out of Restarting.

   Switch to runBlocking(Dispatchers.IO) so the return actually
   reflects completion. c.mu is released on the Go side before
   RestartService is invoked, so synchronous Mobile.* callbacks on
   this thread don't deadlock.

2. stopVPNTunnel() skipped Mobile.stopVPN() when Mobile.isVPNConnected()
   returned false. isVPNConnected is status == Connected — but at the
   point stopVPNTunnel is called from restartService, radiance has
   already set status=Restarting, so the guard always skips and the
   tunnel is never actually closed.

   Swap the guard for Mobile.isRadianceConnected() — i.e. only skip
   when the IPC server itself isn't up. Mobile.stopVPN() is a no-op
   when c.tunnel is nil on the Go side, so the original guard was
   redundant even for the Connected == true case.

Evidence from Freshdesk #173681 logs for the broken path:
- 15:17:34.826 Restart → 15:17:34.828 "Tunnel restarted successfully"
  (2ms total — consistent with fire-and-forget, not real teardown)
- No subsequent tunnel.init / Tunnel connection established
- 15:19:10 onDestroy logs "Skipping stopVPN — VPN tunnel was never
  started" (same isVPNConnected() check)
- 15:21:48 next Connect fails within 2ms of the IPC request with
  "tunnel is currently Restarting"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* android: drop isVPNConnected guard in onDestroy too

Same shape as the restart-path fix: if c.tunnel is non-nil on the Go
side but the tunnel status is anything other than Connected (Restarting
after a failed restart, Connecting mid-startup, Error from a prior
failure), isVPNConnected() returns false and the old guard skipped
Mobile.stopVPN(). That left the radiance tunnel state dangling across
service destroy.

Observed in Freshdesk #173681: "onDestroy — radianceConnected=true
vpnConnected=false, Skipping stopVPN — VPN tunnel was never started"
while the tunnel was actually alive at status=Restarting.

Swap the second guard for an unconditional call. Mobile.stopVPN() is a
no-op when c.tunnel is nil, so the guard was always redundant — it just
happened to also hide the non-Connected-but-non-nil case that's
load-bearing during restart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* android: verify restart postcondition before returning to Go

launchVPN wraps its body in runCatching { ... }.onFailure { ... } and
returns normally regardless of whether Mobile.startVPN() threw — so a
nil return from startVPN() does not mean the restart succeeded. Without
a postcondition check, restartService would log "completed" and return
to radiance as if everything worked, even though the tunnel is still
stuck in Restarting, which defeats the whole point of making this
function block.

Check Mobile.isVPNConnected() at the end of the runBlocking block and
throw IllegalStateException if false. The exception propagates through
runBlocking → restartService → radiance's platformIfce.RestartService()
as a non-nil error, so Restart() hits the ErrorStatus branch and the
caller sees the failure.

Addresses Copilot review feedback on PR #8697.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Adam Fisk <afisk@mini.local>

* fix(vpn): don't cancel tunnel when restart's start phase fails

The PacketTunnelExtension hosts the IPC server, so cancelTunnelWithError
tears down the daemon along with the tunnel. Inline MobileStartVPN in
restartService so a failed restart leaves the extension (and IPC socket)
alive; radiance's status events surface the failure for retry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix copilot issue (#8696) (#8698)

Co-authored-by: atavism <atavism@users.noreply.github.com>

* main: don't block first paint on Updater.init() (#8699)

* main: don't block first paint on Updater.init()

Moving Updater.init() off the critical path to runApp. Investigating a
one-shot black-screen-on-startup report on a local macOS dev build
(9.0.29 build 487): flutter.log stopped at the last pre-runApp log line
with no Dart exception and no crash, while the Go side kept running
normally. The only awaited call between that last log and runApp is
Updater.init().

Inside init(), the actual update check is already deferred 45 s via
Future.delayed + unawaited. But setFeedURL and setScheduledCheckInterval
are awaited — both bridge into Sparkle via the auto_updater Flutter
plugin, and both can stall on first launch: feed URL resolution,
keychain access, or a previous launch's background worker still holding
a lock. Any of those becomes a main-isolate hang that prevents runApp,
which exactly matches the observed symptom.

Fix: drop the await so Updater.init() runs concurrently with the rest
of startup. All errors are already handled inside init() itself, so
unawaited is safe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* review: guard sl<Updater>() lookup against failed service injection

Copilot flagged that if injectServices() throws above (caught at
main.dart:45), Updater is never registered (it's registered at
injection_container.dart:40, after storage init), and sl<Updater>()
throws synchronously. unawaited() doesn't help — the throw happens
before the Future is constructed, so it propagates out of main and
prevents runApp.

Wrap the call in try/catch + sl.isRegistered<Updater>() so any failure
to look up or start Updater.init logs and continues to runApp.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(logs): stream diagnostic logs via ipc TailLogs on desktop

Wires the FFI path to radiance's ipc.Client.TailLogs and merges in-app
flutter.log records so the diagnostic logs view shows both sources.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* deps: bump radiance to refactor tip (9703bcf) (#8700)

Picks up:
- refactor(vpn): own VPN status on the client so restarts span tunnels
- vpn: instrument tunnel.start phases + VPNClient.Restart (#443)

The VPN-status-ownership refactor moves setStatus calls out of
tunnel and onto VPNClient so a restart transitions Restarting →
Disconnecting → Disconnected → Connecting → Connected cleanly.

The instrumentation PR adds child spans around libbox.Setup,
libbox.NewServiceWithContext, libbox.BoxService.Start, and
newMutableGroupManager so SigNoz can attribute the 10s+ tail
on /service/start observed in Freshdesk #173696.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix server auto issue

* More fix to server selection.

* server selection changes for IOS/Macos

* Use select sever if vpn is active.

* bump radiance - pull in empty tag fix

* lantern-core: dispatch ConnectVPN/StartVPN to SelectServer on live tunnel (#8702)

* lantern-core: dispatch ConnectVPN to SelectServer on live tunnel

When the Flutter UI triggers an auto-select on a live tunnel — most
visibly Jigar's rewrite of onSmartLocation (server_selection.dart), which
routes "switch back to Smart" through startVPN(force: true) → Dart
lantern.startVPN() → ffi.go:startVPN → c.ConnectVPN("") — radiance's
/vpn/connect endpoint rejects the request with ErrTunnelAlreadyConnected
(radiance/vpn/vpn.go:126 in VPNClient.Connect). The error is returned to
the Dart UI as a snackbar, the tunnel stays pinned to the previously
selected manual server, and lantern.log is silent because neither
LocalBackend.ConnectVPN nor VPNClient.Connect slog the ErrTunnelAlready
Connected path.

Observed on 9.0.30 beta (internal tester, Freshdesk #173763, build from
commit 405468954 which includes Jigar's 289507280). After manually
picking Bogotá, clicking "Smart" at the top of the server-selection
screen surfaces the snackbar and the tunnel keeps routing traffic
through the Bogotá samizdat outbound.

Fix: when Status() == Connected, LanternCore.ConnectVPN dispatches the
request to /server/selected (the live-tunnel outbound swap) instead of
/vpn/connect. Empty tag normalizes to vpn.AutoSelectTag — Dart sends ""
for Smart, radiance recognizes only the literal "auto" and otherwise
falls into the manual-outbound branch of SelectServer, stranding Clash
in manual mode with an empty selector. The mapping is centralized in a
small normalizeAutoTag helper used by both ConnectVPN and SelectServer.

This puts the same dispatch logic that lives in ffi.go:connectToServer
onto every caller of LanternCore.ConnectVPN — including ffi.go:startVPN
(which Jigar's rewrite now funnels through) and any future FFI/mobile
entry point.

getlantern/engineering#3291 issue 3. Supersedes earlier work on
fisk/connect-dispatch-select-when-connected (485bf5a00), which was
scoped to this same dispatch but predated the current refactor branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* vpn_tunnel: dispatch StartVPN to SelectServer on live tunnel (mobile path)

Mobile.StartVPN (the gomobile entry point for Android MainActivity and
iOS VPNManager) routes through vpn_tunnel.StartVPN(client), which calls
client.ConnectVPN(ctx, vpn.AutoSelectTag) directly — bypassing
lanterncore.Core. Jigar's onSmartLocation rewrite dispatches "switch
back to Smart" through startVPN(force: true), which on Android/iOS
lands here. Same ErrTunnelAlreadyConnected bug as the FFI path fixed in
the previous commit.

Mirror the VPNStatus dispatch pattern garmr already added to
vpn_tunnel.ConnectToServer in 405468954: when Status() == Connected,
swap outbound via /server/selected; otherwise fall through to the
existing /vpn/connect start.

Together with the LanternCore.ConnectVPN dispatch, this closes the
Smart-from-connected bug on every platform (Windows FFI, Android/iOS
gomobile).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ffi: drop now-redundant VPNStatus dispatch in connectToServer

LanternCore.ConnectVPN already routes to /server/selected when the
tunnel is live (added earlier in this PR), so ffi.go:connectToServer's
own VPNStatus check is duplicate work. Collapse to a single c.ConnectVPN
call — both the live-tunnel-swap and fresh-connect paths flow through
the dispatch one layer down.

Behavior unchanged. The "start service failed" error wrapper is kept
for Dart-side snackbar stability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* lantern-core: collapse dispatch to a single implementation in vpn_tunnel

Three functions had independent VPNStatus → SelectServer-vs-ConnectVPN
dispatches after the earlier commits: LanternCore.ConnectVPN,
vpn_tunnel.StartVPN (both added in this PR), and vpn_tunnel.ConnectToServer
(pre-existing from 405468954). Consolidate so vpn_tunnel.ConnectToServer
is the authoritative dispatch and the other two delegate.

- LanternCore.ConnectVPN → vpn_tunnel.ConnectToServer(lc.client, tag)
- vpn_tunnel.StartVPN → ConnectToServer(client, vpn.AutoSelectTag)

LanternCore.SelectServer keeps its own empty-tag normalization since its
scope is the one-shot SelectServer IPC, not the dispatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* lantern-core: drop client-side empty-tag normalization (radiance fac9089) (#8703)

Patrick's radiance fac9089 ("fix(vpn): treat the empty string as
AutoSelect in SelectServer") is now pinned on this branch via
72a6c6282. Radiance normalizes tag == "" → AutoSelectTag on both
ConnectVPN and SelectServer, so the client-side normalizations we
added earlier (normalizeAutoTag helper in core.go, `if tag == ""` in
vpn_tunnel.ConnectToServer) are redundant — radiance handles the Dart
"" convention uniformly.

Remove:
- LanternCore.normalizeAutoTag helper + its use in SelectServer
- `if tag == "" { tag = vpn.AutoSelectTag }` branch in
  vpn_tunnel.ConnectToServer
- lantern-core/core_test.go (only tested the removed helper)

Behavior unchanged end-to-end: empty tag still means auto-select on
every path (FFI, gomobile, connectToServer, startVPN).

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* bump radiance to refactor tip (d5a1872) — pull in LocalBackend.SelectServer empty-tag fix (#8705)

radiance@d5a1872 completes fac9089's empty-string → AutoSelectTag
normalization by extending it to LocalBackend.SelectServer, which
previously only matched the literal "auto" and fell through to the
srvManager lookup for tag == "" — producing "no server found with tag"
(HTTP 500, snackbar) on Smart-from-connected flows after the client-
side normalization was removed in this branch's 6de3c9aa9.

Reported on Lantern 9.0.30 beta via Freshdesk #173773.

go.mod + go.sum bump only; no lantern code changes. Pinned commit:
getlantern/radiance@d5a18726afbc (#444).

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Garmr/refactor mobile logstream (#8701)

* feat(logs): stream diagnostic logs via ipc TailLogs on mobile

Adds a mobile gomobile binding for ipc.Client.TailLogs (TailLogs +
LogSubscription) and switches Android and iOS to consume it, replacing
the per-platform log-file tailers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(logs): stream diagnostic logs via ipc TailLogs on macos

Switches the macOS log stream to MobileTailLogs, matching iOS. Removes
the file-watching LogTailer (no remaining callers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(logs): harden TailLogs against nil, panics, and listener leaks

- Reject nil listener in mobile.TailLogs; recover from panics crossing
  the gomobile bridge so the stream survives unexpected bridge errors.
- Retain the Kotlin LogListener in a field so the Go side's reference
  stays strongly rooted on the JVM.
- On iOS/macOS, cancel any pre-existing subscription before starting a
  new one and clear the stored listener when MobileTailLogs errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(logs): share TailLogs plumbing across mobile and ffi

Adds lantern-core/logs.Subscribe wrapping ipc.Client.TailLogs so the
mobile and desktop integrations go through one helper. Drops the iOS
LogTailer dead code and the unused lantern-core/logging package.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Update log formatting

* Fix issue with ios

* Fix macos logs issue

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Jigar-f <jigar@getlantern.org>

* ffi + lantern-core: drop non-Linux preflight; bound IPC calls with per-operation timeouts (#8707)

* ffi: skip the daemon-reachability preflight on Windows / macOS / mobile

The 300 ms preflight in lantern-core/core.go's CheckDaemonReachable
was originally tuned for the Linux flow (PR #8494 by atavism, commit
bf054f4ea), where the failure path falls back to `systemctl is-active
lanternd.service` for a rich diagnostic error. The 300 ms cap made
sense as "fast probe → systemd-rich-error", with the systemd query
adding the actual user-facing context.

Subsequent refactors (commit bd89bea7e Apr 7, then PR #8578 commit
4d4e06d9d Apr 16) generalized that preflight to all platforms but
the systemd fallback only survived in ffi_linux.go. On Windows /
macOS / mobile, ffi_nonlinux.go ended up running the same 300 ms
probe with no fallback — just an artificial guillotine in front of
ConnectVPN, which has its own "lanternd not reachable" error path
with equivalent precision.

Cold-start IPC on Windows regularly exceeds 300 ms (named-pipe dial
+ winio impersonation token dance + H2c connection preface +
goroutine scheduling on a 96-second-idle daemon), so the first VPN
toggle after launch reliably trips the timeout and shows the user a
"lanternd not reachable" error. Clicking again 10 seconds later
silently succeeds. Reproduced on the same Windows machine across
9.0.29 (Freshdesk #173696) and 9.0.30 (#173932).

Make the preflight a no-op on non-Linux. Linux keeps the original
fast-probe-then-systemdDiag flow unchanged. If we add Windows
(`sc query LanternSvc`) or macOS (`launchctl list`) diagnostics
later, restore the preflight and call them from here.

See getlantern/engineering#3382 for the full archaeology + design
discussion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ffi + lantern-core: bound IPC calls with per-operation timeouts

Companion to dropping the non-Linux daemon-reachability preflight in this
same PR. The preflight (ffi_nonlinux.go's `checkDaemonReachable`) was
introduced in commit bd89bea7e along with the *removal* of per-call
timeouts that used to live on the FFI layer:

    -    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    -    if err := c.Client().DisconnectVPN(ctx); err != nil { ... }
    +    if err := c.DisconnectVPN(); err != nil { ... }

After that change, the only IPC call with any deadline at all was the
300 ms preflight. Every other operation flowed lc.ctx (
context.WithCancel(context.Background())) straight through, meaning a
hung lanternd would freeze the UI indefinitely. Dropping the preflight
without restoring per-call timeouts removes the only line of defense.

Restore them at the LanternCore layer where they belong, with values
sized for the inherent work each operation does (state changes can run
into multi-second territory; status queries should be near-instant):

    ipcConnectTimeout     = 60 * time.Second   // ConnectVPN
    ipcStateChangeTimeout = 30 * time.Second   // SelectServer, DisconnectVPN
    ipcStatusTimeout      = 10 * time.Second   // VPNStatus, IsVPNRunning

These bound the worst case (hung daemon → user sees a clear error within
a minute, no indefinite spinner) without firing during normal slow paths.
The dialer's 10 s connect timeout (radiance/ipc/conn_windows.go) already
covers the lanternd-crashed case; these guard the lanternd-hung case.

vpn_tunnel.{StartVPN, StopVPN, ConnectToServer} take the ctx through
their signatures instead of building their own context.Background()
internally, so callers stay in charge of their own deadlines. mobile/
mobile.go updated to set 60 s / 30 s / 60 s contexts on its three
gomobile entry points.

CheckDaemonReachable's 300 ms timeout is kept untouched — Linux still
calls it from ffi_linux.go for the systemctl is-active fallback that's
the whole point of the fast probe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* bump radiance

* apps_windows: additional Windows app-discovery sources + heuristic noise filter

Split out from #8706 — held back from the base-bug fix branch
(fisk/fix-empty-apps-base-bug) because this is the heuristic part of
that PR, not the actual cause of Freshdesk #173774 / #173778 / #173826.
The base bug was GetEnabledApps returning nil-as-"null" instead of
empty-as-"[]" (fixed in the other branch); this PR is the broader
investigation that grew alongside the diagnosis but stands on its own
merits.

What's in here:

- **HKLM\Software\Microsoft\Windows\CurrentVersion\App Paths scanner.**
  Apps that register here so they're runnable via Win+R / shellexecute.
  Catches browsers, IDEs, Office, and most third-party apps that don't
  go through Squirrel / Start Menu.

- **HKLM and HKCU \Software\Microsoft\Windows\CurrentVersion\Run.**
  Squirrel-managed apps (Slack, Discord, VS Code Insiders) register for
  auto-start with command lines pointing at Update.exe --processStart.
  Same parser as Start Menu .lnk targets, including --processStart.

- **%LOCALAPPDATA%\<App>\Update.exe pattern.** Belt-and-suspenders for
  Squirrel installs that don't show in the registry yet but exist on
  disk under the well-known layout.

- **isAppPathsNoise filter.** App Paths is heavily polluted by
  Microsoft-bundled tooling (IE relics, Office helpers, vestigial Mail
  + tablet apps) and UWP package plumbing (winget,
  WindowsPackageManagerServer). Drops entries under known system
  paths, .NET helper assemblies, anything under \Microsoft Office\
  except primary product exes, helper-named basenames (substring
  match: "browsersupport", "lastpassexporter", "updater", "helper",
  "diagnostic", "diagcmd"), and basenames suffixed with the generic
  helper words (update, service, agent, sync, broker).

- **ParentKeyName filter removal in isNonUserFacingUninstallEntry.**
  ParentKeyName != "" was introduced in #8641 and was over-aggressive
  — plenty of legitimate end-user apps set it (Squirrel installs,
  winget packages, MSI bundle children). SystemComponent=1 and
  NoDisplay=1 are the documented signals; those stay. Drops the
  now-unused parentKeyName field on uninstallEntryMetadata too.

- **COM-failure logs in Start Menu scanner promoted Debug → Warn**
  (CoInitializeEx, WScript.Shell CreateObject, QueryInterface). These
  paths previously failed silently; now an empty result tells us
  which call broke.

- **Per-filter scan-summary tallies** at the end of each scanner with
  scanned/kept/per-drop-reason counts and a sample of kept apps for
  triage. Sample paths are redacted to filepath.Base to avoid
  including user PII (full Windows paths typically embed the
  username) in scan logs that get bundled into Report Issue tickets.

- **Hot-path constant data hoisted to package scope.**
  isAppPathsNoise's systemPaths / primaryOfficeExes / helperHints /
  suffix lists used to be reallocated on every call (runs hundreds-
  to-thousands of times per scan). Now package-level vars.

- **+152 lines of test coverage** for the new heuristics in
  apps_windows_test.go.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* lantern-core: fix empty Windows split-tunnel apps list + UI-process logging (#8709)

Two narrow fixes that together resolve Freshdesk #173774 / #173778 /
#173826 (Derek's "Failed to fetch installed apps" empty list on Windows
split tunneling). Split out from #8706 so they can land independently
of the broader app-discovery rework that PR also contained.

1. **GetEnabledApps returns []string{} instead of nil.**
   When no apps are split-tunneled, the previous code returned nil,
   which json.Marshal serialized as "null". Dart's jsonDecode("null")
   returns null; the receiving code does `as List`, which throws and
   the UI shows "Failed to fetch installed apps". Initializing as an
   empty slice serializes to "[]" — Dart parses that as an empty list,
   no exception, no error UI. THIS is the actual root cause of the
   empty-list reports we've been chasing; the apps-discovery scanner
   work was investigating a different (also-real but secondary) issue.

2. **UI-process slog wired up via common.Init.**
   On the refactor branch, the UI process never called common.Init.
   slog wrote to stderr (= nowhere on a GUI host), settings were
   uninitialized, no lantern.log was produced outside the daemon.
   Patrick caught this — it was a one-line miss in the refactor.

   Platform-aware so we don't double-init on platforms where the
   backend embeds in-process:
     - windows/linux: full common.Init (separate UI + daemon procs)
     - darwin/ios:    setupAppLogging into a distinct lantern-app.log
                      so the main-app slog doesn't race the tunnel
                      extension's lantern.log on lumberjack rotation
     - android:       Mobile.SetupRadiance already ran common.Init
                      upstream — fall through

3. **Auto-attach UI-process *.log to ReportIssue (windows/linux only).**
   Without it the daemon's archive glob only sees the daemon's logDir;
   UI-side lantern.log + flutter.log never reach the issue bundle. The
   daemon runs as SYSTEM on Windows; we keep UI logDir at
   %PUBLIC%\Lantern\logs so SYSTEM can read it.

The broader Windows app-discovery work from #8706 (App Paths scan, Run
keys, Squirrel pattern, isAppPathsNoise heuristic filters) is being
held in a separate PR for independent review.

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* android: init Go logging early so lantern-core/radiance debug logs aren't lost (#8711)

On Android the entire app runs in a single process, so once common.Init
runs slog.SetDefault covers everything. But common.Init only runs deep
inside SetupRadiance / StartIPCServer, which LanternVpnService launches
asynchronously from an intent fired by MainActivity.startLanternService.
Any slog call emitted in the gap — including any of the wide MethodHandler
surface that Flutter can reach before the VPN service is up — falls
through to the stdlib default (text → stderr → logcat at INFO), so DEBUG
logs vanish and the format diverges from what we use everywhere else.

Add Mobile.InitLogging as a thin gomobile-exposed wrapper around
common.Init, and call it from MainActivity.configureFlutterEngine before
startLanternService. common.Init is guarded by an atomic.Bool, so the
later call from backend.NewLocalBackend is a no-op.

Mirrors PR #8709 (Windows). Reported on Slack by Jigar.

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* FetchUserData at startup.

* Map the correct type to avoid a crash.

---------

Co-authored-by: garmr <pdixon117@gmail.com>
Co-authored-by: Jigar-f <jigar@getlantern.org>
Co-authored-by: jigar-f <132374182+jigar-f@users.noreply.github.com>
Co-authored-by: atavism <atavism@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Ilya Yakelzon <reflog@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: reflog <109876+reflog@users.noreply.github.com>
Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Jay <110402935+jay-418@users.noreply.github.com>
Co-authored-by: atavism <paul@getlantern.org>
Co-authored-by: garmr-ulfr <104022054+garmr-ulfr@users.noreply.github.com>
myleshorton added a commit that referenced this pull request Apr 29, 2026
Already pinned at a153918 (post-merge main); go mod tidy drops two
stale go.sum lines for the prior pseudo-version (e312570c) carried
over from PR #8578.

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants