android: bound Mobile.startVPN with withTimeout(60s)#8688
Merged
myleshorton merged 2 commits intomainfrom Apr 22, 2026
Merged
Conversation
In Freshdesk #173507, a Russian user's second VPN start attempt hung for 20 s before Go returned EOF from /service/start. The Kotlin coroutine had no timeout, so every time the Go side deadlocked the UI button would stay in its \"Connecting…\" state indefinitely and the app became impossible to quit — the user had to reboot the phone. Wrap the connect() call in launchVPN with withTimeout(60s) so a hung Mobile.startVPN / connectToServer always surfaces a clear error instead of freezing the UI. 60 s is generous: normal radiance + TUN establish finishes well under 15 s; anything past 60 s is indistinguishable from a hang. TimeoutCancellationException falls through runCatching to the same onFailure branch we already use for other errors; the user sees an \"Error starting VPN\" state and can retry. The sing-box-minimal fix in v1.12.22-lantern already reduces the likelihood of this deadlock, but this keeps the UX safe if any future regression reintroduces it.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an Android-side safety net to prevent Mobile.startVPN() / Mobile.connectToServer() from leaving the app in a “stuck connecting” state by enforcing a 60s timeout and emitting timeout-specific telemetry/error details.
Changes:
- Introduces a
VPN_START_TIMEOUT_MSconstant (60s) as a ceiling for VPN start/connect operations. - Wraps the VPN connect step in
withTimeout(...)so timeouts flow through the existingrunCatching { ... }.onFailure { ... }path. - Emits a distinct timeout error code (
${errorCode}_timeout) and message/logging for operator visibility.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… calls Copilot correctly flagged that wrapping Mobile.startVPN / connectToServer directly in withTimeout wouldn't work: those are blocking JNI calls with no suspension points, so the cooperative cancellation never triggers. Run the call inside async(Dispatchers.IO) and await it — await IS a suspension point, so withTimeout cancels it correctly. On timeout we abandon the Deferred (cancel marks it, but the JNI call keeps running in the background until Go returns); the caller gets a clear error and the UI unfreezes. Also generalize the timeout error message from 'VPN start timed out' to 'VPN operation timed out' since the same path also handles connect_to_server.
This was referenced Apr 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Safety net for Go-side VPN start deadlocks. Freshdesk #173507's user had to reboot their phone because the UI button stayed frozen for 20+ seconds after Go returned EOF. Wrapping
connect()inwithTimeout(60_000)surfaces a clear error + allows retry instead.Change
Mobile.startVPN()/connectToServer()viawithTimeoutTimeoutCancellationExceptionroutes through existingrunCatching.onFailurepath${errorCode}_timeout+ message so operators can tell it apart from other failuresContext
The sing-box-minimal #41 (in v1.12.22-lantern) already fixes the root cause of #173507's deadlock. This PR is defense-in-depth — if any future regression reintroduces a Go-side hang, users see an error instead of a frozen button.
Test plan