android: detach connect() scope so withTimeout actually unblocks the UI#8689
Merged
myleshorton merged 2 commits intogarmr/radiance-daemon-refactorfrom Apr 23, 2026
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Cherry-picks Android service changes to add a wall-clock timeout around Go/JNI VPN start/connect calls, aiming to prevent UI freezes when Mobile.startVPN() / Mobile.connectToServer() deadlock.
Changes:
- Introduces a 60s
VPN_START_TIMEOUT_MSceiling for VPN start/connect operations. - Wraps the blocking Go/JNI call in
async+withTimeout { await() }and propagates timeout failures through the existing error path. - Adds timeout-specific logging and errorCode suffixing (
*_timeout) for clearer operator visibility.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
myleshorton
added a commit
that referenced
this pull request
Apr 23, 2026
Copilot flagged on #8689 that the existing coroutineScope { ... } still hangs in exactly the scenario this change is meant to protect against. Structured coroutineScope cancels its children on exception but then waits for them to complete — so when withTimeout fires, we cancel the deferred (which the JNI call ignores, since it has no suspension points) and then block on it finishing anyway. Net effect: the UI is still frozen, which is the symptom we're trying to prevent. Switch to a DETACHED CoroutineScope(SupervisorJob() + Dispatchers.IO). Its Job is not a child of the enclosing coroutine, so cancelling it doesn't join — the orphan coroutine keeps running the JNI call in the background until Go returns or the process exits, but the caller is unblocked and the runCatching.onFailure path fires the timeout error state for the UI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot flagged on #8689 that the existing coroutineScope { ... } still hangs in exactly the scenario this change is meant to protect against. Structured coroutineScope cancels its children on exception but then waits for them to complete — so when withTimeout fires, we cancel the deferred (which the JNI call ignores, since it has no suspension points) and then block on it finishing anyway. Net effect: the UI is still frozen, which is the symptom we're trying to prevent. Switch to a DETACHED CoroutineScope(SupervisorJob() + Dispatchers.IO). Its Job is not a child of the enclosing coroutine, so cancelling it doesn't join — the orphan coroutine keeps running the JNI call in the background until Go returns or the process exits, but the caller is unblocked and the runCatching.onFailure path fires the timeout error state for the UI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
79e3b9f to
4c583e6
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Copilot correctly pointed out on #8689 that the detached-scope approach can accumulate orphan coroutines if the user retries while a previous connect() is still stuck in JNI. Each orphan pins a Dispatchers.IO thread; enough retries against a truly deadlocked Go side could pressure the IO pool. Their suggested fix (Dispatchers.IO.limitedParallelism(1)) would serialize retries behind the orphan, turning the 2nd retry into another 60s hang. A simple single-flight AtomicBoolean gate with fast rejection is the cleaner mitigation: - compareAndSet rejects concurrent attempts with IllegalStateException (surfaces via the existing runCatching.onFailure → error state). - The flag clears in a try/finally inside the async block, which runs when the JNI call eventually returns — cancellation alone can't break it out, but once Go completes the finally runs and a future retry is admitted. - Process death (reboot, force-stop) resets the flag naturally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #8688 (which was merged to
mainand has since landed ongarmr/radiance-daemon-refactorvia the main-merge, making the cherry-pick commits this PR originally carried redundant — those were dropped during rebase).Problem
#8688's final shape wraps
connect()in:coroutineScope { val deferred = async(Dispatchers.IO) { connect() } try { withTimeout(VPN_START_TIMEOUT_MS) { deferred.await() } } catch (e: TimeoutCancellationException) { deferred.cancel() throw e } }Copilot correctly flagged that this still hangs in the exact scenario the timeout is meant to guard against.
coroutineScope { }is a structured scope: when the block throws, it cancels its children and then waits for them to finish.deferred.cancel()only signals cancellation cooperatively, butconnect()is a blocking JNI call with no suspension points — it ignores the signal. SocoroutineScopeblocks on the still-running child, the caller stays blocked, and the UI freezes just like before (Freshdesk #173507).Fix
Switch to a detached
CoroutineScope(SupervisorJob() + Dispatchers.IO). Its Job is not a child of the enclosing coroutine, socancel()signals but doesn't join. The orphan coroutine keeps running the JNI call in the background until Go returns (or the process exits), but the caller is unblocked andrunCatching.onFailurefires the${errorCode}_timeoutstate for the UI.Test plan
time.Sleep(90 * time.Second)inside the Go/service/starthandler): UI unfreezes after 60s with a*_timeouterror instead of staying stuckVPN operation (...) timed out after 60000mswhen the timeout firesRelated
main— has the same issue; worth a follow-up PR against main)🤖 Generated with Claude Code