android: detach connect() scope so withTimeout actually unblocks the UI by myleshorton · Pull Request #8689 · getlantern/lantern

myleshorton · 2026-04-22T17:11:21Z

Follow-up to #8688 (which was merged to main and has since landed on garmr/radiance-daemon-refactor via the main-merge, making the cherry-pick commits this PR originally carried redundant — those were dropped during rebase).

Problem

#8688's final shape wraps connect() in:

coroutineScope {
    val deferred = async(Dispatchers.IO) { connect() }
    try {
        withTimeout(VPN_START_TIMEOUT_MS) { deferred.await() }
    } catch (e: TimeoutCancellationException) {
        deferred.cancel()
        throw e
    }
}

Copilot correctly flagged that this still hangs in the exact scenario the timeout is meant to guard against. coroutineScope { } is a structured scope: when the block throws, it cancels its children and then waits for them to finish. deferred.cancel() only signals cancellation cooperatively, but connect() is a blocking JNI call with no suspension points — it ignores the signal. So coroutineScope blocks on the still-running child, the caller stays blocked, and the UI freezes just like before (Freshdesk #173507).

Fix

Switch to a detached CoroutineScope(SupervisorJob() + Dispatchers.IO). Its Job is not a child of the enclosing coroutine, so cancel() signals but doesn't join. The orphan coroutine keeps running the JNI call in the background until Go returns (or the process exits), but the caller is unblocked and runCatching.onFailure fires the ${errorCode}_timeout state for the UI.

Test plan

Normal VPN start: timeout never fires, connect completes, status goes to Connected
Simulated deadlock (e.g. time.Sleep(90 * time.Second) inside the Go /service/start handler): UI unfreezes after 60s with a *_timeout error instead of staying stuck
Logs show VPN operation (...) timed out after 60000ms when the timeout fires

Pull request overview

Cherry-picks Android service changes to add a wall-clock timeout around Go/JNI VPN start/connect calls, aiming to prevent UI freezes when Mobile.startVPN() / Mobile.connectToServer() deadlock.

Changes:

Introduces a 60s VPN_START_TIMEOUT_MS ceiling for VPN start/connect operations.
Wraps the blocking Go/JNI call in async + withTimeout { await() } and propagates timeout failures through the existing error path.
Adds timeout-specific logging and errorCode suffixing (*_timeout) for clearer operator visibility.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot flagged on #8689 that the existing coroutineScope { ... } still hangs in exactly the scenario this change is meant to protect against. Structured coroutineScope cancels its children on exception but then waits for them to complete — so when withTimeout fires, we cancel the deferred (which the JNI call ignores, since it has no suspension points) and then block on it finishing anyway. Net effect: the UI is still frozen, which is the symptom we're trying to prevent. Switch to a DETACHED CoroutineScope(SupervisorJob() + Dispatchers.IO). Its Job is not a child of the enclosing coroutine, so cancelling it doesn't join — the orphan coroutine keeps running the JNI call in the background until Go returns or the process exits, but the caller is unblocked and the runCatching.onFailure path fires the timeout error state for the UI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot correctly pointed out on #8689 that the detached-scope approach can accumulate orphan coroutines if the user retries while a previous connect() is still stuck in JNI. Each orphan pins a Dispatchers.IO thread; enough retries against a truly deadlocked Go side could pressure the IO pool. Their suggested fix (Dispatchers.IO.limitedParallelism(1)) would serialize retries behind the orphan, turning the 2nd retry into another 60s hang. A simple single-flight AtomicBoolean gate with fast rejection is the cleaner mitigation: - compareAndSet rejects concurrent attempts with IllegalStateException (surfaces via the existing runCatching.onFailure → error state). - The flag clears in a try/finally inside the async block, which runs when the JNI call eventually returns — cancellation alone can't break it out, but once Go completes the finally runs and a future retry is admitted. - Process death (reboot, force-stop) resets the flag naturally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 22, 2026 17:11

Copilot started reviewing on behalf of myleshorton April 22, 2026 17:11 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread android/app/src/main/kotlin/org/getlantern/lantern/service/LanternVpnService.kt

myleshorton force-pushed the fisk/vpn-start-timeout-refactor branch from 79e3b9f to 4c583e6 Compare April 23, 2026 15:11

myleshorton changed the title ~~android: bound Mobile.startVPN with withTimeout (cherry-pick of #8688 to refactor)~~ android: detach connect() scope so withTimeout actually unblocks the UI Apr 23, 2026

myleshorton requested a review from Copilot April 23, 2026 15:13

Copilot started reviewing on behalf of myleshorton April 23, 2026 15:19 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread android/app/src/main/kotlin/org/getlantern/lantern/service/LanternVpnService.kt

myleshorton merged commit 0540dc9 into garmr/radiance-daemon-refactor Apr 23, 2026

myleshorton deleted the fisk/vpn-start-timeout-refactor branch April 23, 2026 15:45

This was referenced Apr 23, 2026

android: detach connect() scope + single-flight gate (port of #8689) #8695

Closed

android: harden detached connect timeout gate (refactor branch port of #8696) #8698

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

android: detach connect() scope so withTimeout actually unblocks the UI#8689

android: detach connect() scope so withTimeout actually unblocks the UI#8689
myleshorton merged 2 commits intogarmr/radiance-daemon-refactorfrom
fisk/vpn-start-timeout-refactor

myleshorton commented Apr 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

myleshorton commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Test plan

Related

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

myleshorton commented Apr 22, 2026 •

edited

Loading