Skip to content

rpc: avoid ForkJoinPool compensation in send polling loop#7617

Merged
jkschneider merged 1 commit intomainfrom
rpc-avoid-fjp-compensation
May 9, 2026
Merged

rpc: avoid ForkJoinPool compensation in send polling loop#7617
jkschneider merged 1 commit intomainfrom
rpc-avoid-fjp-compensation

Conversation

@jkschneider
Copy link
Copy Markdown
Member

@jkschneider jkschneider commented May 9, 2026

What's changed?

RewriteRpc.send's polling loop now uses future.getNow(null) + Thread.sleep(1ms) instead of future.get(checkIntervalMs, TimeUnit.MILLISECONDS). Liveness check decoupled from polling cadence and fires every 500ms.

What's your motivation?

future.get(long, TimeUnit) from a ForkJoinPool worker goes through CompletableFuture.Signaller.block, which is a ForkJoinPool.ManagedBlocker. ManagedBlocker is the explicit hook FJP watches for and reacts to by spawning a compensation worker — a fresh thread added to keep parallelism while the original is parked.

That compensation worker can pick up other queued work (recipe load, printer invocations, etc.) and call PythonRewriteRpc.getOrStart() / JavaScriptRewriteRpc.getOrStart() / etc., spawning a fresh OS rpc subprocess into its ThreadLocal. When FJP later terminates the idle compensation worker, its TL is GC'd — but the OS process is independent of the JVM heap and survives. The dispatching worker's RunTask.execute() finally only runs shutdownCurrent() on the dispatching thread's TL, never on the dead compensation worker.

On a moderne-cli mod run against ~448 Python repos, this accumulated 127+ alive python rpc processes per long-running JVM, with each leaked rpc carrying a different past repo's log path in argv. Each compensation-worker spawn = one leaked OS process. The same mechanism applies to JS / C# / Go rpcs.

Thread.sleep parks the thread via LockSupport directly, not through ManagedBlocker, so FJP doesn't compensate. Parallelism temporarily drops by one while a worker is in rpc.send, which is exactly the resource-bounded behavior --parallel is supposed to mean.

Verified locally with a 20-repo Python LST sample running UpgradeToPython314 at --parallel=14:

Checklist

The send polling loop called future.get(checkIntervalMs, TimeUnit.MILLISECONDS),
which from a ForkJoinPool worker goes through CompletableFuture.Signaller.block —
a ForkJoinPool.ManagedBlocker. ManagedBlocker is the explicit hook FJP watches
for to spawn a compensation worker (a fresh thread added to keep parallelism
while the original is parked).

The compensation worker can pick up other queued recipe-scheduler work and call
PythonRewriteRpc.getOrStart() (e.g. from a printer or the lazy
LazyRecipeBundleResolver supplier), spawning a fresh OS python rpc into its
ThreadLocal. When FJP later terminates the idle compensation worker, its TL is
GC'd — but the OS python process is independent of the JVM heap and survives.
RunTask.execute()'s finally only runs shutdownCurrent on the dispatching
worker, never on the dead compensation worker. Each compensation-worker spawn
leaked one OS process; long-running recipe-worker JVMs accumulated 100+ alive
rpcs intra-run.

Replace future.get(timeout) with future.getNow(null) + Thread.sleep(1ms)
polling. Thread.sleep parks via LockSupport directly — not through
ManagedBlocker — so FJP doesn't compensate. Parallelism temporarily drops by
one until the rpc response arrives, which is exactly the resource-bounded
behavior we want from --parallel. Liveness check decoupled to fire every 500ms
to preserve existing failure-detection cadence.

Pairs with #7616 (JVM-exit shutdown hook): #7616 catches survivors at JVM
exit; this prevents the leak from happening intra-run.
@github-project-automation github-project-automation Bot moved this to In Progress in OpenRewrite May 9, 2026
@jkschneider jkschneider merged commit ccba1ca into main May 9, 2026
1 check passed
@jkschneider jkschneider deleted the rpc-avoid-fjp-compensation branch May 9, 2026 19:44
@github-project-automation github-project-automation Bot moved this from In Progress to Done in OpenRewrite May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant