Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.TimeUnit;
import java.util.logging.Level;
import java.util.logging.LogRecord;
Expand Down Expand Up @@ -203,6 +204,8 @@ public class FlowExecutionListTest {
}

@Test public void stepExecutionIteratorDoesNotLeakBuildsWhenCpsVmIsStuck() throws Throwable {
// Make sure ForkJoinPool is not initialized on a thread where its inheritedAccessControlContext would point to a CpsGroovyShell$CleanGroovyClassLoader
ForkJoinPool.commonPool().execute(() -> {});
Comment on lines +207 to +208
Copy link
Member Author

@dwnusbaum dwnusbaum Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit interesting that I keep running into these cases where first-time initialization of various objects happening in the context of a Pipeline build causes problems. In practice it's probably ok, and at worst they would only cause 1-build leaks, but I wonder how many such leaks are actually possible.

Compare jenkinsci/pipeline-groovy-lib-plugin#199 (comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a class loader sanity executor service wrapper in Jenkins core which should probably be used.

Copy link
Member Author

@dwnusbaum dwnusbaum Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but this is the common pool, which seems to be unconfigurable, and the reference path was not via Thread.contextClassLoader as is typical (AppClassLoader in this case), but Thread.inheritedAccessControlContext, which holds a reference to CpsGroovyShell$CleanGroovyClassLoader via a ProtectionDomain.classloader, constructed I think via AccessController.getContext().

So whatever the exact problem is, it will go away once we pick up JEP 486 (Java 24+), which deletes Thread.inheritedAccessControlContext.

FWIW also I think the critical usage path of the common pool is via the default executor in Caffeine caches, specifically two caches in SimpleXStreamFlowNodeStorage and one in EnumeratingWhitelist. We could perhaps set their executors to Computer.threadPoolForRemoting or some manually-constructed thread pool to avoid the issue in the meantime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, we ought to define a specific pool for use from Caffeine if possible. Ultimately using virtual threads (21+), though we might need to wait for monitor support in 24, and some diagnosability issues remain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notable thing in these cases is that we are not obviously using any async-related methods or types with the Caffeine caches. We only use weakKeys, strongValues, and LoadingCache, so the fact that an async task is being submitted to the common pool is not immediately obvious. Perhaps it has to do with the internal maintenance referred to in the Javadoc.

sessions.then(r -> {
var notStuck = r.createProject(WorkflowJob.class, "not-stuck");
notStuck.setDefinition(new CpsFlowDefinition("semaphore 'wait'", true));
Expand All @@ -220,17 +223,20 @@ public class FlowExecutionListTest {
// Let notStuckBuild complete and clean up all references.
SemaphoreStep.success("wait/1", null);
r.waitForCompletion(notStuckBuild);
notStuck.getLazyBuildMixIn().removeRun(notStuckBuild);
notStuckBuild = null; // Clear out the local variable in this thread.
Jenkins.get().getQueue().clearLeftItems(); // Otherwise we'd have to wait 5 minutes for the cache to be cleared.
// Make sure that the reference can be GC'd.
MemoryAssert.assertGC(notStuckBuildRef, true);
MemoryAssert.assertGC(notStuckBuildRef, false);
// Allow stuck #1 to complete so the test can be cleaned up promptly.
SynchronousBlockingStep.unblock("stuck");
r.waitForCompletion(stuckBuild);
});
}

@Test public void stepExecutionIteratorDoesNotLeakBuildsWhenProgramPromiseIsStuck() throws Throwable {
// Make sure ForkJoinPool is not initialized on a thread where its inheritedAccessControlContext would point to a CpsGroovyShell$CleanGroovyClassLoader
ForkJoinPool.commonPool().execute(() -> {});
sessions.then(r -> {
var stuck = r.createProject(WorkflowJob.class, "stuck");
stuck.setDefinition(new CpsFlowDefinition(
Expand All @@ -254,10 +260,11 @@ public class FlowExecutionListTest {
// Let notStuckBuild complete and clean up all references.
SemaphoreStep.success("wait/1", null);
r.waitForCompletion(notStuckBuild);
notStuck.getLazyBuildMixIn().removeRun(notStuckBuild);
notStuckBuild = null; // Clear out the local variable in this thread.
Jenkins.get().getQueue().clearLeftItems(); // Otherwise we'd have to wait 5 minutes for the cache to be cleared.
// Make sure that the reference can be GC'd.
MemoryAssert.assertGC(notStuckBuildRef, true);
MemoryAssert.assertGC(notStuckBuildRef, false);
// Allow stuck #1 to complete so the test can be cleaned up promptly.
r.waitForMessage("Still trying to load StuckPickle for", stuckBuild);
ExtensionList.lookupSingleton(StuckPickle.Factory.class).resolved = new StuckPickle.Marker();
Expand Down