Task stuck in 100% CPU loop within FinalizerGroup/DefaultExecutionPlan.finalizePlan on Gradle 7.6+ #25361

chadlwilson · 2023-06-12T13:42:19Z

Expected Behavior

./gradlew server:fastUnitTest of https://github.com/gocd/gocd should execute fine for Gradle 7.6+, as it does on Gradle 7.5.1.

This task is defined here, and I cannot identify anything special about it to guess at the root cause or find a workaround.

Current Behavior

On Gradle 7.6, 7.6.1, 8.1.1 and possibly other versions running this task just gets stuck CONFIGURING and never terminates.

Debug logs end with

2023-06-12T21:23:16.180+0800 [DEBUG] [org.gradle.internal.operations.DefaultBuildOperationRunner] Completing Build operation 'Calculate task graph'
2023-06-12T21:23:16.180+0800 [DEBUG] [org.gradle.internal.operations.DefaultBuildOperationRunner] Build operation 'Calculate task graph' completed
2023-06-12T21:23:24.474+0800 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2023-06-12T21:23:24.476+0800 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2023-06-12T21:23:24.476+0800 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
2023-06-12T21:23:24.477+0800 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2023-06-12T21:23:24.477+0800 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2023-06-12T21:23:24.477+0800 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
2023-06-12T21:23:34.473+0800 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2023-06-12T21:23:34.474+0800 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2023-06-12T21:23:34.474+0800 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
2023-06-12T21:23:34.475+0800 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2023-06-12T21:23:34.475+0800 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2023-06-12T21:23:34.475+0800 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
.... and on ever 10 seconds

Profiler samples always look something like

Context (optional)

The intent here is to just run a custom test task against an alternate java source set defined in a custom configuration earlier. I have not found a workaround for this.

Steps to Reproduce

Requires JDK 17 on path or locatable via Gradle toolchain

git clone https://github.com/gocd/gocd
git checkout 23d5610cee3682e8e6e3375b58a6f240416954ec
./gradlew server:fastUnitTest --tests GoConfigDaoBasicTest
# See that it is stuck

Downgrade and see that it gets past configuring

./gradlew wrapper --gradle-version=7.5.1 --gradle-distribution-sha256-sum=db9c8211ed63f61f60292c69e80d89196f9eb36665e369e7f00ac4cc841c2219 --distribution-type=ALL
./gradlew server:fastUnitTest --tests GoConfigDaoBasicTest
# See that configuring completes and thinks start compiling/executing

Gradle version

8.1.1

Gradle version that used to work

7.5.1

Build scan URL (optional)

No response

Your Environment (optional)

Apple MacBook M1 Pro
MacOS Ventura 13.4
Java 17.0.7

The text was updated successfully, but these errors were encountered:

chadlwilson · 2023-06-12T13:51:32Z

I note various work in this broad area, e.g in #21116 #21997 #21542 #21975

It has the broad same observable behaviour (livelock) as #20455 but understand that ticket is a generic issue and not really resolvable.

Can't find a workaround to gradle/gradle#25361 right now.

eskatos · 2023-06-14T12:22:06Z

Thank you for your interest in Gradle!

This issue needs a decision from the team responsible for that area. They have been informed. Response time may vary.

ghale · 2023-06-15T16:54:25Z

I've taken a look at this and, in fact, this it not a hang, but just some really inefficient code. It will actually finish calculating the task graph after about an hour and a half (depending on the horsepower of your machine). There are about 1400 tasks that get executed with fastUnitTest and it appears that we have some logic that runs in at least O(n^2) time (maybe even O(n^3) time, I'm not sure) relative to the number of tasks.

In any event, we need to optimize this. In the meantime, the following workaround will get you around the problem (i.e. the issue is related to the fact that the test tasks are both dependencies of and finalized by the report task):

@@ -679,7 +679,7 @@ task allTestsJunitHtmlReport(type: TestReport) { TestReport thisTask ->
   description = "Run all api tests"

   forAllTask { Test eachTestTask ->
-    testResults.from(eachTestTask)
+    testResults.from { eachTestTask.binaryResultsDirectory.get().asFile }
     eachTestTask.finalizedBy(thisTask)
   }
   destinationDirectory = file("${project.buildDir}/reports/tests/allTests")

ghale · 2023-06-15T20:19:44Z

Actually, a simpler way to remove the task dependency is to use:

testResults.from(eachTestTask.binaryResultsDirectory.locationOnly)

chadlwilson · 2023-06-16T04:52:01Z

Thank you very much! Will give that a go. (edit: seems to work fine, thank you!)

Do you think this might bes something to do with the changes for finalizedBy and mustRunAfter in Gradle 7.6 or so? #10549 (we have some mustRunAfters I don't quite understand)

I don't quite understand why the same behavior isn't observed when running ./gradlew allTests, or other individual test tasks, i.e what is special about trying to run this one single task that triggers the O(n^2+) processing.

Perhaps I could consider removing allTestsJunitHtmlReport as relevant when running an individual test task; i.e make it only apply when running allTests. I think it was changed to finalize all tasks always for some reason. Perhaps the issue is triggered by the weird afterEvaluate stuff?

I do note however that we have a

whole lot of warnings about needing to use tasks.register and tasks.withType(..).configureEach to defer configuration which may relate to why this is painful.
some hacks in there still lingering around that seemed to be due to long-removed use of gradle-testsets-plugin including the forAllTasks afterEvaluate
we also have other problems with some of the custom stuff being done here, like filtering test tasks with --tests not working properly and running tests for dependent tasks (!!), so probably I need to review/simplify everything in this "test task" area.

Workaround dependencies created by the multi-module test reports creating issues with Gradle 7.6+ task configuration/finalization based on suggestion at gradle/gradle#25361 (comment) Seems to cause no ill side effects. To review these dependencies later.

ghale · 2023-06-20T20:22:18Z

There was definitely a change in 7.6 that attempted to more properly handle finalizer tasks and introduces the problem you've found. In the task graph, mustRunAfter relationships and dependsOn relationships are extremely similar, so it's likely the same problem as discussed in the issue you linked.

The reason it's not an issue with allTests is because allTests pulls all of the finalized Test tasks into the graph. The problematic code comes into play when we are looking at the finalizer task and searching for un-scheduled dependencies that might also need to be pulled into the graph. Since allTests causes all of the finalizer's dependencies to be scheduled, we never actually descend into the problematic logic - it's only when you request some subset of those tasks that we trigger the problem. By removing the task dependency between the finalizer and the test tasks, we also avoid it as we just rely on the finalization relationship to ensure that the test tasks run first. It does mean that running allTestsJunitHtmlReport by itself will not cause any of the test tasks to execute, but I don't get the impression that's something that you would need.

One option to think about with respect to test report aggregation might be to use the test report aggregation plugin introduced in Gradle 7.4. It's a little bit different than your current configuration in that it uses test suites and aggregates by test type (which can be any arbitrary string). You'd have to convert your tests to use test suites, but your use case is pretty much what test suites were designed to address. It would likely simplify the configuration a good bit, I think, but not without some effort during conversion.

When a finalizer finalizes multiple tasks, and also has dependencies, performance for calculating the task graph can be exceedingly poor for large task graphs. This addresses the problem in two ways: - Only schedule the members of a finalizer group once. - Instead of iterating over every member and deciding if it is a dependency of a finalized node, iterate over the finalized nodes and calculate its dependencies that are also members. This dramatically reduces the time to calculate the task graph in large builds that exhibit this behavior. Fixes #25361

…set of finalized tasks When a finalizer finalizes multiple tasks, and also has dependencies, performance for calculating the task graph can be exceedingly poor for large task graphs when a subset of the finalized tasks are executed. For instance, if a task finalizes all of the test tasks in a build, and also sets them as dependencies, and then only one test task is requested on the command line, it will trigger this behavior (oddly, requesting all test tasks avoids the problem). This addresses the problem in two ways: - Only schedule the members of a finalizer group once. - Instead of iterating over every member and deciding if it is a dependency of a finalized node, iterate over the finalized nodes and calculate its dependencies that are also members. This dramatically reduces the time to calculate the task graph in large builds that exhibit this behavior. The performance is still not great, but it's at least manageable even for very large builds. We can improve this more over time. Fixes #25361 Co-authored-by: Gary Hale <gary@gradle.com>

When a finalizer finalizes multiple tasks, and also has dependencies, performance for calculating the task graph can be exceedingly poor for large task graphs. This addresses the problem in two ways: - Only schedule the members of a finalizer group once. - Instead of iterating over every member and deciding if it is a dependency of a finalized node, iterate over the finalized nodes and calculate its dependencies that are also members. This dramatically reduces the time to calculate the task graph in large builds that exhibit this behavior. Fixes #25361

chadlwilson added a:regression This used to work to-triage labels Jun 12, 2023

chadlwilson added a commit to chadlwilson/gocd that referenced this issue Jun 12, 2023

Downgrade Gradle to 7.5.1

8f61e48

Can't find a workaround to gradle/gradle#25361 right now.

chadlwilson mentioned this issue Jun 12, 2023

Downgrade Gradle to 7.5.1 due to server:fastUnitTest problems gocd/gocd#11675

Merged

eskatos added in:scheduler execution plan, task graph, work lease, project lock 👋 team-triage Issues that need to be triaged by a specific team has:reproducer Indicates the issue has a confirmed reproducer and removed to-triage labels Jun 14, 2023

ghale mentioned this issue Jun 21, 2023

Resolve poor performance when running a subset of finalized tasks #25468

Merged

bot-gradle closed this as completed in f7bc130 Jul 1, 2023

bot-gradle added this to the 8.3 RC1 milestone Jul 1, 2023

github-actions bot added the to-triage label Jul 1, 2023

ov7a removed to-triage 👋 team-triage Issues that need to be triaged by a specific team labels Jul 3, 2023

chadlwilson mentioned this issue Jul 4, 2023

“Waiting to acquire shared lock on daemon addresses registry” #20455

Closed

ghale mentioned this issue Jul 14, 2023

Backport finalized task performance fix to 7.6.x #25781

Closed

github-actions bot mentioned this issue Aug 21, 2023

Update gradle to 8.3 manami-project/manami#328

Merged

github-actions bot mentioned this issue Sep 1, 2023

Update gradle to 8.3 red6/dmn-check#233

Closed

sebhoss mentioned this issue Sep 3, 2023

Update gradle-wrapper to latest version metio/yosql#262

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task stuck in 100% CPU loop within FinalizerGroup/DefaultExecutionPlan.finalizePlan on Gradle 7.6+ #25361

Task stuck in 100% CPU loop within FinalizerGroup/DefaultExecutionPlan.finalizePlan on Gradle 7.6+ #25361

chadlwilson commented Jun 12, 2023 •

edited

chadlwilson commented Jun 12, 2023

eskatos commented Jun 14, 2023

ghale commented Jun 15, 2023

ghale commented Jun 15, 2023

chadlwilson commented Jun 16, 2023 •

edited

ghale commented Jun 20, 2023

Task stuck in 100% CPU loop within FinalizerGroup/DefaultExecutionPlan.finalizePlan on Gradle 7.6+ #25361

Task stuck in 100% CPU loop within FinalizerGroup/DefaultExecutionPlan.finalizePlan on Gradle 7.6+ #25361

Comments

chadlwilson commented Jun 12, 2023 • edited

Expected Behavior

Current Behavior

Context (optional)

Steps to Reproduce

Gradle version

Gradle version that used to work

Build scan URL (optional)

Your Environment (optional)

chadlwilson commented Jun 12, 2023

eskatos commented Jun 14, 2023

ghale commented Jun 15, 2023

ghale commented Jun 15, 2023

chadlwilson commented Jun 16, 2023 • edited

ghale commented Jun 20, 2023

chadlwilson commented Jun 12, 2023 •

edited

chadlwilson commented Jun 16, 2023 •

edited