Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BuildDashboard causes "Unable to make progress running work" error #25951

Closed
3ll3d00d opened this issue Jul 29, 2023 · 12 comments · Fixed by #26172
Closed

BuildDashboard causes "Unable to make progress running work" error #25951

3ll3d00d opened this issue Jul 29, 2023 · 12 comments · Fixed by #26172
Labels
a:bug affects-version:8.2 has:reproducer Indicates the issue has a confirmed reproducer in:scheduler execution plan, task graph, work lease, project lock
Milestone

Comments

@3ll3d00d
Copy link
Contributor

3ll3d00d commented Jul 29, 2023

Expected Behavior

Using Gradle 8.2.1, the build should execute to completion whether the buildDashboard plugin is applied or not and whether some test tasks are enabled or not.

Current Behavior

When the following conditions are in place

  • the buildDashboard plugin is applied
  • an extra test task is added but will not execute due to the presence of an onlyIf conditional returning false in a subset of child projects
  • the extra test task is explicitly specified in the command line
  • gradle 8.2.1 is used

The build fails with the following error Unable to make progress running work. The following items are queued for execution but none of them can be started:

The build completes normally using the same setup on 7.3.3

Manually removing the buildDashboard task from the finalizers of the disabled test task also enables the build to complete

Removing the extra test task from the command line also enables it to complete

Context (optional)

Builds that have conditional execution of certain tests are unable to complete

NB: the workaround of "not specifying the extra test task on the command line" is impractical in our build environment

Steps to Reproduce

Company policy does not allow uploading such an example however I'm unable to create a simple cutdown reproducer as yet so there is some relatively complex chain of dependencies that produces this issue. The same type of failure has been seen a few times in our build system since 8.2.1 upgrade and each time it shows the build is blocked on something of this form

destroyer locations for task group 1 (state=SHOULD_RUN, dependencies=NOT_COMPLETE, group=task group 1, dependencies=[Resolve mutations for :my-project3:some-other-task (SHOULD_RUN)

i.e. the "Resolve mutations for" operation is the thing that has is blocking execution

The main problem with coming up with a reproducer is that I do not know what causes the above to occur.

In this specific buildDashboard problem, the build is ultimately blocked on

 - Resolve mutations for :buildDashboard (state=SHOULD_RUN, dependencies=NOT_COMPLETE, group=default group, no dependencies )
 - producer locations for task group 0 (state=SHOULD_RUN, dependencies=NOT_COMPLETE, group=task group 0, dependencies=[Resolve mutations for :buildDashboard (SHOULD_RUN), Resolve mutations for :my-project3:test (EXECUTED), Resolve mutations for :my-project3:processTestResources (EXECUTED), Resolve mutations for :my-project3:compileTestJava (EXECUTED), Resolve mutations for :my-project3:sourcesJar (EXECUTED), Resolve mutations for :my-project3:jar (EXECUTED), Resolve mutations for :my-project3:processResources (EXECUTED), Resolve mutations for :my-project3:generateGitProperties (EXECUTED), Resolve mutations for :my-project3:compileJava (EXECUTED), Resolve mutations for :my-project2:test (EXECUTED), Resolve mutations for :my-project2:processTestResources (EXECUTED), Resolve mutations for :my-project2:compileTestJava (EXECUTED), Resolve mutations for :my-project2:sourcesJar (EXECUTED), Resolve mutations for :my-project2:jar (EXECUTED), Resolve mutations for :my-project2:processResources (EXECUTED), Resolve mutations for :my-project2:generateGitProperties (EXECUTED), Resolve mutations for :my-project2:compileJava (EXECUTED), Resolve mutations for :my-project1:test (EXECUTED), Resolve mutations for :my-project1:processTestResources (EXECUTED), Resolve mutations for :my-project1:compileTestJava (EXECUTED), Resolve mutations for :my-project1:testFixturesJar (EXECUTED), Resolve mutations for :my-project1:processTestFixturesResources (EXECUTED), Resolve mutations for :my-project1:compileTestFixturesJava (EXECUTED), Resolve mutations for :my-project1:sourcesJar (EXECUTED), Resolve mutations for :my-project1:jar (EXECUTED), Resolve mutations for :my-project1:processResources (EXECUTED), Resolve mutations for :my-project1:generateGitProperties (EXECUTED), Resolve mutations for :my-project1:compileJava (EXECUTED)], waiting-for=[Resolve mutations for :buildDashboard (SHOULD_RUN)], has-failed-dependency=false )

in this example, we have 3 subprojects, 2 of which have integrationTest tasks that are disabled and hence are not expected to execute.

Gradle version

8.2.1

Build scan URL (optional)

No response

Your Environment (optional)

No response

@3ll3d00d
Copy link
Contributor Author

3ll3d00d commented Jul 29, 2023

Symptoms appear to be similar to those fixed by #21163

Particularly this one

When a node is a dependency of a finalizer via multiple paths and one of those paths includes an entry point task

@ov7a
Copy link
Member

ov7a commented Aug 2, 2023

Sorry that you're having trouble with Gradle!

We appreciate the effort that went into filing this issue, but we must ask for more information.

As stated in our issue template, a minimal reproducible example is a must for us to be able to track down and fix your problem efficiently. Our available resources are severely limited, and we must be sure we are looking at the exact problem you are facing.

If we have a reproducer, we may be able also to suggest workarounds or ways to avoid the problem.

The ideal way to provide a reproducer is to leverage our reproducer template. You can also use Gradle Project Replicator to reproduce the structure of your project.

This issue will be closed after 7 days unless you provide more information.

@ov7a ov7a added in:scheduler execution plan, task graph, work lease, project lock pending:reproducer Indicates that the issue requires a reproducer or will be closed after 7 days and removed to-triage labels Aug 2, 2023
@3ll3d00d
Copy link
Contributor Author

3ll3d00d commented Aug 3, 2023

@ov7a I understand that you need a reproducible example but pls note this comment from the original post

i.e. the "Resolve mutations for" operation is the thing that has is blocking execution
The main problem with coming up with a reproducer is that I do not know what causes the above to occur.

if someone with knowledge of gradle internals can elaborate on what this operation is then I stand a chance of working out what exactly triggers the bug and hence provide a reproducible example

@github-actions github-actions bot removed the pending:reproducer Indicates that the issue requires a reproducer or will be closed after 7 days label Aug 3, 2023
@ov7a ov7a added the to-triage label Aug 3, 2023
@ghale
Copy link
Member

ghale commented Aug 7, 2023

Hey @3ll3d00d, "resolve mutations" nodes are what we use to represent all of the things that a task changes (i.e. the things that it produces and the things that it deletes). We can't know all of these things until we know all of the task's inputs, which means all of the dependencies of that task have to be completed before we can say we "know" all of its mutations. We need to know these things in order to make smart decisions about what tasks to execute next. For instance, we don't want to run a task that deletes some directory if its going to run at the same time as a task that populates that directory, etc. So for each task, there is a node in the task graph that basically represents "I know everything about this task now because all of its dependencies are complete" so even if the task hasn't started executing, we can make sane decisions about what can execute in parallel with it.

You get this message about being unable to make progress because the scheduler has gone through the task graph and found no tasks that are ready to execute and no tasks that are currently executing. In other words, nothing can run and nothing is currently running that will change that state.

Sooo, what appears to be happening above is that something is preventing the outputs of :buildDashboard from being resolved. This likely has something to do with the disabled tasks, but if reproducing it is not as simple as "apply the build dashboard plugin, add some test tasks and disable them" then it's likely some sort of additional relationship that is tripping it up. For instance a mustRunAfter or a finalizedBy relationship or something similar that is hanging up the scheduler. So I would look for other relationships in the build and start adding those to the reproducer until you can cause the issue to occur.

@ov7a ov7a added the pending:reproducer Indicates that the issue requires a reproducer or will be closed after 7 days label Aug 7, 2023
@3ll3d00d
Copy link
Contributor Author

3ll3d00d commented Aug 8, 2023

I've narrowed it down in our actual setup depending on the following conditions in gradle 8.2.1

  • build-dashboard plugin is applied
  • maven-publish is applied
  • project has multiple publications
  • project has multiple test tasks
  • test tasks have some additional tasks that execute both before and after the tests (assorted setup, cleanup)
  • one of the publications has an optional artifact which contains some of the output created by those tasks surrounding the test task
  • that optional artifact is disabled (as a result of some assessment of what the dev has configured in that project)
  • build is executed with publishAllPublicationsToMyRepository task included at cli

build now gets stuck when it didn't get stuck in 7.3.3

@github-actions github-actions bot removed the pending:reproducer Indicates that the issue requires a reproducer or will be closed after 7 days label Aug 8, 2023
@3ll3d00d
Copy link
Contributor Author

3ll3d00d commented Aug 8, 2023

@3ll3d00d
Copy link
Contributor Author

3ll3d00d commented Aug 8, 2023

build scan using 7.3.3 to demonstrate that this issue has been introduced in a later gradle version

https://scans.gradle.com/s/zm4lx3z3enm2q

@ov7a
Copy link
Member

ov7a commented Aug 9, 2023

Thank you for providing a valid report.

The issue is in the backlog of the relevant team, but this area of Gradle is currently not a focus one, so it might take a while before a fix is made.

@ov7a ov7a removed the to-triage label Aug 9, 2023
@3ll3d00d
Copy link
Contributor Author

3ll3d00d commented Aug 9, 2023

I was able to slimmed the reproducer down further (see latest commit in the repo and https://gradle.com/s/syfl2h6gs4d3w)

for this particular reproducer, it seems to require

  • build-dashboard plugin is applied
  • maven-publish is applied
  • project has multiple publications
  • project has multiple test tasks
  • the command line has 3 task groups in a specific order such that buildDashboard is in group 0, the publication is in group 1 and the additional test task is in group 2
  • one of the publications is produced by a task that depends on a Delete task

This setup means there is a test task in group 2 that is finalised by a task in group 0 & the presence of a Delete task in group 1 then triggers the bug even though there is absolutely no relationship (on the file system via inputs/outputs or the task graph) between that Delete task and the tasks in group 0 & 2.

While this is rather a specific set of conditions, I have seen this type of stall in multiple scenarios since upgrading (but been able to workaround previous issues) so I think there it might be a more common problem than this reproducer suggests

@3ll3d00d
Copy link
Contributor Author

3ll3d00d commented Aug 9, 2023

one further bit of info

the stall is avoided in the earlier variant (https://github.com/3ll3d00d/gradle-resolve-mutations-bug/tree/b36cffdfecb48a9242add2593252e630b6a744de) by forcing the optional artifact to be included and built. This feature isn't present in the slimmed down version in the above post, the mechanism by which that works around the problem is unclear (NB: not an actual workaround for my situation as that artifact is indeed optional).

@ov7a ov7a added the has:reproducer Indicates the issue has a confirmed reproducer label Aug 10, 2023
@ghale
Copy link
Member

ghale commented Aug 11, 2023

Ok, that's good enough to understand what's happening. Here's the cycle:

  • :buildDashboard ==> :disabled:integrationTest (SHOULD_RUN), :disabled:test (EXECUTED), :enabled:integrationTest (SHOULD_RUN)

  • :enabled:integrationTest ==> destroyer locations for task group 1 (SHOULD_RUN)

  • :disabled:integrationTest ==> destroyer locations for task group 1 (SHOULD_RUN)

  • destroyer locations for task group 1 ==> Resolve mutations for :enabled:cleanExtra_enabled (SHOULD_RUN), Resolve mutations for :disabled:cleanExtra_disabled (SHOULD_RUN)

  • :enabled:cleanExtra_enabled ==> producer locations for task group 0 (SHOULD_RUN)

  • :disabled:cleanExtra_disabled ==> producer locations for task group 0 (SHOULD_RUN)

  • producer locations for task group 0 ==> Resolve mutations for :buildDashboard (SHOULD_RUN)

In other words, because task group 1 (i.e. publishToMavenLocal) introduces a destroyer task (i.e. cleanExtra via publishToMavenLocal) we end up with some implicit ordering that creates the cycle. The producer task in group 2 (integrationTest) can't resolve its mutations until the destroyers in task group 1 (cleanExtra tasks) resolve theirs, which can't resolve until the producer in task group 0 (buildDashboard via build) resolves its mutations, but it can't because it finalizes the producer in task group 2 which creates the cycle.

So, one workaround for now is to move integrationTest before publishToMavenLocal:

$./gradlew build integrationTest publishToMavenLocal

In this case, there are only producers in task group 0 (build) and task group 1 (integrationTest) which won't have an implicit relationship to each other. (i.e. there are only implicit relationships between producers and destroyers in different groups, not between producers and producers). The destroyers are all safely in task group 3 (publishToMavenLocal) which can sort them out within the group.

Another workaround would be to remove the dependsOn cleanOutput in makeExtraZip. Needing something like this is generally a smell, and it's probably better to understand why this is needed and figure out why normal incremental build capabilities won't work here.

Boiling this down, a simpler reproducer is the following:

def dashboard = tasks.register("dashboard") {
  outputs.file "\${buildDir}/dashboard.txt"
  doLast { file("\${buildDir}/dashboard.txt").text = "foo" }
}

def test = tasks.register("test") {
  outputs.file "\${buildDir}/test.txt"
  doLast { file("\${buildDir}/test.txt").text = "foo" }
  finalizedBy dashboard
}

def cleanZip = tasks.register("cleanZip", Delete) {
  delete "\${buildDir}/zip.txt"
}

def zip = tasks.register("zip") {
  outputs.file "\${buildDir}/zip.txt"
  doLast { file("\${buildDir}/zip.txt").text = "bar" }
  dependsOn cleanZip
}

def integTest = tasks.register("integTest") {
  outputs.file "\${buildDir}/integTest.txt"
  doLast { file("\${buildDir}/integTest.txt").text = "baz" }
  finalizedBy dashboard
}

tasks.register("publish") {
  inputs.files zip
}

Running gradle test publish integTest will reproduce the issue. Either of the previously mentioned workarounds will work, too.

Obviously, there's something not quite right here with some of the recent changes to how finalization works. Fundamentally, the failing case can be summarized as: there is a finalizer task that finalizes producers in task groups that are sandwiched around a group with a destroyer. Thanks for providing a reproducer. We'll fix this.

@3ll3d00d
Copy link
Contributor Author

thanks for summarising and I look forward to the fix.

FWIW the workaround I actually used in practice is to convert the Delete task for a DefaultTask with a doLast which uses project.delete.

Another workaround would be to remove the dependsOn cleanOutput in makeExtraZip. Needing something like this is generally a smell, and it's probably better to understand why this is needed and figure out why normal incremental build capabilities won't work here.

It is exactly this, I even have a TODO next to it describing this :)

gradle doesn't, as far as I am aware, provide a way to tell it to delete no longer relevant output files. This is custom compilation of a deployment artifact so the problem here is of the form

run build
generates some set of files
change something so that 1 of those files is renamed
run build
generates new set of files in same output dir, renamed file is not deleted

I haven't encountered a way for gradle to handle this automatically. My TODO references TaskOutputsInternal.previousOutputFiles as a possibility to investigate but that tends to end in a rabbit hole so I chose to go with a brute force solution.

one workaround for now is to move integrationTest before publishToMavenLocal

it's a workaround for a number of other gradle wrinkles to do with managing complex conditional task chains where there are task dependencies + finalisers and some of those tasks are "optional" (aka conditionally disabled either via onlyIf or explicitly) and where disabling is complicating because disabling operates a task level not a chain of tasks level. I spent quite a bit of time attempting to avoid this task list but the solutions I tried that don't involve the above task list were extremely brittle & hard to reason about.

bot-gradle added a commit that referenced this issue Aug 25, 2023
…alized task

We now update ordinal group for nodes to highest instead to latest. With that we correctly schedule finalizers later if necessary.

Fixes #25951

### Reviewing cheatsheet

Before merging the PR, comments starting with
- ❌ ❓**must** be fixed
- 🤔 💅 **should** be fixed
- 💭 **may** be fixed
- 🎉 celebrate happy things

Co-authored-by: Anže Sodja <asodja@gradle.com>
@bot-gradle bot-gradle added this to the 8.4 RC1 milestone Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:bug affects-version:8.2 has:reproducer Indicates the issue has a confirmed reproducer in:scheduler execution plan, task graph, work lease, project lock
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants