Remove the potential for race conditions when lazy values are calculated from dependency resolution results #12990

adammurdoch · 2020-05-01T01:11:38Z

There is a lot of complexity that arises from the current strategy Gradle uses when a worker thread needs to calculate values from the state of multiple projects. An example of this is when dependency resolution resolves a project dependency. We should rework this strategy to simplify and make it easier to reason about concurrent access to project state.

For some background, see #12811 and #12969

Expected Behavior

Gradle never runs more than one worker thread at a time against the state of a given project. How we might achieve this is discussed below.

This way, code that is not intended to be thread safe, such as all of Gradle's lazy types, model containers and user build logic code, does not need to deal with thread safety issues.

In addition, core Gradle code that needs to cross project boundaries can do so in a simpler way without the author needing to deal with with issues such as dead locks or race conditions or adding workarounds to make things "mostly" work. This can then potentially serve as a capability to expose to build logic via a public API in the future (solving that problem not in the scope of this issue).

Current Behavior

When a worker thread is performing dependency resolution and needs to calculate a value from across a project boundary, it will attempt to lock the target project. If this isn't possible, it will release all locks that it currently holds and wait until it is possible to lock the target project. When this happens other tasks and work from the source project can start running in other worker threads. However, the worker thread still references state from the project and may mutate state on return up the stack based on what it observed prior to dependency resolution. However, now other threads may be mutating that state and altering that observed state. This is in fact what was happening in the issues referenced above.

In addition, TransformationStep.isolateTransformerParameters() uses withLenientState() to access the project state, meaning this work will run in parallel with other work on the same project, leading to similar problems. This was also observed in the second issue referenced above.

I think a more robust and easier to understand strategy would be for a worker thread to never release the locks that it happens to own on a project while it is running non-isolated work. Work that needs to cross project boundaries should be split off as work nodes, one for each project.

Currently, some work isn't known about up front, such as undeclared dependency resolution. This isn't allowed by tasks when instant execution is enabled, but is possible with "vintage" execution and is also possible at configuration time. When undeclared dependency resolution does happen, we could look at having a worker that is blocked waiting to calculate some value from another project calculate values for the project that it does hold locks for that are required by other worker threads. While does not address all of the complexity issues, as a thread may still reenter a given lazy type instance in order to calculate some value, such as the location for an outgoing Jar file, it does still remove a whole pile of complexity and help with diagnostics, as all of the work would be visible on the thread's stack trace when a failure happens.

The text was updated successfully, but these errors were encountered:

adammurdoch added a:bug from:contributor labels May 1, 2020

jjohannes removed the contributor label Apr 1, 2021

mlopatkin added the to-triage label Jul 13, 2021

ljacomet added in:dependency-resolution engine metadata in:workers and removed to-triage labels Sep 22, 2021

mglazer mentioned this issue Jun 6, 2022

Synchronize Directory and Map property to avoid multithreading issues palantir/gradle-jdks#51

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the potential for race conditions when lazy values are calculated from dependency resolution results #12990

Remove the potential for race conditions when lazy values are calculated from dependency resolution results #12990

adammurdoch commented May 1, 2020 •

edited

Remove the potential for race conditions when lazy values are calculated from dependency resolution results #12990

Remove the potential for race conditions when lazy values are calculated from dependency resolution results #12990

Comments

adammurdoch commented May 1, 2020 • edited

Expected Behavior

Current Behavior

adammurdoch commented May 1, 2020 •

edited