Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the potential for race conditions when lazy values are calculated from dependency resolution results #12990

Open
adammurdoch opened this issue May 1, 2020 · 0 comments

Comments

@adammurdoch
Copy link
Member

adammurdoch commented May 1, 2020

There is a lot of complexity that arises from the current strategy Gradle uses when a worker thread needs to calculate values from the state of multiple projects. An example of this is when dependency resolution resolves a project dependency. We should rework this strategy to simplify and make it easier to reason about concurrent access to project state.

For some background, see #12811 and #12969

Expected Behavior

Gradle never runs more than one worker thread at a time against the state of a given project. How we might achieve this is discussed below.

This way, code that is not intended to be thread safe, such as all of Gradle's lazy types, model containers and user build logic code, does not need to deal with thread safety issues.

In addition, core Gradle code that needs to cross project boundaries can do so in a simpler way without the author needing to deal with with issues such as dead locks or race conditions or adding workarounds to make things "mostly" work. This can then potentially serve as a capability to expose to build logic via a public API in the future (solving that problem not in the scope of this issue).

Current Behavior

When a worker thread is performing dependency resolution and needs to calculate a value from across a project boundary, it will attempt to lock the target project. If this isn't possible, it will release all locks that it currently holds and wait until it is possible to lock the target project. When this happens other tasks and work from the source project can start running in other worker threads. However, the worker thread still references state from the project and may mutate state on return up the stack based on what it observed prior to dependency resolution. However, now other threads may be mutating that state and altering that observed state. This is in fact what was happening in the issues referenced above.

In addition, TransformationStep.isolateTransformerParameters() uses withLenientState() to access the project state, meaning this work will run in parallel with other work on the same project, leading to similar problems. This was also observed in the second issue referenced above.

I think a more robust and easier to understand strategy would be for a worker thread to never release the locks that it happens to own on a project while it is running non-isolated work. Work that needs to cross project boundaries should be split off as work nodes, one for each project.

Currently, some work isn't known about up front, such as undeclared dependency resolution. This isn't allowed by tasks when instant execution is enabled, but is possible with "vintage" execution and is also possible at configuration time. When undeclared dependency resolution does happen, we could look at having a worker that is blocked waiting to calculate some value from another project calculate values for the project that it does hold locks for that are required by other worker threads. While does not address all of the complexity issues, as a thread may still reenter a given lazy type instance in order to calculate some value, such as the location for an outgoing Jar file, it does still remove a whole pile of complexity and help with diagnostics, as all of the work would be visible on the thread's stack trace when a failure happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants