-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix concurrency issue with ZipFileTree class #22809
Conversation
When multiple threads are using the getFile method, it is possible for thread 2 to begin reading the file before thread 1 has finished expanding it from the zip. This adds test to reproduce to issue reliably (I think), and fixes the issue by synchronizing the expansion of zip files using this class's inner DetailsImpl.
This adds a new AbstractArchiveFileTreeElement type as a common base class for TarFileTree and ZipFileTree, allowing both of these types to share caching logic. Also combines and removes some duplication from the tests for these types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at everything, but I realize now I may have sent you down a troublesome path.
Using the PersistentCache
introduces a sync point for all extracted archives. We can only extract one file at a time. I don't know if we're introducing a perf overhead by locking the cache, extracting a single file and unlocking. I made the suggestion to move this locking outside of the visitor, but I think this might just move the problem around so that we're now locking while extracting the entire archive vs a single file.
Since we are using a single cache for everything, the subdirectories under the cache are important. This PR re-uses the same convention as the existing code (file name + hash of file), but we could make this just the hash of the file so we don't need to extract the same contents if only the file name is different. We could also make this cache cross-version so we don't need to re-extract an archive if the Gradle version changes.
With the cache shared across builds and projects vs being inside buildDir, this could lead to a few weird things due to the shared expanded directory.
Imagine you had two projects that did something like:
// some task in project A
tree = zipTree("/path/to/the.zip")
tree.visit { file ->
if (file.name == "foo") {
file.text = "foo"
}
}
// some task in project B
tree = zipTree("/path/to/the.zip")
tree.visit { file ->
if (file.name == "foo") {
file.text = "not foo"
}
}
With existing behavior, this produces two separate expand directories where foo
has the contents "foo" in project A and "not foo" in project B.
With this PR, the contents of foo
would depend on which task ran first, I think.
There might be other things that you can do with the file visitor that might subtly break things. e.g., you can call stopVisiting()
in a file visitor and the contents of the expanded directory will be incomplete. Is there some way that someone would get the root directory of the expanded dir and expect the incomplete extraction?
.../core/src/main/java/org/gradle/api/internal/file/archive/AbstractArchiveFileTreeElement.java
Outdated
Show resolved
Hide resolved
...rojects/core/src/main/java/org/gradle/api/internal/file/archive/AbstractArchiveFileTree.java
Outdated
Show resolved
Hide resolved
...rojects/core/src/main/java/org/gradle/api/internal/file/archive/AbstractArchiveFileTree.java
Outdated
Show resolved
Hide resolved
...rojects/core/src/main/java/org/gradle/api/internal/file/archive/AbstractArchiveFileTree.java
Outdated
Show resolved
Hide resolved
Do we need to be able to extract multiple files simultaneously? This seems like it typically wouldn't offer any performance gains, as it would be limited by the file system.
OK
OK
I'll change this to use a per-project cache under the buildDir then. |
This will lock once per archive being visited, instead of reobtaining the lock for each entry in the archive. This change also moves additional common functionality between the DetailsImpl element classes of the Zip and Tar file trees into the abstract base class.
…ing one within them This uses a BuildScopedCache to begin. It also adds test utilities for creating a scoped cache using the TestInMemoryCacheFactory to back it.
…eOperation to grab zip/tar trees that work with project-level caches
…file with different file names will use the same cache entry
…ses and check that modifications of cached decompressed files are atomic
@bot-gradle test this |
OK, I've already triggered the following builds for you: |
@bot-gradle test and merge |
Your PR is queued. See the queue page for details. |
Your PR is queued. See the queue page for details. |
OK, I've already triggered a build for you. |
When multiple threads are using the getFile method, it is possible for thread 2 to begin reading the file before thread 1 has finished expanding it from the zip. This adds test to reproduce to issue reliably (I think), and fixes the issue by using a cache to mangae the expansion of zip files using this class's inner DetailsImpl. This should allow for correct operation, even if multiple Gradle processes are working with the same archive file; or if multiple tasks in different projects work with the file simultaneously. This fixes both `ZipFileTree` and `TarFileTree`. <!--- The issue this PR addresses --> Fixes #22685 Co-authored-by: Sterling Greene <sterling@gradle.com> (cherry picked from commit 48e4c43)
…etAnalyser This change fixes the performance regressions from switching the `ZipFileTree` to use a locking cache in #22809. Performance test results are [here](https://builds.gradle.org/repository/download/Gradle_Release_Util_Performance_AdHocPerformanceScenarioLinuxAmd64/59685577:id/results/performance/build/test-results-largeMonolithicJavaProjectPerformanceAdHocTest.zip!/performance-tests/report/index.html). Co-authored-by: Thomas Tresansky <ttresansky@gradle.com>
When multiple threads are using the getFile method, it is possible for thread 2 to begin reading the file before thread 1 has finished expanding it from the zip.
This adds test to reproduce to issue reliably (I think), and fixes the issue by using a cache to mangae the expansion of zip files using this class's inner DetailsImpl.
This should allow for correct operation, even if multiple Gradle processes are working with the same archive file; or if multiple tasks in different projects work with the file simultaneously.
This fixes both
ZipFileTree
andTarFileTree
.Fixes #22685