-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8265448: (zipfs): Reduce read contention in ZipFileSystem #3853
Conversation
If the given Path represents a file, use the overload of read defined in FileChannel that accepts an explicit position and avoid serializing reads. Note: The underlying NIO implementation is not required to implement FileChannel.read(ByteBuffer, long) concurrently; Windows still appears to lock, as it returns true for NativeDispatcher.needsPositionLock.
👋 Welcome back jzaugg! A progress list of the required criteria for merging this PR into |
Webrevs
|
} else { | ||
synchronized (zfch) { | ||
n = zfch.position(pos).read(bb); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LanceAndersen Are you planning to look at this? Do you mind checking the async close case to make sure that the synchronization isn't masking anything?
Also just to point out that pattern matching for instanceof ca be used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I plan to look at this. It would also be good to have a couple of additional reviews as well :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using the positional read on the underlying FileChannel is okay. I'm puzzled by the previous code as I would have expected it to restore the position (make me wonder if there are zipfs tests for this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reading of the existing code is that the only position-influenced method called on the channel (either via ZipFileSystem.ch
or ZipFileSystem$EntryInputStream.zfch
) is read
, and this is only called in the .position(pos).read(...)
idiom. The failure to reset the position doesn't affect correctness. However the synchronzized
is definitely needed to avoid races.
Incidentally, regarding this comment:
private class EntryInputStream extends InputStream {
private final SeekableByteChannel zfch; // local ref to zipfs's "ch". zipfs.ch might
// point to a new channel after sync()
If the file system is writable and updated, the underlying file is deleted and replaced with a temporary file by close()
/ sync()
, but ZipFileSystem.ch
is itself final since d581e4f. I believe the comment is outdated and EntryInputStream
could just access ch via the outer pointer. That change would simplify this patch marginally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the simplifying commit for now, but I'm happy to split that to a separate change if you prefer.
} else { | ||
synchronized(ch) { | ||
return ch.position(pos).read(bb); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay to include the update to EntryInputStream, that part looks fine, as does the directly use of the FileChannel positional read.
I'm still mulling over the case where ch is not a FileChannel as I would expected it to capture the existing position and restore it after the read. I think this is the degenerative case when the zip file is located in a custom file system that doesn't support FileChannel. In that case, positional read has to be implemented on the most basic SeekableByteChannel. It would only be observed when mixing positional read ops with other ops that depend on the current position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are all the references to ch
.
this.ch = Files.newByteChannel(zfpath, READ);
...
this.ch.close();
...
ch.close(); // close the ch just in case no update
...
if (ch instanceof FileChannel fch) {
return fch.read(bb, pos);
} else {
synchronized(ch) {
return ch.position(pos).read(bb);
}
}
...
long ziplen = ch.size();
...
ch.close();
It appears the only position-dependent operation called read(ByteBuffer)
. This is performed together with the pos
call within the synchronized(ch)
lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have confirmed that the non-FileChannel
code path is exercised by existing tests.
test/jdk/jdk/nio/zipfs/ZipFSTester.java includes a test that forms a file system based on a JAR that is itself an entry within another ZipFileSystem
.
Sample stacks:
java.lang.Throwable: readFullyAt. ch.getClass=class jdk.nio.zipfs.ByteArrayChannel
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.readFullyAt(ZipFileSystem.java:1234)
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.readFullyAt(ZipFileSystem.java:1226)
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$EntryInputStream.initDataPos(ZipFileSystem.java:2259)
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$EntryInputStream.read(ZipFileSystem.java:2201)
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$2.fill(ZipFileSystem.java:2151)
at java.base/java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at ZipFSTester.checkEqual(ZipFSTester.java:858)
at ZipFSTester.test1(ZipFSTester.java:259)
java.lang.Throwable: readFullyAt. ch.getClass=class jdk.nio.zipfs.ByteArrayChannel
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.readFullyAt(ZipFileSystem.java:1234)
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$EntryInputStream.read(ZipFileSystem.java:2214)
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$2.fill(ZipFileSystem.java:2151)
at java.base/java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at ZipFSTester.checkEqual(ZipFSTester.java:858)
at ZipFSTester.test1(ZipFSTester.java:259)
This use case is not covered by the ZipFSTester.test2
, a multi-threaded test.
While looking at the test I noticed false warnings in the output: read()/position() failed
. This did not actually fail the test. I investigated this and a) fixed the condition to deal with the edge case of zero-length entries and b) throw an "check failed" exception when the assertion fails.
This appears to have been omitted when this test was added. To avoid false error reports, the condition must deal with the edge case of zero-length entries, for which read will return -1.
Hi Jason, I have made a pass through your proposed changes and they look OK. I am in the process of running our various Mach5 tiers against your patch to see if any unforeseen issues arise Best |
@retronym This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 94 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@AlanBateman, @LanceAndersen) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mach5 jdk-tier1, jdk-tier, jdk-tier3 completed successfully
/integrate |
/sponsor |
@LanceAndersen @retronym Since your change was applied there have been 109 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 0a12605. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
If the given Path represents a file, use the overload of read defined
in FileChannel that accepts an explicit position and avoid serializing
reads.
Note: The underlying NIO implementation is not required to implement
FileChannel.read(ByteBuffer, long) concurrently; Windows still appears
to lock, as it returns true for NativeDispatcher.needsPositionLock.
On MacOS X, the enclosed benchmark improves from:
To:
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3853/head:pull/3853
$ git checkout pull/3853
Update a local copy of the PR:
$ git checkout pull/3853
$ git pull https://git.openjdk.java.net/jdk pull/3853/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 3853
View PR using the GUI difftool:
$ git pr show -t 3853
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3853.diff