-
Notifications
You must be signed in to change notification settings - Fork 6.1k
JDK-8273038: ChannelInputStream.transferTo() uses FileChannel.transferTo(FileChannel) #5209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDK-8273038: ChannelInputStream.transferTo() uses FileChannel.transferTo(FileChannel) #5209
Conversation
|
👋 Welcome back mkarg! A progress list of the required criteria for merging this PR into |
…FileChannel) As suggested in 8265891 this change implements a performance optimization for ChannelInputStream.transferTo(): It uses FileChannel.transferTo() in case both wrapped channels actually are FileChannels, as in that case, the actual work of transferring bytes can potentially get offloaded to the operating system / file system. Signed-off-by: Markus Karg <markus@headcrashing.eu>
04e0aac to
bed6daf
Compare
|
@AlanBateman @bplb This PR is a spin-off for the single case of FileChannel->FileChannel only, as suggested by you. I kindly request your review. Thanks. |
Webrevs
|
| } | ||
|
|
||
| public static <T extends Throwable> void assertThrows(Thrower thrower, Class<T> throwable, String message) { | ||
| Throwable thrown; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwable thrown = null; try { thrower.run();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that an OpenJDK best practice or why do you ask for this change? The code actually is a copy of existing OpenJDK code and not originally authored by me.
|
There are 3 PRs open, I'll ignore #4263 and #5179 for now. The moving of the OutputStream implementation from Channels to sun.nio.ch.ChannelOutputStream looks fine. In the class description it should be "accessible" as it visible. You can drop "but not be part of the java.base module API" too. Changing the bounds check to use Objects.checkFromIndexSize looks fine. It would be good if you could re-format 1-arg transferTo to something like this: just to to avoid the really long line as that gets annoying when looking at side-by-side diffs. The 2-arg transferTo methods look okay. An alternative, that avoids the need for "srcPos + bytesWritten" in 3 places is to just something like: @bplb You looked at the test in the first PR, have you look at this one? I suspect it would be a lot cleaner/maintainable if it were written as a TestNG test. I assume the transfer loop never has more than one iteration because the file sizes are <2GB. If it were writing this test then I would probably use Files/Arrays.mismatch to compare the contents. |
Signed-off-by: Markus Karg <markus@headcrashing.eu>
Signed-off-by: Markus Karg <markus@headcrashing.eu>
|
Yes please ignore them for now. The other two PRs are suspendend as they cover bigger pictures and will be resumed by me later.
Done in f35cd7d.
Never had this problem as I'm using inline-diff tools only, but done in 33bef3d.
Does that mean that I must change it? As the method is mostly I/O bound I do not see what we actually would gain. Anyways, done in 205b9b2.
Understood, but please note that this code actually was simply copied from the original tests that Brian provided as a blue-print, which neither uses TestNG nor Files/Arrays.mismatch nor tests larger files. So if you ask me to change that, in turn someone should change the original tests then to keep the code in sync. So can we keep the tests as-is or am I forced to use TestNG and File/Arrays.mismatch? |
Signed-off-by: Markus Karg <markus@headcrashing.eu>
|
@AlanBateman It does not look like this test is substantively different than the one in the first PR aside from constraining the cases tested. One thing I think is insufficient is only dealing with streams at position zero. In the test I recently added, java/io/FileInputStream/TransferTo.java, I included random initial positions for both input and output. For random positions it is better to use |
Okay. The implementation changes in the current patch (205b9b2) look good. I think we will need to iterate on the test. |
Ok so I will do as Brian said and include a test case with random initial positions. Note that my test will do that with the default case, too, so maybe someone should do that for the original test of the default implementation, too, to stay in sync. |
|
Why not just use java/io/FileInputStream/TransferTo.java as a starting point? |
Because I already invested considerable time into my test already and in fact do like my test more due to the flexibility it provides for my future PRs that will follow soon (adding more transfer cases), and very much like it as it is, so if only a small change is actually needed here then I will add that change and we're done with this PR and I will go on to re-enable other test cases. Please don't get me wrong, but while I highly appreciate everybody's comments on actual risks or problems that I have to resolve before my PR can be merged, I really like to concentrate on actual risks and problems of my PR instead of proposing to throw it away effectively. Thanks. |
Signed-off-by: Markus Karg <markus@headcrashing.eu>
e9163bf to
a043ae2
Compare
|
@bplb @AlanBateman As discussed before, I have changed the test in this PR to cover the following cases:
I deliberately did not test the following:
Wherever it made sense (without sacrifying my intended modularity needed for other channel types tested later) I tried to follow the solutions found in https://github.com/openjdk/jdk/blob/master/test/jdk/java/io/FileInputStream/TransferTo.java. Thanks Bran, for this blue print. In fact I wonder if the introduction of 10 loops over random sizes actually will improve test qualify. My assumption is that it rather reduces test stability and test repeatability: In case there actually is a bug that happens only with a specific combination of buffer size and initial positions, ten iterations will be much too few to stably reproduce the bug at each execution of the test. More often that not, that ten loops will all pass without running into the problem given the huge max size. This will driver testers nuts! The improved entropy by introducing the JDK randomizer does not help much to improve this. Having said that, I copied the solution from Brian only because he came up with it, and I can live with the statistical unpredictability it induces, but just wanted to clearly say that I (spoiler: working for a company providing QA tools like FMEA and SPC software) would say introducing random here induces more harm due to non-repeatability than it actualy improves the detection rate of failures not detected by thorough six-eye review. |
I'll try to get time to look at what you have in the next few days but just to say that the suggestion for exercising files > 2GB is because it's beyond the limit of the underlying sendfile. You should be able to check this quickly by printing the iteration count in ChannelInputStream.transferTo, it is always 1? Update with comments on commit 98dc61e: The src/ implementation changes are fine, I think we agreed those previously. It would be cleaner if this test had been created as TestNG test. That would allow us to get rid of Thrower and the assertThrowsXXX infrastructure. It would allow us to use data providers too. Maybe it can be converted and cleaned up in a future PR. I think I mentioned this previously but the test can go into test/jdk/java/nio/channels/Channels so that it's in the same directory as the other tests for the Channels API/implementation. ifOutIsNullThenNpeIsThrown is a strange method name, can we rename it to something testNullPointerException or something more obvious? "contents" is also a bit strange for the method name. This where the real test is so let's rename that to something more obvious too. There's a typo in one of the comments here it says "starting point", should be "starting position". You've got reasonable coverage for random initial positions in both source and target files, good! One other case that could be tested is where the initial position is beyond EOF. If the initial position of the source file is beyond EOF then 0 bytes will be transferred. If the initial position of the target file is beyond EOF then it will be extended. At some point we should look at adding a test for files > 2Gb as otherwise we aren't testing the loop (as we discussed). A few minor nits: The copyright date says "2014", I assume copied from another test. There are several unused imports to remove. The static modifiers on interfaces aren't needed. There are few really long lines, one is 150+ that wraps and will be annoying for side-by-side diffs when changed in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to get time to look at what you have in the next few days but just to say that the suggestion for exercising files > 2GB is because it's beyond the limit of the underlying sendfile. You should be able to check this quickly by printing the iteration count in ChannelInputStream.transferTo, it is always 1?
I understand your concerns and can dispel them. I added a counter that prints the number of iterations and yes, just as you expected, due to the Long.MAX_VALUE there is only a single iteration. But then I simply reduced Long.MAX_VALUE to 1024 and the counter reported thousands of iterations -- just as expected. Even better, the test confirmed that still the number of reported bytes is still correct just as the file contents are.
Hence in case sendfile limits the number of transferred bytes in production use, this quick check has proven that the code is correct and everything actually works fine.
| } | ||
|
|
||
| public static <T extends Throwable> void assertThrows(Thrower thrower, Class<T> throwable, String message) { | ||
| Throwable thrown; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that an OpenJDK best practice or why do you ask for this change? The code actually is a copy of existing OpenJDK code and not originally authored by me.
Signed-off-by: Markus Karg <markus@headcrashing.eu>
Signed-off-by: Markus Karg <markus@headcrashing.eu>
Signed-off-by: Markus Karg <markus@headcrashing.eu>
Signed-off-by: Markus Karg <markus@headcrashing.eu>
Signed-off-by: Markus Karg <markus@headcrashing.eu>
Signed-off-by: Markus Karg <markus@headcrashing.eu>
… test/jdk/java/nio/channels/Channels/ Signed-off-by: Markus Karg <markus@headcrashing.eu>
Signed-off-by: Markus Karg <markus@headcrashing.eu>
Agreed.
I confirm that I will migrate to TestNG in a future PR.
Done in c886fbc.
Done in 90a176f.
Done in 6b4c34c.
Done in cb2a863.
Done in 852032d.
Agreed that at some point (i. e. in some future PR) someone could provide such a test. As explained in #5209 (comment) there is no 100% path coverage rule in OpenJDK, so while this might be a good idea, it seems it is definitively not a mandatory constraint to test that loop at all.
It actually reads "2014, 2021" which means that the original code was first published in 2014 and was last modified in 2021. Oracle once asked me to always do it exactly that way when touching any Oracle source code (which I did here, as the test originated from existing Oracle source code). If OpenJDK is an exceptional case here, then please confirm that I am officially allowed to remove the year 2014 from that list. Until then I have to keep the line exactly the way it is to not get into any troubles with Oracle's legal department.
Done in c3dbc33.
Done in f3a6551.
Done in 412fab1. The longest code line now is 115 characters which should be fine for side-by-side views as my screen shows more than 250 characters per line. I hope that with my recent changes this PR now is in a shape which allows it to get merged. As we agreed on multiple small PRs instead of a single big one, our target should be to contain only the essential changes per each PR but add more PRs ontop for non-essential improvements (which I will be happy to provide). |
The override of InputStream::transferTo is calling FileChannel::transferTo in a loop so I don't think it is unreasonable to ask if we have test that has more than one iteration. I'm okay if we defer this to a further step but I do think we should have test for this case.
This is a new test, the date should be 2021. |
Signed-off-by: Markus Karg <markus@headcrashing.eu>
I will look into testing the 2 GB+ iterations after this PR is merged and I'm done with my TestNG PR.
Done with 1804aa0. So this PR is fine for merging now? :-) |
Nearly there, it needs to be "2021," to avoid falling foul of scripts that check headers. |
Signed-off-by: Markus Karg <markus@headcrashing.eu>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested the patch against the current main line (the PR is against a repo that is 4 or 5 months behind the main line). All okay. The expectation is that the test will be improved and iterated on in subsequent PRs.
|
|
@mkarg This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 1268 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@AlanBateman) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
|
/integrate |
|
/sponsor |
|
Going to push as commit 1855574.
Your commit was automatically rebased without conflicts. |
|
@AlanBateman @mkarg Pushed as commit 1855574. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
As proposed in JDK-8265891, this PR provides an implementation for
Channels.newInputStream().transferTo()which provide superior performance compared to the current implementation. This PR is a spin-off from #4263 and #5179 and deliberately concentrates only on the case where both, source and target, are actuallyFileChannels. Other types of channels will be discussed in future PRs.Using JMH I have benchmarked both, the original implementation and this implementation, and (depending on the used hardware and use case) performance change was approx. doubled performance. A rather similar approach in different use casse was recently implemented by #5097 which was reported by even provide 5x performance (#5097 (comment)).
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5209/head:pull/5209$ git checkout pull/5209Update a local copy of the PR:
$ git checkout pull/5209$ git pull https://git.openjdk.java.net/jdk pull/5209/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 5209View PR using the GUI difftool:
$ git pr show -t 5209Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5209.diff