-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8190753: (zipfs): Accessing a large entry (> 2^31 bytes) leads to a negative initial size for ByteArrayOutputStream #4607
Conversation
…egative initial size for ByteArrayOutputStream
👋 Welcome back jpai! A progress list of the required criteria for merging this PR into |
Webrevs
|
This may be just moving the problem because writing to the BAOS will fail when the deflated size is too large to fit in a byte array. The zip provider can use a temporary file so maybe it should use that when appending to existing zip entries that are larger than some threshold. At some point we may need deeper changes here, e.g. start out with a BAOS and spill over to a temporary file when the deflated size reaches some threshold. I didn't study the test too closely but just to mention that tests with zip entries > 2GB can be problematic to test. The test will probably need the @requires tag to limit it to 64-bit systems and maybe some minimum memory size. It may also need testing on a wide range of systems to get some idea of run time. Test machines with spinning rust (HDDs) come to mind. |
Mailing list message from Jaikiran Pai on core-libs-dev: Hello Alan, On 28/06/21 1:00 pm, Alan Bateman wrote:
That's a good point and I had completely overlooked it. There's an I'll take a look at some other existing tests to see what kind of -Jaikiran |
1 similar comment
Mailing list message from Jaikiran Pai on core-libs-dev: Hello Alan, On 28/06/21 1:00 pm, Alan Bateman wrote:
That's a good point and I had completely overlooked it. There's an I'll take a look at some other existing tests to see what kind of -Jaikiran |
Mailing list message from Lance Andersen on core-libs-dev: Hi Jaikiran, This is on my list to look at but did not get to today. Best Can I please get a review for this proposed fix for the issue reported in https://bugs.openjdk.java.net/browse/JDK-8190753? The commit here checks for the size of the zip entry before trying to create a `ByteArrayOutputStream` for that entry's content. A new jtreg test has been included in this commit to reproduce the issue and verify the fix. P.S: It's still a bit arguable whether it's a good idea to create the `ByteArrayOutputStream` with an initial size potentially as large as the `MAX_ARRAY_SIZE` or whether it's better to just use some smaller default value. However, I think that can be addressed separately while dealing with https://bugs.openjdk.java.net/browse/JDK-8011146 ------------- Commit messages: Changes: https://git.openjdk.java.net/jdk/pull/4607/files PR: https://git.openjdk.java.net/jdk/pull/4607 [cid:E1C4E2F0-ECD0-4C9D-ADB4-B16CA7BCB7FC at home] Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 |
1 similar comment
Mailing list message from Lance Andersen on core-libs-dev: Hi Jaikiran, This is on my list to look at but did not get to today. Best Can I please get a review for this proposed fix for the issue reported in https://bugs.openjdk.java.net/browse/JDK-8190753? The commit here checks for the size of the zip entry before trying to create a `ByteArrayOutputStream` for that entry's content. A new jtreg test has been included in this commit to reproduce the issue and verify the fix. P.S: It's still a bit arguable whether it's a good idea to create the `ByteArrayOutputStream` with an initial size potentially as large as the `MAX_ARRAY_SIZE` or whether it's better to just use some smaller default value. However, I think that can be addressed separately while dealing with https://bugs.openjdk.java.net/browse/JDK-8011146 ------------- Commit messages: Changes: https://git.openjdk.java.net/jdk/pull/4607/files PR: https://git.openjdk.java.net/jdk/pull/4607 [cid:E1C4E2F0-ECD0-4C9D-ADB4-B16CA7BCB7FC at home] Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 |
Mailing list message from Jaikiran Pai on core-libs-dev: Hello Lance, Please take your time. -Jaikiran On 29/06/21 4:17 am, Lance Andersen wrote: |
1 similar comment
Mailing list message from Jaikiran Pai on core-libs-dev: Hello Lance, Please take your time. -Jaikiran On 29/06/21 4:17 am, Lance Andersen wrote: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for looking into this issue.
I ran your current test 150 times and the max runtime was 25 seconds, most cases were in the 18-20 second range on our slower test boxes.
As part of looking at what happens with a file whose deflated size is > 2gb, I would add a specific test which is a manual test to validate that there is no issue when we cross the 2gb threshold.
@@ -1948,7 +1950,7 @@ private OutputStream getOutputStream(Entry e) throws IOException { | |||
e.file = getTempPathForEntry(null); | |||
os = Files.newOutputStream(e.file, WRITE); | |||
} else { | |||
os = new ByteArrayOutputStream((e.size > 0)? (int)e.size : 8192); | |||
os = new ByteArrayOutputStream((e.size > 0 && e.size <= MAX_ARRAY_SIZE)? (int)e.size : 8192); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed change will address the specific issue shown in the bug. As Alan points out, there could be an issue if the deflated size is > 2gb. It would be good to look into that as part of your proposed fix.
Mailing list message from Jaikiran Pai on core-libs-dev: Hello Lance, On 29/06/21 11:31 pm, Lance Andersen wrote:
Thank you for running those tests. Do you think those timings are good
I added a (manual) test to see what happens in this case. I have test LargeCompressedEntrySizeTest.testLargeCompressedSizeWithZipFS(): which to me is understandable. Is this what you and Alan wanted unzip -lv JTwork/scratch/8190753-test-compressed-size.zip I understand that Alan's suggestion holds good and we should have some -Jaikiran |
1 similar comment
Mailing list message from Jaikiran Pai on core-libs-dev: Hello Lance, On 29/06/21 11:31 pm, Lance Andersen wrote:
Thank you for running those tests. Do you think those timings are good
I added a (manual) test to see what happens in this case. I have test LargeCompressedEntrySizeTest.testLargeCompressedSizeWithZipFS(): which to me is understandable. Is this what you and Alan wanted unzip -lv JTwork/scratch/8190753-test-compressed-size.zip I understand that Alan's suggestion holds good and we should have some -Jaikiran |
Mailing list message from Lance Andersen on nio-dev: Hi Jaikiran On Jun 30, 2021, at 12:15 PM, Jaikiran Pai <jai.forums2013 at gmail.com<mailto:jai.forums2013 at gmail.com>> wrote: Hello Lance, On 29/06/21 11:31 pm, Lance Andersen wrote: I ran your current test 150 times and the max runtime was 25 seconds, most cases were in the 18-20 second range on our slower test boxes. Thank you for running those tests. Do you think those timings are good enough to let that test stay as a regular automated jtreg test, in tier1? I'm guessing this falls in tier1? I haven't yet looked in detail the tier definitions of the build. These tests run as part of tier2. The time for the test run is reasonable . As part of looking at what happens with a file whose deflated size is > 2gb, I would add a specific test which is a manual test to validate that there is no issue when we cross the 2gb threshold. I added a (manual) test to see what happens in this case. I have committed the test as part of this PR just for the sake of reference. The test is named LargeCompressedEntrySizeTest. The test uses ZipFS to create a (new) zip file and attempts to write out a zip entry whose deflated/compressed size is potentially greater than 2gb. When I run this test case, I consistenly run into the following exception: test LargeCompressedEntrySizeTest.testLargeCompressedSizeWithZipFS(): failure which to me is understandable. Is this what you and Alan wanted tested/checked? In its current form I don't see a way to write out a entry whose deflated size exceeds 2gb, unless the user/caller use the "useTempFile=true" option while creating the zip filesystem. FWIW - if I do set this "useTempFile=true" while creating that zip filesystem, in the LargeCompressedEntrySizeTest, that test passes fine and the underlying zip that is created shows a compressed/deflated size as follows: unzip -lv JTwork/scratch/8190753-test-compressed-size.zip I understand that Alan's suggestion holds good and we should have some logic in place which switches to using a temp file once we notice that the sizes we are dealing with can exceed some threshold, but I guess that is something we need to do separately outside of this PR? Yes the intent would be to add some logic, which might need to be under a property (for now) to specify the size for when to use a temp file vs BAOS. Having the value configurable via a property might give us some flexibility for experimentation. I don?t see why this PR could not be used for this (as it would provide a more complete solution) Best -Jaikiran [cid:E1C4E2F0-ECD0-4C9D-ADB4-B16CA7BCB7FC at home] Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 -------------- next part -------------- |
1 similar comment
Mailing list message from Lance Andersen on nio-dev: Hi Jaikiran On Jun 30, 2021, at 12:15 PM, Jaikiran Pai <jai.forums2013 at gmail.com<mailto:jai.forums2013 at gmail.com>> wrote: Hello Lance, On 29/06/21 11:31 pm, Lance Andersen wrote: I ran your current test 150 times and the max runtime was 25 seconds, most cases were in the 18-20 second range on our slower test boxes. Thank you for running those tests. Do you think those timings are good enough to let that test stay as a regular automated jtreg test, in tier1? I'm guessing this falls in tier1? I haven't yet looked in detail the tier definitions of the build. These tests run as part of tier2. The time for the test run is reasonable . As part of looking at what happens with a file whose deflated size is > 2gb, I would add a specific test which is a manual test to validate that there is no issue when we cross the 2gb threshold. I added a (manual) test to see what happens in this case. I have committed the test as part of this PR just for the sake of reference. The test is named LargeCompressedEntrySizeTest. The test uses ZipFS to create a (new) zip file and attempts to write out a zip entry whose deflated/compressed size is potentially greater than 2gb. When I run this test case, I consistenly run into the following exception: test LargeCompressedEntrySizeTest.testLargeCompressedSizeWithZipFS(): failure which to me is understandable. Is this what you and Alan wanted tested/checked? In its current form I don't see a way to write out a entry whose deflated size exceeds 2gb, unless the user/caller use the "useTempFile=true" option while creating the zip filesystem. FWIW - if I do set this "useTempFile=true" while creating that zip filesystem, in the LargeCompressedEntrySizeTest, that test passes fine and the underlying zip that is created shows a compressed/deflated size as follows: unzip -lv JTwork/scratch/8190753-test-compressed-size.zip I understand that Alan's suggestion holds good and we should have some logic in place which switches to using a temp file once we notice that the sizes we are dealing with can exceed some threshold, but I guess that is something we need to do separately outside of this PR? Yes the intent would be to add some logic, which might need to be under a property (for now) to specify the size for when to use a temp file vs BAOS. Having the value configurable via a property might give us some flexibility for experimentation. I don?t see why this PR could not be used for this (as it would provide a more complete solution) Best -Jaikiran [cid:E1C4E2F0-ECD0-4C9D-ADB4-B16CA7BCB7FC at home] Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 -------------- next part -------------- |
…rollover into a temporary file during writes, if the data size exceeds a threshold
Based on the inputs received, I've now updated this PR to enhance the ZipFS to allow for rolling over the outputstream data, into a temporary file after reaching a threshold. Details follow:
There are still some decisions to be made:
Using the existing property and just one property to control the temp file creation semantics will help avoid having to deal with 2 different properties ("useTempFile" and the "tempFileThreshold"). Plus it would also prevent any potentially conflicting user specified values for them. For example, we won't have to decide what to do when a user sets useTempFile=false and tempFileThreshold=1234. If we do decide to introduce this new property then some small amount of additional work needs to be done in this implementation to make sure semantically conflicting values for these 2 properties are handled correctly. I decided to skip that part in this round of the PR till we reached a decision about the properties.
|
…configuration property, instead set it internally to a specific value
Mailing list message from Jaikiran Pai on core-libs-dev: On 12/07/21 2:08 am, Lance Andersen wrote:
Thank you for the reviews and running the tests, Lance. I'll wait for -Jaikiran |
1 similar comment
Mailing list message from Jaikiran Pai on core-libs-dev: On 12/07/21 2:08 am, Lance Andersen wrote:
Thank you for the reviews and running the tests, Lance. I'll wait for -Jaikiran |
The update looks reasonable although some of the exception handling (with UncheckedIOException) is surprising. I'll do a detailed review in the next few days. |
Hello Alan,
For some context - the new I'll wait for your full review. |
Have you tried wrapping a BAOS rather than extending, that might allow the exception wrapping/unwapping to go away. |
Hello Alan, I did experiment with it earlier, before going with the current approach in this PR. The disadvantage, as I see it, with wrapping a The rollover code would then look something like:
So although you can transfer the contents to the file without requiring the access to the byte array, you end up creating a new copy of that array (through the use of The use of |
Now that the mailing lists integration seems to be back to normal, just adding this dummy comment to bring to attention the latest comments in this PR. |
Thanks for changing it to wrap the BOAS rather than existing it, that avoids the annoying wrapping/unwrapping of exceptions. So I think the approach looks good but I think the synchronization needs to be re-checked it is not obvious that is correct or needed. Are there any cases where FileRolloverOutputStream is returned to user code? I don't think so, instead users of the zip file system will get an EntryOutputStream that wraps the FileRolloverOutputStream. The EntryOutputStream methods are synchronized so I assume that FileRolloverOutputStream does not need to it and that would avoid the inconsistency between the write methods and the flush/close methods. One other thing to point out is that transferToFile shouldn't need to open the file twice, instead it should be able to open the tmp file for writing once. |
- remove unnecessary "synchronized" - no need to open the temp file twice
I hadn't paid any thoughts on the "synchronized" part. You are right - the new
The updated version of this PR now fixes this part to open it just once. I had reviewed this Thank you for these inputs. The updated PR continues to pass the new tests and the existing ones in |
The updated implementation looks okay, I don't think I have any more questions. |
Thank you for the review Alan. @LanceAndersen, I've run the tier1 tests locally with the latest PR and they have passed without any regressions. Given that we changed the implementation to wrap ByteArrayOutputStream instead of extending it, would you want to rerun some of your other tests that you had previously run, before I integrate this? |
Yes, I will run additional tests and report back after they complete |
I did not notice any new issues after running tier1 - tier3 |
Thank you for running the tests, Lance. |
/integrate |
Going to push as commit c3d8e92.
Your commit was automatically rebased without conflicts. |
Mailing list message from Bernd Eckenfels on core-libs-dev: Hello,
You can avoid the copy and the additional buffer with baos.writeTo() I think. try (OutputStream os = Files.newOutputStream(entry.file)) { // maybe append? -- On Wed, 21 Jul 2021 04:09:23 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:
Now that the mailing lists integration seems to be back to normal, just adding this dummy comment to bring to attention the latest comments in this PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/4607 |
Mailing list message from Jaikiran Pai on core-libs-dev: Hello Bernd, On 22/07/21 8:54 pm, Bernd Eckenfels wrote:
You are absolutely right. I hadn't noticed ByteArrayOutputStream had This was the only concern I had when it came to wrapping the -Jaikiran |
Can I please get a review for this proposed fix for the issue reported in https://bugs.openjdk.java.net/browse/JDK-8190753?
The commit here checks for the size of the zip entry before trying to create a
ByteArrayOutputStream
for that entry's content. A new jtreg test has been included in this commit to reproduce the issue and verify the fix.P.S: It's still a bit arguable whether it's a good idea to create the
ByteArrayOutputStream
with an initial size potentially as large as theMAX_ARRAY_SIZE
or whether it's better to just use some smaller default value. However, I think that can be addressed separately while dealing with https://bugs.openjdk.java.net/browse/JDK-8011146Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4607/head:pull/4607
$ git checkout pull/4607
Update a local copy of the PR:
$ git checkout pull/4607
$ git pull https://git.openjdk.java.net/jdk pull/4607/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 4607
View PR using the GUI difftool:
$ git pr show -t 4607
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4607.diff