Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8190753: (zipfs): Accessing a large entry (> 2^31 bytes) leads to a negative initial size for ByteArrayOutputStream #4607

Closed
wants to merge 15 commits into from

Conversation

jaikiran
Copy link
Member

@jaikiran jaikiran commented Jun 28, 2021

Can I please get a review for this proposed fix for the issue reported in https://bugs.openjdk.java.net/browse/JDK-8190753?

The commit here checks for the size of the zip entry before trying to create a ByteArrayOutputStream for that entry's content. A new jtreg test has been included in this commit to reproduce the issue and verify the fix.

P.S: It's still a bit arguable whether it's a good idea to create the ByteArrayOutputStream with an initial size potentially as large as the MAX_ARRAY_SIZE or whether it's better to just use some smaller default value. However, I think that can be addressed separately while dealing with https://bugs.openjdk.java.net/browse/JDK-8011146


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8190753: (zipfs): Accessing a large entry (> 2^31 bytes) leads to a negative initial size for ByteArrayOutputStream

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4607/head:pull/4607
$ git checkout pull/4607

Update a local copy of the PR:
$ git checkout pull/4607
$ git pull https://git.openjdk.java.net/jdk pull/4607/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 4607

View PR using the GUI difftool:
$ git pr show -t 4607

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4607.diff

…egative initial size for ByteArrayOutputStream
@bridgekeeper
Copy link

bridgekeeper bot commented Jun 28, 2021

👋 Welcome back jpai! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 28, 2021
@openjdk
Copy link

openjdk bot commented Jun 28, 2021

@jaikiran The following labels will be automatically applied to this pull request:

  • core-libs
  • nio

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added nio nio-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Jun 28, 2021
@mlbridge
Copy link

mlbridge bot commented Jun 28, 2021

@AlanBateman
Copy link
Contributor

This may be just moving the problem because writing to the BAOS will fail when the deflated size is too large to fit in a byte array. The zip provider can use a temporary file so maybe it should use that when appending to existing zip entries that are larger than some threshold. At some point we may need deeper changes here, e.g. start out with a BAOS and spill over to a temporary file when the deflated size reaches some threshold.

I didn't study the test too closely but just to mention that tests with zip entries > 2GB can be problematic to test. The test will probably need the @requires tag to limit it to 64-bit systems and maybe some minimum memory size. It may also need testing on a wide range of systems to get some idea of run time. Test machines with spinning rust (HDDs) come to mind.

@mlbridge
Copy link

mlbridge bot commented Jun 28, 2021

Mailing list message from Jaikiran Pai on core-libs-dev:

Hello Alan,

On 28/06/21 1:00 pm, Alan Bateman wrote:

I didn't study the test too closely but just to mention that tests with zip entries > 2GB can be problematic to test. The test will probably need the @requires tag to limit it to 64-bit systems and maybe some minimum memory size. It may also need testing on a wide range of systems to get some idea of run time. Test machines with spinning rust (HDDs) come to mind.

That's a good point and I had completely overlooked it. There's an
existing test test/jdk/java/util/zip/ZipFile/Zip64SizeTest.java
(unrelated to this issue) which uses a 5GB sized entry in the zips. In
fact, the idea of creating the zip entry in this manner was borrowed
from there. That one doesn't have any @requires for it. However, taking
a closer look at that existing test, that one just creates these large
entries but never loads (nor tries to load) those entries into memory
and probably explains why it doesn't need special care when it comes to
running the test.

I'll take a look at some other existing tests to see what kind of
@requires I can add here to make it a bit more selective on where it
gets run. Thank you for that input.

-Jaikiran

1 similar comment
@mlbridge
Copy link

mlbridge bot commented Jun 28, 2021

Mailing list message from Jaikiran Pai on core-libs-dev:

Hello Alan,

On 28/06/21 1:00 pm, Alan Bateman wrote:

I didn't study the test too closely but just to mention that tests with zip entries > 2GB can be problematic to test. The test will probably need the @requires tag to limit it to 64-bit systems and maybe some minimum memory size. It may also need testing on a wide range of systems to get some idea of run time. Test machines with spinning rust (HDDs) come to mind.

That's a good point and I had completely overlooked it. There's an
existing test test/jdk/java/util/zip/ZipFile/Zip64SizeTest.java
(unrelated to this issue) which uses a 5GB sized entry in the zips. In
fact, the idea of creating the zip entry in this manner was borrowed
from there. That one doesn't have any @requires for it. However, taking
a closer look at that existing test, that one just creates these large
entries but never loads (nor tries to load) those entries into memory
and probably explains why it doesn't need special care when it comes to
running the test.

I'll take a look at some other existing tests to see what kind of
@requires I can add here to make it a bit more selective on where it
gets run. Thank you for that input.

-Jaikiran

@mlbridge
Copy link

mlbridge bot commented Jun 28, 2021

Mailing list message from Lance Andersen on core-libs-dev:

Hi Jaikiran,

This is on my list to look at but did not get to today.

Best
Lance
On Jun 27, 2021, at 11:52 PM, Jaikiran Pai <jpai at openjdk.java.net<mailto:jpai at openjdk.java.net>> wrote:

Can I please get a review for this proposed fix for the issue reported in https://bugs.openjdk.java.net/browse/JDK-8190753?

The commit here checks for the size of the zip entry before trying to create a `ByteArrayOutputStream` for that entry's content. A new jtreg test has been included in this commit to reproduce the issue and verify the fix.

P.S: It's still a bit arguable whether it's a good idea to create the `ByteArrayOutputStream` with an initial size potentially as large as the `MAX_ARRAY_SIZE` or whether it's better to just use some smaller default value. However, I think that can be addressed separately while dealing with https://bugs.openjdk.java.net/browse/JDK-8011146

-------------

Commit messages:
- 8190753: (zipfs): Accessing a large entry (> 2^31 bytes) leads to a negative initial size for ByteArrayOutputStream

Changes: https://git.openjdk.java.net/jdk/pull/4607/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4607&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8190753
Stats: 139 lines in 2 files changed: 138 ins; 0 del; 1 mod
Patch: https://git.openjdk.java.net/jdk/pull/4607.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/4607/head:pull/4607

PR: https://git.openjdk.java.net/jdk/pull/4607

[cid:E1C4E2F0-ECD0-4C9D-ADB4-B16CA7BCB7FC at home]

Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
Lance.Andersen at oracle.com<mailto:Lance.Andersen at oracle.com>

1 similar comment
@mlbridge
Copy link

mlbridge bot commented Jun 28, 2021

Mailing list message from Lance Andersen on core-libs-dev:

Hi Jaikiran,

This is on my list to look at but did not get to today.

Best
Lance
On Jun 27, 2021, at 11:52 PM, Jaikiran Pai <jpai at openjdk.java.net<mailto:jpai at openjdk.java.net>> wrote:

Can I please get a review for this proposed fix for the issue reported in https://bugs.openjdk.java.net/browse/JDK-8190753?

The commit here checks for the size of the zip entry before trying to create a `ByteArrayOutputStream` for that entry's content. A new jtreg test has been included in this commit to reproduce the issue and verify the fix.

P.S: It's still a bit arguable whether it's a good idea to create the `ByteArrayOutputStream` with an initial size potentially as large as the `MAX_ARRAY_SIZE` or whether it's better to just use some smaller default value. However, I think that can be addressed separately while dealing with https://bugs.openjdk.java.net/browse/JDK-8011146

-------------

Commit messages:
- 8190753: (zipfs): Accessing a large entry (> 2^31 bytes) leads to a negative initial size for ByteArrayOutputStream

Changes: https://git.openjdk.java.net/jdk/pull/4607/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4607&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8190753
Stats: 139 lines in 2 files changed: 138 ins; 0 del; 1 mod
Patch: https://git.openjdk.java.net/jdk/pull/4607.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/4607/head:pull/4607

PR: https://git.openjdk.java.net/jdk/pull/4607

[cid:E1C4E2F0-ECD0-4C9D-ADB4-B16CA7BCB7FC at home]

Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
Lance.Andersen at oracle.com<mailto:Lance.Andersen at oracle.com>

@mlbridge
Copy link

mlbridge bot commented Jun 29, 2021

Mailing list message from Jaikiran Pai on core-libs-dev:

Hello Lance,

Please take your time.

-Jaikiran

On 29/06/21 4:17 am, Lance Andersen wrote:

1 similar comment
@mlbridge
Copy link

mlbridge bot commented Jun 29, 2021

Mailing list message from Jaikiran Pai on core-libs-dev:

Hello Lance,

Please take your time.

-Jaikiran

On 29/06/21 4:17 am, Lance Andersen wrote:

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into this issue.

I ran your current test 150 times and the max runtime was 25 seconds, most cases were in the 18-20 second range on our slower test boxes.

As part of looking at what happens with a file whose deflated size is > 2gb, I would add a specific test which is a manual test to validate that there is no issue when we cross the 2gb threshold.

@@ -1948,7 +1950,7 @@ private OutputStream getOutputStream(Entry e) throws IOException {
e.file = getTempPathForEntry(null);
os = Files.newOutputStream(e.file, WRITE);
} else {
os = new ByteArrayOutputStream((e.size > 0)? (int)e.size : 8192);
os = new ByteArrayOutputStream((e.size > 0 && e.size <= MAX_ARRAY_SIZE)? (int)e.size : 8192);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed change will address the specific issue shown in the bug. As Alan points out, there could be an issue if the deflated size is > 2gb. It would be good to look into that as part of your proposed fix.

@mlbridge
Copy link

mlbridge bot commented Jun 30, 2021

Mailing list message from Jaikiran Pai on core-libs-dev:

Hello Lance,

On 29/06/21 11:31 pm, Lance Andersen wrote:

I ran your current test 150 times and the max runtime was 25 seconds, most cases were in the 18-20 second range on our slower test boxes.

Thank you for running those tests. Do you think those timings are good
enough to let that test stay as a regular automated jtreg test, in
tier1? I'm guessing this falls in tier1? I haven't yet looked in detail
the tier definitions of the build.

As part of looking at what happens with a file whose deflated size is > 2gb, I would add a specific test which is a manual test to validate that there is no issue when we cross the 2gb threshold.

I added a (manual) test to see what happens in this case. I have
committed the test as part of this PR just for the sake of reference.
The test is named LargeCompressedEntrySizeTest. The test uses ZipFS to
create a (new) zip file and attempts to write out a zip entry whose
deflated/compressed size is potentially greater than 2gb. When I run
this test case, I consistenly run into the following exception:

test LargeCompressedEntrySizeTest.testLargeCompressedSizeWithZipFS():
failure
java.lang.OutOfMemoryError: Required array length 2147483639 + 419 is
too large
??? at
java.base/jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:649)
??? at
java.base/jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:642)
??? at
java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100)
??? at
java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:130)
??? at
java.base/java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:252)
??? at
java.base/java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:210)
??? at
jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$DeflatingEntryOutputStream.write(ZipFileSystem.java:2016)
??? at
java.base/java.io.FilterOutputStream.write(FilterOutputStream.java:108)
??? at
LargeCompressedEntrySizeTest.testLargeCompressedSizeWithZipFS(LargeCompressedEntrySizeTest.java:104)

which to me is understandable. Is this what you and Alan wanted
tested/checked? In its current form I don't see a way to write out a
entry whose deflated size exceeds 2gb, unless the user/caller use the
"useTempFile=true" option while creating the zip filesystem. FWIW - if I
do set this "useTempFile=true" while creating that zip filesystem, in
the LargeCompressedEntrySizeTest, that test passes fine and the
underlying zip that is created shows a compressed/deflated size as follows:

unzip -lv JTwork/scratch/8190753-test-compressed-size.zip
Archive:? JTwork/scratch/8190753-test-compressed-size.zip
?Length?? Method??? Size? Cmpr??? Date??? Time?? CRC-32?? Name
--------? ------? ------- ---- ---------- ----- --------? ----
2147483649? Defl:N 2148138719?? 0% 06-30-2021 21:39 52cab9f8
LargeZipEntry.txt
--------????????? -------? ---??????????????????????????? -------
2147483649???????? 2148138719?? 0%??????????????????????????? 1 file

I understand that Alan's suggestion holds good and we should have some
logic in place which switches to using a temp file once we notice that
the sizes we are dealing with can exceed some threshold, but I guess
that is something we need to do separately outside of this PR?

-Jaikiran

1 similar comment
@mlbridge
Copy link

mlbridge bot commented Jun 30, 2021

Mailing list message from Jaikiran Pai on core-libs-dev:

Hello Lance,

On 29/06/21 11:31 pm, Lance Andersen wrote:

I ran your current test 150 times and the max runtime was 25 seconds, most cases were in the 18-20 second range on our slower test boxes.

Thank you for running those tests. Do you think those timings are good
enough to let that test stay as a regular automated jtreg test, in
tier1? I'm guessing this falls in tier1? I haven't yet looked in detail
the tier definitions of the build.

As part of looking at what happens with a file whose deflated size is > 2gb, I would add a specific test which is a manual test to validate that there is no issue when we cross the 2gb threshold.

I added a (manual) test to see what happens in this case. I have
committed the test as part of this PR just for the sake of reference.
The test is named LargeCompressedEntrySizeTest. The test uses ZipFS to
create a (new) zip file and attempts to write out a zip entry whose
deflated/compressed size is potentially greater than 2gb. When I run
this test case, I consistenly run into the following exception:

test LargeCompressedEntrySizeTest.testLargeCompressedSizeWithZipFS():
failure
java.lang.OutOfMemoryError: Required array length 2147483639 + 419 is
too large
??? at
java.base/jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:649)
??? at
java.base/jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:642)
??? at
java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100)
??? at
java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:130)
??? at
java.base/java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:252)
??? at
java.base/java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:210)
??? at
jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$DeflatingEntryOutputStream.write(ZipFileSystem.java:2016)
??? at
java.base/java.io.FilterOutputStream.write(FilterOutputStream.java:108)
??? at
LargeCompressedEntrySizeTest.testLargeCompressedSizeWithZipFS(LargeCompressedEntrySizeTest.java:104)

which to me is understandable. Is this what you and Alan wanted
tested/checked? In its current form I don't see a way to write out a
entry whose deflated size exceeds 2gb, unless the user/caller use the
"useTempFile=true" option while creating the zip filesystem. FWIW - if I
do set this "useTempFile=true" while creating that zip filesystem, in
the LargeCompressedEntrySizeTest, that test passes fine and the
underlying zip that is created shows a compressed/deflated size as follows:

unzip -lv JTwork/scratch/8190753-test-compressed-size.zip
Archive:? JTwork/scratch/8190753-test-compressed-size.zip
?Length?? Method??? Size? Cmpr??? Date??? Time?? CRC-32?? Name
--------? ------? ------- ---- ---------- ----- --------? ----
2147483649? Defl:N 2148138719?? 0% 06-30-2021 21:39 52cab9f8
LargeZipEntry.txt
--------????????? -------? ---??????????????????????????? -------
2147483649???????? 2148138719?? 0%??????????????????????????? 1 file

I understand that Alan's suggestion holds good and we should have some
logic in place which switches to using a temp file once we notice that
the sizes we are dealing with can exceed some threshold, but I guess
that is something we need to do separately outside of this PR?

-Jaikiran

@mlbridge
Copy link

mlbridge bot commented Jun 30, 2021

Mailing list message from Lance Andersen on nio-dev:

Hi Jaikiran

On Jun 30, 2021, at 12:15 PM, Jaikiran Pai <jai.forums2013 at gmail.com<mailto:jai.forums2013 at gmail.com>> wrote:

Hello Lance,

On 29/06/21 11:31 pm, Lance Andersen wrote:

I ran your current test 150 times and the max runtime was 25 seconds, most cases were in the 18-20 second range on our slower test boxes.

Thank you for running those tests. Do you think those timings are good enough to let that test stay as a regular automated jtreg test, in tier1? I'm guessing this falls in tier1? I haven't yet looked in detail the tier definitions of the build.

These tests run as part of tier2.

The time for the test run is reasonable .

As part of looking at what happens with a file whose deflated size is > 2gb, I would add a specific test which is a manual test to validate that there is no issue when we cross the 2gb threshold.

I added a (manual) test to see what happens in this case. I have committed the test as part of this PR just for the sake of reference. The test is named LargeCompressedEntrySizeTest. The test uses ZipFS to create a (new) zip file and attempts to write out a zip entry whose deflated/compressed size is potentially greater than 2gb. When I run this test case, I consistenly run into the following exception:

test LargeCompressedEntrySizeTest.testLargeCompressedSizeWithZipFS(): failure
java.lang.OutOfMemoryError: Required array length 2147483639 + 419 is too large
at java.base/jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:649)
at java.base/jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:642)
at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100)
at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:130)
at java.base/java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:252)
at java.base/java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:210)
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$DeflatingEntryOutputStream.write(ZipFileSystem.java:2016)
at java.base/java.io.FilterOutputStream.write(FilterOutputStream.java:108)
at LargeCompressedEntrySizeTest.testLargeCompressedSizeWithZipFS(LargeCompressedEntrySizeTest.java:104)

which to me is understandable. Is this what you and Alan wanted tested/checked? In its current form I don't see a way to write out a entry whose deflated size exceeds 2gb, unless the user/caller use the "useTempFile=true" option while creating the zip filesystem. FWIW - if I do set this "useTempFile=true" while creating that zip filesystem, in the LargeCompressedEntrySizeTest, that test passes fine and the underlying zip that is created shows a compressed/deflated size as follows:

unzip -lv JTwork/scratch/8190753-test-compressed-size.zip
Archive: JTwork/scratch/8190753-test-compressed-size.zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
2147483649 Defl:N 2148138719 0% 06-30-2021 21:39 52cab9f8 LargeZipEntry.txt
-------- ------- --- -------
2147483649 2148138719 0% 1 file

I understand that Alan's suggestion holds good and we should have some logic in place which switches to using a temp file once we notice that the sizes we are dealing with can exceed some threshold, but I guess that is something we need to do separately outside of this PR?

Yes the intent would be to add some logic, which might need to be under a property (for now) to specify the size for when to use a temp file vs BAOS. Having the value configurable via a property might give us some flexibility for experimentation.

I don?t see why this PR could not be used for this (as it would provide a more complete solution)

Best
Lance

-Jaikiran

[cid:E1C4E2F0-ECD0-4C9D-ADB4-B16CA7BCB7FC at home]

Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
Lance.Andersen at oracle.com<mailto:Lance.Andersen at oracle.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/nio-dev/attachments/20210630/9072b38d/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oracle_sig_logo.gif
Type: image/gif
Size: 658 bytes
Desc: oracle_sig_logo.gif
URL: <https://mail.openjdk.java.net/pipermail/nio-dev/attachments/20210630/9072b38d/oracle_sig_logo-0001.gif>

1 similar comment
@mlbridge
Copy link

mlbridge bot commented Jun 30, 2021

Mailing list message from Lance Andersen on nio-dev:

Hi Jaikiran

On Jun 30, 2021, at 12:15 PM, Jaikiran Pai <jai.forums2013 at gmail.com<mailto:jai.forums2013 at gmail.com>> wrote:

Hello Lance,

On 29/06/21 11:31 pm, Lance Andersen wrote:

I ran your current test 150 times and the max runtime was 25 seconds, most cases were in the 18-20 second range on our slower test boxes.

Thank you for running those tests. Do you think those timings are good enough to let that test stay as a regular automated jtreg test, in tier1? I'm guessing this falls in tier1? I haven't yet looked in detail the tier definitions of the build.

These tests run as part of tier2.

The time for the test run is reasonable .

As part of looking at what happens with a file whose deflated size is > 2gb, I would add a specific test which is a manual test to validate that there is no issue when we cross the 2gb threshold.

I added a (manual) test to see what happens in this case. I have committed the test as part of this PR just for the sake of reference. The test is named LargeCompressedEntrySizeTest. The test uses ZipFS to create a (new) zip file and attempts to write out a zip entry whose deflated/compressed size is potentially greater than 2gb. When I run this test case, I consistenly run into the following exception:

test LargeCompressedEntrySizeTest.testLargeCompressedSizeWithZipFS(): failure
java.lang.OutOfMemoryError: Required array length 2147483639 + 419 is too large
at java.base/jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:649)
at java.base/jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:642)
at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100)
at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:130)
at java.base/java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:252)
at java.base/java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:210)
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$DeflatingEntryOutputStream.write(ZipFileSystem.java:2016)
at java.base/java.io.FilterOutputStream.write(FilterOutputStream.java:108)
at LargeCompressedEntrySizeTest.testLargeCompressedSizeWithZipFS(LargeCompressedEntrySizeTest.java:104)

which to me is understandable. Is this what you and Alan wanted tested/checked? In its current form I don't see a way to write out a entry whose deflated size exceeds 2gb, unless the user/caller use the "useTempFile=true" option while creating the zip filesystem. FWIW - if I do set this "useTempFile=true" while creating that zip filesystem, in the LargeCompressedEntrySizeTest, that test passes fine and the underlying zip that is created shows a compressed/deflated size as follows:

unzip -lv JTwork/scratch/8190753-test-compressed-size.zip
Archive: JTwork/scratch/8190753-test-compressed-size.zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
2147483649 Defl:N 2148138719 0% 06-30-2021 21:39 52cab9f8 LargeZipEntry.txt
-------- ------- --- -------
2147483649 2148138719 0% 1 file

I understand that Alan's suggestion holds good and we should have some logic in place which switches to using a temp file once we notice that the sizes we are dealing with can exceed some threshold, but I guess that is something we need to do separately outside of this PR?

Yes the intent would be to add some logic, which might need to be under a property (for now) to specify the size for when to use a temp file vs BAOS. Having the value configurable via a property might give us some flexibility for experimentation.

I don?t see why this PR could not be used for this (as it would provide a more complete solution)

Best
Lance

-Jaikiran

[cid:E1C4E2F0-ECD0-4C9D-ADB4-B16CA7BCB7FC at home]

Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
Lance.Andersen at oracle.com<mailto:Lance.Andersen at oracle.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/nio-dev/attachments/20210630/9072b38d/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oracle_sig_logo.gif
Type: image/gif
Size: 658 bytes
Desc: oracle_sig_logo.gif
URL: <https://mail.openjdk.java.net/pipermail/nio-dev/attachments/20210630/9072b38d/oracle_sig_logo-0001.gif>

…rollover into a temporary file during writes, if the data size exceeds a threshold
@jaikiran
Copy link
Member Author

jaikiran commented Jul 3, 2021

Based on the inputs received, I've now updated this PR to enhance the ZipFS to allow for rolling over the outputstream data, into a temporary file after reaching a threshold. Details follow:

  • A new "tempFileThreshold" property has been introduced for ZipFileSystem. This property can be passed like other existing properties when creating the filesystem. The value of this property is expected to be a size in bytes and represents the threshold which will be used to decide whether and when to use a temporary file for outputstreams of zip file entries returned by the ZipFileSystem.

  • The "tempFileThreshold" property is optional and if not set to any explicit value will default to 10MB. In other words, this feature is by default enabled, without the user having to do any specific configuration (not even the existing "useTempFile" property is requried to be set).

  • To disable the threshold based temp file creation feature, a value of 0 or a negative value can be passed.

  • A new (internal) FileRolloverOutputStream has been introduced in ZipFileSystem and forms the core of this enhancement. It extends the ByteArrayOutputStream and its write methods have the additional ability to rollover the current and subsequently written data into a temporary file representing that zip file entry.

  • A new ZipFSOutputStreamTest has been added with the sole focus of verifying the usage of this new "tempFileThreshold" property.

  • The previously added LargeEntrySizeTest and the manual LargeCompressedEntrySizeTest continue to stay and are mainly there to test the large file sizes and deflated sizes of the zip entries and verify that the original reported JBS issue is fixed by this enhancement. One important thing to note about these tests is that I've now removed the explicit "@requires" (memory) requirements, since after this enhancement (with "tempFileThreshold" by default enabled as in those tests), it no longer should require any specific high memory systems to run these tests.

There are still some decisions to be made:

  1. Should we introduce this new property or should we enhance the existing "useTempFile" property to allow a size to be passed? Specifically, can we/should we use that existing property to allow users to set the following values:

    • "true", this would imply the temp file feature is enabled always irrespective of the zip entry size. This is how the value of "true" is currently handled before the patch in this PR. So no change in behaviour.

    • a byte size, represented as a String or an integer. This would imply that the user wants to enable the temp file feature, but only when the size or compressed size of the zip entry reaches a threshold specified by this value. This would mean that for sizes lesser than this the ZipFS implementation would use a ByteArrayOutputStream and would only rollover to a temp file when the threshold is reached.

    • "false", this would disable the temp file feature completely and outputstreams for zip entries of the ZipFS instance will always use ByteArrayOutputStream

    • value of "0" or "-1". This would be the same as specifying "false" value.

Using the existing property and just one property to control the temp file creation semantics will help avoid having to deal with 2 different properties ("useTempFile" and the "tempFileThreshold"). Plus it would also prevent any potentially conflicting user specified values for them. For example, we won't have to decide what to do when a user sets useTempFile=false and tempFileThreshold=1234.

If we do decide to introduce this new property then some small amount of additional work needs to be done in this implementation to make sure semantically conflicting values for these 2 properties are handled correctly. I decided to skip that part in this round of the PR till we reached a decision about the properties.

  1. Given that this is a new enhancement, I believe this requires a CSR.

  2. Should this PR be now linked to https://bugs.openjdk.java.net/browse/JDK-8011146 instead of https://bugs.openjdk.java.net/browse/JDK-8190753?

  3. I've never previously created a manual test case. The LargeCompressedEntrySizeTest in this PR is expected to be a manual test case (given how long it might take to run on various different systems). The only difference between this test case and other jtreg automated tests is the absence of a @test on this one. Is this how manual tests are written or is there some other way?

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 11, 2021
@mlbridge
Copy link

mlbridge bot commented Jul 12, 2021

Mailing list message from Jaikiran Pai on core-libs-dev:

On 12/07/21 2:08 am, Lance Andersen wrote:

On Mon, 5 Jul 2021 07:42:26 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

Can I please get a review for this proposed fix for the issue reported in https://bugs.openjdk.java.net/browse/JDK-8190753?

The commit here checks for the size of the zip entry before trying to create a `ByteArrayOutputStream` for that entry's content. A new jtreg test has been included in this commit to reproduce the issue and verify the fix.

P.S: It's still a bit arguable whether it's a good idea to create the `ByteArrayOutputStream` with an initial size potentially as large as the `MAX_ARRAY_SIZE` or whether it's better to just use some smaller default value. However, I think that can be addressed separately while dealing with https://bugs.openjdk.java.net/browse/JDK-8011146
Jaikiran Pai has updated the pull request incrementally with one additional commit since the last revision:

reorganize the tests now that the temp file creation threshold isn't exposed as a user configurable value
I think the updates made to Zip FS look better. Alan is on vacation so I would prefer to wait until he gets back and give him a chance to provide any last thoughts on the change to Zip FS.

The manual test looks OK and is a good addition

Thank you for the reviews and running the tests, Lance. I'll wait for
Alan to be back for his reviews.

-Jaikiran

1 similar comment
@mlbridge
Copy link

mlbridge bot commented Jul 12, 2021

Mailing list message from Jaikiran Pai on core-libs-dev:

On 12/07/21 2:08 am, Lance Andersen wrote:

On Mon, 5 Jul 2021 07:42:26 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

Can I please get a review for this proposed fix for the issue reported in https://bugs.openjdk.java.net/browse/JDK-8190753?

The commit here checks for the size of the zip entry before trying to create a `ByteArrayOutputStream` for that entry's content. A new jtreg test has been included in this commit to reproduce the issue and verify the fix.

P.S: It's still a bit arguable whether it's a good idea to create the `ByteArrayOutputStream` with an initial size potentially as large as the `MAX_ARRAY_SIZE` or whether it's better to just use some smaller default value. However, I think that can be addressed separately while dealing with https://bugs.openjdk.java.net/browse/JDK-8011146
Jaikiran Pai has updated the pull request incrementally with one additional commit since the last revision:

reorganize the tests now that the temp file creation threshold isn't exposed as a user configurable value
I think the updates made to Zip FS look better. Alan is on vacation so I would prefer to wait until he gets back and give him a chance to provide any last thoughts on the change to Zip FS.

The manual test looks OK and is a good addition

Thank you for the reviews and running the tests, Lance. I'll wait for
Alan to be back for his reviews.

-Jaikiran

@AlanBateman
Copy link
Contributor

Thank you for the reviews and running the tests, Lance. I'll wait for
Alan to be back for his reviews.

The update looks reasonable although some of the exception handling (with UncheckedIOException) is surprising. I'll do a detailed review in the next few days.

@jaikiran
Copy link
Member Author

Hello Alan,

The update looks reasonable although some of the exception handling (with UncheckedIOException) is surprising

For some context - the new FileRolloverOutputStream extends ByteArrayOutputStream and hence cannot have a throws IOException in its overridden write methods.

I'll wait for your full review.

@AlanBateman
Copy link
Contributor

For some context - the new FileRolloverOutputStream extends ByteArrayOutputStream and hence cannot have a throws IOException in its overridden write methods.

Have you tried wrapping a BAOS rather than extending, that might allow the exception wrapping/unwapping to go away.

@jaikiran
Copy link
Member Author

jaikiran commented Jul 21, 2021

For some context - the new FileRolloverOutputStream extends ByteArrayOutputStream and hence cannot have a throws IOException in its overridden write methods.

Have you tried wrapping a BAOS rather than extending, that might allow the exception wrapping/unwapping to go away.

Hello Alan,

I did experiment with it earlier, before going with the current approach in this PR. The disadvantage, as I see it, with wrapping a ByteArrayOutputStream instead of extending it is that when trying to rollover the contents to a file, you don't have access to the (inner protected) byte array of the ByteArrayOutputStream.

The rollover code would then look something like:

private void transferToFile() throws IOException {
    // create a tempfile
    entry.file = getTempPathForEntry(null);
    // transfer the already written data from the byte array buffer into this tempfile
    try (OutputStream os = new BufferedOutputStream(Files.newOutputStream(entry.file))) {
        new ByteArrayInputStream(baos.toByteArray(), 0, baos.size()).transferTo(os);
    }
    // release the underlying byte array
    baos = null;
    // append any further data to the file with buffering enabled
    tmpFileOS = new BufferedOutputStream(Files.newOutputStream(entry.file, APPEND));
}

So although you can transfer the contents to the file without requiring the access to the byte array, you end up creating a new copy of that array (through the use of baos.toByteArray()), which can be at most 10MB in size. I thought avoiding a new copy of that (potentially 10MB) array during this transfer would be a good save and hence decided to settle on extending ByteArrayOutputStream instead of wrapping it.

The use of extends of course now means dealing with the UncheckedIOException as done in this PR. But if you think that the array copy isn't a concern and wrapping the ByteArrayOutputStream is a better way, then I'll go ahead and update this PR accordingly.

@jaikiran
Copy link
Member Author

For some context - the new FileRolloverOutputStream extends ByteArrayOutputStream and hence cannot have a throws IOException in its overridden write methods.

Have you tried wrapping a BAOS rather than extending, that might allow the exception wrapping/unwapping to go away.

Hello Alan,

I did experiment with it earlier, before going with the current approach in this PR. The disadvantage, as I see it, with wrapping a ByteArrayOutputStream instead of extending it is that when trying to rollover the contents to a file, you don't have access to the (inner protected) byte array of the ByteArrayOutputStream.

The rollover code would then look something like:

private void transferToFile() throws IOException {
    // create a tempfile
    entry.file = getTempPathForEntry(null);
    // transfer the already written data from the byte array buffer into this tempfile
    try (OutputStream os = new BufferedOutputStream(Files.newOutputStream(entry.file))) {
        new ByteArrayInputStream(baos.toByteArray(), 0, baos.size()).transferTo(os);
    }
    // release the underlying byte array
    baos = null;
    // append any further data to the file with buffering enabled
    tmpFileOS = new BufferedOutputStream(Files.newOutputStream(entry.file, APPEND));
}

So although you can transfer the contents to the file without requiring the access to the byte array, you end up creating a new copy of that array (through the use of baos.toByteArray()), which can be at most 10MB in size. I thought avoiding a new copy of that (potentially 10MB) array during this transfer would be a good save and hence decided to settle on extending ByteArrayOutputStream instead of wrapping it.

The use of extends of course now means dealing with the UncheckedIOException as done in this PR. But if you think that the array copy isn't a concern and wrapping the ByteArrayOutputStream is a better way, then I'll go ahead and update this PR accordingly.

Now that the mailing lists integration seems to be back to normal, just adding this dummy comment to bring to attention the latest comments in this PR.

@AlanBateman
Copy link
Contributor

Thanks for changing it to wrap the BOAS rather than existing it, that avoids the annoying wrapping/unwrapping of exceptions.

So I think the approach looks good but I think the synchronization needs to be re-checked it is not obvious that is correct or needed. Are there any cases where FileRolloverOutputStream is returned to user code? I don't think so, instead users of the zip file system will get an EntryOutputStream that wraps the FileRolloverOutputStream. The EntryOutputStream methods are synchronized so I assume that FileRolloverOutputStream does not need to it and that would avoid the inconsistency between the write methods and the flush/close methods.

One other thing to point out is that transferToFile shouldn't need to open the file twice, instead it should be able to open the tmp file for writing once.

 - remove unnecessary "synchronized"
 - no need to open the temp file twice
@jaikiran
Copy link
Member Author

So I think the approach looks good but I think the synchronization needs to be re-checked it is not obvious that is correct or needed. Are there any cases where FileRolloverOutputStream is returned to user code? I don't think so, instead users of the zip file system will get an EntryOutputStream that wraps the FileRolloverOutputStream. The EntryOutputStream methods are synchronized so I assume that FileRolloverOutputStream does not need to it and that would avoid the inconsistency between the write methods and the flush/close methods.

I hadn't paid any thoughts on the "synchronized" part. You are right - the new FileRolloverOutputStream doesn't get sent back to the callers directly and instead either the EntryOutputStream or the DeflatingEntryOutputStream get returned. Both of them have the necessary syncrhonizations in place for write and close operations. The flush of the FileRolloverOutputStream calls the flush on the BufferedOutputStream which already has the necessary synchronization. I've updated the PR to remove the use of synchronized from this new class and added a brief note about this for future maintainers, just like the existing EntryOutputStreamDef has.

One other thing to point out is that transferToFile shouldn't need to open the file twice, instead it should be able to open the tmp file for writing once.

The updated version of this PR now fixes this part to open it just once. I had reviewed this transferTo multiple times before, but clearly I overlooked this part of the implementation.

Thank you for these inputs. The updated PR continues to pass the new tests and the existing ones in test/jdk/jdk/nio/zipfs/.

@AlanBateman
Copy link
Contributor

The updated version of this PR now fixes this part to open it just once. I had reviewed this transferTo multiple times before, but clearly I overlooked this part of the implementation.

Thank you for these inputs. The updated PR continues to pass the new tests and the existing ones in test/jdk/jdk/nio/zipfs/.

The updated implementation looks okay, I don't think I have any more questions.

@jaikiran
Copy link
Member Author

Thank you for the review Alan.

@LanceAndersen, I've run the tier1 tests locally with the latest PR and they have passed without any regressions. Given that we changed the implementation to wrap ByteArrayOutputStream instead of extending it, would you want to rerun some of your other tests that you had previously run, before I integrate this?

@LanceAndersen
Copy link
Contributor

Thank you for the review Alan.

@LanceAndersen, I've run the tier1 tests locally with the latest PR and they have passed without any regressions. Given that we changed the implementation to wrap ByteArrayOutputStream instead of extending it, would you want to rerun some of your other tests that you had previously run, before I integrate this?

Yes, I will run additional tests and report back after they complete

@LanceAndersen
Copy link
Contributor

Thank you for the review Alan.
@LanceAndersen, I've run the tier1 tests locally with the latest PR and they have passed without any regressions. Given that we changed the implementation to wrap ByteArrayOutputStream instead of extending it, would you want to rerun some of your other tests that you had previously run, before I integrate this?

Yes, I will run additional tests and report back after they complete

I did not notice any new issues after running tier1 - tier3

@jaikiran
Copy link
Member Author

Thank you for running the tests, Lance.

@jaikiran
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Jul 27, 2021

Going to push as commit c3d8e92.
Since your change was applied there have been 27 commits pushed to the master branch:

  • eb6da88: Merge
  • b76a838: 8269150: UnicodeReader not translating \u005c\u005d to \]
  • 7ddabbf: 8271175: runtime/jni/FindClassUtf8/FindClassUtf8.java doesn't have to be run in othervm
  • 3c27f91: 8271222: two runtime/Monitor tests don't check exit code
  • 049b2ad: 8015886: java/awt/Focus/DeiconifiedFrameLoosesFocus/DeiconifiedFrameLoosesFocus.java sometimes failed on ubuntu
  • fcc7d59: 8269342: CICrashAt=1 does not always catch first Java method
  • 8785737: 8269616: serviceability/dcmd/framework/VMVersionTest.java fails with Address already in use error
  • 3aadae2: 8271140: Fix native frame handling in vframeStream::asJavaVFrame()
  • b8f79a7: 8268873: Unnecessary Vector usage in java.base
  • 0b12e7c: 8075806: divideExact is missing in java.lang.Math
  • ... and 17 more: https://git.openjdk.java.net/jdk/compare/45abbeed2f4f2899a3c1595b0cd8e573990a16fa...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Jul 27, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 27, 2021
@openjdk
Copy link

openjdk bot commented Jul 27, 2021

@jaikiran Pushed as commit c3d8e92.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@mlbridge
Copy link

mlbridge bot commented Sep 3, 2021

Mailing list message from Bernd Eckenfels on core-libs-dev:

Hello,

So although you can transfer the contents to the file without requiring the access
to the byte array, you end up creating a new copy of that array (through the use
of `baos.toByteArray()`)

You can avoid the copy and the additional buffer with baos.writeTo() I think.

try (OutputStream os = Files.newOutputStream(entry.file)) { // maybe append?
baos.writeTo(os);
}
// release the underlying byte array
baos = null;
// append any further data to the file with buffering enabled
tmpFileOS = new BufferedOutputStream(Files.newOutputStream(entry.file, APPEND));

--
http://bernd.eckenfels.net
________________________________
Von: nio-dev <nio-dev-retn at openjdk.java.net> im Auftrag von Jaikiran Pai <jpai at openjdk.java.net>
Gesendet: Thursday, July 22, 2021 2:55:46 PM
An: core-libs-dev at openjdk.java.net <core-libs-dev at openjdk.java.net>; nio-dev at openjdk.java.net <nio-dev at openjdk.java.net>
Betreff: Re: RFR: 8190753: (zipfs): Accessing a large entry (> 2^31 bytes) leads to a negative initial size for ByteArrayOutputStream [v8]

On Wed, 21 Jul 2021 04:09:23 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

For some context - the new `FileRolloverOutputStream` extends `ByteArrayOutputStream` and hence cannot have a `throws IOException` in its overridden `write` methods.

Have you tried wrapping a BAOS rather than extending, that might allow the exception wrapping/unwapping to go away.

Hello Alan,

I did experiment with it earlier, before going with the current approach in this PR. The disadvantage, as I see it, with wrapping a `ByteArrayOutputStream` instead of extending it is that when trying to rollover the contents to a file, you don't have access to the (inner protected) byte array of the ByteArrayOutputStream.

The rollover code would then look something like:

```
private void transferToFile() throws IOException {
// create a tempfile
entry.file = getTempPathForEntry(null);
// transfer the already written data from the byte array buffer into this tempfile
try (OutputStream os = new BufferedOutputStream(Files.newOutputStream(entry.file))) {
new ByteArrayInputStream(baos.toByteArray(), 0, baos.size()).transferTo(os);
}
// release the underlying byte array
baos = null;
// append any further data to the file with buffering enabled
tmpFileOS = new BufferedOutputStream(Files.newOutputStream(entry.file, APPEND));
}
```

So although you can transfer the contents to the file without requiring the access to the byte array, you end up creating a new copy of that array (through the use of `baos.toByteArray()`), which can be at most 10MB in size. I thought avoiding a new copy of that (potentially 10MB) array during this transfer would be a good save and hence decided to settle on extending `ByteArrayOutputStream` instead of wrapping it.

The use of `extends` of course now means dealing with the `UncheckedIOException` as done in this PR. But if you think that the array copy isn't a concern and wrapping the `ByteArrayOutputStream` is a better way, then I'll go ahead and update this PR accordingly.

Now that the mailing lists integration seems to be back to normal, just adding this dummy comment to bring to attention the latest comments in this PR.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4607

@mlbridge
Copy link

mlbridge bot commented Sep 3, 2021

Mailing list message from Jaikiran Pai on core-libs-dev:

Hello Bernd,

On 22/07/21 8:54 pm, Bernd Eckenfels wrote:

Hello,

So although you can transfer the contents to the file without requiring the access
to the byte array, you end up creating a new copy of that array (through the use
of `baos.toByteArray()`)
You can avoid the copy and the additional buffer with baos.writeTo() I think.

try (OutputStream os = Files.newOutputStream(entry.file)) { // maybe append?
baos.writeTo(os);
}

You are absolutely right. I hadn't noticed ByteArrayOutputStream had
this writeTo() method. Thank you for this input.

This was the only concern I had when it came to wrapping the
ByteArrayOutputStream and now with your input it no longer is a concern.
I have updated this PR to go ahead with the wrapping approach which also
does away with the necessity of UncheckedIOException. Existing and the
new tests continue to pass with this change.

-Jaikiran

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated nio nio-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

3 participants