Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8329623: NegativeArraySizeException encoding large String to UTF-8 #18663

Closed
wants to merge 1 commit into from

Conversation

RogerRiggs
Copy link
Contributor

@RogerRiggs RogerRiggs commented Apr 5, 2024

When encoding a vary large string in String.getBytes(StandardCharset.UTF_8) computation of the buffer size may exceed the range of a positive 32-bit Integer.
If the estimated size for the result byte array is too large, pre-compute the exact buffer size.
If that exceeds the range, then throw OutOfMemoryError.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8329623: NegativeArraySizeException encoding large String to UTF-8 (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18663/head:pull/18663
$ git checkout pull/18663

Update a local copy of the PR:
$ git checkout pull/18663
$ git pull https://git.openjdk.org/jdk.git pull/18663/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18663

View PR using the GUI difftool:
$ git pr show -t 18663

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18663.diff

Webrev

Link to Webrev Comment

If the estimated size for the result byte array exceeds array index,
precompute the exact buffer size. If that exceeds the range, then throw OutOfMemoryError
@bridgekeeper
Copy link

bridgekeeper bot commented Apr 5, 2024

👋 Welcome back rriggs! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 5, 2024

@RogerRiggs This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8329623: NegativeArraySizeException encoding large String to UTF-8

Reviewed-by: naoto, rgiulietti

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 42 new commits pushed to the master branch:

  • 6439375: 8329533: TestCDSVMCrash fails on libgraal
  • 3ebf8c9: 8329663: hs_err file event log entry for thread adding/removing should print current thread
  • be45de1: 8328627: JShell documentation should be clearer about "remote runtime system"
  • 8648890: 8329749: Obsolete the unused UseNeon flag
  • fc18201: 8327111: Replace remaining usage of create_bool_from_template_assertion_predicate() which requires additional OpaqueLoop*Nodes transformation strategies
  • 7c66465: 8325088: Overloads that differ in type parameters may be lost
  • 6f087cb: 8328698: oopDesc::klass_raw() decodes without a null check
  • d1aad71: 8321204: C2: assert(false) failed: node should be in igvn hash table
  • 51b0abc: 8329340: Remove unused libawt code
  • 3a3b77d: 8329641: RISC-V: Enable some tests related to SHA-2 instrinsic
  • ... and 32 more: https://git.openjdk.org/jdk/compare/21867c929a2f2c961148f2cd1e79d672ac278d27...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 5, 2024
@openjdk
Copy link

openjdk bot commented Apr 5, 2024

@RogerRiggs The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Apr 5, 2024
@mlbridge
Copy link

mlbridge bot commented Apr 5, 2024

Webrevs

Copy link
Member

@naotoj naotoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The test case could be more thorough if it tests strings with supplementary codepoints, as the new method computes them exclusively.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 5, 2024
@RogerRiggs
Copy link
Contributor Author

LGTM. The test case could be more thorough if it tests strings with supplementary codepoints, as the new method computes them exclusively.

I considered that, but the worst case is the x3 expansion.
A 2 character high/low surrogate pair would result in 4 bytes of UTF-8, less than the 6 bytes needed for a 2 16 bit characters. The test doesn't run quickly already due to the large chunks of memory used.

@naotoj
Copy link
Member

naotoj commented Apr 5, 2024

The test doesn't run quickly already due to the large chunks of memory used.

OK, never mind then, if it would take considerable time.

Copy link
Contributor

@rgiulietti rgiulietti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code fix looks good, but see the comment in the test.


// Strings of size min+1...min+2, throw OOME
// The resulting byte array would exceed implementation limits
for (int count = min + 1; count < max; count++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case min + 1 cannot lead to a NegativeArraySizeException in the current code, since 3 * (min + 1) <= MAX_VALUE. In theory, it should succeed by returning the encoded byte[], although It throws OOME for exceeding VM limits. That is, this case does not trigger the invocation of computeSizeUTF8_UTF16() in the proposed fix.

Only min + 2 throws NegativeArraySizeException in the current code, and thus the invocation of computeSizeUTF8_UTF16() in the proposed fix.

Copy link
Contributor Author

@RogerRiggs RogerRiggs Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, different OOMEs are thrown in the two cases triggered by different limits, min +2 is due to integer overflow, while min +1 is due a VM limit on the size of byte[Integer.MAX_VALUE]. Different VM implementations may have different limits on the max size of a byte array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be some merit in lowering the threshold at which an exact size computation is triggered.
The oversized allocation "wastes" quite a bit of memory and causes extra GC work and usually triggers a second copy of the final size.
However, some guess or heuristic would be needed to choose the threshold at which extra cpu work is needed to compute the exact size vs some metric as to the "cost" of wasted memory (and saving on the copy).
Most guesses would be somewhat arbitrary; bigger than 1Mb, 1GB, etc....?
Choosing that number would be out of scope for the issue raised by this bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that the VM limit might change and suddenly accept MAX_VALUE as an allowed array size (very unlikely, I guess). The test would then fail on min + 1 because it expects a OOME which will not be thrown.

But that is very remote.

@RogerRiggs
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Apr 8, 2024

Going to push as commit 212a253.
Since your change was applied there have been 46 commits pushed to the master branch:

  • dd930c5: 8329787: Fix typo in CLDRConverter
  • 115f419: 8329659: Serial: Extract allowed_dead_ratio from ContiguousSpace
  • 9ac3b77: 8329775: Serial: Remove unused declarations in serialFullGC.hpp
  • 7475824: 8329510: Update ProblemList for JFileChooser/8194044/FileSystemRootTest.java
  • 6439375: 8329533: TestCDSVMCrash fails on libgraal
  • 3ebf8c9: 8329663: hs_err file event log entry for thread adding/removing should print current thread
  • be45de1: 8328627: JShell documentation should be clearer about "remote runtime system"
  • 8648890: 8329749: Obsolete the unused UseNeon flag
  • fc18201: 8327111: Replace remaining usage of create_bool_from_template_assertion_predicate() which requires additional OpaqueLoop*Nodes transformation strategies
  • 7c66465: 8325088: Overloads that differ in type parameters may be lost
  • ... and 36 more: https://git.openjdk.org/jdk/compare/21867c929a2f2c961148f2cd1e79d672ac278d27...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 8, 2024
@openjdk openjdk bot closed this Apr 8, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 8, 2024
@openjdk
Copy link

openjdk bot commented Apr 8, 2024

@RogerRiggs Pushed as commit 212a253.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@RogerRiggs RogerRiggs deleted the 8329623-utf8-oome branch November 27, 2024 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants