-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8329623: NegativeArraySizeException encoding large String to UTF-8 #18663
Conversation
If the estimated size for the result byte array exceeds array index, precompute the exact buffer size. If that exceeds the range, then throw OutOfMemoryError
👋 Welcome back rriggs! A progress list of the required criteria for merging this PR into |
@RogerRiggs This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 42 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@RogerRiggs The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The test case could be more thorough if it tests strings with supplementary codepoints, as the new method computes them exclusively.
I considered that, but the worst case is the x3 expansion. |
OK, never mind then, if it would take considerable time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code fix looks good, but see the comment in the test.
|
||
// Strings of size min+1...min+2, throw OOME | ||
// The resulting byte array would exceed implementation limits | ||
for (int count = min + 1; count < max; count++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case min + 1
cannot lead to a NegativeArraySizeException
in the current code, since 3 * (min + 1) <= MAX_VALUE
. In theory, it should succeed by returning the encoded byte[]
, although It throws OOME
for exceeding VM limits. That is, this case does not trigger the invocation of computeSizeUTF8_UTF16()
in the proposed fix.
Only min + 2
throws NegativeArraySizeException
in the current code, and thus the invocation of computeSizeUTF8_UTF16()
in the proposed fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, different OOMEs are thrown in the two cases triggered by different limits, min +2 is due to integer overflow, while min +1 is due a VM limit on the size of byte[Integer.MAX_VALUE]. Different VM implementations may have different limits on the max size of a byte array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be some merit in lowering the threshold at which an exact size computation is triggered.
The oversized allocation "wastes" quite a bit of memory and causes extra GC work and usually triggers a second copy of the final size.
However, some guess or heuristic would be needed to choose the threshold at which extra cpu work is needed to compute the exact size vs some metric as to the "cost" of wasted memory (and saving on the copy).
Most guesses would be somewhat arbitrary; bigger than 1Mb, 1GB, etc....?
Choosing that number would be out of scope for the issue raised by this bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that the VM limit might change and suddenly accept MAX_VALUE
as an allowed array size (very unlikely, I guess). The test would then fail on min + 1
because it expects a OOME which will not be thrown.
But that is very remote.
/integrate |
Going to push as commit 212a253.
Your commit was automatically rebased without conflicts. |
@RogerRiggs Pushed as commit 212a253. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
When encoding a vary large string in String.getBytes(StandardCharset.UTF_8) computation of the buffer size may exceed the range of a positive 32-bit Integer.
If the estimated size for the result byte array is too large, pre-compute the exact buffer size.
If that exceeds the range, then throw OutOfMemoryError.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18663/head:pull/18663
$ git checkout pull/18663
Update a local copy of the PR:
$ git checkout pull/18663
$ git pull https://git.openjdk.org/jdk.git pull/18663/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18663
View PR using the GUI difftool:
$ git pr show -t 18663
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18663.diff
Webrev
Link to Webrev Comment