Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8311906: Race condition in String constructor #15902

Closed
wants to merge 1 commit into from

Conversation

liach
Copy link
Member

@liach liach commented Sep 25, 2023

In the constructor of String, many locations the user-supplied byte or char arrays are read multiple times with a plain memory access; if a user previously wrote to one of such locations out of happens-before order, distinct plain memory reads may result in different unanticipated values.

The main problem caused by such error is that String constructor may incorrectly produce a UTF16 coder string with all-LATIN1 compatible characters when COMPACT_STRING is true, which breaks the contract of String. (The error can happen the other way around, but the resulting LATIN1 string is valid; this patch does not address that.)

Thus, I modified the String data compression for non-trusted arrays: a LATIN1 compression first-pass is still done, but if the first compression fails, a second compression pass is done on a trusted (that is, copied from the original data) data where reading would be consistent. The approach takes a toll on UTF16 string construction time, but should not be more costly memory-wise.

A separate routine to decode UTF8 in String constructor that takes byte encoding has the same multi-read problem, that the old offset-- leads to a problematic double read. This is resolved by copying the data to decode to a local array at first instead of reading from the user-provided byte array. This fix also costs more runtime but at no extra memory cost.

Internal APIs such as newStringNoRepl are not guarded by this patch, as they are already trusted to be immutable and unshared.

test/jdk/java/lang/String tests pass. More testing is needed to see if there are other edge cases not covered.

Please review and don't hesitate to critique my approach and patch.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires CSR request JDK-8319228 to be approved

Integration blocker

 ⚠️ Title mismatch between PR and JBS for issue JDK-8311906

Issues

  • JDK-8311906: Improve robustness of String constructors with mutable array inputs (Bug - P4) ⚠️ Title mismatch between PR and JBS.
  • JDK-8319228: Improve robustness of String constructors with mutable array inputs (CSR)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15902/head:pull/15902
$ git checkout pull/15902

Update a local copy of the PR:
$ git checkout pull/15902
$ git pull https://git.openjdk.org/jdk.git pull/15902/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 15902

View PR using the GUI difftool:
$ git pr show -t 15902

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15902.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 25, 2023

👋 Welcome back liach! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 25, 2023
@openjdk
Copy link

openjdk bot commented Sep 25, 2023

@liach The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Sep 25, 2023
@AlanBateman
Copy link
Contributor

AlanBateman commented Sep 25, 2023

The JBS issue is assigned to Roger Riggs, he has changes in progress for this so please coordinate to avoid this duplicate effort.

@mlbridge
Copy link

mlbridge bot commented Sep 25, 2023

Webrevs

@RogerRiggs
Copy link
Contributor

I've been working on something similar and am in the process of fine tuning it to ensure that it is performance neutral both in both allocation rate and cpu time and has little/no impact on either latin1 or UTF16 string codings.

@liach
Copy link
Member Author

liach commented Sep 25, 2023

A prototype I thought of was to modify array compression to return the stopping index + the bad 2-byte value, packed like lengthCoder used by String concatenation, at that index to avoid double-reads, but was not sure how to modify hotspot code to accomplish that. As long as we have a valid 2-byte value into the UTF16 value array, we can prevent constructing invalid Strings.

@liach
Copy link
Member Author

liach commented Sep 25, 2023

A side comment: I don't think it's problematic if we incorrectly create a LATIN1 String (such as by downcasting a char to byte), for such a String is valid and it's user's fault (for not publishing their writes to the array in happens-before order). We only think about avoiding writing a values array with no significant byte: we can just write any non-trivial 2-byte into UTF16 to make it valid, and that's why I'm looking for compress to return a twoByte << 32 | consumedLength. Such a task of writing a 2-byte should be easy to accomplish.

That said, my adjustment to UTF8 parsing code is overkill: https://github.com/openjdk/jdk/pull/15902/files#diff-f8131d8a48caf7cfc908417fad241393c2ef55408172e9a28dcaa14b1d73e1fbR575-L580 plus changing initialization of dst to a copyOfRange suffices, for only decodeUTF8_UTF16 may have a chance of writing no 2-byte.

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 23, 2023

@liach This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@openjdk openjdk bot added the csr Pull request needs approved CSR before integration label Nov 1, 2023
@liach
Copy link
Member Author

liach commented Nov 4, 2023

Closing in favor of #16425.

@liach liach closed this Nov 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org csr Pull request needs approved CSR before integration rfr Pull request is ready for review
3 participants