Skip to content

8301958: Reduce Arrays.copyOf/-Range overheads #12453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

cl4es
Copy link
Member

@cl4es cl4es commented Feb 7, 2023

This patch adds special-cases to Arrays.copyOf and Arrays.copyOfRange to copy arrays more efficiently when exactly the whole input array is to be copied. This helps eliminate range checks and has been verified to help various String operations. Example:

Baseline

Benchmark                                            (size)  Mode  Cnt   Score   Error  Units
StringConstructor.newStringFromArray                      7  avgt   15  16.817 ± 0.369  ns/op
StringConstructor.newStringFromArrayWithCharset           7  avgt   15  16.866 ± 0.449  ns/op
StringConstructor.newStringFromArrayWithCharsetName       7  avgt   15  22.198 ± 0.396  ns/op

Patch:

Benchmark                                            (size)  Mode  Cnt   Score   Error  Units
StringConstructor.newStringFromArray                      7  avgt   15  14.666 ± 0.336  ns/op
StringConstructor.newStringFromArrayWithCharset           7  avgt   15  14.582 ± 0.288  ns/op
StringConstructor.newStringFromArrayWithCharsetName       7  avgt   15  20.339 ± 0.328  ns/op

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453
$ git checkout pull/12453

Update a local copy of the PR:
$ git checkout pull/12453
$ git pull https://git.openjdk.org/jdk pull/12453/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12453

View PR using the GUI difftool:
$ git pr show -t 12453

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12453.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 7, 2023

👋 Welcome back redestad! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 7, 2023
@openjdk
Copy link

openjdk bot commented Feb 7, 2023

@cl4es The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Feb 7, 2023
@mlbridge
Copy link

mlbridge bot commented Feb 7, 2023

@@ -4535,9 +4541,12 @@ void getBytes(byte[] dst, int srcPos, int dstBegin, byte coder, int length) {
String(AbstractStringBuilder asb, Void sig) {
byte[] val = asb.getValue();
int length = asb.length();
// To avoid surprises due to data races (which would either truncate or throw an exception)
// we should check that length <= val.length up front
checkOffset(length, val.length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree a check is needed here but I assume using checkOffset means that SB::toString could fail with SIOOBE. I wonder if Math.min(length, val.length) would be better here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. That keeps behavior consistent for most cases and removes a path where we can fail with SIOOBE in the existing code (down StringUTF16::compress).

Copy link
Contributor

@AlanBateman AlanBateman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest version looks good, I assume you've bump the copyright date on the files that still have 2022 as their last edit.

@openjdk
Copy link

openjdk bot commented Feb 7, 2023

@cl4es This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8301958: Reduce Arrays.copyOf/-Range overheads

Reviewed-by: alanb, smarks

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 85 new commits pushed to the master branch:

  • cb81073: 8300139: [AIX] Use pthreads to avoid JNI_createVM call from primordial thread
  • bbd8ae7: 8294194: [AArch64] Create intrinsics compress and expand
  • 4e327db: 8301499: Replace NULL with nullptr in cpu/zero
  • 0458d38: 6513512: MetalLookAndFeel.initClassDefaults does not install an entry for MetalMenuBarUI
  • f4b140b: 8296410: HttpClient throws java.io.IOException: no statuscode in response for HTTP2
  • 5d39d14: 8299970: Speed up compiler/arraycopy/TestArrayCopyConjoint.java
  • d1c87a0: 8302203: IR framework should detect non-compilable test methods early
  • 1fec6b5: 8301852: RISC-V: Optimize class atomic when order is memory_order_relaxed
  • 7c233bc: 8302114: RISC-V: Several foreign jtreg tests fail with debug build after JDK-8301818
  • 8049e59: 8301833: Add wide-ranging tests for FDLIBM porting
  • ... and 75 more: https://git.openjdk.org/jdk/compare/98433a2f6e7fe97e03ed26673c9925d7b26466bf...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 7, 2023
@@ -760,8 +760,7 @@ public static String newString(byte[] val, int index, int len) {
if (len == 0) {
return "";
}
return new String(Arrays.copyOfRange(val, index, index + len),
LATIN1);
return new String(String.copyBytes(val, index, len), LATIN1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worthwhile putting a comment at the top of this method saying that the caller is required to bounds-check the arguments. It's called from several other classes in this package, so it would be good to document this package-internal contract. I checked the callers and they seem fine.

@rose00
Copy link
Contributor

rose00 commented Feb 7, 2023

Our source code is a reference implementation, and people will look at this change as evidence that Arrays::copyOfRange should be hand-inlined by savvy coders. Surely we could also fix this small performance pothole by improving C2’s treatment of Arrays.copyOfRange. That would benefit all users as well, not just String. That is our preferred way to handle things.

On the other hand, String is an important class and worthy of every tiny tweak we give it. Do we need this fix now? If so, I suggest putting in a comment in the code which says something like “normally one would use Arrays.copyOfRange here, but we get slightly better code in this particular case”. Also, regarding the bug against the JIT, I suggest that we back out this change to String when that JIT bug is fixed. Perhaps the comment in String should reference that bug.

Copy link

@franz1981 franz1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cl4es to look into this!

@@ -695,6 +695,12 @@ private String(Charset charset, byte[] bytes, int offset, int length) {
}
}

static byte[] copyBytes(byte[] bytes, int offset, int length) {
Copy link

@franz1981 franz1981 Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the stub generated for array copy seems highly dependent by the call site constrains, did you tried adding a check for offset == 0 and/or length == bytes.length?

If (offset == 0 && bytes.length == length) {
    System.arrayCopy(bytes, 0, dst, 0, bytes.length);
    // etc etc the other combinations 

This should have different generated stubs with much smaller ASM depending by the enforced constrains (and shouldn't affect terribly the code size of the method, given that the stub won't be inlined AFAIK)

Beware, as noted by others, I'm not suggesting that's the way to fix this, but it would be interesting to check how much perf we leave on the ground due to the this supposed "inefficient" stub generation (if that's the issue).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some quick experiments but saw no clear win from doing anything like this here. Feel free to experiment and see if there's some particular configuration that comes out ahead.

FTR I did not intend for this RFE to solve https://bugs.openjdk.org/browse/JDK-8295496 completely, but provide a small, partial win that might possibly clear a path to solving that likely orthogonal issue.

Copy link

@franz1981 franz1981 Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a separate benchmark for this (named as your by accident - given that I've used it as a blueprint):
https://gist.github.com/franz1981/658c2bf6796aab4ae04a84bef1ef34b6
results are

Benchmark                             (offset)  (size)  Mode  Cnt   Score   Error  Units
StringConstructor.arrayCopy                  0       7  avgt   10   9.519 ± 0.131  ns/op
StringConstructor.arrayCopy                  1       7  avgt   10   9.194 ± 0.232  ns/op
StringConstructor.copyOf                     0       7  avgt   10  11.548 ± 0.133  ns/op
StringConstructor.copyOf                     1       7  avgt   10   9.812 ± 0.018  ns/op
StringConstructor.optimizedArrayCopy         0       7  avgt   10   6.854 ± 0.355  ns/op    <---- THAT'S COOL
StringConstructor.optimizedArrayCopy         1       7  avgt   10   9.088 ± 0.049  ns/op

the optimized array copy is helping C2 on stub generation.
I didn't checked yet if this applies to the String case and I didn't created a long enough dataset array to check the effects on the branch predictor with the newly introduced conditions too, but in term of generated stub, there's a difference.

@cl4es
Copy link
Member Author

cl4es commented Feb 7, 2023

Our source code is a reference implementation, and people will look at this change as evidence that Arrays::copyOfRange should be hand-inlined by savvy coders. Surely we could also fix this small performance pothole by improving C2’s treatment of Arrays.copyOfRange. That would benefit all users as well, not just String. That is our preferred way to handle things.

On the other hand, String is an important class and worthy of every tiny tweak we give it. Do we need this fix now? If so, I suggest putting in a comment in the code which says something like “normally one would use Arrays.copyOfRange here, but we get slightly better code in this particular case”. Also, regarding the bug against the JIT, I suggest that we back out this change to String when that JIT bug is fixed. Perhaps the comment in String should reference that bug.

It might be that the redundant checks in Arrays.copyOfRange would be eliminated properly with more inlining, and that the issue here is that the affected String constructor is particularly gnarly. This gnarliness is due 1) the need to construct the value and coder in one go and 2) the lack of multiple return values. Give me a value-based record type so we can split apart the constructor into charset-specific methods with zero overhead and we might be able to split up the logic into better-sized chunks.

But yes, I will add some commentary to the effect that this should ideally be handled by our JIT, along with comments that the method deliberately avoids safety checks.

@stuart-marks
Copy link
Member

It might be that the redundant checks in Arrays.copyOfRange would be eliminated properly with more inlining, and that the issue here is that the affected String constructor is particularly gnarly. This gnarliness is due 1) the need to construct the value and coder in one go and 2) the lack of multiple return values. Give me a value-based record type so we can split apart the constructor into charset-specific methods with zero overhead and we might be able to split up the logic into better-sized chunks.

I'm wondering if another contributing factor to the complexity of this code is the continued support of the non-compact-String codepaths. This means there are actually three code paths through every string computation. Do we need to continue to support the non-compact-string code paths? I'm concerned about maintainability too.

@cl4es
Copy link
Member Author

cl4es commented Feb 7, 2023

It might be that the redundant checks in Arrays.copyOfRange would be eliminated properly with more inlining, and that the issue here is that the affected String constructor is particularly gnarly. This gnarliness is due 1) the need to construct the value and coder in one go and 2) the lack of multiple return values. Give me a value-based record type so we can split apart the constructor into charset-specific methods with zero overhead and we might be able to split up the logic into better-sized chunks.

I'm wondering if another contributing factor to the complexity of this code is the continued support of the non-compact-String codepaths. This means there are actually three code paths through every string computation. Do we need to continue to support the non-compact-string code paths? I'm concerned about maintainability too.

I think most apps have sufficient ASCII/latin1-encodable data to make compact strings a net win. Especially with recent improvements to key intrinsics that has narrowed the gap.

I still think turning off compact strings might be beneficial in locales where most strings are UTF-16, but as you say there might be wins to maintainability and code complexity by ripping out -XX:-CompactStrings (which could mean a net performance win for all).

@rose00
Copy link
Contributor

rose00 commented Feb 7, 2023

It is at least possible that splitting Arrays::copyOfRange would make it inline more readily in all use cases as well as this particular one.

By splitting, in this case, I mean moving the first three lines into a helper routine called copyArrayChecks, and returning newLength as a result. That would make it inline better, I’m pretty sure.

It’s sad and true, until we get a better inliner, which of course will disrupt the ecosystem because there is no unique best answer to an inlining problem (if it is complex enough, and they are). So for now we pretend to be good O-O programmers and code separate concerns in separate methods.

@cl4es
Copy link
Member Author

cl4es commented Feb 7, 2023

Rather than splitting right down the middle isn't it more effective to factor out code that would typically not be executed, such as the exception creation + formatting? That additionally allows the JIT to outline such code altogether, allowing more aggressive inlining of the non-exceptional path(s).

@rose00
Copy link
Contributor

rose00 commented Feb 7, 2023

You could split it that way as well. It pushes the inliner a little harder, but doesn’t make it fall over.

@cl4es
Copy link
Member Author

cl4es commented Feb 7, 2023

@franz1981 idea seems to apply nicely here, and going back and applying it to Arrays.copyOfRange end up on top for the common case where copyOfRange copies the entire range:

Benchmark                                            (size)  Mode  Cnt   Score   Error  Units
StringConstructor.newStringFromArray                      7  avgt   15  14.666 ± 0.336  ns/op
StringConstructor.newStringFromArrayWithCharset           7  avgt   15  14.582 ± 0.288  ns/op
StringConstructor.newStringFromArrayWithCharsetName       7  avgt   15  20.339 ± 0.328  ns/op

We might still benefit for some cases to specialize a copyBytes method, but this solution might help more cases. If others agree I might take a step back and apply this optimization to all the copyOfRange methods and add some microbenchmarking to verify this more widely.

@cl4es cl4es changed the title 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String 8301958: Reduce Arrays.copyOfRange overheads Feb 7, 2023
Copy link

@franz1981 franz1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cl4es cl4es changed the title 8301958: Reduce Arrays.copyOfRange overheads 8301958: Reduce Arrays.copyOf/-Range overheads Feb 8, 2023
@openjdk openjdk bot added ready Pull request is ready to be integrated and removed ready Pull request is ready to be integrated labels Feb 8, 2023
@@ -3582,6 +3590,9 @@ public static short[] copyOf(short[] original, int newLength) {
* @since 1.6
*/
public static int[] copyOf(int[] original, int newLength) {
if (newLength == original.length) {
return original.clone();
Copy link
Contributor

@schlosna schlosna Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious about the use of clone for some primitive array types (short[], int[], long[], char[], float[]) and copyOf using System.arraycopy in other types (byte[], double[]). Do these types optimize differently or hit different intrinsics depending on primitive type? Is there difference in array zeroing?

From a quick JMH benchmark System.arraycopy seems slightly better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this and also saw a small win using arraycopy, but the PR ended up in an inconsistent state with some using one and some using the other. While this discrepancy seem like something we should treat as a bug, I've arranged to use copyOf helper consistently for now.

Copy link
Contributor

@schlosna schlosna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also curious if returning the new length from checkLength would be worthwhile

@cl4es cl4es requested review from stuart-marks and AlanBateman and removed request for stuart-marks and AlanBateman February 8, 2023 17:12
Copy link
Contributor

@AlanBateman AlanBateman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nudge from John and others to move this to Arrays.copyOf is much better. It looks good.

We may have to come back to the SB::toString issue from the original proposal.

@cl4es
Copy link
Member Author

cl4es commented Feb 13, 2023

Thanks for reviewng!

I've filed https://bugs.openjdk.org/browse/JDK-8302315 to investigate the clone/arraycopy performance discrepancy. Ideally we should be able to just do array.clone() here and get optimal performance with less code.

/integrate

@openjdk
Copy link

openjdk bot commented Feb 13, 2023

Going to push as commit 1f9c110.
Since your change was applied there have been 85 commits pushed to the master branch:

  • cb81073: 8300139: [AIX] Use pthreads to avoid JNI_createVM call from primordial thread
  • bbd8ae7: 8294194: [AArch64] Create intrinsics compress and expand
  • 4e327db: 8301499: Replace NULL with nullptr in cpu/zero
  • 0458d38: 6513512: MetalLookAndFeel.initClassDefaults does not install an entry for MetalMenuBarUI
  • f4b140b: 8296410: HttpClient throws java.io.IOException: no statuscode in response for HTTP2
  • 5d39d14: 8299970: Speed up compiler/arraycopy/TestArrayCopyConjoint.java
  • d1c87a0: 8302203: IR framework should detect non-compilable test methods early
  • 1fec6b5: 8301852: RISC-V: Optimize class atomic when order is memory_order_relaxed
  • 7c233bc: 8302114: RISC-V: Several foreign jtreg tests fail with debug build after JDK-8301818
  • 8049e59: 8301833: Add wide-ranging tests for FDLIBM porting
  • ... and 75 more: https://git.openjdk.org/jdk/compare/98433a2f6e7fe97e03ed26673c9925d7b26466bf...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 13, 2023
@openjdk openjdk bot closed this Feb 13, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 13, 2023
@openjdk
Copy link

openjdk bot commented Feb 13, 2023

@cl4es Pushed as commit 1f9c110.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

6 participants