Skip to content

Conversation

@cl4es
Copy link
Member

@cl4es cl4es commented Feb 13, 2023

We can improve various String methods such as startsWith, endsWith and regionMatches by leveraging the intrinsified mismatch methods in ArraysSupport.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8302163: Speed up various String comparison methods with ArraysSupport.mismatch

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12528/head:pull/12528
$ git checkout pull/12528

Update a local copy of the PR:
$ git checkout pull/12528
$ git pull https://git.openjdk.org/jdk pull/12528/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12528

View PR using the GUI difftool:
$ git pr show -t 12528

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12528.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 13, 2023

👋 Welcome back redestad! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@cl4es
Copy link
Member Author

cl4es commented Feb 13, 2023

Microbenchmarking shows decent improvements on small data, scaling up to some impressive gains on large inputs when vectorization kicks in (~41x on regionMatches for size = 1024 on my x64 sandybridge setup, ~34x on m1 in the same config)

macosx-aarch64, m1, 21-b9:

Benchmark                             (size)  (utf16)  Mode  Cnt     Score    Error  Units
StringComparisons.endsWith                 6     true  avgt   15     5,949 ±  0,051  ns/op
StringComparisons.endsWith                 6    false  avgt   15     4,063 ±  0,038  ns/op
StringComparisons.endsWith                15     true  avgt   15    10,758 ±  0,132  ns/op
StringComparisons.endsWith                15    false  avgt   15     6,487 ±  0,052  ns/op
StringComparisons.endsWith              1024     true  avgt   15   653,750 ±  2,835  ns/op
StringComparisons.endsWith              1024    false  avgt   15   314,219 ±  1,264  ns/op
StringComparisons.regionMatches            6     true  avgt   15    12,460 ±  0,043  ns/op
StringComparisons.regionMatches            6    false  avgt   15     6,647 ±  0,026  ns/op
StringComparisons.regionMatches           15     true  avgt   15    30,502 ±  0,193  ns/op
StringComparisons.regionMatches           15    false  avgt   15    15,073 ±  0,030  ns/op
StringComparisons.regionMatches         1024     true  avgt   15  2147,480 ±  4,622  ns/op
StringComparisons.regionMatches         1024    false  avgt   15  1068,787 ± 14,098  ns/op
StringComparisons.regionMatchesCI          6     true  avgt   15    11,680 ±  0,106  ns/op
StringComparisons.regionMatchesCI          6    false  avgt   15     4,577 ±  0,101  ns/op
StringComparisons.regionMatchesCI         15     true  avgt   15    14,422 ±  0,132  ns/op
StringComparisons.regionMatchesCI         15    false  avgt   15     6,904 ±  0,124  ns/op
StringComparisons.regionMatchesCI       1024     true  avgt   15   273,810 ±  3,446  ns/op
StringComparisons.regionMatchesCI       1024    false  avgt   15   226,040 ±  2,886  ns/op
StringComparisons.regionMatchesRange       6     true  avgt   15    11,896 ±  0,110  ns/op
StringComparisons.regionMatchesRange       6    false  avgt   15     6,044 ±  0,034  ns/op
StringComparisons.regionMatchesRange      15     true  avgt   15    29,508 ±  0,093  ns/op
StringComparisons.regionMatchesRange      15    false  avgt   15    14,336 ±  0,020  ns/op
StringComparisons.regionMatchesRange    1024     true  avgt   15  2187,600 ± 80,285  ns/op
StringComparisons.regionMatchesRange    1024    false  avgt   15  1105,813 ± 28,260  ns/op
StringComparisons.startsWith               6     true  avgt   15     5,315 ±  0,087  ns/op
StringComparisons.startsWith               6    false  avgt   15     3,588 ±  0,020  ns/op
StringComparisons.startsWith              15     true  avgt   15     9,823 ±  0,144  ns/op
StringComparisons.startsWith              15    false  avgt   15     5,963 ±  0,253  ns/op
StringComparisons.startsWith            1024     true  avgt   15   441,854 ±  9,295  ns/op
StringComparisons.startsWith            1024    false  avgt   15   224,386 ±  6,876  ns/op

macosx-aarch64. m1, patched:

Benchmark                             (size)  (utf16)  Mode  Cnt    Score    Error  Units
StringComparisons.endsWith                 6     true  avgt   15    3,185 ±  0,063  ns/op
StringComparisons.endsWith                 6    false  avgt   15    4,507 ±  0,447  ns/op
StringComparisons.endsWith                15     true  avgt   15    6,097 ±  0,142  ns/op
StringComparisons.endsWith                15    false  avgt   15    5,736 ±  0,025  ns/op
StringComparisons.endsWith              1024     true  avgt   15   60,283 ±  4,109  ns/op
StringComparisons.endsWith              1024    false  avgt   15   31,011 ±  0,080  ns/op
StringComparisons.regionMatches            6     true  avgt   15    3,993 ±  0,063  ns/op
StringComparisons.regionMatches            6    false  avgt   15    4,836 ±  0,474  ns/op
StringComparisons.regionMatches           15     true  avgt   15    3,641 ±  0,012  ns/op
StringComparisons.regionMatches           15    false  avgt   15    2,849 ±  0,065  ns/op
StringComparisons.regionMatches         1024     true  avgt   15   57,739 ±  0,748  ns/op
StringComparisons.regionMatches         1024    false  avgt   15   30,943 ±  0,423  ns/op
StringComparisons.regionMatchesCI          6     true  avgt   15   11,729 ±  0,142  ns/op
StringComparisons.regionMatchesCI          6    false  avgt   15    4,562 ±  0,125  ns/op
StringComparisons.regionMatchesCI         15     true  avgt   15   14,611 ±  0,227  ns/op
StringComparisons.regionMatchesCI         15    false  avgt   15    6,970 ±  0,083  ns/op
StringComparisons.regionMatchesCI       1024     true  avgt   15  273,476 ±  1,789  ns/op
StringComparisons.regionMatchesCI       1024    false  avgt   15  227,503 ±  3,547  ns/op
StringComparisons.regionMatchesRange       6     true  avgt   15    3,485 ±  0,035  ns/op
StringComparisons.regionMatchesRange       6    false  avgt   15    4,640 ±  0,050  ns/op
StringComparisons.regionMatchesRange      15     true  avgt   15    6,121 ±  0,087  ns/op
StringComparisons.regionMatchesRange      15    false  avgt   15    5,784 ±  0,080  ns/op
StringComparisons.regionMatchesRange    1024     true  avgt   15   58,403 ±  0,876  ns/op
StringComparisons.regionMatchesRange    1024    false  avgt   15   36,500 ± 11,264  ns/op
StringComparisons.startsWith               6     true  avgt   15    3,225 ±  0,255  ns/op
StringComparisons.startsWith               6    false  avgt   15    4,577 ±  0,684  ns/op
StringComparisons.startsWith              15     true  avgt   15    7,586 ±  4,402  ns/op
StringComparisons.startsWith              15    false  avgt   15    6,295 ±  1,402  ns/op
StringComparisons.startsWith            1024     true  avgt   15   76,536 ± 16,461  ns/op
StringComparisons.startsWith            1024    false  avgt   15   30,523 ±  0,359  ns/op

linux-x64 sandybridge, 21-b9:

Benchmark                             (size)  (utf16)  Mode  Cnt     Score    Error  Units
StringComparisons.endsWith                 6     true  avgt   15    12.382 ±  0.014  ns/op
StringComparisons.endsWith                 6    false  avgt   15     9.975 ±  0.012  ns/op
StringComparisons.endsWith                15     true  avgt   15    18.679 ±  0.028  ns/op
StringComparisons.endsWith                15    false  avgt   15    13.133 ±  0.116  ns/op
StringComparisons.endsWith              1024     true  avgt   15   819.007 ±  5.523  ns/op
StringComparisons.endsWith              1024    false  avgt   15   415.061 ±  0.962  ns/op
StringComparisons.regionMatches            6     true  avgt   15    17.363 ±  0.155  ns/op
StringComparisons.regionMatches            6    false  avgt   15    10.722 ±  0.072  ns/op
StringComparisons.regionMatches           15     true  avgt   15    37.202 ±  0.321  ns/op
StringComparisons.regionMatches           15    false  avgt   15    21.128 ±  0.164  ns/op
StringComparisons.regionMatches         1024     true  avgt   15  2265.847 ±  2.668  ns/op
StringComparisons.regionMatches         1024    false  avgt   15  1305.184 ± 11.794  ns/op
StringComparisons.regionMatchesCI          6     true  avgt   15    24.403 ±  0.367  ns/op
StringComparisons.regionMatchesCI          6    false  avgt   15     9.521 ±  0.015  ns/op
StringComparisons.regionMatchesCI         15     true  avgt   15    27.650 ±  0.033  ns/op
StringComparisons.regionMatchesCI         15    false  avgt   15    12.082 ±  0.130  ns/op
StringComparisons.regionMatchesCI       1024     true  avgt   15   514.851 ±  5.818  ns/op
StringComparisons.regionMatchesCI       1024    false  avgt   15   351.532 ±  4.147  ns/op
StringComparisons.regionMatchesRange       6     true  avgt   15    18.227 ±  0.017  ns/op
StringComparisons.regionMatchesRange       6    false  avgt   15    10.128 ±  0.015  ns/op
StringComparisons.regionMatchesRange      15     true  avgt   15    40.924 ±  0.305  ns/op
StringComparisons.regionMatchesRange      15    false  avgt   15    20.476 ±  0.177  ns/op
StringComparisons.regionMatchesRange    1024     true  avgt   15  2267.800 ±  3.280  ns/op
StringComparisons.regionMatchesRange    1024    false  avgt   15  1139.950 ±  1.222  ns/op
StringComparisons.startsWith               6     true  avgt   15    10.095 ±  0.044  ns/op
StringComparisons.startsWith               6    false  avgt   15     8.908 ±  0.319  ns/op
StringComparisons.startsWith              15     true  avgt   15    16.406 ±  0.017  ns/op
StringComparisons.startsWith              15    false  avgt   15    10.964 ±  0.077  ns/op
StringComparisons.startsWith            1024     true  avgt   15   670.043 ±  5.107  ns/op
StringComparisons.startsWith            1024    false  avgt   15   346.972 ±  0.373  ns/op

linux-x64 sandybridge, patched:

Benchmark                             (size)  (utf16)  Mode  Cnt    Score   Error  Units
StringComparisons.endsWith                 6     true  avgt   15    8.439 ± 0.010  ns/op
StringComparisons.endsWith                 6    false  avgt   15   11.341 ± 0.125  ns/op
StringComparisons.endsWith                15     true  avgt   15   10.932 ± 0.135  ns/op
StringComparisons.endsWith                15    false  avgt   15   10.704 ± 0.143  ns/op
StringComparisons.endsWith              1024     true  avgt   15   69.427 ± 7.376  ns/op
StringComparisons.endsWith              1024    false  avgt   15   40.248 ± 0.661  ns/op
StringComparisons.regionMatches            6     true  avgt   15    8.397 ± 0.010  ns/op
StringComparisons.regionMatches            6    false  avgt   15    9.813 ± 0.064  ns/op
StringComparisons.regionMatches           15     true  avgt   15    8.468 ± 0.105  ns/op
StringComparisons.regionMatches           15    false  avgt   15    8.082 ± 0.181  ns/op
StringComparisons.regionMatches         1024     true  avgt   15   64.316 ± 0.643  ns/op
StringComparisons.regionMatches         1024    false  avgt   15   31.500 ± 0.483  ns/op
StringComparisons.regionMatchesCI          6     true  avgt   15   24.199 ± 0.017  ns/op
StringComparisons.regionMatchesCI          6    false  avgt   15    9.521 ± 0.009  ns/op
StringComparisons.regionMatchesCI         15     true  avgt   15   27.656 ± 0.037  ns/op
StringComparisons.regionMatchesCI         15    false  avgt   15   12.047 ± 0.019  ns/op
StringComparisons.regionMatchesCI       1024     true  avgt   15  513.828 ± 4.221  ns/op
StringComparisons.regionMatchesCI       1024    false  avgt   15  349.885 ± 0.455  ns/op
StringComparisons.regionMatchesRange       6     true  avgt   15    9.771 ± 0.215  ns/op
StringComparisons.regionMatchesRange       6    false  avgt   15   10.190 ± 0.012  ns/op
StringComparisons.regionMatchesRange      15     true  avgt   15   11.352 ± 0.267  ns/op
StringComparisons.regionMatchesRange      15    false  avgt   15   10.617 ± 0.012  ns/op
StringComparisons.regionMatchesRange    1024     true  avgt   15   66.727 ± 2.732  ns/op
StringComparisons.regionMatchesRange    1024    false  avgt   15   43.148 ± 7.262  ns/op
StringComparisons.startsWith               6     true  avgt   15    8.525 ± 0.010  ns/op
StringComparisons.startsWith               6    false  avgt   15    8.792 ± 0.025  ns/op
StringComparisons.startsWith              15     true  avgt   15   10.600 ± 0.564  ns/op
StringComparisons.startsWith              15    false  avgt   15   10.538 ± 0.645  ns/op
StringComparisons.startsWith            1024     true  avgt   15   61.118 ± 5.143  ns/op
StringComparisons.startsWith            1024    false  avgt   15   40.171 ± 0.576  ns/op

@openjdk
Copy link

openjdk bot commented Feb 13, 2023

@cl4es The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Feb 13, 2023
@cl4es cl4es marked this pull request as ready for review February 13, 2023 10:07
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 13, 2023
@mlbridge
Copy link

mlbridge bot commented Feb 13, 2023

Webrevs

@bourgesl
Copy link
Contributor

Amazing gains!
Congratulations

@eirbjo
Copy link
Contributor

eirbjo commented Feb 13, 2023

cat PR >> https://cl4es.github.io/

Copy link
Contributor

@stsypanov stsypanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks simple and harmless.

Copy link
Contributor

@RogerRiggs RogerRiggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good.

@openjdk
Copy link

openjdk bot commented Feb 13, 2023

@cl4es This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8302163: Speed up various String comparison methods with ArraysSupport.mismatch

Reviewed-by: stsypanov, rriggs, alanb

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 92 new commits pushed to the master branch:

  • 101db26: 8301697: [s390] Optimized-build is broken
  • f4d4fa5: 8300255: Introduce interface for GC oop verification in the assembler
  • 99b6c0e: 8302289: RISC-V: Use bgez instruction in arraycopy_simple_check when possible
  • 57aef85: 8301838: PPC: continuation yield intrinsic: exception check not needed if yield succeeded
  • df93880: 8301942: java/net/httpclient/DigestEchoClientSSL.java fail with -Xcomp
  • 0025764: 8040793: vmTestbase/nsk/monitoring/stress/lowmem fails on calling isCollectionUsageThresholdExceeded()
  • 1f9c110: 8301958: Reduce Arrays.copyOf/-Range overheads
  • cb81073: 8300139: [AIX] Use pthreads to avoid JNI_createVM call from primordial thread
  • bbd8ae7: 8294194: [AArch64] Create intrinsics compress and expand
  • 4e327db: 8301499: Replace NULL with nullptr in cpu/zero
  • ... and 82 more: https://git.openjdk.org/jdk/compare/98433a2f6e7fe97e03ed26673c9925d7b26466bf...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 13, 2023
}
toffset <<= coder;
return ArraysSupport.mismatch(ta, toffset,
pa, 0, pc) < 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offset <<= coder is only obvious if the reader knows the value of LATIN1, maybe it would be simpler to read if you kept "int to"?

@eirbjo
Copy link
Contributor

eirbjo commented Feb 13, 2023

Case-insensitive regionMatches could be improved by using ArraysSupport.mismatch to skip over long common substrings:

PR:

Benchmark                          (size)  (utf16)  Mode  Cnt    Score   Error  Units
StringComparisons.regionMatchesCI       6    false  avgt   15    8.248 ± 0.427  ns/op
StringComparisons.regionMatchesCI      15    false  avgt   15   11.408 ± 0.429  ns/op
StringComparisons.regionMatchesCI    1024    false  avgt   15  328.796 ± 7.191  ns/op

Arrays.mismatch:

Benchmark                          (size)  (utf16)  Mode  Cnt   Score   Error  Units
StringComparisons.regionMatchesCI       6    false  avgt   15  10.608 ± 0.302  ns/op
StringComparisons.regionMatchesCI      15    false  avgt   15   8.772 ± 0.347  ns/op
StringComparisons.regionMatchesCI    1024    false  avgt   15  31.070 ± 1.783  ns/op

Patch on top of your PR:

Index: src/java.base/share/classes/java/lang/StringLatin1.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/java.base/share/classes/java/lang/StringLatin1.java b/src/java.base/share/classes/java/lang/StringLatin1.java
--- a/src/java.base/share/classes/java/lang/StringLatin1.java	(revision 6cac333d8f9f34e16168447c60f28a6b0d31623f)
+++ b/src/java.base/share/classes/java/lang/StringLatin1.java	(date 1676305817928)
@@ -384,6 +384,14 @@
                                           byte[] other, int ooffset, int len) {
         int last = toffset + len;
         while (toffset < last) {
+            int mismatch = ArraysSupport.mismatch(value, toffset, other, ooffset, len);
+            if (mismatch == -1) {
+                return true;
+            } else {
+                toffset += mismatch;
+                ooffset += mismatch;
+                len -= mismatch + 1;
+            }
             char c1 = (char)(value[toffset++] & 0xff);
             char c2 = (char)(other[ooffset++] & 0xff);
             if (c1 == c2) {
Index: test/micro/org/openjdk/bench/java/lang/StringComparisons.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/test/micro/org/openjdk/bench/java/lang/StringComparisons.java b/test/micro/org/openjdk/bench/java/lang/StringComparisons.java
--- a/test/micro/org/openjdk/bench/java/lang/StringComparisons.java	(revision 6cac333d8f9f34e16168447c60f28a6b0d31623f)
+++ b/test/micro/org/openjdk/bench/java/lang/StringComparisons.java	(date 1676305817914)
@@ -49,6 +49,7 @@
     public String endsWithA;
     public String endsWithB;
     public String startsWithA;
+    public String endsWitha;
 
     @Setup
     public void setup() {
@@ -56,6 +57,7 @@
         string = c.repeat(size);
         equalString = c.repeat(size);
         endsWithA = c.repeat(size).concat("A");
+        endsWitha = c.repeat(size).concat("a");
         endsWithB = c.repeat(size).concat("B");
         startsWithA = "A" + (c.repeat(size));
     }
@@ -75,6 +77,11 @@
         return endsWithA.regionMatches(0, endsWithB, 0, endsWithB.length());
     }
 
+    @Benchmark
+    public boolean regionMatchesIC() {
+        return endsWithA.regionMatches(true,0, endsWitha, 0, endsWitha.length());
+    }
+
     @Benchmark
     public boolean regionMatchesRange() {
         return startsWithA.regionMatches(1, endsWithB, 0, endsWithB.length() - 1);

@cl4es
Copy link
Member Author

cl4es commented Feb 14, 2023

Case-insensitive regionMatches could be improved by using ArraysSupport.mismatch to skip over long common substrings:

I considered this but saw regressions similar to what you're getting for size = 6 and backed off. I think this might be a good future enhancement, with some care, but didn't want to encumber this RFE with a case that regresses small inputs (which tend to be more common in actual applications).

In similar constructs we have avoided doing the vectorized call in a loop since this could regress worst case inputs severely (an adversary example might be something like regionMatches(true, "aaaaaaaaaaaaaaaaaaaaaa", 0, "aAaAaAaAaAaAaAa", 0, 15) which will call mismatch 8 times on 15 byte of input).

@cl4es
Copy link
Member Author

cl4es commented Feb 15, 2023

Thanks for reviewing!

/integrate

@openjdk
Copy link

openjdk bot commented Feb 15, 2023

Going to push as commit 861e302.
Since your change was applied there have been 127 commits pushed to the master branch:

  • 50dcc2a: 8301460: Clean up LambdaForm to reference BasicType enums directly
  • 28f5250: 8302127: Remove unused arg in write_ref_field_post
  • 0c96584: 8301225: Replace NULL with nullptr in share/gc/shenandoah/
  • 26b111d: 8301700: Increase the default TLS Diffie-Hellman group size from 1024-bit to 2048-bit
  • 5238817: 8301463: Code in DatagramSocket still refers to resolved JDK-8237352
  • 11194e8: 8302325: Wrong comment in java.base/share/native/libjimage/imageFile.hpp
  • 33bec20: 8300808: Accelerate Base64 on x86 for AVX2
  • 46bcc49: 8302147: Speed up compiler/jvmci/compilerToVM/IterateFramesNative.java
  • a9a53f4: 8302152: Speed up tests with infinite loops, sleep less
  • 98a392c: 8302102: Disable ASan for SafeFetch and os::print_hex_dump
  • ... and 117 more: https://git.openjdk.org/jdk/compare/98433a2f6e7fe97e03ed26673c9925d7b26466bf...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 15, 2023
@openjdk openjdk bot closed this Feb 15, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 15, 2023
@openjdk
Copy link

openjdk bot commented Feb 15, 2023

@cl4es Pushed as commit 861e302.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

6 participants