Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads #14320

Closed
wants to merge 7 commits into from

Conversation

VladimirKempik
Copy link

@VladimirKempik VladimirKempik commented Jun 5, 2023

Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V

Initialy found these misaligned loads when profiling finagle-http test from renaissance suite.
The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706)
The other two produced about 100 events combined.
Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub.
Numbers on hifive before and after applying the patch:

Benchmark                                                  Mode  Cnt       Score      Error  Units
StringIndexOf.advancedWithMediumSub                        avgt   25   47031.406 ±  144.005  ns/op

After:

Benchmark                                                 Mode  Cnt       Score     Error  Units
StringIndexOf.advancedWithMediumSub                       avgt   25    4256.830 ±  23.075  ns/op

Testing: tier1/tier2 is clean on hifive.

/cc hotspot-compiler


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14320/head:pull/14320
$ git checkout pull/14320

Update a local copy of the PR:
$ git checkout pull/14320
$ git pull https://git.openjdk.org/jdk.git pull/14320/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14320

View PR using the GUI difftool:
$ git pr show -t 14320

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14320.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 5, 2023

👋 Welcome back vkempik! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added rfr Pull request is ready for review hotspot-compiler hotspot-compiler-dev@openjdk.org labels Jun 5, 2023
@openjdk
Copy link

openjdk bot commented Jun 5, 2023

@VladimirKempik
The hotspot-compiler label was successfully added.

@mlbridge
Copy link

mlbridge bot commented Jun 5, 2023

Copy link
Member

@luhenry luhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@feilongjiang
Copy link
Member

Looks good, I see some load_4chr at [1], could it also produce misaligned loads?

  1. if (needle_con_cnt == 4) {
    Label CH1_LOOP;
    (this->*load_4chr)(ch1, Address(needle), noreg);
    sub(result_tmp, haystack_len, 4);
    slli(tmp3, result_tmp, haystack_chr_shift); // result as tmp
    add(haystack, haystack, tmp3);
    neg(hlen_neg, tmp3);
    bind(CH1_LOOP);
    add(ch2, haystack, hlen_neg);
    (this->*load_4chr)(ch2, Address(ch2), noreg);
    beq(ch1, ch2, MATCH);
    add(hlen_neg, hlen_neg, haystack_chr_size);
    blez(hlen_neg, CH1_LOOP);
    j(NOMATCH);
    }

@VladimirKempik
Copy link
Author

Looks good, I see some load_4chr at [1], could it also produce misaligned loads?

  1. if (needle_con_cnt == 4) {
    Label CH1_LOOP;
    (this->*load_4chr)(ch1, Address(needle), noreg);
    sub(result_tmp, haystack_len, 4);
    slli(tmp3, result_tmp, haystack_chr_shift); // result as tmp
    add(haystack, haystack, tmp3);
    neg(hlen_neg, tmp3);
    bind(CH1_LOOP);
    add(ch2, haystack, hlen_neg);
    (this->*load_4chr)(ch2, Address(ch2), noreg);
    beq(ch1, ch2, MATCH);
    add(hlen_neg, hlen_neg, haystack_chr_size);
    blez(hlen_neg, CH1_LOOP);
    j(NOMATCH);
    }

That's hard to say, I just run some tests with perf and check result for trp_lam events. Such analysis is a lot easier.

@RealFYang
Copy link
Member

RealFYang commented Jun 7, 2023

Hi, I searched and found that we have four direct callers of C2_MacroAssembler::string_indexof_linearscan: three in file riscv.ad (by string_indexof_conUU, string_indexof_conLL and string_indexof_conUL) and one by C2_MacroAssembler::string_indexof. Did you check which one is triggering these unaligned accesses? I am not sure but it looks to me that the three direct callers in file riscv.ad are less likely to have such an issue. If that is true, we might need some distinugish among those callers for better performance. Also, it would be better to have some numbers on other venders like T-head.

@VladimirKempik
Copy link
Author

VladimirKempik commented Jun 7, 2023

I'll test on thead

as for the first part of the patch, at line 494,
on this line, we do nlen_tmp -= 7;
then we at this line, result = haystack + nlen_tmp and read long word from result address, obviously this causes misaligned load

@VladimirKempik
Copy link
Author

Hi, I searched and found that we have four direct callers of C2_MacroAssembler::string_indexof_linearscan: three in file riscv.ad (by string_indexof_conUU, string_indexof_conLL and string_indexof_conUL) and one by C2_MacroAssembler::string_indexof. Did you check which one is triggering these unaligned accesses? I am not sure but it looks to me that the three direct callers in file riscv.ad are less likely to have such an issue. If that is true, we might need some distinugish among those callers for better performance.

Originally, when I found this misaligned load, this code (this->*load_2chr)(ch2, Address(tmp3), noreg); was corresponding to lhu t1, 0(t4). So I can say the isLL variable was true.

@RealFYang
Copy link
Member

RealFYang commented Jun 7, 2023

Hi, I searched and found that we have four direct callers of C2_MacroAssembler::string_indexof_linearscan: three in file riscv.ad (by string_indexof_conUU, string_indexof_conLL and string_indexof_conUL) and one by C2_MacroAssembler::string_indexof. Did you check which one is triggering these unaligned accesses? I am not sure but it looks to me that the three direct callers in file riscv.ad are less likely to have such an issue. If that is true, we might need some distinugish among those callers for better performance.

Originally, when I found this misaligned load, this code (this->*load_2chr)(ch2, Address(tmp3), noreg); was corresponding to lhu t1, 0(t4). So I can say the isLL variable was true.

Can we simply change the two conditions if (AvoidUnalignedAccesses) { added in C2_MacroAssembler::string_indexof_linearscan into something like if (needle_con_cnt == -1 && AvoidUnalignedAccesses) { and see if this could also resolve the problem?

@VladimirKempik
Copy link
Author

See, this part of algo is misaligned

 bind(CH1_LOOP);
    add(tmp3, haystack, hlen_neg);
    (this->*load_2chr)(ch2, Address(tmp3), noreg);
    beq(ch1, ch2, MATCH);
    add(hlen_neg, hlen_neg, haystack_chr_size);
    blez(hlen_neg, CH1_LOOP);

this becomes:

CH1_LOOP: add t4, a1, a2
lhu t1, 0(t4)
beg t0, t1, 0xMATCH
addi a2, a2, 1
blez a2, CH1_LOOP

so we load halfword on each iteration from address (a1+a2), and while a1 is constant, a2 is incrementing by 1 each step, so every other load is misaligned.

@RealFYang
Copy link
Member

See, this part of algo is misaligned

 bind(CH1_LOOP);
    add(tmp3, haystack, hlen_neg);
    (this->*load_2chr)(ch2, Address(tmp3), noreg);
    beq(ch1, ch2, MATCH);
    add(hlen_neg, hlen_neg, haystack_chr_size);
    blez(hlen_neg, CH1_LOOP);

this becomes:

CH1_LOOP: add t4, a1, a2
lhu t1, 0(t4)
beg t0, t1, 0xMATCH
addi a2, a2, 1
blez a2, CH1_LOOP

so we load halfword on each iteration from address (a1+a2), and while a1 is constant, a2 is incrementing by 1 each step, so every other load is misaligned.

I see it now. Thanks. Then I am expecting that we could also have similar issue for the if (needle_con_cnt == 4) case in the same function where we do load_4chr incrementally with 1 byte step when isLL variable is true, right?

@VladimirKempik
Copy link
Author

VladimirKempik commented Jun 7, 2023

Hi, I searched and found that we have four direct callers of C2_MacroAssembler::string_indexof_linearscan: three in file riscv.ad (by string_indexof_conUU, string_indexof_conLL and string_indexof_conUL) and one by C2_MacroAssembler::string_indexof. Did you check which one is triggering these unaligned accesses? I am not sure but it looks to me that the three direct callers in file riscv.ad are less likely to have such an issue. If that is true, we might need some distinugish among those callers for better performance.

Originally, when I found this misaligned load, this code (this->*load_2chr)(ch2, Address(tmp3), noreg); was corresponding to lhu t1, 0(t4). So I can say the isLL variable was true.

Can we simply change the two conditions if (AvoidUnalignedAccesses) { added in C2_MacroAssembler::string_indexof_linearscan into something like if (needle_con_cnt == -1 && AvoidUnalignedAccesses) { and see if this could also resolve the problem?

I can't put whole section ( https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L714 ) under AvoidUnalignedAccesses as it defines label DO3 which is used earlier

Looking more at how it works:
needle is a search string. haystack is where we search for needle

at line 627, we have a main body which performs linear search if needle len is variable. this part has no misaligned access
however if needle is short, then it can go to line 675 ( 4 char needle), which can do misaligned access
it also can go to line 692 for 2 char needle, which can do misaligned access.
it also can go to line 721 for 3 char needle, which can do misaligned access.
and it can go to line 761 for 1 char needle, which can't do misaligned access, but it assumes the needle is 1 character long.

when size of needle is constant ( in [1,4] range) then the code directly goes to line 675 for 4 char needle, line 692 for 2 char needle, line 721 for 3 char needle and line 761 for 1 char needle. 4,3,2 can still do misaligned loads.

I think we can't just remove cases for 4/3/2, but can optimize them to do less loads when in AvoidUnaligned mode

@VladimirKempik
Copy link
Author

@RealFYang check last commit please, it improves DO2 by replacing one haystack_load_1chr() with one srli in expense one one load before the loop (CH1_LOOP)

@VladimirKempik
Copy link
Author

I have made a microtest to test specifically DO2 part of string_indexof_linear

diff --git a/test/micro/org/openjdk/bench/java/lang/StringIndexOf.java b/test/micro/org/openjdk/bench/java/lang/StringIndexOf.java
index 57ced6d8e13..33c8d998d8d 100644
--- a/test/micro/org/openjdk/bench/java/lang/StringIndexOf.java
+++ b/test/micro/org/openjdk/bench/java/lang/StringIndexOf.java
@@ -46,6 +46,8 @@ public class StringIndexOf {
     private String shortSub1;
     private String data2;
     private String shortSub2;
+
+    private String shortSub3;
     private String string16Short;
     private String string16Medium;
     private String string16Long;
@@ -64,6 +66,7 @@ public class StringIndexOf {
         shortSub1 = "1";
         data2 = "00001001010100a10110101010010101110101001110110101010010101010010000010111010101010101010a100010010101110111010101101010100010010a100a0010101111111000010101010010101000010101010010101010101110a10010101010101010101010101010";
         shortSub2 = "a";
+        shortSub3 = "a1";
         searchChar = 's';
 
         string16Short = "scar\u01fe1";
@@ -246,6 +249,20 @@ public class StringIndexOf {
         return dummy;
     }
 
+    /**
+     * Benchmarks String.indexOf with a rather big String. Search repeatedly for a matched that is 2 chars but only with
+     * a few matches.
+     */
+    @Benchmark
+    public int advancedWithShortSub3() {
+        int dummy = 0;
+        int index = 0;
+        while ((index = data2.indexOf(shortSub3, index)) > -1) {
+            index++;
+            dummy += index;
+        }
+        return dummy;
+    }
     @Benchmark
     public void constantPattern() {
         String tmp = "simple-hash:SHA-1/UTF-8";

Results, v1 - original patch in this PR, v2 - latest update to DO2

hifive
Benchmark                            Mode  Cnt     Score    Error  Units
Before
StringIndexOf.advancedWithShortSub3  avgt   25  37302.933 ± 80.306  ns/op
V1
StringIndexOf.advancedWithShortSub3  avgt   25  1362.159 ± 37.021  ns/op
V2
StringIndexOf.advancedWithShortSub3  avgt   25  1248.750 ± 40.432  ns/op


thead
Benchmark                            Mode  Cnt     Score    Error  Units
Before
StringIndexOf.advancedWithShortSub3  avgt   25  632.976 ? 42.601  ns/op
V1
StringIndexOf.advancedWithShortSub3  avgt   25  916.040 ? 45.086  ns/op
V2
StringIndexOf.advancedWithShortSub3  avgt   25  919.363 ? 21.977  ns/op

while hifive benefits the update, thead doesn't care and like misaligned way the most

@VladimirKempik
Copy link
Author

VladimirKempik commented Jun 7, 2023

Numbers on DO4 ( comparing 4 characters at once) ( substring has to be final String of 4 characters)
DO4:

hifive
Benchmark                                 Mode  Cnt     Score    Error  Units
before
StringIndexOf.advancedWithShortSub4Chars  avgt   25  69514.891 ± 128.730  ns/op
after
StringIndexOf.advancedWithShortSub4Chars  avgt   25  2481.448 ± 13.481  ns/op

thead

Benchmark                                 Mode  Cnt     Score    Error  Units
before
StringIndexOf.advancedWithShortSub4Chars  avgt   25  753.125 ? 2.859  ns/op
after
StringIndexOf.advancedWithShortSub4Chars  avgt   25  741.031 ? 9.075  ns/op

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp Outdated Show resolved Hide resolved
@RealFYang
Copy link
Member

@VladimirKempik : Thanks for the update. Would you mind one more tweak? Since needle_chr_shift and haystack_chr_shift could be 0 for the L case, I think we should guard the shift instructions at https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L634, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L637, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L679, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L700, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L724, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L775 with conditions if (needle_chr_shift) or if (haystack_chr_shift).

Ah, this won't save us any instructions. Let's change this code snippet: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#LL754-L757
into a single line: slli(tmp3, result_tmp, haystack_chr_shift);

@VladimirKempik
Copy link
Author

VladimirKempik commented Jun 8, 2023

@VladimirKempik : Thanks for the update. Would you mind one more tweak? Since needle_chr_shift and haystack_chr_shift could be 0 for the L case, I think we should guard the shift instructions at https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L634, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L637, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L679, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L700, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L724, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L775 with conditions if (needle_chr_shift) or if (haystack_chr_shift).

It's questionable, mv(Xd, Xs) becomes addi(Xd, Xs, 0).
and what is cheaper - addi(Xd, Xs,0) or slli(Xd, Xs,0) is an open question.

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks.

@openjdk
Copy link

openjdk bot commented Jun 8, 2023

@VladimirKempik This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads

Reviewed-by: luhenry, fjiang, fyang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 163 new commits pushed to the master branch:

  • 9b0baa1: 8306281: function isWsl() returns false on WSL2
  • c884862: 8309468: Remove jvmti Allocate locker test case
  • 05f896a: 8309862: Unsafe list operations in JfrStringPool
  • 4f23fc1: 8309671: Avoid using jvmci.Compiler property to determine if Graal is enabled
  • 1a9edb8: 8309838: Classfile API Util.toBinaryName and other cleanup
  • f7de726: 8295555: Primitive wrapper caches could be @Stable
  • 5d71612: 8309852: G1: Remove unnecessary assert_empty in G1ParScanThreadStateSet destructor
  • 23a54f3: 8309538: G1: Move total collection increment from Cleanup to Remark
  • 57fc9a3: 8309763: Move tests in test/jdk/sun/misc/URLClassPath directory to test/jdk/jdk/internal/loader
  • 2dca5ae: 8299052: ViewportOverlapping test fails intermittently on Win10 & Win11
  • ... and 153 more: https://git.openjdk.org/jdk/compare/dc8bc6c98ca1f9b441cf71c641675fe29dda9162...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 8, 2023
@VladimirKempik
Copy link
Author

I have asked our HW team, addi sometimes can be cheaper than slli: addi Xd, Xs, 0 can be resolved by register renaming ( not on every uarch tho), without using ALU. and slli always uses ALU
So it may be worth it to add umbrella for slli in macroassembler, if shift amount is zero - use addi instead.

@theRealAph
Copy link
Contributor

I am very concerned about the increased complexity and maintenance burden caused by these unaligned access patches. While RISC-V is not a mainstream arch at this time, it may become one, and it that happens we'll need something reasonably maintainable.
Sprinkling 'if (AvoidUnalignedAccesses)' all over the back end is disastrous for readability. I urge you to find a more abstract solution, for example by creating a memory access assembler class and subclassing it as appropriate with aligned and unaligned versions.

@VladimirKempik
Copy link
Author

I am very concerned about the increased complexity and maintenance burden caused by these unaligned access patches. While RISC-V is not a mainstream arch at this time, it may become one, and it that happens we'll need something reasonably maintainable. Sprinkling 'if (AvoidUnalignedAccesses)' all over the back end is disastrous for readability. I urge you to find a more abstract solution, for example by creating a memory access assembler class and subclassing it as appropriate with aligned and unaligned versions.

Hello, do you mean things like load_XXXX_misaligned ( e.g. https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L1735 ) or more complicated things ?

@VladimirKempik
Copy link
Author

VladimirKempik commented Jun 8, 2023

First change, at Line496 regresses performance of indexOf based on Boyer-Moore-Horspool algo on thead :
Before:

org.openjdk.bench.java.lang.StringIndexOf.advancedWithMediumSub   2790.160 ±  56.442  ns/op

After:

org.openjdk.bench.java.lang.StringIndexOf.advancedWithMediumSub  3377.943 ± 42.496 ns/op

Next is wrong, only made things worse:

I think this could be improved

Currently, when we compare a needle and a region of haystack, we first read last 8 bytes from both regions then compare them, then if they match, compare rest byte per byte.
Reading 8 bytes from haystack is not always aligned or misaligned, we can read 4 or 2 bytes for first comparision, reducing wasted reads from haystack.
this way ( reading just 2 bytes from haystack and comparing to first 2 bytes of needle) it gets me this result on hifive:
StringIndexOf.advancedWithMediumSub avgt 25 1790.703 ± 4.880 ns/op
, so even better, will check later on thead

@theRealAph
Copy link
Contributor

I am very concerned about the increased complexity and maintenance burden caused by these unaligned access patches. While RISC-V is not a mainstream arch at this time, it may become one, and it that happens we'll need something reasonably maintainable. Sprinkling 'if (AvoidUnalignedAccesses)' all over the back end is disastrous for readability. I urge you to find a more abstract solution, for example by creating a memory access assembler class and subclassing it as appropriate with aligned and unaligned versions.

Hello, do you mean things like load_XXXX_misaligned ( e.g. https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L1735 ) or more complicated things ?

That's certainly a good start, although I believe its implementation could be much improved. But everywhere you see if (AvoidUnalignedAccesses) is potentially a candidate for factoring out the parts and moving them into a misaligned memory access class.

@VladimirKempik
Copy link
Author

VladimirKempik commented Jun 13, 2023

jmh results, StringIndexOf
advancedWithShortSubXChars - three new tests which test speed of linear search with needle of size X

before:

Benchmark                                                 (loops)  (pathCnt)  (rngSeed)  Mode  Cnt       Score     Error  Units
StringIndexOf.advancedWithMediumSub                           N/A        N/A        N/A  avgt   25   47184.829 ± 227.340  ns/op
StringIndexOf.advancedWithShortSub1                           N/A        N/A        N/A  avgt   25    4121.931 ± 194.659  ns/op
StringIndexOf.advancedWithShortSub2                           N/A        N/A        N/A  avgt   25     992.714 ±  24.638  ns/op
StringIndexOf.advancedWithShortSub2Chars                      N/A        N/A        N/A  avgt   25   37315.528 ±  61.079  ns/op
StringIndexOf.advancedWithShortSub3Chars                      N/A        N/A        N/A  avgt   25   37316.547 ±  26.079  ns/op
StringIndexOf.advancedWithShortSub4Chars                      N/A        N/A        N/A  avgt   25   69367.409 ± 117.164  ns/op
StringIndexOf.constantPattern                                 N/A        N/A        N/A  avgt   25      73.671 ±   3.551  ns/op
StringIndexOf.searchChar16LongSuccess                         N/A        N/A        N/A  avgt   25     251.513 ±   3.103  ns/op
StringIndexOf.searchChar16LongWithOffsetSuccess               N/A        N/A        N/A  avgt   25     258.523 ±   2.925  ns/op
StringIndexOf.searchChar16MediumSuccess                       N/A        N/A        N/A  avgt   25     134.067 ±   3.635  ns/op
StringIndexOf.searchChar16MediumWithOffsetSuccess             N/A        N/A        N/A  avgt   25     146.327 ±   3.257  ns/op
StringIndexOf.searchChar16ShortSuccess                        N/A        N/A        N/A  avgt   25      35.564 ±   2.468  ns/op
StringIndexOf.searchChar16ShortWithOffsetSuccess              N/A        N/A        N/A  avgt   25      40.607 ±   3.270  ns/op
StringIndexOf.searchCharLongSuccess                           N/A        N/A        N/A  avgt   25     122.948 ±   4.485  ns/op
StringIndexOf.searchCharMediumSuccess                         N/A        N/A        N/A  avgt   25      63.505 ±   2.645  ns/op
StringIndexOf.searchCharShortSuccess                          N/A        N/A        N/A  avgt   25      33.107 ±   2.404  ns/op
StringIndexOf.searchString16LongLatinSuccess                  N/A        N/A        N/A  avgt   25     504.675 ±   6.297  ns/op
StringIndexOf.searchString16LongSuccess                       N/A        N/A        N/A  avgt   25     628.733 ±   3.652  ns/op
StringIndexOf.searchString16LongWithOffsetLatinSuccess        N/A        N/A        N/A  avgt   25     325.615 ±   3.355  ns/op
StringIndexOf.searchString16LongWithOffsetSuccess             N/A        N/A        N/A  avgt   25     343.145 ±   3.068  ns/op
StringIndexOf.searchString16MediumLatinSuccess                N/A        N/A        N/A  avgt   25     226.349 ±   3.635  ns/op
StringIndexOf.searchString16MediumSuccess                     N/A        N/A        N/A  avgt   25     279.963 ±   3.536  ns/op
StringIndexOf.searchString16MediumWithOffsetLatinSuccess      N/A        N/A        N/A  avgt   25     161.672 ±   3.024  ns/op
StringIndexOf.searchString16MediumWithOffsetSuccess           N/A        N/A        N/A  avgt   25     162.722 ±   3.598  ns/op
StringIndexOf.searchString16ShortLatinSuccess                 N/A        N/A        N/A  avgt   25     322.307 ±   3.027  ns/op
StringIndexOf.searchString16ShortSuccess                      N/A        N/A        N/A  avgt   25      55.243 ±   3.518  ns/op
StringIndexOf.searchString16ShortWithOffsetLatinSuccess       N/A        N/A        N/A  avgt   25      55.825 ±   2.582  ns/op
StringIndexOf.searchString16ShortWithOffsetSuccess            N/A        N/A        N/A  avgt   25      54.268 ±   3.709  ns/op
StringIndexOf.success                                         N/A        N/A        N/A  avgt   25      80.776 ±   2.500  ns/op
StringIndexOf.successBig                                      N/A        N/A        N/A  avgt   25    6283.167 ±  11.876  ns/op
StringIndexOfChar.latin1_AVX2_String                       100000       1000       1999  avgt   25  206802.394 ± 598.649  ns/op
StringIndexOfChar.latin1_AVX2_char                         100000       1000       1999  avgt   25  103587.559 ± 214.802  ns/op
StringIndexOfChar.latin1_SSE4_String                       100000       1000       1999  avgt   25  121714.481 ± 118.594  ns/op
StringIndexOfChar.latin1_SSE4_char                         100000       1000       1999  avgt   25   75014.737 ± 178.044  ns/op
StringIndexOfChar.latin1_Short_String                      100000       1000       1999  avgt   25  116975.364 ±  90.326  ns/op
StringIndexOfChar.latin1_Short_char                        100000       1000       1999  avgt   25   81844.387 ± 230.281  ns/op
StringIndexOfChar.latin1_mixed_String                      100000       1000       1999  avgt   25  210860.343 ± 159.635  ns/op
StringIndexOfChar.latin1_mixed_char                        100000       1000       1999  avgt   25  117095.518 ± 204.476  ns/op
StringIndexOfChar.utf16_AVX2_String                        100000       1000       1999  avgt   25  100868.093 ± 136.887  ns/op
StringIndexOfChar.utf16_AVX2_char                          100000       1000       1999  avgt   25   80257.944 ± 208.123  ns/op
StringIndexOfChar.utf16_SSE4_String                        100000       1000       1999  avgt   25   74831.080 ± 284.069  ns/op
StringIndexOfChar.utf16_SSE4_char                          100000       1000       1999  avgt   25   64963.525 ± 113.680  ns/op
StringIndexOfChar.utf16_Short_String                       100000       1000       1999  avgt   25   72531.734 ± 209.899  ns/op
StringIndexOfChar.utf16_Short_char                         100000       1000       1999  avgt   25   70835.907 ± 202.187  ns/op
StringIndexOfChar.utf16_mixed_String                       100000       1000       1999  avgt   25  162457.612 ± 178.987  ns/op
StringIndexOfChar.utf16_mixed_char                         100000       1000       1999  avgt   25  149974.738 ± 320.802  ns/op

Hifive, after:

Benchmark                                                 (loops)  (pathCnt)  (rngSeed)  Mode  Cnt       Score     Error  Units
StringIndexOf.advancedWithMediumSub                           N/A        N/A        N/A  avgt   25    4276.564 ±  39.149  ns/op
StringIndexOf.advancedWithShortSub1                           N/A        N/A        N/A  avgt   25    4149.350 ± 209.233  ns/op
StringIndexOf.advancedWithShortSub2                           N/A        N/A        N/A  avgt   25    1128.838 ±  20.157  ns/op
StringIndexOf.advancedWithShortSub2Chars                      N/A        N/A        N/A  avgt   25    1277.692 ±  13.031  ns/op
StringIndexOf.advancedWithShortSub3Chars                      N/A        N/A        N/A  avgt   25    1313.186 ±   9.654  ns/op
StringIndexOf.advancedWithShortSub4Chars                      N/A        N/A        N/A  avgt   25    2488.046 ±   8.964  ns/op
StringIndexOf.constantPattern                                 N/A        N/A        N/A  avgt   25      79.567 ±   5.082  ns/op
StringIndexOf.searchChar16LongSuccess                         N/A        N/A        N/A  avgt   25     251.484 ±   3.302  ns/op
StringIndexOf.searchChar16LongWithOffsetSuccess               N/A        N/A        N/A  avgt   25     256.214 ±   3.778  ns/op
StringIndexOf.searchChar16MediumSuccess                       N/A        N/A        N/A  avgt   25     133.622 ±   3.497  ns/op
StringIndexOf.searchChar16MediumWithOffsetSuccess             N/A        N/A        N/A  avgt   25     139.377 ±   3.008  ns/op
StringIndexOf.searchChar16ShortSuccess                        N/A        N/A        N/A  avgt   25      35.788 ±   2.936  ns/op
StringIndexOf.searchChar16ShortWithOffsetSuccess              N/A        N/A        N/A  avgt   25      37.000 ±   2.983  ns/op
StringIndexOf.searchCharLongSuccess                           N/A        N/A        N/A  avgt   25     124.275 ±   4.894  ns/op
StringIndexOf.searchCharMediumSuccess                         N/A        N/A        N/A  avgt   25      65.132 ±   3.882  ns/op
StringIndexOf.searchCharShortSuccess                          N/A        N/A        N/A  avgt   25      35.020 ±   3.418  ns/op
StringIndexOf.searchString16LongLatinSuccess                  N/A        N/A        N/A  avgt   25     595.135 ±   5.635  ns/op
StringIndexOf.searchString16LongSuccess                       N/A        N/A        N/A  avgt   25     630.710 ±   3.627  ns/op
StringIndexOf.searchString16LongWithOffsetLatinSuccess        N/A        N/A        N/A  avgt   25     321.968 ±   3.086  ns/op
StringIndexOf.searchString16LongWithOffsetSuccess             N/A        N/A        N/A  avgt   25     344.868 ±   5.492  ns/op
StringIndexOf.searchString16MediumLatinSuccess                N/A        N/A        N/A  avgt   25     268.289 ±   7.435  ns/op
StringIndexOf.searchString16MediumSuccess                     N/A        N/A        N/A  avgt   25     276.393 ±   3.831  ns/op
StringIndexOf.searchString16MediumWithOffsetLatinSuccess      N/A        N/A        N/A  avgt   25     161.604 ±   2.949  ns/op
StringIndexOf.searchString16MediumWithOffsetSuccess           N/A        N/A        N/A  avgt   25     166.575 ±   3.478  ns/op
StringIndexOf.searchString16ShortLatinSuccess                 N/A        N/A        N/A  avgt   25     390.758 ±   5.794  ns/op
StringIndexOf.searchString16ShortSuccess                      N/A        N/A        N/A  avgt   25      55.287 ±   4.530  ns/op
StringIndexOf.searchString16ShortWithOffsetLatinSuccess       N/A        N/A        N/A  avgt   25      48.239 ±   1.333  ns/op
StringIndexOf.searchString16ShortWithOffsetSuccess            N/A        N/A        N/A  avgt   25      51.657 ±   2.762  ns/op
StringIndexOf.success                                         N/A        N/A        N/A  avgt   25      83.580 ±   3.200  ns/op
StringIndexOf.successBig                                      N/A        N/A        N/A  avgt   25    6253.601 ±  13.245  ns/op
StringIndexOfChar.latin1_AVX2_String                       100000       1000       1999  avgt   25  180259.333 ± 428.243  ns/op
StringIndexOfChar.latin1_AVX2_char                         100000       1000       1999  avgt   25  103301.911 ± 157.780  ns/op
StringIndexOfChar.latin1_SSE4_String                       100000       1000       1999  avgt   25  106739.090 ± 206.242  ns/op
StringIndexOfChar.latin1_SSE4_char                         100000       1000       1999  avgt   25   75027.524 ± 208.941  ns/op
StringIndexOfChar.latin1_Short_String                      100000       1000       1999  avgt   25  102724.833 ± 231.911  ns/op
StringIndexOfChar.latin1_Short_char                        100000       1000       1999  avgt   25   81018.525 ± 138.541  ns/op
StringIndexOfChar.latin1_mixed_String                      100000       1000       1999  avgt   25  184633.008 ± 209.443  ns/op
StringIndexOfChar.latin1_mixed_char                        100000       1000       1999  avgt   25  116350.746 ± 298.832  ns/op
StringIndexOfChar.utf16_AVX2_String                        100000       1000       1999  avgt   25  110819.605 ± 137.955  ns/op
StringIndexOfChar.utf16_AVX2_char                          100000       1000       1999  avgt   25   79956.001 ± 254.436  ns/op
StringIndexOfChar.utf16_SSE4_String                        100000       1000       1999  avgt   25   75500.341 ± 186.736  ns/op
StringIndexOfChar.utf16_SSE4_char                          100000       1000       1999  avgt   25   64974.675 ± 211.639  ns/op
StringIndexOfChar.utf16_Short_String                       100000       1000       1999  avgt   25   71304.026 ± 163.559  ns/op
StringIndexOfChar.utf16_Short_char                         100000       1000       1999  avgt   25   70843.242 ± 173.108  ns/op
StringIndexOfChar.utf16_mixed_String                       100000       1000       1999  avgt   25  191690.983 ± 301.041  ns/op
StringIndexOfChar.utf16_mixed_char                         100000       1000       1999  avgt   25  149988.445 ± 175.224  ns/op

Thead, before:

Benchmark                                                 (loops)  (pathCnt)  (rngSeed)  Mode  Cnt       Score      Error  Units
StringIndexOf.advancedWithMediumSub                           N/A        N/A        N/A  avgt   25    2734.898 ±   61.540  ns/op
StringIndexOf.advancedWithShortSub1                           N/A        N/A        N/A  avgt   25    2440.471 ±   90.996  ns/op
StringIndexOf.advancedWithShortSub2                           N/A        N/A        N/A  avgt   25     722.081 ±   29.674  ns/op
StringIndexOf.advancedWithShortSub2Chars                      N/A        N/A        N/A  avgt   25     679.410 ±    5.793  ns/op
StringIndexOf.advancedWithShortSub3Chars                      N/A        N/A        N/A  avgt   25     875.206 ±   26.224  ns/op
StringIndexOf.advancedWithShortSub4Chars                      N/A        N/A        N/A  avgt   25     747.692 ±    5.600  ns/op
StringIndexOf.constantPattern                                 N/A        N/A        N/A  avgt   25      69.154 ±    0.647  ns/op
StringIndexOf.searchChar16LongSuccess                         N/A        N/A        N/A  avgt   25     172.494 ±    0.754  ns/op
StringIndexOf.searchChar16LongWithOffsetSuccess               N/A        N/A        N/A  avgt   25     177.181 ±    0.126  ns/op
StringIndexOf.searchChar16MediumSuccess                       N/A        N/A        N/A  avgt   25     106.646 ±    1.143  ns/op
StringIndexOf.searchChar16MediumWithOffsetSuccess             N/A        N/A        N/A  avgt   25     109.219 ±    1.165  ns/op
StringIndexOf.searchChar16ShortSuccess                        N/A        N/A        N/A  avgt   25      40.604 ±    0.316  ns/op
StringIndexOf.searchChar16ShortWithOffsetSuccess              N/A        N/A        N/A  avgt   25      40.440 ±    0.514  ns/op
StringIndexOf.searchCharLongSuccess                           N/A        N/A        N/A  avgt   25      96.637 ±    0.335  ns/op
StringIndexOf.searchCharMediumSuccess                         N/A        N/A        N/A  avgt   25      60.237 ±    1.648  ns/op
StringIndexOf.searchCharShortSuccess                          N/A        N/A        N/A  avgt   25      37.428 ±    0.623  ns/op
StringIndexOf.searchString16LongLatinSuccess                  N/A        N/A        N/A  avgt   25     277.862 ±   12.231  ns/op
StringIndexOf.searchString16LongSuccess                       N/A        N/A        N/A  avgt   25     332.158 ±    0.254  ns/op
StringIndexOf.searchString16LongWithOffsetLatinSuccess        N/A        N/A        N/A  avgt   25     398.582 ±    0.380  ns/op
StringIndexOf.searchString16LongWithOffsetSuccess             N/A        N/A        N/A  avgt   25     422.520 ±    0.153  ns/op
StringIndexOf.searchString16MediumLatinSuccess                N/A        N/A        N/A  avgt   25     135.033 ±    2.969  ns/op
StringIndexOf.searchString16MediumSuccess                     N/A        N/A        N/A  avgt   25     157.165 ±    0.459  ns/op
StringIndexOf.searchString16MediumWithOffsetLatinSuccess      N/A        N/A        N/A  avgt   25     178.419 ±    1.152  ns/op
StringIndexOf.searchString16MediumWithOffsetSuccess           N/A        N/A        N/A  avgt   25     189.184 ±    0.507  ns/op
StringIndexOf.searchString16ShortLatinSuccess                 N/A        N/A        N/A  avgt   25     189.720 ±    5.050  ns/op
StringIndexOf.searchString16ShortSuccess                      N/A        N/A        N/A  avgt   25      48.456 ±    0.015  ns/op
StringIndexOf.searchString16ShortWithOffsetLatinSuccess       N/A        N/A        N/A  avgt   25      41.523 ±    0.261  ns/op
StringIndexOf.searchString16ShortWithOffsetSuccess            N/A        N/A        N/A  avgt   25      44.079 ±    0.142  ns/op
StringIndexOf.success                                         N/A        N/A        N/A  avgt   25      56.303 ±    0.506  ns/op
StringIndexOf.successBig                                      N/A        N/A        N/A  avgt   25     240.224 ±    0.718  ns/op
StringIndexOfChar.latin1_AVX2_String                       100000       1000       1999  avgt   25  151718.306 ±  697.735  ns/op
StringIndexOfChar.latin1_AVX2_char                         100000       1000       1999  avgt   25  101266.052 ±  975.917  ns/op
StringIndexOfChar.latin1_SSE4_String                       100000       1000       1999  avgt   25  101792.535 ±  341.851  ns/op
StringIndexOfChar.latin1_SSE4_char                         100000       1000       1999  avgt   25   55309.860 ±  154.954  ns/op
StringIndexOfChar.latin1_Short_String                      100000       1000       1999  avgt   25   94692.722 ±  413.354  ns/op
StringIndexOfChar.latin1_Short_char                        100000       1000       1999  avgt   25   60527.606 ±  534.854  ns/op
StringIndexOfChar.latin1_mixed_String                      100000       1000       1999  avgt   25  154694.070 ±  323.422  ns/op
StringIndexOfChar.latin1_mixed_char                        100000       1000       1999  avgt   25  102887.596 ±  123.646  ns/op
StringIndexOfChar.utf16_AVX2_String                        100000       1000       1999  avgt   25  102949.366 ± 2005.041  ns/op
StringIndexOfChar.utf16_AVX2_char                          100000       1000       1999  avgt   25   57791.800 ±  104.712  ns/op
StringIndexOfChar.utf16_SSE4_String                        100000       1000       1999  avgt   25   62716.138 ±  163.635  ns/op
StringIndexOfChar.utf16_SSE4_char                          100000       1000       1999  avgt   25   46677.973 ±  161.807  ns/op
StringIndexOfChar.utf16_Short_String                       100000       1000       1999  avgt   25   56375.027 ±  486.974  ns/op
StringIndexOfChar.utf16_Short_char                         100000       1000       1999  avgt   25   50512.176 ±  383.844  ns/op
StringIndexOfChar.utf16_mixed_String                       100000       1000       1999  avgt   25  145740.443 ±  484.267  ns/op
StringIndexOfChar.utf16_mixed_char                         100000       1000       1999  avgt   25  127834.969 ±  130.643  ns/op

thead, after:

Benchmark                                                 (loops)  (pathCnt)  (rngSeed)  Mode  Cnt       Score      Error  Units
StringIndexOf.advancedWithMediumSub                           N/A        N/A        N/A  avgt   25    3377.943 ±   42.496  ns/op
StringIndexOf.advancedWithShortSub1                           N/A        N/A        N/A  avgt   25    2567.466 ±   57.557  ns/op
StringIndexOf.advancedWithShortSub2                           N/A        N/A        N/A  avgt   25     844.403 ±    6.488  ns/op
StringIndexOf.advancedWithShortSub2Chars                      N/A        N/A        N/A  avgt   25     892.346 ±   11.231  ns/op
StringIndexOf.advancedWithShortSub3Chars                      N/A        N/A        N/A  avgt   25     942.688 ±   19.306  ns/op
StringIndexOf.advancedWithShortSub4Chars                      N/A        N/A        N/A  avgt   25     761.535 ±   20.112  ns/op
StringIndexOf.constantPattern                                 N/A        N/A        N/A  avgt   25      75.172 ±    0.294  ns/op
StringIndexOf.searchChar16LongSuccess                         N/A        N/A        N/A  avgt   25     172.765 ±    1.537  ns/op
StringIndexOf.searchChar16LongWithOffsetSuccess               N/A        N/A        N/A  avgt   25     177.554 ±    0.515  ns/op
StringIndexOf.searchChar16MediumSuccess                       N/A        N/A        N/A  avgt   25     105.234 ±    1.079  ns/op
StringIndexOf.searchChar16MediumWithOffsetSuccess             N/A        N/A        N/A  avgt   25     107.671 ±    1.415  ns/op
StringIndexOf.searchChar16ShortSuccess                        N/A        N/A        N/A  avgt   25      40.933 ±    0.015  ns/op
StringIndexOf.searchChar16ShortWithOffsetSuccess              N/A        N/A        N/A  avgt   25      42.273 ±    2.262  ns/op
StringIndexOf.searchCharLongSuccess                           N/A        N/A        N/A  avgt   25      99.018 ±    1.945  ns/op
StringIndexOf.searchCharMediumSuccess                         N/A        N/A        N/A  avgt   25      62.872 ±    3.143  ns/op
StringIndexOf.searchCharShortSuccess                          N/A        N/A        N/A  avgt   25      36.762 ±    0.029  ns/op
StringIndexOf.searchString16LongLatinSuccess                  N/A        N/A        N/A  avgt   25     395.942 ±    0.239  ns/op
StringIndexOf.searchString16LongSuccess                       N/A        N/A        N/A  avgt   25     328.769 ±    0.298  ns/op
StringIndexOf.searchString16LongWithOffsetLatinSuccess        N/A        N/A        N/A  avgt   25     312.369 ±    0.601  ns/op
StringIndexOf.searchString16LongWithOffsetSuccess             N/A        N/A        N/A  avgt   25     422.857 ±    0.483  ns/op
StringIndexOf.searchString16MediumLatinSuccess                N/A        N/A        N/A  avgt   25     175.366 ±    0.034  ns/op
StringIndexOf.searchString16MediumSuccess                     N/A        N/A        N/A  avgt   25     153.542 ±    0.474  ns/op
StringIndexOf.searchString16MediumWithOffsetLatinSuccess      N/A        N/A        N/A  avgt   25     146.393 ±    0.080  ns/op
StringIndexOf.searchString16MediumWithOffsetSuccess           N/A        N/A        N/A  avgt   25     175.485 ±   12.868  ns/op
StringIndexOf.searchString16ShortLatinSuccess                 N/A        N/A        N/A  avgt   25     253.175 ±    1.237  ns/op
StringIndexOf.searchString16ShortSuccess                      N/A        N/A        N/A  avgt   25      46.278 ±    0.316  ns/op
StringIndexOf.searchString16ShortWithOffsetLatinSuccess       N/A        N/A        N/A  avgt   25      42.041 ±    0.566  ns/op
StringIndexOf.searchString16ShortWithOffsetSuccess            N/A        N/A        N/A  avgt   25      44.513 ±    0.976  ns/op
StringIndexOf.success                                         N/A        N/A        N/A  avgt   25      58.469 ±    0.017  ns/op
StringIndexOf.successBig                                      N/A        N/A        N/A  avgt   25     240.645 ±    0.649  ns/op
StringIndexOfChar.latin1_AVX2_String                       100000       1000       1999  avgt   25  137297.837 ± 1618.438  ns/op
StringIndexOfChar.latin1_AVX2_char                         100000       1000       1999  avgt   25   99919.463 ±  264.771  ns/op
StringIndexOfChar.latin1_SSE4_String                       100000       1000       1999  avgt   25   93552.042 ±  412.514  ns/op
StringIndexOfChar.latin1_SSE4_char                         100000       1000       1999  avgt   25   55130.042 ±  228.381  ns/op
StringIndexOfChar.latin1_Short_String                      100000       1000       1999  avgt   25   93682.963 ±  448.951  ns/op
StringIndexOfChar.latin1_Short_char                        100000       1000       1999  avgt   25   60450.415 ±  544.678  ns/op
StringIndexOfChar.latin1_mixed_String                      100000       1000       1999  avgt   25  139723.661 ±  656.951  ns/op
StringIndexOfChar.latin1_mixed_char                        100000       1000       1999  avgt   25  102253.415 ±  189.882  ns/op
StringIndexOfChar.utf16_AVX2_String                        100000       1000       1999  avgt   25  101267.586 ±  437.666  ns/op
StringIndexOfChar.utf16_AVX2_char                          100000       1000       1999  avgt   25   58385.242 ±  423.666  ns/op
StringIndexOfChar.utf16_SSE4_String                        100000       1000       1999  avgt   25   61231.849 ±  111.539  ns/op
StringIndexOfChar.utf16_SSE4_char                          100000       1000       1999  avgt   25   46524.978 ±  171.727  ns/op
StringIndexOfChar.utf16_Short_String                       100000       1000       1999  avgt   25   56955.300 ±  115.976  ns/op
StringIndexOfChar.utf16_Short_char                         100000       1000       1999  avgt   25   50042.089 ±  353.580  ns/op
StringIndexOfChar.utf16_mixed_String                       100000       1000       1999  avgt   25  156943.226 ±  260.089  ns/op
StringIndexOfChar.utf16_mixed_char                         100000       1000       1999  avgt   25  129073.240 ±  124.931  ns/op

@VladimirKempik
Copy link
Author

tier1/2 clean, so
/integrate

@openjdk
Copy link

openjdk bot commented Jun 15, 2023

Going to push as commit 6b94289.
Since your change was applied there have been 191 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 15, 2023
@openjdk openjdk bot closed this Jun 15, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 15, 2023
@openjdk
Copy link

openjdk bot commented Jun 15, 2023

@VladimirKempik Pushed as commit 6b94289.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
5 participants