-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8173585: Intrinsify StringLatin1.indexOf(char) #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @jasontatton-aws, welcome to this OpenJDK project and thanks for contributing! We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user jasontatton-aws" as summary for the issue. If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing |
@jasontatton-aws The following labels will be automatically applied to this pull request: When this pull request is ready to be reviewed, an RFR email will be sent to the corresponding mailing lists. If you would like to change these labels, use the |
/covered |
Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated! |
Webrevs
|
Mailing list message from Andrew Haley on hotspot-dev: On 11/09/2020 11:23, Jason Tatton wrote:
Is this really a good idea? When the processor detects Intel AVX instructions, additional https://computing.llnl.gov/tutorials/linux_clusters/intelAVXperformanceWhitePaper.pdf So, if StringLatin1.indexOf(char) executes enough to make a difference
-- |
Mailing list message from Tatton, Jason on hotspot-dev: Hi Everyone, Please could some reviewers volunteer their time to have a look at this patch? It was marked as a starter bug so I don't expect it will consume a lot of time. Thanks, -----Original Message----- This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 Details of testing: - A ?short? string of < 16 characters. Hardware used for testing: - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) ? Intel i7 processor (with AVX2 support). I also ran; ?run-test-tier1? and ?run-test-tier2? for: x86_64 and aarch64. Possible future enhancements: 1. Make use of AVX-512 instructions. Benchmark results: | Benchmark | Mode | Cnt | Score | Error | Units | **With** the new StringLatin1 indexOf(char) intrinsic: | Benchmark | Mode | Cnt | Score | Error | Units | The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 ------------- Commit messages: Changes: https://git.openjdk.java.net/jdk/pull/71/files PR: https://git.openjdk.java.net/jdk/pull/71 |
Hi Andrew,
The current indexOf char intrinsics for StringUTF16 and the new one here for StringLatin1 both use the AVX2 – i.e. 256 bit instructions, these are also affected by the frequency scaling which affects the AVX-512 instructions you pointed out. Of course in a world where all the work taking place is AVX instructions this wouldn’t be an issue but in mixed mode execution this is a problem.
However, the compiler does have knowledge of the capability of the CPU upon which it’s optimizing code for and is able to decide whether to use AVX instructions if they are supported by the CPU AND if it wouldn’t be detrimental for performance. In fact, there is a flag which one can use to interact with this: -XX:UseAVX=version.
This of course made testing this patch an interesting experience as the AVX2 instructions were not enabled on the Xeon processors which I had access to at AWS, but in the end I was able to use an i7 on my corporate macbook to validate the code.
From: mlbridge[bot] <notifications@github.com>
Sent: 11 September 2020 17:01
To: openjdk/jdk <jdk@noreply.github.com>
Cc: Tatton, Jason <jptatton@amazon.com>; Mention <mention@noreply.github.com>
Subject: Re: [openjdk/jdk] 8173585: Intrinsify StringLatin1.indexOf(char) (#71)
Mailing list message from Andrew Haley<mailto:aph@redhat.com> on hotspot-dev<mailto:hotspot-dev@openjdk.java.net>:
On 11/09/2020 11:23, Jason Tatton wrote:
For the x86 implementation there may be two further improvements we
can make in order to improve performance of both the StringUTF16 and
StringLatin1 indexOf(char) intrinsics:
1. Make use of AVX-512 instructions.
Is this really a good idea?
When the processor detects Intel AVX instructions, additional
voltage is applied to the core. With the additional voltage applied,
the processor could run hotter, requiring the operating frequency to
be reduced to maintain operations within the TDP limits. The higher
voltage is maintained for 1 millisecond after the last Intel AVX
instruction completes, and then the voltage returns to the nominal
TDP voltage level.
https://computing.llnl.gov/tutorials/linux_clusters/intelAVXperformanceWhitePaper.pdf
So, if StringLatin1.indexOf(char) executes enough to make a difference
to any real-world program, the effect may well be to slow down the clock
for all of the code that does not use AVX.
2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD
instructions instead of a loop.
…--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#71 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AQ44AL33DFEQ36TCHSTZKCLSFJCUJANCNFSM4Q74AKCQ>.
|
/label add hotspot-compiler |
@dholmes-ora |
Mailing list message from Andrew Haley on hotspot-dev: On 12/09/2020 00:06, Jason Tatton wrote:
Right.
Sure. But the question to be answered is this: we can use AVX for My guess is that there are no (non-FP) workloads where AVX dominates. I have to emphahsize that I don't know the answer to this question.
So: is this worth doing? Are there workloads where this helps? Where -- |
Hi everyone, This patch is just missing a couple of reviewers... Please can someone step forward? I think it's a fairly straightforward change. -Jason |
Mailing list message from Andrew Dinn on hotspot-dev: On 17/09/2020 19:22, Jason Tatton wrote:
I believe you got a review from Andrew Haley -- it was quoted in your What you did not get was license to proceed and push this change. That's regards, Andrew Dinn |
@jasontatton-aws this pull request can not be integrated into git checkout JDK-8173585
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
Hi Andrew, Thanks for coming back to me. Looking on the github PR nobody is tagged as a reviewer for this (perhaps this is a feature which is not being used).
There are two separate things here: 2). Discussion around future enhancements - concerning potential use of AVX-512 instructions and a more optimal implementation for short strings. |
Based on comment by @jatin-bhateja (Intel) frequency level switchover pointed by @theRealAph is sensitive to vector size By keeping vector size less or equal to 32 bytes we should avoid it. And as I can see this intrinsic code is using 32 bytes (chars) and 16 bytes vectors: Also we never had issues with AVX2. only with AVX512 regarding performance hit: I would like to see performance numbers for for all values of UseAVX flag : 0, 1, 2, 3 The usage is guarded UseSSE42Intrinsics in UseSSE42Intrinsics predicate in .ad file. Make sure to test with UseAVX=0 to make sure that some avx instructions are not mixed into non avx code. And also with UseSSE=2 (for example) to make sure shared code correctly recognize that intrinsics is not supported. |
@vnkozlov @mknjc @jatin-bhateja Thanks for providing the relevant details. I'm now quite content that this patch avoids any potential frequency scaling problem. I'm also glad that an explanation of why this is so is now available -- although it's not perfect that we are relying on a stackoverflow post for the full details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jason,
thanks for bringing String.indexOf() for latin strings up to date with the Unicode version.
Your changes look good except a few minor issues I've commented on right in the code.
I'd only like to ask you if you could possibly improve your test a little bit. As far as I understand, your search text is a consecutive sequence of "abc" characters, so you'll always find the character your searching for within the next three characters of the source text. This won't exercise the loops of your intrinsic. Maybe you can also add some test versions where the search character will be found beyond the first 32/64 characters after "fromIndex"?
test/hotspot/jtreg/compiler/intrinsics/string/TestStringLatin1IndexOfChar.java
Outdated
Show resolved
Hide resolved
test/hotspot/jtreg/compiler/intrinsics/string/TestStringLatin1IndexOfChar.java
Outdated
Show resolved
Hide resolved
Rewrite of unit test and newlines added to end of files Changes to unit test: - main test adjusted such that Strings gennerated are much longer (up to 2048 characters) and of the form: azaza, aazaazaa, aaazaaazaaa, etc with 'z' being the search character searched for. Multiple instances of the search character are included in the String in order to validate that the starting offset is correctly handleded. Results are compared to non intrinsified version of the code. Longer strings means that the looping functionality of the various paths is entered into. - Run configurations introduced such that it checks behaviour where use of SSE and AVX instructions are restricted. - Tier4InvocationThreshold adjusted so as to ensure C2 code iis invoked. Other changes: - newlines added at end of files
@simonis Thank you for the corrections, I have ammended them in the latest comit as follows: Changes to unit test:
Other changes:
@vnkozlov here are the performance numbers as requested. I have included performance of the UTF16 version of the intrinsic for reference:
I think the results are as expected, we see improvements in performance as the range of SSE and AVX instructions which can be used is expanded upon. Note that no improvement is observed with UseAVX=3 because there is no AVX-512 code in these intrinsics. |
Hi All, Just wondering if there is anything you'd like me to do in order to assist with moving this patch forward? Thanks, |
test/hotspot/jtreg/compiler/intrinsics/string/TestStringLatin1IndexOfChar.java
Show resolved
Hide resolved
Added new unit test: findOneItem. This will test strings of varying length ensuring that for all lengths one instance of the search char can be found. We check what happens when the search character is in each position of the search string (including first and last positions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes seems fine but you missing Copyright + GPL header in new files.
test/hotspot/jtreg/compiler/intrinsics/string/TestStringLatin1IndexOfChar.java
Show resolved
Hide resolved
|
@jasontatton-aws This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 349 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@simonis, @vnkozlov, @neliasso) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
/integrate |
@jasontatton-aws |
/sponsor |
@phohensee @jasontatton-aws Since your change was applied there have been 350 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit f71e8a6. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Due to the requirement for multiple reviewers, I had been waiting to add my review of the Core-Libs files until the HotSpot reviewers had approved! I see only one reviewer credited in the commit. |
This integration without testing with a current merge from the master and has caused two build failures. JDK-8254761: Wrong intrinsic annotation used for StringLatin1.indexOfChar JDK-8254775: Microbenchmark StringIndexOfChar doesn't compile There is a raw unicode character in the JMH test that causes a compilation error.
|
And also a failed Graal test because of the new intrinsic. And JDK-8254785: compiler/graalunit/HotspotTest.java failed with "missing Graal intrinsics for: java/lang/StringLatin1.indexOfChar([BIII)I" @phohensee don't be so quick to type |
@phohensee - @vnkozlov has determined that a new Tier2 test failure is |
This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as code for C2.
Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and java/lang/StringLatin1.java
https://bugs.openjdk.java.net/browse/JDK-8173585
Details of testing:
I have created a jtreg test “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new intrinsic. Note that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the input String. Hence the test has been designed to cover all these cases. In summary they are:
Hardware used for testing:
I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64.
Possible future enhancements:
For the x86 implementation there may be two further improvements we can make in order to improve performance of both the StringUTF16 and StringLatin1 indexOf(char) intrinsics:
Benchmark results:
Without the new StringLatin1 indexOf(char) intrinsic:
With the new StringLatin1 indexOf(char) intrinsic:
The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when running on ARM.
Progress
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/71/head:pull/71
$ git checkout pull/71