New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 #5621
Conversation
|
/label remove security |
@wangweij |
…ISOArray intrinsic (currently x86 only)
The current version (cef05f4) copies the ISO_8859_1.implEncodeISOArray intrinsic and adapts it to work on ASCII encoding, which makes the UTF_8$Encoder perform on par with (or outperform) encoding from a String. Using microbenchmarks provided by @carterkozak here: https://github.com/carterkozak/stringbuilder-encoding-performance Baseline:
Patch:
This can probably be simplified further, say by adding a flag to the intrinsic of whether we're encoding ASCII only or ISO-8859-1. It also needs to be implemented and tested on all architectures. (edit: accidentally edit rather than quote-reply, restored original comment) |
Done: Removed the addition of a new C2 Node, merged the macro assembler encode_iso_array and encode_ascii_array and added a predicate to select the behavior.
Implementing this on other hardware is Future Work. The non-x86 intrinsics for implEncodeISOArray all seem to use clever tricks rather than a simple mask that can be easily switched from detecting non-latin-1(0xFF00) to detecting ASCII (0xFF80). Clever tricks make it rather challenging to extend this like I could easily do in the x86 code (most all assembler is foreign to me) |
… using internal APIs; remove adhoc performance tests
/label add hotspot-compiler |
@cl4es |
…ze ASCII-compatible SingleByte (e.g. ISO-8859-15)
On the JDK-included
With the proposed patch:
That is: on my system encoding 16K char ASCII data is 10x faster for UTF-8 and ASCII, and roughly 48x faster for ASCII-compatible charsets like ISO-8859-15. On 3rd party microbenchmarks we can assert that performance for non-ASCII input either doesn't change, or improves when messages have an ASCII prefix. |
Webrevs
|
@cl4es This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 107 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
|
x86 part of changes look good. |
Very nice. The changes look good to me, just added some minor comments.
Should we remove the "iso" part from the method/class names?
I'm open to suggestions, but I've not been able to think of anything better. |
…rms implements support for the _encodeAsciiArray intrinsic
Okay, that's fine with me. |
…II EncodeISOArrayNodes
Mailing list message from Claes Redestad on core-libs-dev: Yes, this was spotted and fixed already. Annoyingly the test I added didn't detect this so GHA was green, but it failed some tier2 tests on aarch64. I added extra safeguards by predicating matching the encode_iso_array instructions on the node being !ascii, which will cause C2 to report an error rather than silently using the iso variant for ascii-only nodes. H?mta Outlook f?r Android<https://aka.ms/AAb9ysg> On Tue, 28 Sep 2021 10:01:43 GMT, Claes Redestad <redestad at openjdk.org> wrote:
In principle yes, but shouldn't the condition read: if (!Matcher::match_rule_supported(Op_EncodeISOArray) || !Matcher::supports_encode_ascii_array) return false; I.e. the intrinisc is supported if both conditions are true and not supported if either one of them is false? ------------- PR: https://git.openjdk.java.net/jdk/pull/5621 |
…o where the ISO intrinsic was used in place of the ASCII-only intrinsic
Thanks for reviewing, @TobiHartmann I also cleaned up the test and made sure it fails when there's a logic bug like the one I introduced in 9800a99 where the ISO array intrinsic would be matched for a case requiring the ASCII-only array intrinsic. The test was (in)conveniently never testing out-of-range characters in the 0x80-0xFF range, which is precisely where the two intrinsics would produce different results. I hope this doesn't require another 24 hour grace period. While the test uses randomization - implying a theoretical chance you never generate a char in the 0x80-0xFF range that would be wrongly encoded - a typical run of this test now get hundreds of failures when accidentally mismatching the intrinsics. |
/integrate |
Going to push as commit aaa36cc.
Your commit was automatically rebased without conflicts. |
This patch extends the
ISO_8859_1.implEncodeISOArray
intrinsic on x86 to work also for ASCII encoding, which makes for example theUTF_8$Encoder
perform on par with (or outperform) similarly getting charset encoded bytes from a String. The former took a small performance hit in JDK 9, and the latter improved greatly in the same release.Extending the
EncodeIsoArray
intrinsics on other platforms should be possible, but I'm unfamiliar with the macro assembler in general and unlike the x86 intrinsic they don't use a simple vectorized mask to implement the latin-1 check. For example aarch64 seem to filter out the low bytes and then check if there's any bits set in the high bytes. Clever, but very different to the 0xFF80 2-byte mask that an ASCII test wants.Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5621/head:pull/5621
$ git checkout pull/5621
Update a local copy of the PR:
$ git checkout pull/5621
$ git pull https://git.openjdk.java.net/jdk pull/5621/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 5621
View PR using the GUI difftool:
$ git pr show -t 5621
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5621.diff