Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 #5621

Closed
wants to merge 15 commits into from

Conversation

cl4es
Copy link
Member

@cl4es cl4es commented Sep 21, 2021

This patch extends the ISO_8859_1.implEncodeISOArray intrinsic on x86 to work also for ASCII encoding, which makes for example the UTF_8$Encoder perform on par with (or outperform) similarly getting charset encoded bytes from a String. The former took a small performance hit in JDK 9, and the latter improved greatly in the same release.

Extending the EncodeIsoArray intrinsics on other platforms should be possible, but I'm unfamiliar with the macro assembler in general and unlike the x86 intrinsic they don't use a simple vectorized mask to implement the latin-1 check. For example aarch64 seem to filter out the low bytes and then check if there's any bits set in the high bytes. Clever, but very different to the 0xFF80 2-byte mask that an ASCII test wants.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5621/head:pull/5621
$ git checkout pull/5621

Update a local copy of the PR:
$ git checkout pull/5621
$ git pull https://git.openjdk.java.net/jdk pull/5621/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5621

View PR using the GUI difftool:
$ git pr show -t 5621

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5621.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Sep 21, 2021

👋 Welcome back redestad! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Sep 21, 2021

@cl4es The following labels will be automatically applied to this pull request:

  • core-libs
  • nio
  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added security nio core-libs labels Sep 21, 2021
@wangweij
Copy link
Contributor

@wangweij wangweij commented Sep 22, 2021

/label remove security

@openjdk openjdk bot removed the security label Sep 22, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Sep 22, 2021

@wangweij
The security label was successfully removed.

@cl4es cl4es changed the title Enhance UTF_8.Encoder by using StringUTF16.compress for ASCII Implement fast-path for ASCII-compatible single-byte CharsetEncoders Sep 22, 2021
@cl4es
Copy link
Member Author

@cl4es cl4es commented Sep 23, 2021

The current version (cef05f4) copies the ISO_8859_1.implEncodeISOArray intrinsic and adapts it to work on ASCII encoding, which makes the UTF_8$Encoder perform on par with (or outperform) encoding from a String. Using microbenchmarks provided by @carterkozak here: https://github.com/carterkozak/stringbuilder-encoding-performance

Baseline:

Benchmark                                                      (charsetName)                          (message)  (timesToAppend)  Mode  Cnt     Score    Error  Units
EncoderBenchmarks.charsetEncoder                                       UTF-8     This is a simple ASCII message                3  avgt    8   270.237 ± 10.504  ns/op
EncoderBenchmarks.charsetEncoder                                       UTF-8  This is a message with unicode 😊                3  avgt    8   568.353 ±  2.331  ns/op
EncoderBenchmarks.charsetEncoderWithAllocation                         UTF-8     This is a simple ASCII message                3  avgt    8   324.889 ± 17.466  ns/op
EncoderBenchmarks.charsetEncoderWithAllocation                         UTF-8  This is a message with unicode 😊                3  avgt    8   633.720 ± 22.703  ns/op
EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder          UTF-8     This is a simple ASCII message                3  avgt    8  1132.436 ± 30.661  ns/op
EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder          UTF-8  This is a message with unicode 😊                3  avgt    8  1379.207 ± 66.982  ns/op
EncoderBenchmarks.toStringGetBytes                                     UTF-8     This is a simple ASCII message                3  avgt    8    91.253 ±  3.848  ns/op
EncoderBenchmarks.toStringGetBytes                                     UTF-8  This is a message with unicode 😊                3  avgt    8   519.489 ± 12.516  ns/op

Patch:

Benchmark                                                      (charsetName)                          (message)  (timesToAppend)  Mode  Cnt     Score     Error  Units
EncoderBenchmarks.charsetEncoder                                       UTF-8     This is a simple ASCII message                3  avgt    4    82.535 ±  20.310  ns/op
EncoderBenchmarks.charsetEncoder                                       UTF-8  This is a message with unicode 😊                3  avgt    4   522.679 ±  13.456  ns/op
EncoderBenchmarks.charsetEncoderWithAllocation                         UTF-8     This is a simple ASCII message                3  avgt    4   127.831 ±  32.612  ns/op
EncoderBenchmarks.charsetEncoderWithAllocation                         UTF-8  This is a message with unicode 😊                3  avgt    4   549.343 ±  59.899  ns/op
EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder          UTF-8     This is a simple ASCII message                3  avgt    4  1182.835 ± 153.735  ns/op
EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder          UTF-8  This is a message with unicode 😊                3  avgt    4  1416.407 ± 130.551  ns/op
EncoderBenchmarks.toStringGetBytes                                     UTF-8     This is a simple ASCII message                3  avgt    4    97.770 ±  15.742  ns/op
EncoderBenchmarks.toStringGetBytes                                     UTF-8  This is a message with unicode 😊                3  avgt    4   516.351 ±  58.580  ns/op

This can probably be simplified further, say by adding a flag to the intrinsic of whether we're encoding ASCII only or ISO-8859-1. It also needs to be implemented and tested on all architectures.

(edit: accidentally edit rather than quote-reply, restored original comment)

@cl4es cl4es changed the title Implement fast-path for ASCII-compatible single-byte CharsetEncoders Implement fast-path for ASCII-compatible single-byte CharsetEncoders on x86 Sep 23, 2021
@cl4es
Copy link
Member Author

@cl4es cl4es commented Sep 23, 2021

This can probably be simplified further, say by adding a flag to the intrinsic of whether we're encoding ASCII only or ISO-8859-1.

Done: Removed the addition of a new C2 Node, merged the macro assembler encode_iso_array and encode_ascii_array and added a predicate to select the behavior.

It also needs to be implemented and tested on all architectures.

Implementing this on other hardware is Future Work. The non-x86 intrinsics for implEncodeISOArray all seem to use clever tricks rather than a simple mask that can be easily switched from detecting non-latin-1(0xFF00) to detecting ASCII (0xFF80). Clever tricks make it rather challenging to extend this like I could easily do in the x86 code (most all assembler is foreign to me)

… using internal APIs; remove adhoc performance tests
@cl4es cl4es changed the title Implement fast-path for ASCII-compatible single-byte CharsetEncoders on x86 Implement fast-path for ASCII-compatible CharsetEncoders on x86 Sep 23, 2021
@cl4es cl4es changed the title Implement fast-path for ASCII-compatible CharsetEncoders on x86 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 Sep 23, 2021
@cl4es
Copy link
Member Author

@cl4es cl4es commented Sep 23, 2021

/label add hotspot-compiler

@openjdk openjdk bot added the hotspot-compiler label Sep 23, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Sep 23, 2021

@cl4es
The hotspot-compiler label was successfully added.

…ze ASCII-compatible SingleByte (e.g. ISO-8859-15)
@cl4es
Copy link
Member Author

@cl4es cl4es commented Sep 24, 2021

On the JDK-included CharsetEncodeDecode.encode microbenchmark, I get these numbers in the baseline (18-b09):

Benchmark                   (size)       (type)  Mode  Cnt    Score   Error  Units
CharsetEncodeDecode.encode   16384        UTF-8  avgt   30   39.962 ± 1.703  us/op
CharsetEncodeDecode.encode   16384         BIG5  avgt   30  153.282 ± 4.521  us/op
CharsetEncodeDecode.encode   16384  ISO-8859-15  avgt   30  192.040 ± 4.543  us/op
CharsetEncodeDecode.encode   16384        ASCII  avgt   30   40.051 ± 1.210  us/op
CharsetEncodeDecode.encode   16384       UTF-16  avgt   30  302.815 ± 9.490  us/op

With the proposed patch:

Benchmark                   (size)       (type)  Mode  Cnt    Score    Error  Units
CharsetEncodeDecode.encode   16384        UTF-8  avgt   30    4.081 ±  0.182  us/op
CharsetEncodeDecode.encode   16384         BIG5  avgt   30  150.374 ±  3.579  us/op
CharsetEncodeDecode.encode   16384  ISO-8859-15  avgt   30    4.010 ±  0.179  us/op
CharsetEncodeDecode.encode   16384        ASCII  avgt   30    3.961 ±  0.176  us/op
CharsetEncodeDecode.encode   16384       UTF-16  avgt   30  302.235 ± 11.395  us/op

That is: on my system encoding 16K char ASCII data is 10x faster for UTF-8 and ASCII, and roughly 48x faster for ASCII-compatible charsets like ISO-8859-15. On 3rd party microbenchmarks we can assert that performance for non-ASCII input either doesn't change, or improves when messages have an ASCII prefix.

@cl4es cl4es marked this pull request as ready for review Sep 24, 2021
@openjdk openjdk bot added the rfr label Sep 24, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 24, 2021

naotoj
naotoj approved these changes Sep 24, 2021
Copy link
Member

@naotoj naotoj left a comment

core library part of the changes looks good to me.

@openjdk
Copy link

@openjdk openjdk bot commented Sep 24, 2021

@cl4es This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86

Reviewed-by: naoto, thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 107 new commits pushed to the master branch:

  • c4d1157: 8271855: [TESTBUG] Wrong weakCompareAndSet assumption in UnsafeIntrinsicsTest
  • 756d22c: 8274130: C2: MulNode::Ideal chained transformations may act on wrong nodes
  • 5b0c9cc: 8274172: Convert JavadocTester to use NIO
  • 2657bcb: 8274136: -XX:+ExitOnOutOfMemoryError calls exit while threads are running
  • 53b25bc: 8273459: Update code segment alignment to 64 bytes
  • 1a29b1e: 8274016: Replace 'for' cycles with iterator with enhanced-for in java.desktop
  • d8a278f: 8274396: Suppress more warnings on non-serializable non-transient instance fields in client libs
  • e49e5b5: 8273972: Multi-core choke point in CMM engine (LCMSTransform.doTransform)
  • 2072bc7: 8274391: Suppress more warnings on non-serializable non-transient instance fields in java.util.concurrent
  • 6a477bd: 8274415: Suppress warnings on non-serializable non-transient instance fields in java.xml
  • ... and 97 more: https://git.openjdk.java.net/jdk/compare/161fdb4afbc6e67cc7580fe753112c4d894a9b6b...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Sep 24, 2021
@sviswa7
Copy link

@sviswa7 sviswa7 commented Sep 24, 2021

x86 part of changes look good.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Very nice. The changes look good to me, just added some minor comments.

Should we remove the "iso" part from the method/class names?

src/hotspot/cpu/x86/x86_32.ad Outdated Show resolved Hide resolved
src/hotspot/cpu/x86/x86_32.ad Outdated Show resolved Hide resolved
src/hotspot/share/opto/intrinsicnode.hpp Outdated Show resolved Hide resolved
@cl4es
Copy link
Member Author

@cl4es cl4es commented Sep 27, 2021

Should we remove the "iso" part from the method/class names?

I'm open to suggestions, but I've not been able to think of anything better. encodeISOOrASCII doesn't seem helpful and since ASCII is a subset of the ISO-8859-1 encoding referred to by the "iso" moniker then the ASCII-only variant is technically encoding chars to valid ISO-8859-1.

…rms implements support for the _encodeAsciiArray intrinsic
@TobiHartmann
Copy link
Member

@TobiHartmann TobiHartmann commented Sep 27, 2021

Should we remove the "iso" part from the method/class names?

I'm open to suggestions, but I've not been able to think of anything better. encodeISOOrASCII doesn't seem helpful and since ASCII is a subset of the ISO-8859-1 encoding referred to by the "iso" moniker then the ASCII-only variant is technically encoding chars to valid ISO-8859-1.

Okay, that's fine with me.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 28, 2021

Mailing list message from Claes Redestad on core-libs-dev:

Yes, this was spotted and fixed already. Annoyingly the test I added didn't detect this so GHA was green, but it failed some tier2 tests on aarch64. I added extra safeguards by predicating matching the encode_iso_array instructions on the node being !ascii, which will cause C2 to report an error rather than silently using the iso variant for ascii-only nodes.

H?mta Outlook f?r Android<https://aka.ms/AAb9ysg>
________________________________
From: core-libs-dev <core-libs-dev-retn at openjdk.java.net> on behalf of Volker Simonis <simonis at openjdk.java.net>
Sent: Tuesday, September 28, 2021 1:53:22 PM
To: core-libs-dev at openjdk.java.net <core-libs-dev at openjdk.java.net>; hotspot-compiler-dev at openjdk.java.net <hotspot-compiler-dev at openjdk.java.net>; nio-dev at openjdk.java.net <nio-dev at openjdk.java.net>
Subject: Re: RFR: 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 [v5]

On Tue, 28 Sep 2021 10:01:43 GMT, Claes Redestad <redestad at openjdk.org> wrote:

Not too much work. I recently introduced platform-specific `matcher_*.hpp` files, so since then adding a boolean constant is easy (no need to muck with the .ad files).

Does the changes in 9800a99 resolve your concerns?

In principle yes, but shouldn't the condition read:

if (!Matcher::match_rule_supported(Op_EncodeISOArray) || !Matcher::supports_encode_ascii_array) return false;

I.e. the intrinisc is supported if both conditions are true and not supported if either one of them is false?

-------------

PR: https://git.openjdk.java.net/jdk/pull/5621

Copy link
Member

@TobiHartmann TobiHartmann left a comment

The latest version looks good to me.

…o where the ISO intrinsic was used in place of the ASCII-only intrinsic
@cl4es
Copy link
Member Author

@cl4es cl4es commented Sep 29, 2021

Thanks for reviewing, @TobiHartmann

I also cleaned up the test and made sure it fails when there's a logic bug like the one I introduced in 9800a99 where the ISO array intrinsic would be matched for a case requiring the ASCII-only array intrinsic. The test was (in)conveniently never testing out-of-range characters in the 0x80-0xFF range, which is precisely where the two intrinsics would produce different results. I hope this doesn't require another 24 hour grace period. 😄

While the test uses randomization - implying a theoretical chance you never generate a char in the 0x80-0xFF range that would be wrongly encoded - a typical run of this test now get hundreds of failures when accidentally mismatching the intrinsics.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

The incremental change looks good and trivial to me.

@cl4es
Copy link
Member Author

@cl4es cl4es commented Sep 29, 2021

/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Sep 29, 2021

Going to push as commit aaa36cc.
Since your change was applied there have been 107 commits pushed to the master branch:

  • c4d1157: 8271855: [TESTBUG] Wrong weakCompareAndSet assumption in UnsafeIntrinsicsTest
  • 756d22c: 8274130: C2: MulNode::Ideal chained transformations may act on wrong nodes
  • 5b0c9cc: 8274172: Convert JavadocTester to use NIO
  • 2657bcb: 8274136: -XX:+ExitOnOutOfMemoryError calls exit while threads are running
  • 53b25bc: 8273459: Update code segment alignment to 64 bytes
  • 1a29b1e: 8274016: Replace 'for' cycles with iterator with enhanced-for in java.desktop
  • d8a278f: 8274396: Suppress more warnings on non-serializable non-transient instance fields in client libs
  • e49e5b5: 8273972: Multi-core choke point in CMM engine (LCMSTransform.doTransform)
  • 2072bc7: 8274391: Suppress more warnings on non-serializable non-transient instance fields in java.util.concurrent
  • 6a477bd: 8274415: Suppress warnings on non-serializable non-transient instance fields in java.xml
  • ... and 97 more: https://git.openjdk.java.net/jdk/compare/161fdb4afbc6e67cc7580fe753112c4d894a9b6b...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Sep 29, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Sep 29, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Sep 29, 2021

@cl4es Pushed as commit aaa36cc.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@cl4es cl4es deleted the utf8_ascii_encode branch Sep 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs hotspot-compiler integrated nio
6 participants