Skip to content

Conversation

@SirYwell
Copy link
Member

@SirYwell SirYwell commented Mar 1, 2025

subnode has multiple nodes that are self-inverse but lacking the respective optimization. ReverseINode and ReverseLNode already have the optimization, but we can deduplicate the code for all those operations.

For most nodes, the optimization is obvious. The NegF/DNodes however are worth to look at in detail imo:

  • Float.NaN has the same bits set as -Float.NaN. That means, it this specific case, the operation is a no-op anyway
  • For other values, the msb is flipped, flipping twice results in the original value again.

Similar changes could be made to the corresponding vector nodes. If you want, I can do that in a follow-up RFE.

One note: During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like

        int v = 0;
        for (int datum : data) {
            v ^= Integer.reverseBytes(Integer.reverseBytes(datum));
        }
        return v;

was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8350988: Consolidate Identity of self-inverse operations (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23851/head:pull/23851
$ git checkout pull/23851

Update a local copy of the PR:
$ git checkout pull/23851
$ git pull https://git.openjdk.org/jdk.git pull/23851/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 23851

View PR using the GUI difftool:
$ git pr show -t 23851

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23851.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 1, 2025

👋 Welcome back hgreule! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 1, 2025

@SirYwell This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8350988: Consolidate Identity of self-inverse operations

Reviewed-by: epeter, vlivanov

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 198 new commits pushed to the master branch:

  • 1007811: 8352897: RISC-V: Change default value for UseConservativeFence
  • 7853415: 8352218: RISC-V: Zvfh requires RVV
  • 2483340: 8352579: Refactor CDS legacy optimization for lambda proxy classes
  • 1397ee5: 8334322: Misleading values of keys in jpackage resource bundle
  • 441bd12: 8352812: remove useless class and function parameter in SuspendThread impl
  • e83cccf: 8352948: Remove leftover runtime_x86_32.cpp after 32-bit x86 removal
  • 5672a93: 8348400: GenShen: assert(ShenandoahHeap::heap()->is_full_gc_in_progress() || (used_regions_size() <= _max_capacity)) failed: Cannot use more than capacity #
  • c2a4fed: 8348907: Stress times out when is executed with ZGC
  • 5392674: 8352766: Problemlist hotspot tier1 tests requiring tools that are not included in static JDK
  • 1d205f5: 8352716: (tz) Update Timezone Data to 2025b
  • ... and 188 more: https://git.openjdk.org/jdk/compare/3b189e0e78c867b75e984bfaabc92d12b9ff2b9e...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@dafedafe, @eme64, @iwanowww) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot changed the title 8350988 8350988: Consolidate Identity of self-inverse operations Mar 1, 2025
@openjdk
Copy link

openjdk bot commented Mar 1, 2025

@SirYwell The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Mar 1, 2025
}

@Test
@IR(failOn = {IRNode.REVERSE_BYTES_I})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is fine as the ReverseBytes nodes depend on intrinsics. From my understanding, the methods are just seen as normal methods on platforms without reverseBytes support. In that case, the test would still pass, but it might be surprising that it passes. Is this fine or is there a better way here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting question - expanding on that, could arbitrary methods be marked as self-inverse to be represented in the IR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there are many such methods, and my knowledge of Ideal isn't good enough to judge whether that's possible.

But it might make sense to use the nodes even when the intrinsic isn't available, and "lower" it to the existing implementation (either a call or inlining) after optimizing the node itself.

@SirYwell SirYwell marked this pull request as ready for review March 1, 2025 13:42
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 1, 2025
@mlbridge
Copy link

mlbridge bot commented Mar 1, 2025

Webrevs

Copy link
Contributor

@dafedafe dafedafe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refinement @SirYwell!
You are mentioning that

During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like

    int v = 0;
    for (int datum : data) {
        v ^= Integer.reverseBytes(Integer.reverseBytes(datum));
    }
    return v;

was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases.

I'm not totally sure I fully get what you mean here: does this optimization hinder vectorization in some cases? Does this result in a slowdown? (BTW do you have benchmark results?) Should we possibly try to detect this early and avoid simplifying?

@liach
Copy link
Member

liach commented Mar 3, 2025

Per my understanding this should be an issue with autovectorization, and it should be fixed by fixing autovectorization instead of by blocking valid and sound simplifications.

@SirYwell
Copy link
Member Author

SirYwell commented Mar 3, 2025

I'm not totally sure I fully get what you mean here: does this optimization hinder vectorization in some cases? Does this result in a slowdown? (BTW do you have benchmark results?) Should we possibly try to detect this early and avoid simplifying?

What happens basically comes down to this check:

if ((second_pk == nullptr) || (_num_work_vecs == _num_reductions)) {

Without my change, _num_work_vecs is 3 (I assume, I didn't debug that part) as we have one load and two reverse bytes operations. _num_reductions is 1, the xor. With my change, when we come to this check, _num_work_vecs is 1 (That part I checked with the debugger), as we only have the load left. So superword does not consider vectorization to be profitable.

My benchmark code: https://gist.github.com/SirYwell/a76578dc5f3c10cd08b768a3bd39a988
Results on my machine (Ryzen 9 3900X):
mainline

Benchmark                           Mode  Cnt     Score     Error   Units
DoubledReverseBytes.doubleReverse  thrpt    3  3287,042 ± 398,656  ops/ms
DoubledReverseBytes.folded         thrpt    3   418,627 ±  20,797  ops/ms

this pr

Benchmark                           Mode  Cnt    Score    Error   Units
DoubledReverseBytes.doubleReverse  thrpt    3  419,369 ± 24,974  ops/ms
DoubledReverseBytes.folded         thrpt    3  415,469 ± 88,714  ops/ms

You can see the almost 8x speedup due to vectorization that happens on mainline but not anymore with my change.

I don't think this should block this change. Detecting such situations also seems like a rather complicated workaround.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea! Thanks for the work :)

assertResultF(inf);
assertResultF(nanf);

double ad = RunInfo.getRandom().nextDouble();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually only generates values between 0.0...1.0.

Can you instead use Generators.java? It will make sure to generate "interesting" values, including different encodings of NaN, infinity, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'm using Generators now. I'm currently not testing ReverseBytesS/US, but there also isn't support for short/char in Generators. Should I still add tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always restrict ranges:

private static final RestrictableGenerator<Integer> GEN_BYTE = Generators.G.safeRestrict(Generators.G.ints(), Byte.MIN_VALUE, Byte.MAX_VALUE);
private static final RestrictableGenerator<Integer> GEN_CHAR = Generators.G.safeRestrict(Generators.G.ints(), Character.MIN_VALUE, Character.MAX_VALUE);
private static final RestrictableGenerator<Integer> GEN_SHORT = Generators.G.safeRestrict(Generators.G.ints(), Short.MIN_VALUE, Short.MAX_VALUE);
private static final RestrictableGenerator<Integer> GEN_INT = Generators.G.ints();
private static final RestrictableGenerator<Long> GEN_LONG = Generators.G.longs();
private static final Generator<Float> GEN_FLOAT = Generators.G.floats();
private static final Generator<Double> GEN_DOUBLE = Generators.G.doubles();
private static final RestrictableGenerator<Integer> GEN_BOOLEAN = Generators.G.safeRestrict(Generators.G.ints(), 0, 1);

I'm currently not testing ReverseBytesS/US, but there also isn't support for short/char in Generators. Should I still add tests?

And yes, more tests would always be better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I added test cases for short and char. Interestingly, there are no reverse (bits) methods for those types.

Please let me know if there's anything more I can do.

@SirYwell
Copy link
Member Author

@eme64 could you take another look? Thanks!

@eme64
Copy link
Contributor

eme64 commented Mar 24, 2025

@SirYwell The code now looks really good, I launched some tests. Please ping me again in a day for the results!

@SirYwell
Copy link
Member Author

Thanks @eme64. Are the results in already?

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SirYwell The tests are passing 🟢

Thank you for all the work, especially for writing all the tests 😊

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 25, 2025
@SirYwell
Copy link
Member Author

Great! Do I need another review or can we integrate?

@eme64
Copy link
Contributor

eme64 commented Mar 25, 2025

@SirYwell Thanks for asking. We generally want to have 2 reviews for Compiler changes :)

Copy link
Contributor

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@SirYwell
Copy link
Member Author

Thank you both for your reviews.

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Mar 27, 2025
@openjdk
Copy link

openjdk bot commented Mar 27, 2025

@SirYwell
Your change (at version 0a48b5b) is now ready to be sponsored by a Committer.

@eme64
Copy link
Contributor

eme64 commented Mar 27, 2025

/sponsor

@openjdk
Copy link

openjdk bot commented Mar 27, 2025

Going to push as commit 66b5dba.
Since your change was applied there have been 198 commits pushed to the master branch:

  • 1007811: 8352897: RISC-V: Change default value for UseConservativeFence
  • 7853415: 8352218: RISC-V: Zvfh requires RVV
  • 2483340: 8352579: Refactor CDS legacy optimization for lambda proxy classes
  • 1397ee5: 8334322: Misleading values of keys in jpackage resource bundle
  • 441bd12: 8352812: remove useless class and function parameter in SuspendThread impl
  • e83cccf: 8352948: Remove leftover runtime_x86_32.cpp after 32-bit x86 removal
  • 5672a93: 8348400: GenShen: assert(ShenandoahHeap::heap()->is_full_gc_in_progress() || (used_regions_size() <= _max_capacity)) failed: Cannot use more than capacity #
  • c2a4fed: 8348907: Stress times out when is executed with ZGC
  • 5392674: 8352766: Problemlist hotspot tier1 tests requiring tools that are not included in static JDK
  • 1d205f5: 8352716: (tz) Update Timezone Data to 2025b
  • ... and 188 more: https://git.openjdk.org/jdk/compare/3b189e0e78c867b75e984bfaabc92d12b9ff2b9e...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 27, 2025
@openjdk openjdk bot closed this Mar 27, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Mar 27, 2025
@openjdk
Copy link

openjdk bot commented Mar 27, 2025

@eme64 @SirYwell Pushed as commit 66b5dba.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

5 participants