-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8350988: Consolidate Identity of self-inverse operations #23851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back hgreule! A progress list of the required criteria for merging this PR into |
|
@SirYwell This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 198 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@dafedafe, @eme64, @iwanowww) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
| } | ||
|
|
||
| @Test | ||
| @IR(failOn = {IRNode.REVERSE_BYTES_I}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is fine as the ReverseBytes nodes depend on intrinsics. From my understanding, the methods are just seen as normal methods on platforms without reverseBytes support. In that case, the test would still pass, but it might be surprising that it passes. Is this fine or is there a better way here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting question - expanding on that, could arbitrary methods be marked as self-inverse to be represented in the IR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there are many such methods, and my knowledge of Ideal isn't good enough to judge whether that's possible.
But it might make sense to use the nodes even when the intrinsic isn't available, and "lower" it to the existing implementation (either a call or inlining) after optimizing the node itself.
Webrevs
|
dafedafe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refinement @SirYwell!
You are mentioning that
During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like
int v = 0; for (int datum : data) { v ^= Integer.reverseBytes(Integer.reverseBytes(datum)); } return v;was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases.
I'm not totally sure I fully get what you mean here: does this optimization hinder vectorization in some cases? Does this result in a slowdown? (BTW do you have benchmark results?) Should we possibly try to detect this early and avoid simplifying?
|
Per my understanding this should be an issue with autovectorization, and it should be fixed by fixing autovectorization instead of by blocking valid and sound simplifications. |
What happens basically comes down to this check: jdk/src/hotspot/share/opto/superword.cpp Line 1759 in 885338b
Without my change, _num_work_vecs is 3 (I assume, I didn't debug that part) as we have one load and two reverse bytes operations. _num_reductions is 1, the xor. With my change, when we come to this check, _num_work_vecs is 1 (That part I checked with the debugger), as we only have the load left. So superword does not consider vectorization to be profitable.
My benchmark code: https://gist.github.com/SirYwell/a76578dc5f3c10cd08b768a3bd39a988 this pr You can see the almost 8x speedup due to vectorization that happens on mainline but not anymore with my change. I don't think this should block this change. Detecting such situations also seems like a rather complicated workaround. |
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea! Thanks for the work :)
| assertResultF(inf); | ||
| assertResultF(nanf); | ||
|
|
||
| double ad = RunInfo.getRandom().nextDouble(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually only generates values between 0.0...1.0.
Can you instead use Generators.java? It will make sure to generate "interesting" values, including different encodings of NaN, infinity, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I'm using Generators now. I'm currently not testing ReverseBytesS/US, but there also isn't support for short/char in Generators. Should I still add tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can always restrict ranges:
private static final RestrictableGenerator<Integer> GEN_BYTE = Generators.G.safeRestrict(Generators.G.ints(), Byte.MIN_VALUE, Byte.MAX_VALUE);
private static final RestrictableGenerator<Integer> GEN_CHAR = Generators.G.safeRestrict(Generators.G.ints(), Character.MIN_VALUE, Character.MAX_VALUE);
private static final RestrictableGenerator<Integer> GEN_SHORT = Generators.G.safeRestrict(Generators.G.ints(), Short.MIN_VALUE, Short.MAX_VALUE);
private static final RestrictableGenerator<Integer> GEN_INT = Generators.G.ints();
private static final RestrictableGenerator<Long> GEN_LONG = Generators.G.longs();
private static final Generator<Float> GEN_FLOAT = Generators.G.floats();
private static final Generator<Double> GEN_DOUBLE = Generators.G.doubles();
private static final RestrictableGenerator<Integer> GEN_BOOLEAN = Generators.G.safeRestrict(Generators.G.ints(), 0, 1);
I'm currently not testing ReverseBytesS/US, but there also isn't support for short/char in Generators. Should I still add tests?
And yes, more tests would always be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I added test cases for short and char. Interestingly, there are no reverse (bits) methods for those types.
Please let me know if there's anything more I can do.
|
@eme64 could you take another look? Thanks! |
|
@SirYwell The code now looks really good, I launched some tests. Please ping me again in a day for the results! |
|
Thanks @eme64. Are the results in already? |
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SirYwell The tests are passing 🟢
Thank you for all the work, especially for writing all the tests 😊
|
Great! Do I need another review or can we integrate? |
|
@SirYwell Thanks for asking. We generally want to have 2 reviews for Compiler changes :) |
iwanowww
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
|
Thank you both for your reviews. /integrate |
|
/sponsor |
|
Going to push as commit 66b5dba.
Your commit was automatically rebased without conflicts. |
subnode has multiple nodes that are self-inverse but lacking the respective optimization. ReverseINode and ReverseLNode already have the optimization, but we can deduplicate the code for all those operations.
For most nodes, the optimization is obvious. The NegF/DNodes however are worth to look at in detail imo:
Float.NaNhas the same bits set as-Float.NaN. That means, it this specific case, the operation is a no-op anywaySimilar changes could be made to the corresponding vector nodes. If you want, I can do that in a follow-up RFE.
One note: During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like
was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23851/head:pull/23851$ git checkout pull/23851Update a local copy of the PR:
$ git checkout pull/23851$ git pull https://git.openjdk.org/jdk.git pull/23851/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 23851View PR using the GUI difftool:
$ git pr show -t 23851Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23851.diff
Using Webrev
Link to Webrev Comment