8051725: Improve expansion of Conv2B nodes in the middle-end#13345
8051725: Improve expansion of Conv2B nodes in the middle-end#13345jaskarth wants to merge 15 commits intoopenjdk:masterfrom
Conversation
…exity - Also remove unneeded Assembler::set_byte_if_not_zero, as it was duplicated with Assembler::setne. The function is only called from 64-bit code, so it is identical in execution with setne.
|
👋 Welcome back jaskarth! A progress list of the required criteria for merging this PR into |
Webrevs
|
|
I think this should be done in the middle-end instead. May I ask what are the advantages of |
|
As far as I know, |
|
@jaskarth Yes I also think that is the case, but without advantages in the back-end maybe it's best us lowering it in macro expansion phase so the compiler can have chances to transform more primitive |
|
Oh, I had forgotten to consider the macro expansion step! That sounds reasonable to me, I'll make this change and see what the performance is like. |
|
I've reworked the transformation to happen in macro expansion, and it seems the performance is actually better now! The comparison is now basically the same as doing it with |
TobiHartmann
left a comment
There was a problem hiding this comment.
I'm a bit confused, why do you need the new match rules if Conv2B nodes are now macro expanded to CMove?
|
Hi, that's my bad- I was working on removing the now-redundant Conv2B rules from the backends but I hadn't had a chance to push that yet, I've done so now. Since this has become a broader cleanup effort I'll also go ahead and rename the JBS issue to reflect the new approach with macro expansion as well. |
|
Generally I think it's good, one small question I want to ask is whether we should do splitting |
|
I think that's a good idea, it would reduce the complexity of the logic in macro expansion and allow the transform to be applied more generally. |
There was a problem hiding this comment.
Hello, I wonder if we could make this transformation of Conv2B conditional? Architectures like RISC-V doesn't have support of conditional moves at the ISA level for now. So we set ConditionalMoveLimit parameter to 0 for this platform and conditionals moves are emulated with normal compare and branch instructions instead [1]. I don't think we would achieve better performance numbers on this platform with this change.
[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv.ad#L9583
|
Hey @RealFYang, thanks for this info! I wasn't aware that RISC-V didn't have conditional moves, and I agree that it doesn't sound like this transform would be so profitable there. To make the transformation conditional I've moved it to post loop opts IGVN, and only run it if the match rule for @merykitty I've also implemented your change with idealization and would like your thoughts on it, thanks! I'll attach aarch64 perf results soon. |
|
@jaskarth : Thanks for taking care of that. I performed tier1-3 tests on linux-riscv64 platform, result looks good. |
src/hotspot/share/opto/addnode.cpp
Outdated
| } | ||
| } | ||
|
|
||
| // Try to convert (c ? 1 : 0) ^ 1 into !c ? 1 : 0. This pattern can occur after expansion of Conv2B nodes. |
There was a problem hiding this comment.
Be more general? Xor (CMove cond, iftrue, iffalse), op == CMove cond, (Xor iftrue op), (Xor iffalse op). You can be conservative and apply this only if op, iftrue and iffalse are all constant.
There was a problem hiding this comment.
I think that's a good idea, I've made this change. I wonder if other associative operations would also benefit from a similar patch?
|
Apologies for the delayed update, and thanks for the reviews! I have aarch64 performance results, from an M1 mac: It seems like the patch doesn't have much of an impact other than |
merykitty
left a comment
There was a problem hiding this comment.
Otherwise LGTM. Thanks a lot.
|
A tiny point: maybe just remove |
|
Thanks for your review! I've updated the code accordingly. I also noticed that in addition to |
|
I run this through some quick testing and I'm seeing the following crash with |
|
I think the latest change should fix the error! The code now properly checks that both types are ints, and tier1 and |
TobiHartmann
left a comment
There was a problem hiding this comment.
Great work, the latest version looks good to me. I'll run some testing and report back once it finished.
|
@jaskarth This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 213 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@TobiHartmann, @RealFYang, @merykitty, @sviswa7) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
|
All tests passed. |
|
Thanks a lot for testing, and thanks all for reviews and feedback with this change! /integrate |
|
/sponsor |
|
Going to push as commit fb0b1f0.
Your commit was automatically rebased without conflicts. |
|
@TobiHartmann @jaskarth Pushed as commit fb0b1f0. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Hi, I've created optimizations for the expansion of
Conv2Bnodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both cmov idealization and diamond-phi optimization. This change replacesConv2Bnodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip withxorto be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine:Reviews would be greatly appreciated!
Testing: tier1-2 on linux x64, GHA
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13345/head:pull/13345$ git checkout pull/13345Update a local copy of the PR:
$ git checkout pull/13345$ git pull https://git.openjdk.org/jdk.git pull/13345/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 13345View PR using the GUI difftool:
$ git pr show -t 13345Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13345.diff
Webrev
Link to Webrev Comment