Skip to content

8051725: Improve expansion of Conv2B nodes in the middle-end#13345

Closed
jaskarth wants to merge 15 commits intoopenjdk:masterfrom
jaskarth:conv2b-x86-lowering
Closed

8051725: Improve expansion of Conv2B nodes in the middle-end#13345
jaskarth wants to merge 15 commits intoopenjdk:masterfrom
jaskarth:conv2b-x86-lowering

Conversation

@jaskarth
Copy link
Member

@jaskarth jaskarth commented Apr 5, 2023

Hi, I've created optimizations for the expansion of Conv2B nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both cmov idealization and diamond-phi optimization. This change replaces Conv2B nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with xor to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine:

                                    Baseline                           Patch              Improvement
Benchmark                      Mode  Cnt  Score    Error Units     Score    Error  Units
Conv2BRules.testEquals0        avgt   10  47.566 ± 0.346 ns/op  /  34.130 ± 0.177  ns/op  + 28.2%
Conv2BRules.testNotEquals0     avgt   10  37.167 ± 0.211 ns/op  /  34.185 ± 0.258  ns/op  + 8.0%
Conv2BRules.testEquals1        avgt   10  35.059 ± 0.280 ns/op  /  34.847 ± 0.160  ns/op  (unchanged)
Conv2BRules.testEqualsNull     avgt   10  56.768 ± 2.600 ns/op  /  34.330 ± 0.625  ns/op  + 39.5%
Conv2BRules.testNotEqualsNull  avgt   10  47.447 ± 1.193 ns/op  /  34.142 ± 0.303  ns/op  + 28.0%

Reviews would be greatly appreciated!

Testing: tier1-2 on linux x64, GHA


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8051725: Improve expansion of Conv2B nodes in the middle-end

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13345/head:pull/13345
$ git checkout pull/13345

Update a local copy of the PR:
$ git checkout pull/13345
$ git pull https://git.openjdk.org/jdk.git pull/13345/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 13345

View PR using the GUI difftool:
$ git pr show -t 13345

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13345.diff

Webrev

Link to Webrev Comment

jaskarth added 5 commits April 3, 2023 00:10
…exity

- Also remove unneeded Assembler::set_byte_if_not_zero, as it was duplicated with Assembler::setne. The function is only called from 64-bit code, so it is identical in execution with setne.
@bridgekeeper
Copy link

bridgekeeper bot commented Apr 5, 2023

👋 Welcome back jaskarth! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 5, 2023

@jaskarth The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Apr 5, 2023
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 5, 2023
@mlbridge
Copy link

mlbridge bot commented Apr 5, 2023

@merykitty
Copy link
Member

I think this should be done in the middle-end instead. May I ask what are the advantages of Conv2B over CMove that we need to have it all the way to matching?

@jaskarth
Copy link
Member Author

As far as I know, Conv2B is a special-case convert node where it does either c == 0 ? 0 : 1 or p == null ? 0 : 1. I think an advantage of Conv2B over CMove here is that the Conv2B has more specialized rules for value() and identity(), so it can prune more types of inputs than an equivalent Cmove can. I agree that it's not great having to shuffle the node from the middle-end to the backend, but I think it's still helpful as it can remove some dead bools that CMove wouldn't be able to. Hope this clarifies a bit!

@merykitty
Copy link
Member

@jaskarth Yes I also think that is the case, but without advantages in the back-end maybe it's best us lowering it in macro expansion phase so the compiler can have chances to transform more primitive CMove, what do you think?

@jaskarth
Copy link
Member Author

Oh, I had forgotten to consider the macro expansion step! That sounds reasonable to me, I'll make this change and see what the performance is like.

@jaskarth
Copy link
Member Author

I've reworked the transformation to happen in macro expansion, and it seems the performance is actually better now!

                                    Baseline                           Patch              Improvement
Benchmark                      Mode  Cnt  Score    Error Units     Score    Error  Units
Conv2BRules.testEquals0        avgt   10  47.566 ± 0.346 ns/op  /  34.130 ± 0.177  ns/op  + 28.2%
Conv2BRules.testNotEquals0     avgt   10  37.167 ± 0.211 ns/op  /  34.185 ± 0.258  ns/op  + 8.0%
Conv2BRules.testEquals1        avgt   10  35.059 ± 0.280 ns/op  /  34.847 ± 0.160  ns/op  (unchanged)
Conv2BRules.testEqualsNull     avgt   10  56.768 ± 2.600 ns/op  /  34.330 ± 0.625  ns/op  + 39.5%
Conv2BRules.testNotEqualsNull  avgt   10  47.447 ± 1.193 ns/op  /  34.142 ± 0.303  ns/op  + 28.0%

The comparison is now basically the same as doing it with cmp, which is nice. It seems the reason is because the assembly now zeroes the register, tests against zero, and then does the setcc, instead of comparsion, setcc, then movzbl. So, it seems that doing the transform in macro expansion is indeed a better choice for x86, as well as reducing the overhead in the matcher. However, I'm not so sure if the benefit will be the same across other platforms as it seems like the different architectures implement Conv2B using different strategies. Do you have any thoughts on this approach @merykitty?

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused, why do you need the new match rules if Conv2B nodes are now macro expanded to CMove?

@jaskarth
Copy link
Member Author

Hi, that's my bad- I was working on removing the now-redundant Conv2B rules from the backends but I hadn't had a chance to push that yet, I've done so now. Since this has become a broader cleanup effort I'll also go ahead and rename the JBS issue to reflect the new approach with macro expansion as well.

@jaskarth jaskarth changed the title 8051725: Questionable if-conversion involving SETNE 8051725: Improve expansion of Conv2B nodes in the middle-end Apr 19, 2023
@merykitty
Copy link
Member

Generally I think it's good, one small question I want to ask is whether we should do splitting Xor through CMove as an idealisation of the Xor?

@jaskarth
Copy link
Member Author

I think that's a good idea, it would reduce the complexity of the logic in macro expansion and allow the transform to be applied more generally.

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, I wonder if we could make this transformation of Conv2B conditional? Architectures like RISC-V doesn't have support of conditional moves at the ISA level for now. So we set ConditionalMoveLimit parameter to 0 for this platform and conditionals moves are emulated with normal compare and branch instructions instead [1]. I don't think we would achieve better performance numbers on this platform with this change.

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv.ad#L9583

@jaskarth
Copy link
Member Author

Hey @RealFYang, thanks for this info! I wasn't aware that RISC-V didn't have conditional moves, and I agree that it doesn't sound like this transform would be so profitable there. To make the transformation conditional I've moved it to post loop opts IGVN, and only run it if the match rule for Conv2B isn't found. In an effort to not accidentally cause performance regressions, I've limited the transform to x86_64, aarch64, and arm32.

@merykitty I've also implemented your change with idealization and would like your thoughts on it, thanks!

I'll attach aarch64 perf results soon.

@RealFYang
Copy link
Member

@jaskarth : Thanks for taking care of that. I performed tier1-3 tests on linux-riscv64 platform, result looks good.

}
}

// Try to convert (c ? 1 : 0) ^ 1 into !c ? 1 : 0. This pattern can occur after expansion of Conv2B nodes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be more general? Xor (CMove cond, iftrue, iffalse), op == CMove cond, (Xor iftrue op), (Xor iffalse op). You can be conservative and apply this only if op, iftrue and iffalse are all constant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea, I've made this change. I wonder if other associative operations would also benefit from a similar patch?

@jaskarth
Copy link
Member Author

Apologies for the delayed update, and thanks for the reviews!

I have aarch64 performance results, from an M1 mac:

                                    Baseline                           Patch              Improvement
Benchmark                      Mode  Cnt  Score    Error Units     Score    Error  Units
Conv2BRules.testEquals0        avgt   12  41.697 ± 0.127 ns/op  /  40.724 ± 0.086  ns/op  + 2.4%
Conv2BRules.testNotEquals0     avgt   12  39.522 ± 0.143 ns/op  /  40.608 ± 0.046  ns/op  - 2.7%
Conv2BRules.testEquals1        avgt   12  40.168 ± 0.136 ns/op  /  40.679 ± 0.044  ns/op  (unchanged)
Conv2BRules.testEqualsNull     avgt   12  48.922 ± 0.498 ns/op  /  42.046 ± 0.018  ns/op  + 15.1%
Conv2BRules.testNotEqualsNull  avgt   12  41.725 ± 0.264 ns/op  /  42.063 ± 0.043  ns/op  - 0.8%

It seems like the patch doesn't have much of an impact other than testEqualsNull, which would make sense as the Conv2B rule is using the same cset instruction as the 0 and 1 rule for CMoveI. I was unfortunately not able to test for arm32, but I think it should still be beneficial as the Conv2B rules there used two cmoves and had a fixme, whereas with this patch it would only use one cmove.

Copy link
Member

@merykitty merykitty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM. Thanks a lot.

@merykitty
Copy link
Member

A tiny point: maybe just remove setne and use setb(Assembler::notZero, I don't think having a dedicated setne achieve much.

@jaskarth
Copy link
Member Author

Thanks for your review! I've updated the code accordingly. I also noticed that in addition to setne, there was also setl and sete with just a handful of uses, so I've modified those to also use setb with a condition code instead as well.

@TobiHartmann
Copy link
Member

I run this through some quick testing and I'm seeing the following crash with compiler/vectorapi/TestMaskedMacroLogicVector.java and -XX:-TieredCompilation -XX:+AlwaysIncrementalInline:

#  Internal Error (/workspace/open/src/hotspot/share/opto/type.hpp:1967), pid=3563106, tid=3563120
#  assert(_base == Int) failed: Not an Int
#
# JRE version: Java(TM) SE Runtime Environment (21.0) (fastdebug build 21-internal-LTS-2023-05-23-0846313.tobias.hartmann.jdk2)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 21-internal-LTS-2023-05-23-0846313.tobias.hartmann.jdk2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x5cca5f]  Type::is_int() const [clone .part.0]+0x2f
#

Current CompileTask:
C2:   9002  777 %  b        compiler.vectorapi.TestMaskedMacroLogicVector::verifyInt2 @ 3 (136 bytes)

Stack: [0x00007f83d89e0000,0x00007f83d8ae1000],  sp=0x00007f83d8adb530,  free space=1005k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x5cca5f]  Type::is_int() const [clone .part.0]+0x2f  (type.hpp:1967)
V  [libjvm.so+0x5d2488]  XorINode::Ideal(PhaseGVN*, bool)+0x468  (node.hpp:394)
V  [libjvm.so+0x151c7fe]  PhaseIterGVN::transform_old(Node*)+0x22e  (phaseX.cpp:833)
V  [libjvm.so+0x1516141]  PhaseIterGVN::optimize()+0x81  (phaseX.cpp:1218)
V  [libjvm.so+0x9f3a12]  PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x4d2  (loopnode.hpp:1203)
V  [libjvm.so+0x9efdae]  Compile::Optimize()+0x10fe  (compile.cpp:2350)
V  [libjvm.so+0x9f24c5]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1ae5  (compile.cpp:839)
V  [libjvm.so+0x84c414]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x3c4  (c2compiler.cpp:118)
V  [libjvm.so+0x9fe2c0]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa00  (compileBroker.cpp:2265)
V  [libjvm.so+0x9ff148]  CompileBroker::compiler_thread_loop()+0x618  (compileBroker.cpp:1944)
V  [libjvm.so+0xeb96fc]  JavaThread::thread_main_inner()+0xcc  (javaThread.cpp:719)
V  [libjvm.so+0x179b59a]  Thread::call_run()+0xba  (thread.cpp:217)
V  [libjvm.so+0x14980cc]  thread_native_entry(Thread*)+0x11c  (os_linux.cpp:775)

@jaskarth
Copy link
Member Author

I think the latest change should fix the error! The code now properly checks that both types are ints, and tier1 and hotspot_vector_1 testing passes without errors for me.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, the latest version looks good to me. I'll run some testing and report back once it finished.

@openjdk
Copy link

openjdk bot commented May 26, 2023

@jaskarth This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8051725: Improve expansion of Conv2B nodes in the middle-end

Reviewed-by: thartmann, qamai, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 213 new commits pushed to the master branch:

  • 3eced01: 8304074: [JMX] Add an approximation of total bytes allocated on the Java heap by the JVM
  • 15e0285: 8309110: Build failure after JDK-8307795 due to warnings in micro-benchmark StoreMaskTrueCount.java
  • 4526282: 8308977: gtest:codestrings fails on riscv
  • f600d03: 8307795: AArch64: Optimize VectorMask.truecount() on Neon
  • 07f2070: 8309095: Remove UTF-8 character from TaskbarPositionTest.java
  • 2b186e2: 8309042: MemorySegment::reinterpret cleanup action is not called for all overloads
  • 78aac24: 8308881: Strong CLD oop handle roots are demoted to non-roots concurrently
  • 1f1f604: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack
  • d35a550: 8309077: Problemlist compiler/jvmci/TestUncaughtErrorInCompileMethod.java
  • 457e1cb: 8308817: RISC-V: Support VectorTest node for Vector API
  • ... and 203 more: https://git.openjdk.org/jdk/compare/316837226ecceb4daa14e2bc1be8ce120edbfdc9...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@TobiHartmann, @RealFYang, @merykitty, @sviswa7) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 26, 2023
@TobiHartmann
Copy link
Member

All tests passed.

@jaskarth
Copy link
Member Author

Thanks a lot for testing, and thanks all for reviews and feedback with this change!

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label May 30, 2023
@openjdk
Copy link

openjdk bot commented May 30, 2023

@jaskarth
Your change (at version 65e841f) is now ready to be sponsored by a Committer.

@TobiHartmann
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented May 30, 2023

Going to push as commit fb0b1f0.
Since your change was applied there have been 213 commits pushed to the master branch:

  • 3eced01: 8304074: [JMX] Add an approximation of total bytes allocated on the Java heap by the JVM
  • 15e0285: 8309110: Build failure after JDK-8307795 due to warnings in micro-benchmark StoreMaskTrueCount.java
  • 4526282: 8308977: gtest:codestrings fails on riscv
  • f600d03: 8307795: AArch64: Optimize VectorMask.truecount() on Neon
  • 07f2070: 8309095: Remove UTF-8 character from TaskbarPositionTest.java
  • 2b186e2: 8309042: MemorySegment::reinterpret cleanup action is not called for all overloads
  • 78aac24: 8308881: Strong CLD oop handle roots are demoted to non-roots concurrently
  • 1f1f604: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack
  • d35a550: 8309077: Problemlist compiler/jvmci/TestUncaughtErrorInCompileMethod.java
  • 457e1cb: 8308817: RISC-V: Support VectorTest node for Vector API
  • ... and 203 more: https://git.openjdk.org/jdk/compare/316837226ecceb4daa14e2bc1be8ce120edbfdc9...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 30, 2023
@openjdk openjdk bot closed this May 30, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels May 30, 2023
@openjdk
Copy link

openjdk bot commented May 30, 2023

@TobiHartmann @jaskarth Pushed as commit fb0b1f0.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

5 participants