8264054: Bad XMM performance on java.lang.MathBench.sqrtDouble #3256
For the j.l.Math JMH at https://github.com/openjdk/jmh-jdk-microbenchmarks/blob/master/micros-jdk11/src/main/java/org/openjdk/bench/java/lang/MathBench.java, the performance for sqrt benchmark could be improved. Thanks a lot to Eric Caspole for finding this issue.
Current code generated (linux format) by c2 JIT is:
The vsqrtsd instruction operation is specified as below:
The upper 127:64 bits are set from previous contents of xmm0. As the destination xmm0 register was not initialized prior to use by c2 JIT, this causes stall and lower performance.
By adding xmm0 initialization prior to use, the performance of the above benchmark improves significantly.
Code generated after patch:
Performance before patch:
Performance after patch:
I added these micros the the https://github.com/openjdk/jmh-jdk-microbenchmarks a couple weeks ago, before we knew about this problem, because it is easy to work on JMH with Maven. But it would be best to contribute these micros into the JDK repo at the same time as this patch. I will get it together for the JDK repo and connect with Sandhya.
@vnkozlov Both xor(xmm) + sqrt(xmm, mem) or mov(xmm, mem) + sqrt(xmm,xmm) have same performance. I could look into simplifying the patch by just keeping the register version of these rules.
I looked through x86.ad for other usages of unary instructions but didn't find any other instances. We implement negate and abs using binary instructions.
@sviswa7 This change now passes all automated pre-integration checks.
After integration, the commit message for the final commit will be:
At the time when this comment was updated there had been 130 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
@sviswa7 Since your change was applied there have been 131 commits pushed to the
Your commit was automatically rebased without conflicts.
Pushed as commit 52d8a22.