8259775: [Vector API] Incorrect code-gen for VectorReinterpret operation #122
Conversation
/issue add JDK-8259775 |
|
@DamonFool This issue is referenced in the PR title - it will now be updated. |
@DamonFool |
@DamonFool The |
Webrevs
|
/test |
Hi all, The reason for the wrong execution is that the upper bits of vector registers fails to be zeroed. The 4-byte vectors also be fixed by using movfltz since we are not recommended to use movss directly [1]. Could you please review it? Thanks. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L1048 |
@DamonFool This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 12 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
|
Good to know that the code was introduced in 16 so that no regression is introduced.
Approved.
Don't forget to request approval for JDK 16 fix integration:
http://openjdk.java.net/jeps/3#Fix-Request-Process
@@ -168,6 +168,9 @@ class MacroAssembler: public Assembler { | |||
void movflt(XMMRegister dst, AddressLiteral src); | |||
void movflt(Address dst, XMMRegister src) { movss(dst, src); } | |||
|
|||
// Move with zero extension | |||
void movfltz(XMMRegister dst, XMMRegister src) { movss(dst, src); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems movdbl(XMMRegister dst, XMMRegister src)
has the same issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems
movdbl(XMMRegister dst, XMMRegister src)
has the same issue.
Good catch.
I will try to make a reproducer and fix it in another pr since VectorReinterpret doesn't use it.
Thanks.
Will integrate it later since the jdk16-fix-request will be approved after PR is finished. |
I approved this fix for JDK 16.
I misread your comment and thought you will also fix movdbl() here.
I am fine with fixing it in separate PR.
/integrate |
@DamonFool Since your change was applied there have been 12 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit d90e06a. |
Hi all,
The code-gen for VectorReinterpret may be wrong on x86.
Let's see the opto-assembly for the reproducer in the JBS, which was actually based on @XiaohongGong 's example in JDK-8259353 and many thanks to her.
Please note that the dst and src [1] share the same XMM0 register and movdqu [2] should be generated for this case.
But when dst == src, movdqu actually generates nothing [3], which leads to incorrect result;
For this case, movdqu should not be empty since the upper bits of dst should be zeroed.
The similar error also exists for vmovdqu [4].
I think we should also change movflt [5] to movss but I just can't understand why we have 4-byte vectors.
Isn't the shortest vectors 8-byte on x86?
Thanks.
Best regards,
Jie
[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L3354
[2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L3364
[3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L2490
[4] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L2515
[5] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L3379
Progress
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk16 pull/122/head:pull/122
$ git checkout pull/122