-
Notifications
You must be signed in to change notification settings - Fork 76
8259775: [Vector API] Incorrect code-gen for VectorReinterpret operation #122
Conversation
/issue add JDK-8259775 |
👋 Welcome back jiefu! A progress list of the required criteria for merging this PR into |
@DamonFool This issue is referenced in the PR title - it will now be updated. |
@DamonFool |
@DamonFool The |
Webrevs
|
/test |
Hi all, The reason for the wrong execution is that the upper bits of vector registers fails to be zeroed. The 4-byte vectors also be fixed by using movfltz since we are not recommended to use movss directly [1]. Could you please review it? Thanks. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L1048 |
@DamonFool This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 12 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know that the code was introduced in 16 so that no regression is introduced.
Approved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to request approval for JDK 16 fix integration:
http://openjdk.java.net/jeps/3#Fix-Request-Process
@@ -168,6 +168,9 @@ class MacroAssembler: public Assembler { | |||
void movflt(XMMRegister dst, AddressLiteral src); | |||
void movflt(Address dst, XMMRegister src) { movss(dst, src); } | |||
|
|||
// Move with zero extension | |||
void movfltz(XMMRegister dst, XMMRegister src) { movss(dst, src); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems movdbl(XMMRegister dst, XMMRegister src)
has the same issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems
movdbl(XMMRegister dst, XMMRegister src)
has the same issue.
Good catch.
I will try to make a reproducer and fix it in another pr since VectorReinterpret doesn't use it.
Thanks.
Will integrate it later since the jdk16-fix-request will be approved after PR is finished. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approved this fix for JDK 16.
I misread your comment and thought you will also fix movdbl() here.
I am fine with fixing it in separate PR.
/integrate |
@DamonFool Since your change was applied there have been 12 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit d90e06a. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Hi all,
The code-gen for VectorReinterpret may be wrong on x86.
Let's see the opto-assembly for the reproducer in the JBS, which was actually based on @XiaohongGong 's example in JDK-8259353 and many thanks to her.
Please note that the dst and src [1] share the same XMM0 register and movdqu [2] should be generated for this case.
But when dst == src, movdqu actually generates nothing [3], which leads to incorrect result;
For this case, movdqu should not be empty since the upper bits of dst should be zeroed.
The similar error also exists for vmovdqu [4].
I think we should also change movflt [5] to movss but I just can't understand why we have 4-byte vectors.
Isn't the shortest vectors 8-byte on x86?
Thanks.
Best regards,
Jie
[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L3354
[2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L3364
[3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L2490
[4] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L2515
[5] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L3379
Progress
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk16 pull/122/head:pull/122
$ git checkout pull/122