-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 #21480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ults for vector length 2
👋 Welcome back sviswanathan! A progress list of the required criteria for merging this PR into |
@sviswa7 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 133 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks good to me. @eme64 should have a look as well.
I submitted testing and will report back once it passed.
public void test_float_float16_short_vector(short[] sout, float[] finp) { | ||
for (int i = 0; i < finp.length - 1; i++) { | ||
sout[i+0] = Float.floatToFloat16(finp[i+0]); | ||
sout[i+1] = Float.floatToFloat16(finp[i+1]); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your test looks different than the one that I added on JIRA. Can you please add that one as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out. I have modified the contents of the loop kernel to match your testcase loop kernel now. I also verified that it fails before the fix and passes after the fix.
Before the fix the test fails:
Test results: failed: 1
And the jtr file shows the following:
Custom Run Test: @Run: kernel_test_float_float16 - @Tests:
{test_float_float16,test_float_float16_strided,test_float_float16_short_vector}:
compiler.lib.ir_framework.shared.TestRunException: There was an error while invoking @Run method public void
compiler.vectorization.TestFloatConversionsVector.kernel_test_float_float16()
at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:162)
at compiler.lib.ir_framework.test.CustomRunTest.run(CustomRunTest.java:87)
at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:861)
at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:252)
at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:165)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:119)
at java.base/java.lang.reflect.Method.invoke(Method.java:573)
at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:159)
... 4 more
Caused by: java.lang.RuntimeException: assertEquals expected: 18483 but was: 0
at jdk.test.lib.Asserts.fail(Asserts.java:691)
at jdk.test.lib.Asserts.assertEquals(Asserts.java:204)
at jdk.test.lib.Asserts.assertEquals(Asserts.java:191)
at compiler.vectorization.TestFloatConversionsVector.kernel_test_float_float16(TestFloatConversionsVector.java:112)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
... 6 more
After the fix the test passes with no failures:
Test results: passed: 1
Please let me know if this works or you would like to see any other change.
src/hotspot/cpu/x86/x86.ad
Outdated
@@ -3676,6 +3676,7 @@ instruct vconvF2HF(vec dst, vec src) %{ | |||
%} | |||
|
|||
instruct vconvF2HF_mem_reg(memory mem, vec src) %{ | |||
predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, and so what alternative path is the matcher now going to take if we have vector_length = 8
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eme64 It is now going to do the load from memory in register and use the register/register version for the conversion and then use the store to memory. Please see before and after code snippets below.
Generated code snippet for 2 element float vector to float16 vector conversion
Before:
vmovq 0x10(%rdx,%rbx,4),%xmm2 ; load 8 bytes from memory into xmm2 (correct)
vcvtps2ph $0x4,%xmm2,0x10(%rsi,%rbx,2) ; convert to float16 and store 8 bytes to memory (incorrect)
After:
vmovq 0x10(%rdx,%rbx,4),%xmm15 ; load 8 bytes from memory into xmm15 (correct)
vcvtps2ph $0x4,%xmm15,%xmm0 ; convert to float16 into register (correct)
vmovd %xmm0,0x10(%rsi,%rbx,2) ; store 4 byte into memory (correct)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. You are using a 4-element register-only vcvtps2ph
instruction, but only use the first 2-elements of it. Great :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! It looks good to me now.
I have one more wish:
Could you allow to run the test on all platforms please?
test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java
Currently, it only runs on selected platforms, see @requires
. We should really run this on all platforms and compilers. The IR rules can be limited to platforms and features. That ensures that we do not have similar bugs elsewhere, and our test-coverage would be increased.
src/hotspot/cpu/x86/x86.ad
Outdated
@@ -3676,6 +3676,7 @@ instruct vconvF2HF(vec dst, vec src) %{ | |||
%} | |||
|
|||
instruct vconvF2HF_mem_reg(memory mem, vec src) %{ | |||
predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add an eligant prediction check like following instead of accesing bare inputs.
n->as_StoreVector()->memory_size() >= 16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have used bare inputs at many places in the ad file in the predicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its ok to use safe cast if its available atleast for newly added code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jatin-bhateja I have made this change. Please take a look.
test_float_float16_short_vector(sout, finp); | ||
} | ||
|
||
// Verifying the result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added test follows the verification mechanism used already in the test. I would prefer not to get into refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, we can do it separately.
@eme64 I have attempted to update the test accordingly. Please take a look. |
@TobiHartmann Please do let me know if the testing passed. |
Sorry for the delay. I re-submitted testing with the latest version and it all passed. |
Thanks a lot @TobiHartmann. |
/integrate |
Going to push as commit 153ad91.
Your commit was automatically rebased without conflicts. |
When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above.
Also a regression test case is added accordingly.
Best Regards,
Sandhya
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21480/head:pull/21480
$ git checkout pull/21480
Update a local copy of the PR:
$ git checkout pull/21480
$ git pull https://git.openjdk.org/jdk.git pull/21480/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 21480
View PR using the GUI difftool:
$ git pr show -t 21480
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21480.diff
Webrev
Link to Webrev Comment