Skip to content

8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 #21480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

sviswa7
Copy link

@sviswa7 sviswa7 commented Oct 11, 2024

When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above.
Also a regression test case is added accordingly.

Best Regards,
Sandhya


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8338126: C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21480/head:pull/21480
$ git checkout pull/21480

Update a local copy of the PR:
$ git checkout pull/21480
$ git pull https://git.openjdk.org/jdk.git pull/21480/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21480

View PR using the GUI difftool:
$ git pr show -t 21480

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21480.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 11, 2024

👋 Welcome back sviswanathan! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 11, 2024

@sviswa7 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8338126: C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2

Reviewed-by: thartmann, jbhateja, epeter

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 133 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Oct 11, 2024

@sviswa7 The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Oct 11, 2024
@sviswa7 sviswa7 marked this pull request as ready for review October 11, 2024 23:30
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 11, 2024
@mlbridge
Copy link

mlbridge bot commented Oct 11, 2024

Webrevs

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good to me. @eme64 should have a look as well.

I submitted testing and will report back once it passed.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 14, 2024
Comment on lines 71 to 76
public void test_float_float16_short_vector(short[] sout, float[] finp) {
for (int i = 0; i < finp.length - 1; i++) {
sout[i+0] = Float.floatToFloat16(finp[i+0]);
sout[i+1] = Float.floatToFloat16(finp[i+1]);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your test looks different than the one that I added on JIRA. Can you please add that one as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out. I have modified the contents of the loop kernel to match your testcase loop kernel now. I also verified that it fails before the fix and passes after the fix.

Before the fix the test fails:
Test results: failed: 1

And the jtr file shows the following:

  Custom Run Test: @Run: kernel_test_float_float16 - @Tests: 
  {test_float_float16,test_float_float16_strided,test_float_float16_short_vector}:
  compiler.lib.ir_framework.shared.TestRunException: There was an error while invoking @Run method public void 
  compiler.vectorization.TestFloatConversionsVector.kernel_test_float_float16()
        at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:162)
        at compiler.lib.ir_framework.test.CustomRunTest.run(CustomRunTest.java:87)
        at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:861)
        at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:252)
        at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:165)
Caused by: java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:119)
        at java.base/java.lang.reflect.Method.invoke(Method.java:573)
        at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:159)
        ... 4 more
Caused by: java.lang.RuntimeException: assertEquals expected: 18483 but was: 0
        at jdk.test.lib.Asserts.fail(Asserts.java:691)
        at jdk.test.lib.Asserts.assertEquals(Asserts.java:204)
        at jdk.test.lib.Asserts.assertEquals(Asserts.java:191)
        at compiler.vectorization.TestFloatConversionsVector.kernel_test_float_float16(TestFloatConversionsVector.java:112)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        ... 6 more

After the fix the test passes with no failures:
Test results: passed: 1

Please let me know if this works or you would like to see any other change.

@@ -3676,6 +3676,7 @@ instruct vconvF2HF(vec dst, vec src) %{
%}

instruct vconvF2HF_mem_reg(memory mem, vec src) %{
predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and so what alternative path is the matcher now going to take if we have vector_length = 8?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eme64 It is now going to do the load from memory in register and use the register/register version for the conversion and then use the store to memory. Please see before and after code snippets below.

Generated code snippet for 2 element float vector to float16 vector conversion
Before:
vmovq 0x10(%rdx,%rbx,4),%xmm2 ; load 8 bytes from memory into xmm2 (correct)
vcvtps2ph $0x4,%xmm2,0x10(%rsi,%rbx,2) ; convert to float16 and store 8 bytes to memory (incorrect)

After:
vmovq 0x10(%rdx,%rbx,4),%xmm15 ; load 8 bytes from memory into xmm15 (correct)
vcvtps2ph $0x4,%xmm15,%xmm0 ; convert to float16 into register (correct)
vmovd %xmm0,0x10(%rsi,%rbx,2) ; store 4 byte into memory (correct)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. You are using a 4-element register-only vcvtps2ph instruction, but only use the first 2-elements of it. Great :)

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Oct 14, 2024
Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! It looks good to me now.

I have one more wish:
Could you allow to run the test on all platforms please?
test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java

Currently, it only runs on selected platforms, see @requires. We should really run this on all platforms and compilers. The IR rules can be limited to platforms and features. That ensures that we do not have similar bugs elsewhere, and our test-coverage would be increased.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 15, 2024
@@ -3676,6 +3676,7 @@ instruct vconvF2HF(vec dst, vec src) %{
%}

instruct vconvF2HF_mem_reg(memory mem, vec src) %{
predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add an eligant prediction check like following instead of accesing bare inputs.

n->as_StoreVector()->memory_size() >= 16.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have used bare inputs at many places in the ad file in the predicate.

Copy link
Member

@jatin-bhateja jatin-bhateja Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its ok to use safe cast if its available atleast for newly added code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jatin-bhateja I have made this change. Please take a look.

test_float_float16_short_vector(sout, finp);
}

// Verifying the result
Copy link
Member

@jatin-bhateja jatin-bhateja Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using IR framework, we can leverage existing @Check annotation for verification which works in conjunction with @test method, it will automatically invoke validation after test method execution. We may need little refactoring for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added test follows the verification mechanism used already in the test. I would prefer not to get into refactoring.

Copy link
Member

@jatin-bhateja jatin-bhateja Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, we can do it separately.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Oct 16, 2024
@sviswa7
Copy link
Author

sviswa7 commented Oct 16, 2024

Thanks for the updates! It looks good to me now.

I have one more wish: Could you allow to run the test on all platforms please? test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java

Currently, it only runs on selected platforms, see @requires. We should really run this on all platforms and compilers. The IR rules can be limited to platforms and features. That ensures that we do not have similar bugs elsewhere, and our test-coverage would be increased.

@eme64 I have attempted to update the test accordingly. Please take a look.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 16, 2024
@sviswa7
Copy link
Author

sviswa7 commented Oct 17, 2024

That looks good to me. @eme64 should have a look as well.

I submitted testing and will report back once it passed.

@TobiHartmann Please do let me know if the testing passed.

@TobiHartmann
Copy link
Member

Sorry for the delay. I re-submitted testing with the latest version and it all passed.

@sviswa7
Copy link
Author

sviswa7 commented Oct 21, 2024

Thanks a lot @TobiHartmann.

@sviswa7
Copy link
Author

sviswa7 commented Oct 21, 2024

/integrate

@openjdk
Copy link

openjdk bot commented Oct 21, 2024

Going to push as commit 153ad91.
Since your change was applied there have been 156 commits pushed to the master branch:

  • 80ec552: 8328528: C2 should optimize long-typed parallel iv in an int counted loop
  • 330f2b5: 8342295: compiler/jvmci/TestJVMCISavedProperties.java fails due to garbage in output
  • 1f35748: 8342102: ZGC: Optimize copy constructors in ZPhysicalMemory
  • 66ddaaa: 8340241: RISC-V: Returns mispredicted
  • 07f550b: 8340818: Add a new jtreg test root to test the generated documentation
  • 27ef6c9: 8341470: BigDecimal.stripTrailingZeros() optimization
  • 5d5d88a: 8339570: Add Tidy build support for JDK tests
  • 239d84a: 8342578: GHA: RISC-V: Bootstrap using Debian snapshot is still failing
  • aa060f2: 8342334: CDS: Scratch mirrors should not point to dead klasses
  • 680dc5d: 8342496: C2/Shenandoah: SEGV in compiled code when running jcstress
  • ... and 146 more: https://git.openjdk.org/jdk/compare/593c27e69703875115e6db5843a3743ba9bd8c18...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 21, 2024
@openjdk openjdk bot closed this Oct 21, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 21, 2024
@openjdk
Copy link

openjdk bot commented Oct 21, 2024

@sviswa7 Pushed as commit 153ad91.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants