Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE #15010

Closed
wants to merge 2 commits into from

Conversation

pfustc
Copy link
Member

@pfustc pfustc commented Jul 25, 2023

Hotspot jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on AArch64 CPUs with 512-bit SVE. The reason is that many test loops in the code cannot be vectorized due to data dependence but IR tests assume they can.

On AArch64, these IR tests just check the CPU feature of asimd and incorrectly assumes AArch64 vectors are at most 256 bits. But actually, asimd on AArch64 only represents NEON vectors which are at most 128 bits. AArch64 CPUs may have another feature of sve which represents scalable vectors of at most 2048 bits. The vectorization won't succeed on 512-bit SVE CPUs if the memory offset between some read and write is less than 512 bits.

As this jtreg is auto-generated by a python script, we have updated the script and re-generated this jtreg. In this new version, we checked the auto-vectorization on both NEON-only and NEON+SVE platforms. Below is the diff of the generator script. We have also attached the new script to the JBS page.

@@ -321,7 +321,8 @@ class Type:
            p.append(Platform("avx512", ["avx512", "true"], 64))
         else:
            assert False, "type not implemented" + self.name
-        p.append(Platform("asimd", ["asimd", "true"], 32))
+        p.append(Platform("asimd", ["asimd", "true", "sve", "false"], 16))
+        p.append(Platform("sve", ["sve", "true"], 256))
         return p

 class Test:
@@ -457,7 +458,7 @@ class Generator:
         lines.append(" *   and various MaxVectorSize values, and +- AlignVector.")
         lines.append(" *")
         lines.append(" * Note: this test is auto-generated. Please modify / generate with script:")
-        lines.append(" *       https://bugs.openjdk.org/browse/JDK-8308606")
+        lines.append(" *       https://bugs.openjdk.org/browse/JDK-8312570")
         lines.append(" *")
         lines.append(" * Types: " + ", ".join([t.name for t in self.types]))
         lines.append(" * Offsets: " + ", ".join([str(o) for o in self.offsets]))
@@ -598,7 +599,8 @@ class Generator:
             # IR rules
             for p in test.t.platforms():
                 elements = p.vector_width // test.t.size
-                lines.append(f"    // CPU: {p.name} -> vector_width: {p.vector_width} -> elements in vector: {elements}")
+                max_pre = "max " if p.name == "sve" else ""
+                lines.append(f"    // CPU: {p.name} -> {max_pre}vector_width: {p.vector_width} -> {max_pre}elements in vector: {elements}")
                 ###############  -AlignVector
                 rule = PlatformIRRule(p)
                 rule.add_pre_constraint("AlignVector", IRBool.makeFalse())
@@ -694,8 +696,8 @@ class Generator:
 def main():
     g = Generator()
     g.generate("TestDependencyOffsets",
-               "/home/emanuel/Documents/fork7-jdk/open/test/hotspot/jtreg/compiler/loopopts/superword",
-               "8298935 8308606", # Big ID
+               "test/hotspot/jtreg/compiler/loopopts/superword",
+               "8298935 8308606 8312570", # Bug ID
                "compiler.loopopts.superword", # package
     )

We tested this on various of AArch64 CPUs.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15010/head:pull/15010
$ git checkout pull/15010

Update a local copy of the PR:
$ git checkout pull/15010
$ git pull https://git.openjdk.org/jdk.git pull/15010/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 15010

View PR using the GUI difftool:
$ git pr show -t 15010

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15010.diff

Webrev

Link to Webrev Comment

…fsets.java fails on 512-bit SVE

Hotspot jtreg `compiler/loopopts/superword/TestDependencyOffsets.java`
fails on AArch64 CPUs with 512-bit SVE. The reason is that many test
loops in the code cannot be vectorized due to data dependence but IR
tests assume they can.

On AArch64, these IR tests just check the CPU feature of `asimd` and
incorrectly assumes AArch64 vectors are at most 256 bits. But actually,
`asimd` on AArch64 only represents NEON vectors which are at most 128
bits. AArch64 CPUs may have another feature of `sve` which represents
scalable vectors of at most 2048 bits. The vectorization won't succeed
on 512-bit SVE CPUs if the memory offset between some read and write is
less than 512 bits.

As this jtreg is auto-generated by a python script, we have updated the
script and re-generated this jtreg. In this new version, we checked the
auto-vectorization on both NEON-only and NEON+SVE platforms. Below is
the diff of the generator script. We have also attached the new script
to the JBS page.

```
@@ -321,7 +321,8 @@ class Type:
            p.append(Platform("avx512", ["avx512", "true"], 64))
         else:
            assert False, "type not implemented" + self.name
-        p.append(Platform("asimd", ["asimd", "true"], 32))
+        p.append(Platform("asimd", ["asimd", "true", "sve", "false"], 16))
+        p.append(Platform("sve", ["sve", "true"], 256))
         return p

 class Test:
@@ -457,7 +458,7 @@ class Generator:
         lines.append(" *   and various MaxVectorSize values, and +- AlignVector.")
         lines.append(" *")
         lines.append(" * Note: this test is auto-generated. Please modify / generate with script:")
-        lines.append(" *       https://bugs.openjdk.org/browse/JDK-8308606")
+        lines.append(" *       https://bugs.openjdk.org/browse/JDK-8312570")
         lines.append(" *")
         lines.append(" * Types: " + ", ".join([t.name for t in self.types]))
         lines.append(" * Offsets: " + ", ".join([str(o) for o in self.offsets]))
@@ -598,7 +599,8 @@ class Generator:
             # IR rules
             for p in test.t.platforms():
                 elements = p.vector_width // test.t.size
-                lines.append(f"    // CPU: {p.name} -> vector_width: {p.vector_width} -> elements in vector: {elements}")
+                max_pre = "max " if p.name == "sve" else ""
+                lines.append(f"    // CPU: {p.name} -> {max_pre}vector_width: {p.vector_width} -> {max_pre}elements in vector: {elements}")
                 ###############  -AlignVector
                 rule = PlatformIRRule(p)
                 rule.add_pre_constraint("AlignVector", IRBool.makeFalse())
@@ -694,8 +696,8 @@ class Generator:
 def main():
     g = Generator()
     g.generate("TestDependencyOffsets",
-               "/home/emanuel/Documents/fork7-jdk/open/test/hotspot/jtreg/compiler/loopopts/superword",
-               "8298935 8308606", # Big ID
+               "test/hotspot/jtreg/compiler/loopopts/superword",
+               "8298935 8308606 8312570", # Bug ID
                "compiler.loopopts.superword", # package
     )
```

We tested this on various of AArch64 CPUs.
@bridgekeeper
Copy link

bridgekeeper bot commented Jul 25, 2023

👋 Welcome back pli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 25, 2023
@openjdk
Copy link

openjdk bot commented Jul 25, 2023

@pfustc The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jul 25, 2023
@mlbridge
Copy link

mlbridge bot commented Jul 25, 2023

Webrevs

@pfustc
Copy link
Member Author

pfustc commented Jul 25, 2023

@eme64 Please help look at this. And, how about adding the test generator script you wrote into the jdk repo?

@TobiHartmann
Copy link
Member

Emanuel is on vacation until Aug 9. Can this wait or should we problem list?

@pfustc
Copy link
Member Author

pfustc commented Jul 31, 2023

Emanuel is on vacation until Aug 9. Can this wait or should we problem list?

Thanks for the info. It's ok for us to wait, since very few people are using 512-bit SVE today.

@eme64
Copy link
Contributor

eme64 commented Aug 3, 2023

@pfustc @TobiHartmann I just saw this on my emails. So I'll give a quick response:

We had this running on Aarch64 machines with asimd but without sve. Why do you think that this even passed with my 32 byte assumption (256 bit)? You say it should only have 128 bit.

What is the max_pre for? Is it necessary?

Adding the script to the jdk repo could be nice. But I think there may be some issues with adding python files. We may have to rewrite it in java. But I think for now adding and updating it via JBS is ok.

@pfustc
Copy link
Member Author

pfustc commented Aug 4, 2023

We had this running on Aarch64 machines with asimd but without sve. Why do you think that this even passed with my 32 byte assumption (256 bit)? You say it should only have 128 bit.

Assuming NEON has larger vector size (256 bit, which is wrong) won't result in any failure on NEON-only machines. But it results in running less IR checks on 256-bit SVE. Let's take below IR condition change as an example.

-        applyIfAnd = {"AlignVector", "false", "MaxVectorSize", ">= 8", "MaxVectorSize", "<= 16"},
+        applyIfAnd = {"AlignVector", "false", "MaxVectorSize", ">= 8"},

Before this patch, the existence of vector IRs won't be checked on 256-bit SVE as we have MaxVectorSize <= 16. After this patch, it will be checked. The main reason of failures on 512-bit SVE is the lack of sve == false check so the IR tests will run on machines with vector length > 256 bits.

What is the max_pre for? Is it necessary?

It just adds a prefix to make the comment more precise, as SVE uses scalable vectors and the vector length ranges from 128 bits to 2048 bits.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pfustc Thanks for the changes and explanations, looks good to me! :)

Ah. Just one more idea: Since you now have even longer vector widths with 2048 bits: Should we not add some cases with even larger dependency offsets? We should go further than -196, 196. We could consider adding 255, 256, 511, 512, 1024, 1536 (positive and negative). Of course the question is if that increases the runtime too much, what do you think?

@openjdk
Copy link

openjdk bot commented Aug 10, 2023

@pfustc This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE

Reviewed-by: epeter, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 10, 2023
Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

@eme64
Copy link
Contributor

eme64 commented Aug 15, 2023

@pfustc you will have to merge the changes from JDK-8310308.

@pfustc
Copy link
Member Author

pfustc commented Aug 28, 2023

Thanks @eme64 @vnkozlov for review. Patch is currently merged with master. The python generator file is also updated and attached to the JBS page.

Ah. Just one more idea: Since you now have even longer vector widths with 2048 bits: Should we not add some cases with even larger dependency offsets? We should go further than -196, 196. We could consider adding 255, 256, 511, 512, 1024, 1536 (positive and negative). Of course the question is if that increases the runtime too much, what do you think?

This is a good suggestion but I don't think it's necessary now and in the near future. Although Arm architecture document says SVE vector length can be at most 2048 bits, so far, the largest SVE vector length of real AArch64 CPUs in the world is 512 bits. AFAIK, there will not be real CPUs with 1024-bit SVE in the short term. So I prefer keeping current offset values for now.

@eme64
Copy link
Contributor

eme64 commented Aug 28, 2023

@pfustc ok, fair enough. If there is no real hardware, and not coming anytime soon, then we can leave the offsets as is. Thanks again for the fix!

@pfustc
Copy link
Member Author

pfustc commented Aug 29, 2023

/integrate

@openjdk
Copy link

openjdk bot commented Aug 29, 2023

Going to push as commit e5ea9aa.
Since your change was applied there have been 16 commits pushed to the master branch:

  • 1cb2cc6: 8308464: Shared array class should not always be loaded in boot loader
  • 69d1feb: 8315060: Out of tree incremental build fails with ccache
  • 8e2a533: 8315137: Add explicit override RecordComponentElement.asType()
  • b4b2fec: 8311081: KeytoolReaderP12Test.java fail on localized Windows platform
  • 31e2681: 8315071: Modify TrayIconScalingTest.java, PrintLatinCJKTest.java to use new PassFailJFrame's builder pattern usage
  • 21916f3: 8139208: [macosx] Issue with setExtendedState of JFrame
  • 99ea8bf: 8315062: [GHA] get-bootjdk action should return the abolute path
  • acb24bf: 8315116: fix minor issue in copyright header introduced by JDK-8269957 that is breaking the build
  • 11da15d: 8269957: facilitate alternate impls of NameTable and Name
  • 725ec0c: 8315020: The macro definition for LoongArch64 zero build is not accurate.
  • ... and 6 more: https://git.openjdk.org/jdk/compare/12de9b0225363377e9a76729b11698221d4f29f2...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Aug 29, 2023
@openjdk openjdk bot closed this Aug 29, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Aug 29, 2023
@openjdk
Copy link

openjdk bot commented Aug 29, 2023

@pfustc Pushed as commit e5ea9aa.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@pfustc pfustc deleted the deptest branch August 29, 2023 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
4 participants