Skip to content
This repository has been archived by the owner. It is now read-only.

8278889: AArch64: [vectorapi] VectorMaskLoadStoreTest.testMaskCast() test fail #49

Closed
wants to merge 2 commits into from

Conversation

theRealELiu
Copy link
Contributor

@theRealELiu theRealELiu commented Dec 20, 2021

This bug appears intermittently and it's caused by vmaskAll_immI[1]
when the vector mask size is smaller than max predicate size of running
machine. It generates an all-true predicate without considering those
inactive bits. That may result in the wrong result of VectorMask.toLong.
The problematic code is as below:

        ShortVector.SPECIES_64.MaskAll(true).toLong()

assembly:

        ptrue   p0.h          <= MaskAll(true)
        mov     z16.h, p0/z, #1
        mov     z17.h, #0
        uzp1    z16.b, z16.b, z17.b
        fmov    x10, d16
        orr     x10, x10, x10, lsr #7
        orr     x10, x10, x10, lsr #14
        orr     x10, x10, x10, lsr #28
        and     x10, x10, #0xff

(gdb) p/x $p0 # on an SVE machine with vector length as 64 in bytes
$1 = {0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55}

Expected:
(gdb) p/x $p0
$1 = {0x55, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}

Considering MaskAll is used in VectorMask.fromLong() only for a special
case and relies on the mechanism of inline and intrinsification, even it
could be optimized out, this patch also adds test cases for MaskAll to
reproduce this issue stably.

Also fix a small issue on register utilization for
sve_reduce_[max|min][D|F].

[1] https://github.com/openjdk/jdk18/blob/master/src/hotspot/cpu/aarch64/aarch64_sve.ad#L416

hotspot/compiler/vectorapi, jdk/incubator/vector passed on SVE enabled
system.

Change-Id: I9631f26f9232ffe7a28b74f14062d945c32fa1fb


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8278889: AArch64: [vectorapi] VectorMaskLoadStoreTest.testMaskCast() test fail

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk18 pull/49/head:pull/49
$ git checkout pull/49

Update a local copy of the PR:
$ git checkout pull/49
$ git pull https://git.openjdk.java.net/jdk18 pull/49/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 49

View PR using the GUI difftool:
$ git pr show -t 49

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk18/pull/49.diff

…test fail

This bug appears intermittently and it's caused by vmaskAll_immI[1]
when the vector mask size is smaller than max predicate size of running
machine. It generates an all-true predicate without considering those
inactive bits. That may result in the wrong result of VectorMask.toLong.
The problematic code is as below:

```
        ShortVector.SPECIES_64.MaskAll(true).toLong()

assembly:

        ptrue   p0.h          <= MaskAll(true)
        mov     z16.h, p0/z, openjdk#1
        mov     z17.h, #0
        uzp1    z16.b, z16.b, z17.b
        fmov    x10, d16
        orr     x10, x10, x10, lsr openjdk#7
        orr     x10, x10, x10, lsr openjdk#14
        orr     x10, x10, x10, lsr openjdk#28
        and     x10, x10, #0xff

(gdb) p/x $p0 # on an SVE machine with vector length as 64 in bytes
$1 = {0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55}

Expected:
(gdb) p/x $p0
$1 = {0x55, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}

```

Considering MaskAll is used in VectorMask.fromLong() only for a special
case and relies on the mechanism of inline and intrinsification, even it
could be optimized out, this patch also adds test cases for MaskAll to
reproduce this issue stably.

Also fix a small issue on register utilization for
sve_reduce_[max|min][D|F].

[1] https://github.com/openjdk/jdk18/blob/master/src/hotspot/cpu/aarch64/aarch64_sve.ad#L416

hotspot/compiler/vectorapi, jdk/incubator/vector passed on SVE enabled
system.

Change-Id: I9631f26f9232ffe7a28b74f14062d945c32fa1fb
@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Dec 20, 2021

👋 Welcome back eliu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Dec 20, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Dec 20, 2021

@theRealELiu The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler label Dec 20, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Dec 20, 2021

Webrevs

nsjian
nsjian approved these changes Dec 21, 2021
Copy link

@nsjian nsjian left a comment

Thanks for the fix! Overall looks good to me. Just one enhancement suggestion.

src/hotspot/cpu/aarch64/aarch64_sve_ad.m4 Outdated Show resolved Hide resolved
Change-Id: Id71ebe5161fac08a689ee3ec538b485f6c172186
sve_ptrue(dst, size, /* VL128 */ 0b01100);
break;
case 256:
sve_ptrue(dst, size, /* VL256 */ 0b01101);
Copy link
Member

@dean-long dean-long Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use an enum for these magic constants?

Copy link
Contributor Author

@theRealELiu theRealELiu Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these magic numbers are only used by sve_ptrue and probably there should be no one need to write these magic numbers again. If these magic numbers are really common used, I agree with yours that enum is better.

Copy link

@vnkozlov vnkozlov left a comment

I ran tier1 and 3 tests in test/hotspot/jtreg/gc/stress/gcbasher/ failed on macosx-aarch64:

java.lang.IllegalStateException
	at gc.stress.gcbasher.ByteCursor.readUtf8(ByteCursor.java:110)
	at gc.stress.gcbasher.Decompiler.decodeConstantPool(Decompiler.java:310)
	at gc.stress.gcbasher.Decompiler.<init>(Decompiler.java:42)
	at gc.stress.gcbasher.TestGCBasher.parseClassFiles(TestGCBasher.java:46)
	at gc.stress.gcbasher.TestGCBasher.main(TestGCBasher.java:63)
	at gc.stress.gcbasher.TestGCBasherWithG1.main(TestGCBasherWithG1.java:40)

It seems new failure.

@theRealELiu
Copy link
Contributor Author

@theRealELiu theRealELiu commented Dec 22, 2021

I ran tier1 and 3 tests in test/hotspot/jtreg/gc/stress/gcbasher/ failed on macosx-aarch64:

java.lang.IllegalStateException
	at gc.stress.gcbasher.ByteCursor.readUtf8(ByteCursor.java:110)
	at gc.stress.gcbasher.Decompiler.decodeConstantPool(Decompiler.java:310)
	at gc.stress.gcbasher.Decompiler.<init>(Decompiler.java:42)
	at gc.stress.gcbasher.TestGCBasher.parseClassFiles(TestGCBasher.java:46)
	at gc.stress.gcbasher.TestGCBasher.main(TestGCBasher.java:63)
	at gc.stress.gcbasher.TestGCBasherWithG1.main(TestGCBasherWithG1.java:40)

It seems new failure.

Thanks for your testing! Looks like it's the same as https://bugs.openjdk.java.net/browse/JDK-8275263.

@vnkozlov
Copy link

@vnkozlov vnkozlov commented Dec 22, 2021

It is actually 8275316 which was closed as duplicate 8275263 .
But, as I understand, 8275263 was also closed as duplicate of [BACKOUT] 8275262.
So there should not be code in JDK 18 which cause it.

Note, I ran additional testing of the same local repo build but without this 8278889 changes and I don't see gcbasher failures.

@theRealELiu
Copy link
Contributor Author

@theRealELiu theRealELiu commented Dec 22, 2021

It is actually 8275316 which was closed as duplicate 8275263 . But, as I understand, 8275263 was also closed as duplicate of [BACKOUT] 8275262. So there should not be code in JDK 18 which cause it.

Note, I ran additional testing of the same local repo build but without this 8278889 changes and I don't see gcbasher failures.

Thanks for your feedback.

The failure is really bizarre. Is it possible that caused by the specific toolchain ? This patch doesn't change any common code except the test cases, all the changes are SVE related, and I think mac couldn't touch SVE related code at this moment. Moreover, the log is the same as 8275316, which means perhaps some other patches will trigger this failure as well before the root cause has been fixed. In this way, expose this failure and post a new JBS to involve some mac experts maybe not bad. Otherwise, some patches which looks okay but would be blocked unfortunately.

@vnkozlov
Copy link

@vnkozlov vnkozlov commented Dec 22, 2021

This is indeed bizarre. I submitted new build in our infra only for macosx-aarch64-debug and test passed with it (20 times). And it immediately failed with old build.
I checked: OS version is the same and clang (and xcode) exactly the same for both builds.

My only suspicion is we may have some uninitialized value[s] in VM somewhere and depending on JVM's code and data layout we can get these strange failures.

I am repeating whole tier1 build and testing on all platforms.

@vnkozlov
Copy link

@vnkozlov vnkozlov commented Dec 22, 2021

I filed https://bugs.openjdk.java.net/browse/JDK-8279177
tier1 re-testing passed. Running tier2 and tier3 now.

Copy link

@vnkozlov vnkozlov left a comment

Re-testing passed.

@openjdk
Copy link

@openjdk openjdk bot commented Dec 22, 2021

@theRealELiu This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8278889: AArch64: [vectorapi] VectorMaskLoadStoreTest.testMaskCast() test fail

Reviewed-by: njian, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 16 new commits pushed to the master branch:

  • 9d5ae2e: 8279076: C2: Bad AD file when matching SqrtF with UseSSE=0
  • 04ee921: 8278967: rmiregistry fails to start because SecurityManager is disabled
  • 2be3e7e: 8278239: vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine failed with EXCEPTION_ACCESS_VIOLATION at 0x000000000000000d
  • dfb15c3: 8274315: JFR: One closed state per file or stream
  • e49d4a9: 8271447: java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters
  • 713fbeb: 8278987: RunThese24H.java failed with EXCEPTION_ACCESS_VIOLATION in write_sample_info
  • 7341439: 8279007: jstatd fails to start because SecurityManager is disabled
  • 97c5cd7: 8278508: Enable X86 maskAll instruction pattern for 32 bit JVM.
  • 9ee3ccf: 8279045: Intrinsics missing vzeroupper instruction
  • 84d3333: 8279081: ProblemList jdk/jfr/event/oldobject/TestLargeRootSet.java on 2 platforms
  • ... and 6 more: https://git.openjdk.java.net/jdk18/compare/ad1282842c5eefdad151afe6f4db97a09d643546...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@nsjian, @vnkozlov) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready label Dec 22, 2021
@theRealELiu
Copy link
Contributor Author

@theRealELiu theRealELiu commented Dec 23, 2021

/integrate

@openjdk openjdk bot added the sponsor label Dec 23, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Dec 23, 2021

@theRealELiu
Your change (at version 24fb4fc) is now ready to be sponsored by a Committer.

@theRealELiu
Copy link
Contributor Author

@theRealELiu theRealELiu commented Dec 24, 2021

Is there anyone can help to sponsor this if no more reviews?

@vnkozlov
Copy link

@vnkozlov vnkozlov commented Dec 24, 2021

/sponsor

@openjdk
Copy link

@openjdk openjdk bot commented Dec 24, 2021

Going to push as commit 6588bed.
Since your change was applied there have been 18 commits pushed to the master branch:

  • 04ad668: 8279204: [BACKOUT] JDK-8278413: C2 crash when allocating array of size too large
  • 730f670: 8268297: jdk/jfr/api/consumer/streaming/TestLatestEvent.java times out
  • 9d5ae2e: 8279076: C2: Bad AD file when matching SqrtF with UseSSE=0
  • 04ee921: 8278967: rmiregistry fails to start because SecurityManager is disabled
  • 2be3e7e: 8278239: vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine failed with EXCEPTION_ACCESS_VIOLATION at 0x000000000000000d
  • dfb15c3: 8274315: JFR: One closed state per file or stream
  • e49d4a9: 8271447: java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters
  • 713fbeb: 8278987: RunThese24H.java failed with EXCEPTION_ACCESS_VIOLATION in write_sample_info
  • 7341439: 8279007: jstatd fails to start because SecurityManager is disabled
  • 97c5cd7: 8278508: Enable X86 maskAll instruction pattern for 32 bit JVM.
  • ... and 8 more: https://git.openjdk.java.net/jdk18/compare/ad1282842c5eefdad151afe6f4db97a09d643546...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated label Dec 24, 2021
@openjdk openjdk bot closed this Dec 24, 2021
@openjdk openjdk bot removed ready rfr sponsor labels Dec 24, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Dec 24, 2021

@vnkozlov @theRealELiu Pushed as commit 6588bed.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@theRealELiu
Copy link
Contributor Author

@theRealELiu theRealELiu commented Dec 24, 2021

Thanks for all reviewers:P

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hotspot-compiler integrated
4 participants