-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8303161: [vectorapi] VectorMask.cast narrow operation returns incorrect value with SVE #12901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ct value with SVE The cast operation for VectorMask from wider type to narrow type returns incorrect result for trueCount() method invocation for the resultant mask with SVE (on some SVE machines toLong() also results in incorrect values). An example narrow operation which results in incorrect toLong() and trueCount() values is shown below for a 128-bit -> 64-bit conversion and this can be extended to other narrow operations where the source mask in bytes is either 4x or 8x the size of the result mask in bytes - public class TestMaskCast { static final boolean [] mask_arr = {true, true, false, true}; public static long narrow_long() { VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0); return lmask128.cast(IntVector.SPECIES_64).toLong(); } public static void main(String[] args) { long r = 0L; for (int ic = 0; ic < 50000; ic++) { r = narrow_long(); } System.out.println("toLong() : " + r); } } C2 compilation result : java --add-modules jdk.incubator.vector TestMaskCast toLong(): 15 Interpreter result (for verification) : java --add-modules jdk.incubator.vector -Xint TestMaskCast toLong(): 3 The incorrect results with toLong() have been observed only on the 128-bit and 256-bit SVE machines but they are not reproducible on a 512-bit machine. However, trueCount() returns incorrect values too and they are reproducible on all the SVE machines and thus is more reliable to use trueCount() to bring out the drawbacks of the current implementation of mask cast narrow operation for SVE. Replacing the call to toLong() by trueCount() in the above example - public class TestMaskCast { static final boolean [] mask_arr = {true, true, false, true}; public static int narrow_long() { VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0); return lmask128.cast(IntVector.SPECIES_64).trueCount(); } public static void main(String[] args) { int r = 0; for (int ic = 0; ic < 50000; ic++) { r = narrow_long(); } System.out.println("trueCount() : " + r); } } C2 compilation result: java --add-modules jdk.incubator.vector TestMaskCast trueCount() : 4 Interpreter result: java --add-modules jdk.incubator.vector -Xint TestMaskCast trueCount() : 2 Since in this example, the source mask size in bytes is 2x that of the result mask, trueCount() returns 2x the number of true elements in the source mask. It would return 4x/8x the number of true elements in the source mask if the size of the source mask is 4x/8x that of result mask. The returned values are incorrect because of the higher order bits in the result not being cleared (since the result is narrowed down) and trueCount() or toLong() tend to consider the higher order bits in the vector register as well which results in incorrect value. For the 128-bit to 64-bit conversion with a mask - "TT" passed, the current implementation for mask cast narrow operation returns the same mask in the lower and upper half of the 128-bit register that is - "TTTT" which results in a long value of 15 (instead of 3 - "FFTT" for the 64-bit Integer mask) and number of true elements to be 4 (instead of 2). This patch proposes a fix for this problem. An already existing JTREG IR test - "test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastTest.java" has also been modified to call the trueCount() method as well since the toString() method alone cannot be used to reproduce the incorrect values in this bug. This test passes successfully on 128-bit, 256-bit and 512-bit SVE machines. Since the IR test has been changed, it has been tested successfully on other platforms like x86 and aarch64 Neon machines as well to ensure the changes have not introduced any new errors.
👋 Welcome back bkilambi! A progress list of the required criteria for merging this PR into |
@Bhavana-Kilambi The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already reviewed internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
/integrate |
@Bhavana-Kilambi This pull request has not yet been marked as ready for integration. |
@Bhavana-Kilambi This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 19 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@theRealELiu, @XiaohongGong, @nick-arm) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
/integrate |
@Bhavana-Kilambi |
/sponsor |
Going to push as commit 6727490.
Your commit was automatically rebased without conflicts. |
@nick-arm @Bhavana-Kilambi Pushed as commit 6727490. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
The cast operation for VectorMask from wider type to narrow type returns incorrect result for trueCount() method invocation for the resultant mask with SVE (on some SVE machines toLong() also results in incorrect values). An example narrow operation which results in incorrect toLong() and trueCount() values is shown below for a 128-bit -> 64-bit conversion and this can be extended to other narrow operations where the source mask in bytes is either 4x or 8x the size of the result mask in bytes -
C2 compilation result :
java --add-modules jdk.incubator.vector TestMaskCast
toLong(): 15
Interpreter result (for verification) :
java --add-modules jdk.incubator.vector -Xint TestMaskCast
toLong(): 3
The incorrect results with toLong() have been observed only on the 128-bit and 256-bit SVE machines but they are not reproducible on a 512-bit machine. However, trueCount() returns incorrect values too and they are reproducible on all the SVE machines and thus is more reliable to use trueCount() to bring out the drawbacks of the current implementation of mask cast narrow operation for SVE.
Replacing the call to toLong() by trueCount() in the above example -
C2 compilation result:
java --add-modules jdk.incubator.vector TestMaskCast
trueCount() : 4
Interpreter result:
java --add-modules jdk.incubator.vector -Xint TestMaskCast
trueCount() : 2
Since in this example, the source mask size in bytes is 2x that of the result mask, trueCount() returns 2x the number of true elements in the source mask. It would return 4x/8x the number of true elements in the source mask if the size of the source mask is 4x/8x that of result mask.
The returned values are incorrect because of the higher order bits in the result not being cleared (since the result is narrowed down) and trueCount() or toLong() tend to consider the higher order bits in the vector register as well which results in incorrect value. For the 128-bit to 64-bit conversion with a mask - "TT" passed, the current implementation for mask cast narrow operation returns the same mask in the lower and upper half of the 128-bit register that is - "TTTT" which results in a long value of 15 (instead of 3 - "FFTT" for the 64-bit Integer mask) and number of true elements to be 4 (instead of 2).
This patch proposes a fix for this problem. An already existing JTREG IR test - "test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastTest.java" has also been modified to call the trueCount() method as well since the toString() method alone cannot be used to reproduce the incorrect values in this bug. This test passes successfully on 128-bit, 256-bit and 512-bit SVE machines. Since the IR test has been changed, it has been tested successfully on other platforms like x86 and aarch64 Neon machines as well to ensure the changes have not introduced any new errors.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/12901/head:pull/12901
$ git checkout pull/12901
Update a local copy of the PR:
$ git checkout pull/12901
$ git pull https://git.openjdk.org/jdk.git pull/12901/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 12901
View PR using the GUI difftool:
$ git pr show -t 12901
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12901.diff