8269568: JVM crashes when running VectorMask query tests #168
Conversation
|
@XiaohongGong The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
How about the running time of these tests after the change? |
The improved tests also find more bugs of vector mask operations on x86. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC we wrote them as smoke tests because they were not intrinsic. We need to think more carefully about converting them from smoke tests.
Ideally we should convert them to kernel tests, but that is more work. Instead we can copy the generated pattern and do the following:
- move the assertion outside of the loops (it will generate garbage with string concatenation)
- assert over arrays, thereby also moving the actual scalar computation result outside the loops. The simplest approach is to create an
int[] array
of the same length as the input and write the reduced result at indexi
. Thus it's sparse.
That should result in an inner loop body that is very focused on exercising the intrinsic method. It will also likely reduce the test execution times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable.
@XiaohongGong This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 48 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@PaulSandoz, @dean-long, @DamonFool, @jatin-bhateja, @sviswa7) but any other Committer may sponsor as well.
|
I tested the running time, and seems it almost have nothing influence to the whole tests running time. Thanks! |
Hi @DamonFool , thanks for looking at this PR! I also met the tests failures on some x86 machine. Seems something is wrong with the backend implementation. However, unfortunately I'm not very familiar with x86 instructions. So maybe @jatin-bhateja could help to take a look please? Thanks so much! |
Thanks for looking at this PR @PaulSandoz ! Yes, I think this need more work and more carefully to move them from the smoke tests. Maybe we can revisit them in future?
Agree, I will move the assertion outside of the loop first. Thanks so much! |
After some investigation, I believe the failure is caused by the incorrect code-gen for VectorStoreMask on x86. Something like this would fix the failures:
But I'm afraid there are more rules which need to be fixed. Please feel free to go ahead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am seeing testing failures.
@XiaohongGong The following should fix this for x86:
Please include this in your PR. |
The fix seems to be tricky. |
@DamonFool This is the cleanest solution I could think of. |
But we actually do not |
@DamonFool This small patch fixes the problem at hand and at the minimum should go in with JDK 17. You have filed the JDK-8269679 for a more robust fix. Also the masking support is being overhauled as part of JDK 18. Your thoughts? |
Actually, instruct like |
Thanks @DamonFool. |
Thanks for the fix @sviswa7 ! I will include it in the latest commit. Thanks! |
/contributor add sandhya.viswanathan@intel.com |
@XiaohongGong Could not parse
|
Hi @sviswa7 @DamonFool @dean-long , the x86 issue is fixed with @sviswa7 's patch. I'v tested the Vector API tests internally and they all pass. Could you please take a look at it again? Thanks so much! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes in HotSpot LGTM.
Thanks for fixing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice update to the test.
test/jdk/jdk/incubator/vector/templates/Unit-Miscellaneous.template
Outdated
Show resolved
Hide resolved
Hi @sviswa7 @PaulSandoz , all your comments have been addressed! Thanks so much for the review! |
I'm seeing some tests timing out. @PaulSandoz how did your testing go? |
For me there were no timeouts Can you provide more details? |
Vector tests for run as part of hs-tier1, hs-tier2, and hs-tier3 are fine. You might be observing time outs in unrelated tests. |
The timeouts were in 4 different jdk/incubator/vector tests, but all were on the same machine linux-aarch64 machine, and all were fastdebug builds, tier3, with ZGC. |
Do these tests also time out without this patch? Could you please provider more details to the tests (the timing out log or the test itself...). I'm not sure, but I guess the timing out is not related to this patch? |
I ran the Vector API tests with ZGC on all kinds of aarch64 machines that we have, and all the tests pass without the timing out. |
It's probably just a glitch on one of our test machines. I checked the results again and the tests ran fine with ZGC on other linux-aarch64 test machines. |
OK, that's very fine! Thanks so much! |
/integrate |
@XiaohongGong |
/sponsor |
Going to push as commit 2b20778.
Your commit was automatically rebased without conflicts. |
@DamonFool @XiaohongGong Pushed as commit 2b20778. |
This is a follow-up patch for [1]. When we are trying to add the VectorMask query implementation for Arm NEON, we found the jtreg tests for
VectorMask.trueCount/firstTrue/lastTrue
is not effective. The tests failure cannot be reported as expected. The main reason is that the Vector API methods are not hot enough to be compiled by C2 compiler. Wrap the main test codes inside a loop can make the tests effective.With the tests taking effect, we can see a JVM crash due to the following assertion:
The mask input might be other vector nodes like
"LoadVectorNode"
, since there is an optimization for"VectorStoreMask"
:Changing the code to check whether its element basic type is
"T_BOOLEAN"
is more reasonable.[1] https://bugs.openjdk.java.net/browse/JDK-8256973
Progress
Issue
Reviewers
Contributors
<sviswanathan@openjdk.org>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk17 pull/168/head:pull/168
$ git checkout pull/168
Update a local copy of the PR:
$ git checkout pull/168
$ git pull https://git.openjdk.java.net/jdk17 pull/168/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 168
View PR using the GUI difftool:
$ git pr show -t 168
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk17/pull/168.diff