-
Notifications
You must be signed in to change notification settings - Fork 43
8273949: Intrinsic creation for VectorMask.toLong operation. #126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8273949: Intrinsic creation for VectorMask.toLong operation. #126
Conversation
👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into |
Webrevs
|
for (int i = 0; i < bits.length; i++) { | ||
res = bits[i] ? res | set : res; | ||
set = set << 1; | ||
if (length() <= Long.SIZE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check that i > 0 && i < length()
and if not throw IllegalArgumentException
.
Currently the behavior is unspecified, can you please add the following to the documentation of VectorMask.laneIsSet
. (I will handle any CSR related tasks.)
* @throws IllegalArgumentException if the index is is out of range
* ({@code < 0 || >= length()})
return (int) VectorSupport.maskReductionCoerced(VECTOR_OP_MASK_TRUECOUNT, Byte128Mask.class, byte.class, VLENGTH, this, | ||
(m) -> (long)trueCountHelper(((Byte128Mask)m).getBits())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return (int) VectorSupport.maskReductionCoerced(VECTOR_OP_MASK_TRUECOUNT, Byte128Mask.class, byte.class, VLENGTH, this, | |
(m) -> (long)trueCountHelper(((Byte128Mask)m).getBits())); | |
return (int) VectorSupport.maskReductionCoerced(VECTOR_OP_MASK_TRUECOUNT, Byte128Mask.class, byte.class, VLENGTH, this, | |
(m) -> trueCountHelper(m.getBits())); |
We don't need the cast to long
for the return from the lambda expression. And I think the cast of the mask is redundant too. Same applies to the other three methods.
if (i < 0 || i >= length()) { | ||
throw new IllegalArgumentException("Index " + i + " must be zero or positive, and less than " + length()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies i should have previously said use Objects.checkIndex(i, length())
. In fact lets call length
once:
int length = length();
Objects.checkIndex(i, length);
if (length <= Long.SIZE) {
...
Separately i wonder if the second bounds check should be if i <= Long.SIZE
, since the first bounds check should dominate in many cases?
(I know there are other areas where we should Objects.checkIndex
e.g. *MaxVector.lane/withLane
but we can fix those later.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With length <= Long.SIZE there is a better possibility of optimizing out else part and thus only fast path gets generated. length is final static values and hence in most of the cases due to in-lining this constant may get propagated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java changes are good.
@jatin-bhateja This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 1 new commit pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the ➡️ To integrate this PR with the above commit message to the |
match(Set dst (VectorMaskTrueCount mask)); | ||
effect(TEMP_DEF dst, TEMP tmp, TEMP ktmp, TEMP xtmp); | ||
format %{ "vector_truecount_evex $mask \t! vector mask true count" %} | ||
instruct vmask_tolong_evex(rRegL dst, kReg mask) %{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instructs with rRegL should be in ifdef _LP64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its already covered under _LP64
src/hotspot/cpu/x86/x86.ad
Outdated
match(Set dst (VectorMaskLastTrue mask)); | ||
effect(TEMP_DEF dst, TEMP tmp, TEMP ktmp, TEMP xtmp, KILL cr); | ||
format %{ "vector_mask_first_or_last_true_evex $mask \t! vector first/last true location" %} | ||
instruct vmask_truecount_evex(rRegI dst, kReg mask, rRegL tmp) %{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have flags register as popcnt instruction affects flags.
src/hotspot/cpu/x86/x86.ad
Outdated
instruct vmask_truecount_avx(rRegI dst, vec mask, rRegL tmp, vec xtmp, vec xtmp1) %{ | ||
predicate(!VM_Version::supports_avx512vlbw()); | ||
predicate(n->in(1)->bottom_type()->isa_vectmask() == NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also include flags register.
class VectorMaskToLongNode : public VectorMaskOpNode { | ||
public: | ||
VectorMaskToLongNode(Node* mask, const Type* ty): | ||
VectorMaskOpNode(mask, ty, Op_VectorMaskLastTrue) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be Op_VectorMaskToLong?
case T_LONG: // fall-through | ||
case T_FLOAT: // fall-through | ||
case T_DOUBLE: return Op_VectorMaskToLong; | ||
default: fatal("MASK_TRUECOUNT: %s", type2name(bt)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MASK_TOLONG here? Also break is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is consistent with other case statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default: fatal("MASK_TRUECOUNT: %s", type2name(bt));
should refer to MASK_TOLONG.
@nsjian , kindly check the IR change, it may need some change in AARCH64 backend since now VectorStoreMask is not being inserted while connecting mask generating node to mask operation node. This saves redundant store mask operation if target supports predicate registers. |
Hi @jatin-bhateja , thanks for the heads-up. The VectorMaskOp connection change looks reasonable. We will update SVE backend later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one comment on cut and paste mistake in error message, Please fix that. Other than that the patch looks good. No need for re-rereview.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Just the comment is mismatched.
// <E, M>
// long maskReductionCoerced(int oper, Class<? extends M> maskClass, Class<?> elemClass,
// int length, M m, VectorMaskOp<M> defaultImpl)
bool LibraryCallKit::inline_vector_mask_operation() {
/integrate |
@jatin-bhateja Pushed as commit 0e7348d. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Build failure on MacOS after this PR: #133 . |
Incorrect computation after this pr. |
Summary of changes:
Following performance number are generated using JMH benchmark modification included with the patch.
System: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (Cascade Lake Server 28C 2S)
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/panama-vector pull/126/head:pull/126
$ git checkout pull/126
Update a local copy of the PR:
$ git checkout pull/126
$ git pull https://git.openjdk.java.net/panama-vector pull/126/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 126
View PR using the GUI difftool:
$ git pr show -t 126
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/panama-vector/pull/126.diff