Skip to content

8285281: [x86] Add C2 mid-end and back-end implementation for COMPRESS_BITS and EXPAND_BITS operations #195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Apr 21, 2022

Summary of changes:

  • Patch intrinsifies following newly added Java SE APIs
    1. Integer.compress
    2. Integer.expand
    3. Long.compress
    4. Long.expand
  • Adds C2 IR nodes and corresponding ideal transformations for new operations.
  • Inline expansion of new vector operations COMPRESS_BITS and EXPAND_BITS are performed using their scalar counterparts and lane insertion/extraction operations.
  • Performance of JIT sequence generated using above approach vs directly vectorizing scalar algorithm using existing vector APIs is within in +/-%10 range depending on the width of the operation, since X86 offers direct instructions PEXT/PDEP for parallel bit extraction and deposition operations hence performance of scalar loop is always superior to corresponding vector operations.
  • Adds an IR framework based test to validate newly introduced IR transformations.

Kindly review and share your feedback.

Best Regards,
Jatin


Progress

  • Change must not contain extraneous whitespace
  • Change must be properly reviewed

Issue

  • JDK-8285281: [x86] Add C2 mid-end and back-end implementation for COMPRESS_BITS and EXPAND_BITS operations

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/panama-vector pull/195/head:pull/195
$ git checkout pull/195

Update a local copy of the PR:
$ git checkout pull/195
$ git pull https://git.openjdk.java.net/panama-vector pull/195/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 195

View PR using the GUI difftool:
$ git pr show -t 195

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/panama-vector/pull/195.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 21, 2022

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into vectorIntrinsics will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@jatin-bhateja jatin-bhateja changed the title 8285281: [x86] Add C2 mid-end and back-end implementation for bit COMPRESS_BITS and EXPAND_BITS operations 8285281: [x86] Add C2 mid-end and back-end implementation for COMPRESS_BITS and EXPAND_BITS operations Apr 21, 2022
@openjdk openjdk bot added the rfr label Apr 21, 2022
@mlbridge
Copy link

mlbridge bot commented Apr 21, 2022

Webrevs

@PaulSandoz
Copy link
Member

See https://git.openjdk.java.net/panama-vector/pull/194 which removes the various compress/expand implementations

@PaulSandoz
Copy link
Member

PaulSandoz commented Apr 21, 2022

Is it worth the additional complexity in C2 over just depending on the fallback? Actually I misunderstood, you are making intrinsic the scalar implementations. I think that should be a PR against the jdk repository.

@sviswa7
Copy link
Collaborator

sviswa7 commented Apr 21, 2022

I agree with Paul that the scalar Integer.compress, Integer.expand, Long.compress and Long.expand intrinsication should be a PR against the mainline. Also since we are now using the scalar intrinsics, the vector CompressV and ExpandV node generation can be removed.

Comment on lines 224 to 225
tty->print_cr(" ** Rejected vector op (%s,%s,%d) because architecture does not support variable vector shifts",
NodeClassNames[sopc], type2name(type), num_elem);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should be updated to "bitshuffle" related.

@@ -322,6 +346,24 @@ static bool is_klass_initialized(const TypeInstPtr* vec_klass) {
return klass->is_initialized();
}

Node* LibraryCallKit::gen_bitshuffle_operation(int voper, BasicType elem_bt, int num_elem, Node* opd1, Node* opd2) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please rename voper to opc ? The name looks like the opcode expected to be a vector type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented Apr 22, 2022

Is it worth the additional complexity in C2 over just depending on the fallback? Actually I misunderstood, you are making intrinsic the scalar implementations. I think that should be a PR against the jdk repository.

I initially tried doing this in Java side, by bringing the fall back implementation to forefront and basing that over lane/withLane and scalar compress/expand operations, but C2 based inline expansion generates better code and almost shows 2X the performance gain. I can rebase this patch on JDK mainline as suggested.

@XiaohongGong
Copy link
Collaborator

Also please update the latest copyright to 2022 to all the touched files. Thanks!

virtual int Opcode() const;
virtual Node* Ideal(PhaseGVN* phase, bool can_reshape);
virtual Node* Identity(PhaseGVN* phase);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the blank line in line-285. Thanks!

Comment on lines 263 to 264
};
//----------------------------CompressBits/ExpandBits---------------------------
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: please insert a blank line between line 263-264

Comment on lines 160 to 171
if (n->bottom_type()->isa_int()) {
// compress(x, 0) == 0
if(phase->type(n->in(2))->higher_equal(TypeInt::ZERO)) return n->in(2);
// compress(x, -1) == x
if(phase->type(n->in(2))->higher_equal( TypeInt::MINUS_1)) return n->in(1);
} else {
assert(n->bottom_type()->isa_long(), "");
// compress(x, 0) == 0
if(phase->type(n->in(2))->higher_equal(TypeLong::ZERO)) return n->in(2);
// compress(x, -1) == x
if(phase->type(n->in(2))->higher_equal( TypeLong::MINUS_1)) return n->in(1);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The codes are almost the same for int and long type except for the "ZERO and MINUS_1" node. Could you please remove the duplicate codes by just defining different ZERO and MUNUS_1 nodes for int and long?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more style issue: higher_equal( TypeLong::MINUS_1) -> higher_equal(TypeLong::MINUS_1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node* n = NULL;
switch (id) {
case vmIntrinsics::_compress_i: n = new CompressBitsNode(argument(0), argument(1), TypeInt::INT); break;
case vmIntrinsics::_expand_i: n = new ExpandBitsNode(argument(0), argument(1), TypeInt::INT); break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: one more space after new ExpandBitsNode(argument(0),

@@ -67,6 +67,20 @@ static bool is_vector_shuffle(ciKlass* klass) {
return klass->is_subclass_of(ciEnv::current()->vector_VectorShuffle_klass());
}

bool LibraryCallKit::arch_supports_vector_bitshuffle(int opc, int num_elem, BasicType elem_bt) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add an assertion for opc that it should be Op_CompressBitsV or Op_ExpandBitsV ?

@jatin-bhateja
Copy link
Member Author

Review comments resolved, patch will be posted on JDK-mainline for review.

@e1iu
Copy link
Member

e1iu commented Apr 26, 2022

LGTM.

Comment on lines +587 to 589
} else if (VectorNode::is_bitshuffle_opcode(sopc) && !Matcher::match_rule_supported_vector(sopc, num_elem, elem_bt)) {
operation = gen_bitshuffle_operation(opc, elem_bt, num_elem, opd1, opd2);
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that you should add the Op_CompressBitsV and Op_ExpandBitsV node creation in VectorNode::make(). If the backend support these two ops, the jvm crashes due to missing creation of these two nodes. I met the issue when I try to support the backend for ARM SVE:

Internal Error (panama-vector/src/hotspot/share/opto/vectornode.cpp:657), pid=152820, tid=152852
#  fatal error: Missed vector creation for 'CompressBitsV'

@XiaohongGong
Copy link
Collaborator

Please don't forget to add the new added vector ops like CompressBitsV and ExpandBitsV to function https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/formssel.cpp#L4208. Or the vector type info might be missing by matcher as expected. Thanks!

@jatin-bhateja
Copy link
Member Author

Pull request is created against JDK-mainline, closing this PR.

@jatin-bhateja
Copy link
Member Author

Thanks @XiaohongGong , @theRealELiu for reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

5 participants