-
Notifications
You must be signed in to change notification settings - Fork 43
8290204: FP16 initial backend implementation #204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back svkamath! A progress list of the required criteria for merging this PR into |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Smita,
Please find my initial review comments.
Thanks
void Assembler::evcvtph2pd(XMMRegister dst, XMMRegister src, int vector_len) { | ||
assert(VM_Version::supports_avx512_fp16(), ""); | ||
InstructionAttr attributes(vector_len, /* rex_w */ false, /* legacy mode */ false, /* no_mask_reg */ true, /*uses_vl */ true); | ||
attributes.set_is_evex_instruction(); | ||
int encode = vex_prefix_and_encode(dst->encoding(), 0, src->encoding(), VEX_SIMD_NONE, VEX_OPCODE_MAP5, &attributes); | ||
emit_int8((unsigned char)0x5A); | ||
emit_int8((unsigned char)(0xC0 | encode)); | ||
} | ||
|
||
void Assembler::evcvtpd2ph(XMMRegister dst, XMMRegister src, int vector_len) { | ||
assert(VM_Version::supports_avx512_fp16(), ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be beneficial is we pass an extra opmask register to leaf level assembly routines for packed operations and have another wrapper without opmask which passes K0 as the opmask operand. It will help us while supporting predicated operations.
@@ -363,7 +365,8 @@ class VM_Version : public Abstract_VM_Version { | |||
decl(HV, "hv", 46) /* Hypervisor instructions */ \ | |||
decl(SERIALIZE, "serialize", 47) /* CPU SERIALIZE */ \ | |||
decl(GFNI, "gfni", 48) /* Vector GFNI instructions */ \ | |||
decl(AVX512_BITALG, "avx512_bitalg", 49) /* Vector sub-word popcount and bit gather instructions */ | |||
decl(AVX512_BITALG, "avx512_bitalg", 49) /* Vector sub-word popcount and bit gather instructions */\ | |||
decl(AVX512_FP16, "avx512_fp16", 50) /* Vector FP16 instructions*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also handle the newly added feature in
src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.amd64/src/jdk/vm/ci/amd64/AMD64.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/hotspot/cpu/x86/x86.ad
Outdated
case Op_AddVHF: | ||
case Op_SubVHF: | ||
case Op_MulVHF: | ||
case Op_DivVHF: | ||
case Op_AbsVHF: | ||
case Op_NegVHF: | ||
if(bt != T_SHORT && !VM_Version::supports_avx512_fp16()) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be moved Matcher::match_rule_supported_vector, backend does not support predicated half floats instructions currently.
src/hotspot/cpu/x86/x86.ad
Outdated
__ movdl($tmp1$$XMMRegister, $dst$$Register); | ||
__ evaddsh($tmp1$$XMMRegister, $tmp1$$XMMRegister, $src2$$XMMRegister); | ||
__ pshuflw($tmp$$XMMRegister, $src2$$XMMRegister, 0x01); | ||
__ evaddsh($tmp1$$XMMRegister, $tmp1$$XMMRegister, $tmp$$XMMRegister); | ||
__ pshuflw($tmp$$XMMRegister, $src2$$XMMRegister, 0x02); | ||
__ evaddsh($tmp1$$XMMRegister, $tmp1$$XMMRegister, $tmp$$XMMRegister); | ||
__ pshuflw($tmp$$XMMRegister, $src2$$XMMRegister, 0x03); | ||
__ evaddsh($tmp1$$XMMRegister, $tmp1$$XMMRegister, $tmp$$XMMRegister); | ||
__ pshufd($tmp2$$XMMRegister, $src2$$XMMRegister, 0x0E); | ||
__ evaddsh($tmp1$$XMMRegister, $tmp1$$XMMRegister, $tmp2$$XMMRegister); | ||
__ pshuflw($tmp$$XMMRegister, $tmp2$$XMMRegister, 0x01); | ||
__ evaddsh($tmp1$$XMMRegister, $tmp1$$XMMRegister, $tmp$$XMMRegister); | ||
__ pshuflw($tmp$$XMMRegister, $tmp2$$XMMRegister, 0x02); | ||
__ evaddsh($tmp1$$XMMRegister, $tmp1$$XMMRegister, $tmp$$XMMRegister); | ||
__ pshuflw($tmp$$XMMRegister, $tmp2$$XMMRegister, 0x03); | ||
__ evaddsh($tmp1$$XMMRegister, $tmp1$$XMMRegister, $tmp$$XMMRegister); | ||
__ movdl($dst$$Register, $tmp1$$XMMRegister); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move this into a macro assembly routine.
src/hotspot/cpu/x86/x86.ad
Outdated
@@ -4834,6 +4847,51 @@ instruct reduction16F(regF dst, legVec src, legVec vtmp1, legVec vtmp2) %{ | |||
ins_pipe( pipe_slow ); | |||
%} | |||
|
|||
// =======================Half Float Reduction========================================== | |||
instruct reduction8HF(rRegI dst, vec src2, vec tmp, vec tmp1, vec tmp2) %{ | |||
predicate(UseAVX > 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patterns handles 128 bit vector, in case we defer handling 256/512 bit vectors we should add a check in match_rule_supported_vector.
%} | ||
|
||
instruct vcvtDtoHF_reg(vec dst, vec src) %{ | ||
predicate(UseAVX > 2 && VM_Version::supports_avx512_fp16() && Matcher::vector_element_basic_type(n) == T_SHORT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
|
||
// Convert from other types to Halffloat | ||
instruct vcvtFtoHF_reg(vec dst, vec src) %{ | ||
predicate(UseAVX > 2 && Matcher::vector_element_basic_type(n) == T_SHORT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
instruct vabsHF_reg(vec dst, vec src) %{ | ||
predicate(UseAVX > 2); | ||
match(Set dst (AbsVHF src)); | ||
format %{ "vandps $dst,$src\t# $dst = |$src| abs packedHF" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect format string.
ReductionNode* ReductionNode::make(int vopc, Node *ctrl, Node* n1, Node* n2) { | ||
switch (vopc) { | ||
case Op_AddReductionVHF: return new AddReductionVHFNode(ctrl, n1, n2); | ||
case Op_MulReductionVF: return new MulReductionVFNode(ctrl, n1, n2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backend handling is missing for MulReductionVF
StubRoutines::x86::_vector_halffloat_sign_mask = generate_vector_fp_mask("vector_halffloat_sign_mask", 0x7FFF7FFF7FFF7FFF); | ||
StubRoutines::x86::_vector_halffloat_sign_flip = generate_vector_fp_mask("vector_halffloat_sign_flip", 0x8000800080008000); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be added to 32 bit stubGenerators also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -290,6 +290,47 @@ int VectorNode::replicate_opcode(BasicType bt) { | |||
} | |||
} | |||
|
|||
// Return the vector operator for the specified scalar operation | |||
// and vector length for half float | |||
int VectorNode::opcode(int sopc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, where is this function being referenced ? Should extra handling be added in this method -
bool LibraryCallKit::inline_vector_nary_operation(int n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Bhavana, apologies for the delay in responding to your comment. The code to stitch everything together will be added later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Smitha, one more question - Since HalfFloat is not primitive datatype, the check to make sure the elem_type is primitive type might fail in LibraryCallKit::inline_vector_nary_operation(int n). Will you also be handling this? Or will this PR be merged after Valhalla integration when halffloat becomes a primitive class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Bhavana, we plan to explore this after the integration of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more comments added.
} | ||
|
||
void Assembler::evcvtph2ps(XMMRegister dst, KRegister mask, XMMRegister src, int vector_len) { | ||
assert(VM_Version::supports_evex(), ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asserts should also check for VL feature if vector_len < Assembler::AVX_512bit.
} | ||
|
||
void Assembler::evcvtps2ph(XMMRegister dst, KRegister mask, XMMRegister src, int imm8, int vector_len) { | ||
assert(VM_Version::supports_evex(), ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asserts should also check for VL feature if vector_len < Assembler::AVX_512bit.
@@ -2134,6 +2134,61 @@ void Assembler::evcvtqq2pd(XMMRegister dst, XMMRegister src, int vector_len) { | |||
emit_int16((unsigned char)0xE6, (0xC0 | encode)); | |||
} | |||
|
|||
void Assembler::evcvtph2pd(XMMRegister dst, KRegister mask, XMMRegister src, int vector_len) { | |||
assert(VM_Version::supports_avx512_fp16(), ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asserts should also check for VL feature if vector_len < Assembler::AVX_512bit.
} | ||
|
||
void Assembler::evcvtpd2ph(XMMRegister dst, KRegister mask, XMMRegister src, int vector_len) { | ||
assert(VM_Version::supports_avx512_fp16(), ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asserts should also check for VL feature if vector_len < Assembler::AVX_512bit.
@@ -6133,6 +6188,15 @@ void Assembler::vaddss(XMMRegister dst, XMMRegister nds, XMMRegister src) { | |||
emit_int16(0x58, (0xC0 | encode)); | |||
} | |||
|
|||
void Assembler::evaddsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { | |||
assert(VM_Version::supports_avx512_fp16(), ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
|
||
void Assembler::evfmadd231ph(XMMRegister dst, KRegister mask, XMMRegister src1, Address src2, int vector_len) { | ||
assert(VM_Version::supports_avx512_fp16(), ""); | ||
assert(VM_Version::supports_avx512vl(), ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
@@ -6520,6 +6661,20 @@ void Assembler::vdivps(XMMRegister dst, XMMRegister nds, Address src, int vector | |||
emit_operand(dst, src); | |||
} | |||
|
|||
void Assembler::evdivph(XMMRegister dst, KRegister mask, XMMRegister nds, XMMRegister src, int vector_len) { | |||
assert(VM_Version::supports_avx512_fp16(), ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
@@ -6373,6 +6451,20 @@ void Assembler::vsubps(XMMRegister dst, XMMRegister nds, Address src, int vector | |||
emit_operand(dst, src); | |||
} | |||
|
|||
void Assembler::evsubph(XMMRegister dst, KRegister mask, XMMRegister nds, XMMRegister src, int vector_len) { | |||
assert(VM_Version::supports_avx512_fp16(), ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
case Op_AddVHF: | ||
case Op_SubVHF: | ||
case Op_MulVHF: | ||
case Op_DivVHF: | ||
case Op_AbsVHF: | ||
case Op_NegVHF: | ||
if (bt != T_SHORT && !VM_Version::supports_avx512_fp16()) { | ||
return false; | ||
} | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if target does not support AVX512BW and vector length is 512 bits, vector size compatibility check will fail upfront and we won't reach till this point, currently the BasicType for halffloat is T_SHORT,
https://github.com/openjdk/panama-vector/pull/204/files#diff-d6a3624f0f0af65a98a47378a5c146eed5016ca09b4de1acd0a3acc823242e82L1690
Though, this may not show up in reality since FP16 ISA comes along with next gen Xeons which support both AVX512_FP16 and AVX512BW.
@@ -7313,6 +7419,53 @@ instruct vucast(vec dst, vec src) %{ | |||
ins_pipe( pipe_slow ); | |||
%} | |||
|
|||
// Convert from Halffloat to other types | |||
instruct vcvtHFtoD_reg(vec dst, vec src) %{ | |||
predicate(UseAVX > 2 && VM_Version::supports_avx512_fp16() && Matcher::vector_element_basic_type(n) == T_DOUBLE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being a common IR, we need to add a complimentary check in the short vector casting.
Over all code looks clean, in absence of missing link b/w expanders and backend we cannot test it over SW emulator/HW. |
macro(FmaVD) | ||
macro(FmaVF) | ||
macro(DivVF) | ||
macro(DivVD) | ||
macro(DivVHF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, in a previous commit, the DIV operation wasn't added in the binary operations list for Halffloats here -
panama-vector/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/HalffloatVector.java
Line 789 in 9113b20
private static BinaryOperation<HalffloatVector, VectorMask<Halffloat>> binaryOperations(int opc_) { |
Although this PR is specifically for backend impl but the backend ISA for "div" operation might not be generated due to this operation not being added to the list of binaryoperations for Halffloat. Is it possible to please add it in this PR?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Bhavana, there will be a follow up PR that Swati will work on to add div to the list of binary ops for Halffloat.
@smita-kamath This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 1 new commit pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the ➡️ To integrate this PR with the above commit message to the |
/integrate |
@smita-kamath Pushed as commit 7460d93. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Initial backend implementation for enabling FP16.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/panama-vector pull/204/head:pull/204
$ git checkout pull/204
Update a local copy of the PR:
$ git checkout pull/204
$ git pull https://git.openjdk.org/panama-vector pull/204/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 204
View PR using the GUI difftool:
$ git pr show -t 204
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/panama-vector/pull/204.diff