-
Notifications
You must be signed in to change notification settings - Fork 127
8308363: Initial compiler support for FP16 scalar operations. #848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into |
|
@jatin-bhateja This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
| // For now, compare to class file version 51 so old verifier doesn't see Q signatures. | ||
| if ( (_major_version < 51 /* CONSTANT_CLASS_DESCRIPTORS */ ) || (!EnablePrimitiveClasses)) { | ||
| if ( (_major_version < 51 /* CONSTANT_CLASS_DESCRIPTORS */ ) || (!EnablePrimitiveClasses && !is_jdk_internal_class_sig(signature))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last condition (!EnablePrimitiveClasses && !is_jdk_internal_class_sig(signature)) seems not right to me. Assume there is a jdk internal primitive class defined , and EnablePrimitiveClasses is disabled, then result of (!EnablePrimitiveClasses && !is_jdk_internal_class_sig(signature)) is false (i.e. true && false), then the followed error is not printed if the jdk version matches >= 51. But it should report the error since EnablePrimitiveClasses is closed. So it should be:
if ( (_major_version < 51 /* CONSTANT_CLASS_DESCRIPTORS */ ) || !EnablePrimitiveClasses || !is_jdk_internal_class_sig(signature))
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea here is to relax the need to use an explicit JVM flag -XX:+EnablePrimtiiveClasses for primitive classes known to VM.
| do_intrinsic(_add_float16, java_lang_Float16, add_name, floa16_float16_signature, F_R) \ | ||
| do_name(add_name, "add") \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: please remove extra spaces between add_name and "add".
| }; | ||
|
|
||
| //------------------------------AddHFNode--------------------------------------- | ||
| // Add 2 floats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to: Add 2 half-precision floats ?
| public: | ||
| ReinterpretS2HFNode( Node *in1 ) : Node(0,in1) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest changes:
ReinterpretS2HFNode(Node* in1) : Node(0, in1) {}
| virtual int Opcode() const; | ||
| virtual const Type *bottom_type() const { return Type::FLOAT; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to:
virtual const Type* bottom_type() const { return Type::FLOAT; }
| ReinterpretHF2SNode( Node *in1 ) : Node(0,in1) {} | ||
| virtual int Opcode() const; | ||
| virtual const Type* Value(PhaseGVN* phase) const; | ||
| virtual const Type *bottom_type() const { return TypeInt::SHORT; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
| } | ||
|
|
||
| bool LibraryCallKit::inline_fp16_operations(vmIntrinsics::ID id) { | ||
| Node* result = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use nullptr instead of NULL.
|
vecAddFP16.txt |
sviswa7
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good work in general. I only have couple of comments in the code, please take a look. Also there might be some further optimization opportunities for the path that comes from Op_ConvF2HF -> ReinterpretS2HF. ConvF2HF is doing the conversion from xmm to xmm register and then moves the xmm to gpr. ReinterpretS2HF then moves from gpr back to xmm. This unnecessary movement from xmm->gpr and from gpr->xmm could be optimized out.
|
|
||
| void Assembler::evaddph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { | ||
| assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); | ||
| InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uses_vl should be true here. Also wondering if we could create a generic method like fp16varithop() with most of the boiler plate common code and call it from individual instructions like add, sub etc. This generic method we could either create now or when we include additional fp16 instructions.
| do_intrinsic(_sum_float16, java_lang_Float16, sum_name, floa16_float16_signature, F_S) \ | ||
| do_name(sum_name, "sum") \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be called add instead of sum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be called add instead of sum?
I made this change after Paul offline comments, to align it with Float.sum API.
|
|
||
| $1_FLAGS += -g -Xlint:all $$($1_TARGET_RELEASE) $$(PARANOIA_FLAGS) $$(JAVA_WARNINGS_ARE_ERRORS) | ||
| $1_FLAGS += $$($1_JAVAC_FLAGS) | ||
| $1_FLAGS += $$($1_JAVAC_FLAGS) -XDenablePrimitiveClasses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this change now that we have special handling in the VM for Float16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this change now that we have special handling in the VM for Float16?
Yes, Float16 being a primitive class needs this flag during build time compilation.
src/hotspot/cpu/x86/x86.ad
Outdated
| format %{ "vmovw $dst, $src" %} | ||
| ins_encode %{ | ||
| __ vmovw($dst$$Register, $src$$XMMRegister); | ||
| __ movswl($dst$$Register, $dst$$Register); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do without movswl here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do without movswl here?
This was done keeping in mind JVM semantics where byte/short are internally promoted to int type since JVM operands and local variables are 32 bit values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is actually a float16 here (short is only indicating the 16bit storage) and it is not expected to have any integral operation directly on this. The movswl would unnecessarily add to the pathlength.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is actually a float16 here (short is only indicating the 16bit storage) and it is not expected to have any integral operation directly on this. The movswl would unnecessarily add to the pathlength.
We do have an API to return a raw value and a direct comparison b/w raw value is done at JVM word (32 bit) granularity, I was planning to remove additional sign extending move when we add Float16.compare(short, short), but I think its better to fix tests for now and remove this extra instruction.
Yes, but this can only be done if we factor out xmm -> gpr movement out of ConvF2HF, its inputs is a IEEE 754 binary32 bit floating point value and output is a binary16 bit value help in a GPR. It will not be proper to remove ConvF2HF + ReinterpretS2HF by a direct ideal transformation currently. |
One path is having an instruct with match rule for this, something like below I think: |
Agree, patten match will break if ConF2HF is shared across multiple nodes, but its still better to have than no optimization. |
|
Hi @sviswa7 , @XiaohongGong , All your comments have been addressed. I am committing this patch Best Regards |
|
/integrate |
|
Going to push as commit f03fb4e. |
|
@jatin-bhateja Pushed as commit f03fb4e. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |

Starting with 4th Generation Xeon, Intel has made extensive extensions to existing ISA to support 16 bit scalar and vector floating point operations based on IEEE 754 binary16 format.
We plan to support this in multiple stages spanning across Java side definition of Float16 type, scalar operation and finally SLP vectorization support.
This patch adds minimal Java and Compiler side support for one API Float16.add.
Summary of changes :-
Implementation details:-
1/ Newly defined Float16 class encapsulate a short value holding IEEE 754 binary16 encoded value.
2/ Float16 is a primitive class which in future will be aligned with other enhanced primitive wrapper classes proposed by JEP-402.
3/ Float16 to support all the operations supported by corresponding Float class.
4/ Java implementation of each API will internally perform floating point operation at FP32 granularity.
5/ API which can be directly mapped to an Intel AVX512FP16 instruction will be a candidate for intensification by C2 compiler.
6/ With Valhalla, C2 compiler always creates an InlineType IR node for a value class instance.
Total number of inputs of an InlineType node match the number of non-static fields. In this case node will have one input of short type TypeInt::SHORT.
7/ Since all the scalar AVX512FP16 instructions operate on floating point registers and Float16 backing storage is held in a general-purpose register hence we need to introduce appropriate conversion IR which moves a 16-bit value from GPR to a XMM register and vice versa.

8/ Current plan is to introduce a new IR node for each operation which is a subclass of its corresponding single precision IR node. This will allow leveraging idealization routines (Ideal/Identity/Value) of its parent operation.
9/ All the single/double precision IR nodes carry a Type::FLOAT/DOUBLE ideal type. This represents entire FP32/64 value range and is different from integral types which explicitly record lower and upper bounds of value ranges. Value resolution routines operating over integral types calibrates the lattice associated with IR node based on flow function depicting the semantics of the operation.
10/ There are two separate ideal type for floating point constants TypeF and TypeD, these are singleton types and represents a unique single/double precision floating point value
11/ Ideal types are linked to specific register classes and determine the allocation set for register allocator.
12/ Since FP16 value range is a proper subset of FP32 value range hence all the newly created IR nodes will still carry Type::FLOAT type.
13/ To handle constant folding scenarios value routines associated with newly introduced conversion IR nodes convert an FP16 constant to a FP32 constant (TypeF) value using existing runtime helpers. Constant folding is performed by existing idealization routines of parent IR nodes.
14/ Auto-vectorizer scans through packs of scalar IR to create a vector IR.
15/ Functional and performance test for creation for each newly supported operation.
Above design is generic and should work for any reduced precision types like FP8 values.
Please let me know your views on this.
Best Regards,
Jatin
PS: Test instantiating Float16 class must be compiled with -XDenablePrimitiveClasses currently, we need to relax this constraint for known primitive classes in language compiler, JEP-402 talks about converting existing primitive box classes to value classes should address this limitation.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/valhalla.git pull/848/head:pull/848$ git checkout pull/848Update a local copy of the PR:
$ git checkout pull/848$ git pull https://git.openjdk.org/valhalla.git pull/848/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 848View PR using the GUI difftool:
$ git pr show -t 848Using diff file
Download this PR as a diff file:
https://git.openjdk.org/valhalla/pull/848.diff
Webrev
Link to Webrev Comment