-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter #12869
Conversation
…6ToFloat yields different result than the interpreter
👋 Welcome back kvn! A progress list of the required criteria for merging this PR into |
/label remove hotspot |
/label add hotspot-compiler |
@vnkozlov |
/label add hotspot-runtime |
@vnkozlov |
@vnkozlov |
GHA failure on linux-x86 in test compiler/vectorization/runner/LoopRangeStrideTest.java is due to JDK-8303105 |
Webrevs
|
@fyang, please help to verify that new tests passed on RISC-V with these changes and review these changes. Thanks! I tested x86 (64- and 32-bit) and AArch64. |
@vnkozlov Thanks a lot for taking this up. Is the following in the PR description still true: |
Correct, it is consistent. Only optimization to calculate constant value during compile time is skipped. C2 will generate HW instruction for It is possible to add similar Stub routines for AArch64 and RISC-V to be called from C2 but I am not expert in those platforms so I skipped them. It may be not clear but |
Note, I removed |
// Instruction requires different XMM registers | ||
vcvtps2ph(tmp, src, 0x04, Assembler::AVX_128bit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vcvtps2ph can have source and destination as same. Did you mean to say here in the comment that "Instruction requires XMM register as destination"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flt_to_flt16
is used in x86.ad
instruction which requires preserving src
register.
I did not want to add an other macroassembler instruction for src->src case.
I will add this to this comment.
if (VM_Version::supports_f16c() || VM_Version::supports_avx512vl()) { | ||
// For results consistency both intrinsics should be enabled. | ||
if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_float16ToFloat) && | ||
vmIntrinsics::is_intrinsic_available(vmIntrinsics::_floatToFloat16)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also check for InlineIntrinsics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vmIntrinsics::disabled_by_jvm_flags()
checks InlineIntrinsics
. See vmIntrinsics.cpp
changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right.
return nullptr; // Generate a vanilla entry | ||
} | ||
// For AVX CPUs only. f16c support is disabled if UseAVX == 0. | ||
if (VM_Version::supports_f16c() || VM_Version::supports_avx512vl()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could check for VM_Version::supports_float16() here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. And I need to remove !InlineIntrinsics
check at line 340.
@@ -3874,6 +3925,15 @@ void StubGenerator::generate_initial() { | |||
StubRoutines::_updateBytesAdler32 = generate_updateBytesAdler32(); | |||
} | |||
|
|||
if (VM_Version::supports_f16c() || VM_Version::supports_avx512vl()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could check for VM_Version::supports_float16() here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
Yes, removing the Identity optimization is correct. It doesn't hold for NaN inputs. |
Hi @vnkozlov , There is some discrepancy in results b/w interpreter, C1 and C2 for following case.
|
And that is fine. Consistency have to be preserved only during one run. Different runs with different flags (with disabled intrinsics, for example) may produce different results. EDIT: I should have paid more attention to the example outputs. The third (C2) run produces inconsistent result! |
On other hand |
It looks like C1 compilation does not invoke intrinsics. Investigating. |
We should not allow JIT compilation of What happens with @jatin-bhateja test is The fix ix simple:
test now produce the same result:
|
C2 also compiled |
@jatin-bhateja I applied the fix. Please, verify. |
Hi, Thanks for handling linux-riscv64 at the same time.
It looks like there is a problem when handling NaNs with fcvt.h.s/fmv.x.h and fmv.h.x/fcvt.s.h instructions at the bottom. |
Thank you very much @RealFYang for testing changes and preparing patch. I applied your patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vnkozlov , Thanks for explanations, looks good to me now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good. Minor comments/suggestions follow.
return (jshort)(sign_bit | ( ((exp + 15) << 10) + signif_bits ) ); | ||
assert(StubRoutines::f2hf() != nullptr, "floatToFloat16 intrinsic is not supported on this platform"); | ||
typedef jshort (*f2hf_stub_t)(jfloat x); | ||
return ((f2hf_stub_t)StubRoutines::f2hf())(x); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of keeping the wrappers around? The stubs can be called directly, can't they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted isolate function type cast and assert in one place.
BTW the comment in assert should be "the stub is not implemented on this platform".
if( t == Type::FLOAT ) return TypeInt::SHORT; | ||
if (t == Type::TOP) return Type::TOP; | ||
if (t == Type::FLOAT) return TypeInt::SHORT; | ||
if (StubRoutines::f2hf() == nullptr) return bottom_type(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this check? My understanding is ConvF2HF/ConvHF2F require intrinsification and on platforms where stubs are absent, intrinsification is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is optimization: use stub to calculate constant value during compilation instead of generating HW instruction in compiled code. It is not required to have this stub for intensification to work - ConvF2HFNode
will be processed normally and will use intrinsics code (HW instruction) defined in .ad file.
These stubs are used only here, not in C1 and not in Interpreter. As consequence these stubs implementation is optional and I implemented them only on x64. That is why I have this check.
I debated to not have them at all to not confuse people but they did improve performance a little.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarifications. Now it makes much more sense.
Still, the mix of StubRoutines::f2hf()
and SharedRuntime::f2hf()
looks a bit confusing.
What if you move the wrapper to StubRoutines
class instead? (JRT_LEAF
et al stuff looks redundant here. Also, even though there are other arithmetic operations declared on StubRoutines
, they provide default implementations universally available across all platforms. f2hf
case is different since it exposes a platform-specific stub and its availability is limited.)
Or encapsulate the constant folding logic (along with the guard) into SharedRuntime
and return Type*
(instead of int/float scalar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or encapsulate the constant folding logic (along with the guard) into SharedRuntime and return Type* (instead of int/float scalar).
I take this particular suggestion back. SharedRuntime
is compiler-agnostic while Type
is C2-specific.
|
||
const TypeInt *ti = t->is_int(); | ||
if (ti->is_con()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it confusing that ConvHF2FNode::Value()
has is_con()
check, but ConvF2HFNode::Value()
doesn't. I'd prefer to see both implementations unified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It follows the same pattern as other nodes here: ConvF2INode::Value()
vs ConvI2FNode::Value()
.
If you want to change it we need to do that in separate RFE for all methods here.
But I don't think we need to do that because Float/Double does not have range values as Integer types.
Float have only 3 types of value: FloatTop, FloatBot, FloatCon. So we don't need to check for constant if checked for TOP and BOT. For Integer we need to check bool is_con() const { return _lo==_hi; }
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review @iwanowww
if( t == Type::FLOAT ) return TypeInt::SHORT; | ||
if (t == Type::TOP) return Type::TOP; | ||
if (t == Type::FLOAT) return TypeInt::SHORT; | ||
if (StubRoutines::f2hf() == nullptr) return bottom_type(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is optimization: use stub to calculate constant value during compilation instead of generating HW instruction in compiled code. It is not required to have this stub for intensification to work - ConvF2HFNode
will be processed normally and will use intrinsics code (HW instruction) defined in .ad file.
These stubs are used only here, not in C1 and not in Interpreter. As consequence these stubs implementation is optional and I implemented them only on x64. That is why I have this check.
I debated to not have them at all to not confuse people but they did improve performance a little.
|
||
const TypeInt *ti = t->is_int(); | ||
if (ti->is_con()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It follows the same pattern as other nodes here: ConvF2INode::Value()
vs ConvI2FNode::Value()
.
If you want to change it we need to do that in separate RFE for all methods here.
But I don't think we need to do that because Float/Double does not have range values as Integer types.
Float have only 3 types of value: FloatTop, FloatBot, FloatCon. So we don't need to check for constant if checked for TOP and BOT. For Integer we need to check bool is_con() const { return _lo==_hi; }
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Thank you, Sandhya, Jatin, Vladimir and Dean for review. |
/integrate |
Going to push as commit 8cfd74f.
Your commit was automatically rebased without conflicts. |
Implemented
Float.floatToFloat16
andFloat.float16ToFloat
intrinsics in Interpreter and C1 compiler to produce the same results as C2 intrinsics on x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java methods were implemented originally.Replaced
SharedRuntime::f2hf()
andhf2f()
C runtime functions with calls to runtime stubs which use the same HW instructions as C2 intrinsics. Only for 64-bit x64 because 32-bit x86 stub does not work: result is passed through FPU register and NaN values become different from C2 intrinsic. This runtime stub is only used to calculate constant values during C2 compilation and can be skipped.I added new tests based on Tobias's
TestAll.java
And copiedjdk/lang/Float/Binary16Conversion*.java
tests to run them with-Xcomp
to make sure code is compiled by C1 or C2. I modifiedBinary16ConversionNaN.java
to compare results from Interpreter, C1 and C2.Tested tier1-5, Xcomp, stress
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12869/head:pull/12869
$ git checkout pull/12869
Update a local copy of the PR:
$ git checkout pull/12869
$ git pull https://git.openjdk.org/jdk pull/12869/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 12869
View PR using the GUI difftool:
$ git pr show -t 12869
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12869.diff