-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter #12704
Conversation
…6ToFloat yields different result than the interpreter
👋 Welcome back sviswanathan! A progress list of the required criteria for merging this PR into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add regression test.
Also consider creating copies of jdk/java/lang/Float/Binary16Conversion*.java tests in compiler/intrinsics/ and modify them to compare results from Interpreter, runtime and JITed code.
We run compiler/intrinsics/ tests with different SSE and AVX settings to make sure they work in all cases. |
@@ -1097,6 +1098,7 @@ public static short floatToFloat16(float f) { | |||
// Preserve sign and attempt to preserve significand bits | |||
return (short)(sign_bit | |||
| 0x7c00 // max exponent + 1 | |||
| 0x0200 // QNaN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what is being done here. From IEEE 754-2019:
"Besides issues such as byte order which affect all data
interchange, certain implementation options allowed by this standard must also be considered:
― for binary formats, how signaling NaNs are distinguished from quiet NaNs
― for decimal formats, whether binary or decimal encoding is used.
This standard does not define how these parameters are to be communicated."
The code in java.lang.Float in particular is meant to be usable on all host CPUs so architecture-specific assumptions about QNAN vs SNAN should be avoided.
I'd like to see a more informative description of the problem: "float16 NaN values handled differently with and without intrinsification" If that is issue reported, it may not be a problem as opposed to "incorrect value returned under Float.float16ToFloat intrinsification", etc. |
PS The detailed NaN handling is specifically done in a separate test file since the invariants that are true for the software implementation need not be true for a intrinsified hardware-based one. However, as done for the intrinsics of the transcendental methods (sin, cos, tan), if the float16 conversion intrinsics are used, they should be consistently used or not used regardless of compilation approach (interpreter, C1, C2, etc.). HTH |
I'm also a bit concerned that we are rushing in to "fix" this. IIUC we have three mechanisms for implementing this functionality:
Unless the hardware instructions for all relevant CPUs behave exactly the same, then I don't see how we can have parity of behaviour across these three mechanisms. The observed behaviour may be surprising but it seems not to be a bug. And is this even a real concern - would real programs actually need to peek at the raw bits and so see the difference, or does it suffice to handle Nan's opaquely? |
From the spec (https://download.java.net/java/early_access/jdk20/docs/api/java.base/java/lang/Float.html#float16ToFloat(short)) "Returns the float value closest to the numerical value of the argument, a floating-point binary16 value encoded in a short. The conversion is exact; all binary16 values can be exactly represented in float. Special cases:
If the float argument is a NaN, you are supposed to get a float16 NaN as a result -- that is all the specification requires. However, the implementation makes stronger guarantees to try to preserve some non-zero NaN significand bits if they are set. "NaN boxing" is a technique used to put extra information into the significand bits a NaN and pass the around. It is consistent with the intended use of the feature by IEEE 754 and used in various language runtimes: e.g., https://piotrduperas.com/posts/nan-boxing The Java specs are careful to avoid mentioning quiet vs signaling NaNs in general discussion. That said, I think it is reasonable on a given JVM invocation if Float.floatToFloat16(f) gave the same result for input f regardless of in what context it was called. |
Yes, I'm under the impression that for math API methods like this, the stability of input to output must be preserved for a single JVM invocation. Or are there existing methods for which the interpreter and compiled code execution is allowed to differ? |
@dholmes-ora @jddarcy @TobiHartmann @vnkozlov From @dean-long 's comment in the JBS entry, he sees the same result on AARCH64 and Intel, i.e. the output has the QNaN bit set. |
The proposed fix do exactly what everyone asked - the same result from Java code (Interpreter), runtime (C++ code) and intrinsic (HW instruction). Since HW instruction is already produces QNaNs, PR fixes only Java code (Interpreter) and runtime (C++) code to produce QNaNs. @TobiHartmann created test which covers all cases and should be added to this PR. |
We don't know that all HW will produce the same NaN "payload", right? Instead, we might need interpreter intrinsics. I assume that is how the trig functions are handled that @jddarcy mentioned. |
Good point. We can't guarantee that all OpenJDK ports HW do the same. If CPU has corresponding instructions we need to generate a stub during VM startup with HW instructions and use it in all cases (or directly the same instruction in JIT compiled code). |
Thanks @vnkozlov @dean-long. One last question before I withdraw the PR: As QNaN bit is supported across current architectures like x86, ARM and may be others as well for conversion, couldn't we go ahead with this PR? The architectures that behave differently could then follow the technique suggested by Vladimir Kozlov as and when they implement the intrinsic? |
No, because it's not just the SNaN vs QNaN that is different, but also the NaN "payload" or "boxing" that is different. For example, the intrinsic gives me different results on aarch64 vs Intel with this test:
|
A similar but not exactly analagous situation occurs for the intrinsics of various java.lang.Math methods. Many of the methods of interest allow implementation flexibility, subject to various quality of implementation criteria. Those criteria bound the maximum error at a single point, phrased in terms of ulps -- units in the last place, and require semi-monotonicity -- which describes the relation of outputs of adjacent floating-point values. Taking two compliant implementations of Math.foo(), it is not necessarily a valid implementation to switch between them because the semi-monotonicity constraint can be violated. The solution is to always use or always not use an intrinsic for Math.foo on a given JVM invocation. HTH |
Changing the Java code to match the semantics of a given architecture's HW instruction risks requiring that other architectures have to implement additional code to match those semantics, potentially impacting the benefit of using an intrinsic in the first place. Consistent output across all execution contexts is certainly a desirable quality, but at what cost? |
Thankyou very much all for your valuable inputs. I think it is best to withdraw this PR. |
Change the java/lang/float.java and the corresponding shared runtime constant expression evaluation to generate QNaN.
The HW instructions generate QNaNs and not SNaNs for floating point instructions. This happens across double, float, and float16 data types. The most significant bit of mantissa is set to 1 for QNaNs.
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12704/head:pull/12704
$ git checkout pull/12704
Update a local copy of the PR:
$ git checkout pull/12704
$ git pull https://git.openjdk.org/jdk pull/12704/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 12704
View PR using the GUI difftool:
$ git pr show -t 12704
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12704.diff