Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter #12704

Closed
wants to merge 1 commit into from

Conversation

sviswa7
Copy link

@sviswa7 sviswa7 commented Feb 22, 2023

Change the java/lang/float.java and the corresponding shared runtime constant expression evaluation to generate QNaN.
The HW instructions generate QNaNs and not SNaNs for floating point instructions. This happens across double, float, and float16 data types. The most significant bit of mantissa is set to 1 for QNaNs.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12704/head:pull/12704
$ git checkout pull/12704

Update a local copy of the PR:
$ git checkout pull/12704
$ git pull https://git.openjdk.org/jdk pull/12704/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12704

View PR using the GUI difftool:
$ git pr show -t 12704

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12704.diff

…6ToFloat yields different result than the interpreter
@bridgekeeper
Copy link

bridgekeeper bot commented Feb 22, 2023

👋 Welcome back sviswanathan! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Feb 22, 2023

@sviswa7 The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot hotspot-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Feb 22, 2023
@sviswa7 sviswa7 marked this pull request as ready for review February 22, 2023 02:14
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 22, 2023
@mlbridge
Copy link

mlbridge bot commented Feb 22, 2023

Webrevs

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add regression test.
Also consider creating copies of jdk/java/lang/Float/Binary16Conversion*.java tests in compiler/intrinsics/ and modify them to compare results from Interpreter, runtime and JITed code.

@vnkozlov
Copy link
Contributor

vnkozlov commented Feb 22, 2023

We run compiler/intrinsics/ tests with different SSE and AVX settings to make sure they work in all cases.

@@ -1097,6 +1098,7 @@ public static short floatToFloat16(float f) {
// Preserve sign and attempt to preserve significand bits
return (short)(sign_bit
| 0x7c00 // max exponent + 1
| 0x0200 // QNaN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what is being done here. From IEEE 754-2019:

"Besides issues such as byte order which affect all data
interchange, certain implementation options allowed by this standard must also be considered:
― for binary formats, how signaling NaNs are distinguished from quiet NaNs
― for decimal formats, whether binary or decimal encoding is used.
This standard does not define how these parameters are to be communicated."

The code in java.lang.Float in particular is meant to be usable on all host CPUs so architecture-specific assumptions about QNAN vs SNAN should be avoided.

@jddarcy
Copy link
Member

jddarcy commented Feb 22, 2023

I'd like to see a more informative description of the problem:

"float16 NaN values handled differently with and without intrinsification"

If that is issue reported, it may not be a problem as opposed to

"incorrect value returned under Float.float16ToFloat intrinsification", etc.

@jddarcy
Copy link
Member

jddarcy commented Feb 22, 2023

I'd like to see a more informative description of the problem:

"float16 NaN values handled differently with and without intrinsification"

If that is issue reported, it may not be a problem as opposed to

"incorrect value returned under Float.float16ToFloat intrinsification", etc.

PS The detailed NaN handling is specifically done in a separate test file since the invariants that are true for the software implementation need not be true for a intrinsified hardware-based one.

However, as done for the intrinsics of the transcendental methods (sin, cos, tan), if the float16 conversion intrinsics are used, they should be consistently used or not used regardless of compilation approach (interpreter, C1, C2, etc.).

HTH

@dholmes-ora
Copy link
Member

I'm also a bit concerned that we are rushing in to "fix" this. IIUC we have three mechanisms for implementing this functionality:

  1. The interpreted Java code
  2. The compiled non-intrinisc sharedRuntime code
  3. The compiler intrinsic that uses a hardware instruction.

Unless the hardware instructions for all relevant CPUs behave exactly the same, then I don't see how we can have parity of behaviour across these three mechanisms.

The observed behaviour may be surprising but it seems not to be a bug. And is this even a real concern - would real programs actually need to peek at the raw bits and so see the difference, or does it suffice to handle Nan's opaquely?

@jddarcy
Copy link
Member

jddarcy commented Feb 22, 2023

I'm also a bit concerned that we are rushing in to "fix" this. IIUC we have three mechanisms for implementing this functionality:

1. The interpreted Java code

2. The compiled non-intrinisc sharedRuntime code

3. The compiler intrinsic that uses a hardware instruction.

Unless the hardware instructions for all relevant CPUs behave exactly the same, then I don't see how we can have parity of behaviour across these three mechanisms.

The observed behaviour may be surprising but it seems not to be a bug. And is this even a real concern - would real programs actually need to peek at the raw bits and so see the difference, or does it suffice to handle Nan's opaquely?

From the spec (https://download.java.net/java/early_access/jdk20/docs/api/java.base/java/lang/Float.html#float16ToFloat(short))

"Returns the float value closest to the numerical value of the argument, a floating-point binary16 value encoded in a short. The conversion is exact; all binary16 values can be exactly represented in float. Special cases:

If the argument is zero, the result is a zero with the same sign as the argument.
If the argument is infinite, the result is an infinity with the same sign as the argument.
If the argument is a NaN, the result is a NaN. "

If the float argument is a NaN, you are supposed to get a float16 NaN as a result -- that is all the specification requires. However, the implementation makes stronger guarantees to try to preserve some non-zero NaN significand bits if they are set.

"NaN boxing" is a technique used to put extra information into the significand bits a NaN and pass the around. It is consistent with the intended use of the feature by IEEE 754 and used in various language runtimes: e.g.,

https://piotrduperas.com/posts/nan-boxing
https://leonardschuetz.ch/blog/nan-boxing/
https://anniecherkaev.com/the-secret-life-of-nan

The Java specs are careful to avoid mentioning quiet vs signaling NaNs in general discussion.

That said, I think it is reasonable on a given JVM invocation if Float.floatToFloat16(f) gave the same result for input f regardless of in what context it was called.

@dougxc
Copy link
Member

dougxc commented Feb 22, 2023

That said, I think it is reasonable on a given JVM invocation if Float.floatToFloat16(f) gave the same result for input f regardless of in what context it was called.

Yes, I'm under the impression that for math API methods like this, the stability of input to output must be preserved for a single JVM invocation. Or are there existing methods for which the interpreter and compiled code execution is allowed to differ?

@sviswa7
Copy link
Author

sviswa7 commented Feb 22, 2023

@dholmes-ora @jddarcy @TobiHartmann @vnkozlov From @dean-long 's comment in the JBS entry, he sees the same result on AARCH64 and Intel, i.e. the output has the QNaN bit set.
Please let me know if we want to proceed with this PR or if it would be good to withdraw this. I am open to either suggestion. Please advice.

@vnkozlov
Copy link
Contributor

The proposed fix do exactly what everyone asked - the same result from Java code (Interpreter), runtime (C++ code) and intrinsic (HW instruction). Since HW instruction is already produces QNaNs, PR fixes only Java code (Interpreter) and runtime (C++) code to produce QNaNs.

@TobiHartmann created test which covers all cases and should be added to this PR.

@dean-long
Copy link
Member

We don't know that all HW will produce the same NaN "payload", right? Instead, we might need interpreter intrinsics. I assume that is how the trig functions are handled that @jddarcy mentioned.

@vnkozlov
Copy link
Contributor

We don't know that all HW will produce the same NaN "payload", right? Instead, we might need interpreter intrinsics. I assume that is how the trig functions are handled that @jddarcy mentioned.

Good point. We can't guarantee that all OpenJDK ports HW do the same.

If CPU has corresponding instructions we need to generate a stub during VM startup with HW instructions and use it in all cases (or directly the same instruction in JIT compiled code).
If CPU does not have instruction we should use runtime C++ function in all cases to be consistent.

@sviswa7
Copy link
Author

sviswa7 commented Feb 22, 2023

Thanks @vnkozlov @dean-long. One last question before I withdraw the PR: As QNaN bit is supported across current architectures like x86, ARM and may be others as well for conversion, couldn't we go ahead with this PR? The architectures that behave differently could then follow the technique suggested by Vladimir Kozlov as and when they implement the intrinsic?

@dean-long
Copy link
Member

Thanks @vnkozlov @dean-long. One last question before I withdraw the PR: As QNaN bit is supported across current architectures like x86, ARM and may be others as well for conversion, couldn't we go ahead with this PR? The architectures that behave differently could then follow the technique suggested by Vladimir Kozlov as and when they implement the intrinsic?

No, because it's not just the SNaN vs QNaN that is different, but also the NaN "payload" or "boxing" that is different. For example, the intrinsic gives me different results on aarch64 vs Intel with this test:

public class Foo {
  public static float hf2f(short s) {
    return Float.floatToFloat16(s);
  }
  public static short f2hf(float f) {
    return Float.floatToFloat16(f);
  }
  public static void main(String[] args) {
    float f = Float.intBitsToFloat(0x7fc00000 | 0x2000 );
    System.out.println(Integer.toHexString(f2hf(f)));
    f = Float.intBitsToFloat(0x7fc00000 | 0x20 );
    System.out.println(Integer.toHexString(f2hf(f)));
    f = Float.intBitsToFloat(0x7fc00000 | 0x4);
    System.out.println(Integer.toHexString(f2hf(f)));
    f = Float.intBitsToFloat(0x7fc00000 | 0x2000 | 0x20 | 0x4);
    System.out.println(Integer.toHexString(f2hf(f)));
  }
}

@jddarcy
Copy link
Member

jddarcy commented Feb 23, 2023

That said, I think it is reasonable on a given JVM invocation if Float.floatToFloat16(f) gave the same result for input f regardless of in what context it was called.

Yes, I'm under the impression that for math API methods like this, the stability of input to output must be preserved for a single JVM invocation. Or are there existing methods for which the interpreter and compiled code execution is allowed to differ?

A similar but not exactly analagous situation occurs for the intrinsics of various java.lang.Math methods. Many of the methods of interest allow implementation flexibility, subject to various quality of implementation criteria. Those criteria bound the maximum error at a single point, phrased in terms of ulps -- units in the last place, and require semi-monotonicity -- which describes the relation of outputs of adjacent floating-point values.

Taking two compliant implementations of Math.foo(), it is not necessarily a valid implementation to switch between them because the semi-monotonicity constraint can be violated. The solution is to always use or always not use an intrinsic for Math.foo on a given JVM invocation. HTH

@dholmes-ora
Copy link
Member

Changing the Java code to match the semantics of a given architecture's HW instruction risks requiring that other architectures have to implement additional code to match those semantics, potentially impacting the benefit of using an intrinsic in the first place.

Consistent output across all execution contexts is certainly a desirable quality, but at what cost?

@sviswa7
Copy link
Author

sviswa7 commented Feb 23, 2023

Thankyou very much all for your valuable inputs. I think it is best to withdraw this PR.

@sviswa7 sviswa7 closed this Feb 23, 2023
@sviswa7 sviswa7 deleted the fp16fix branch June 3, 2024 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org hotspot hotspot-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

6 participants