New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Сlarify rounding #2140
Сlarify rounding #2140
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6292,7 +6292,7 @@ the following exceptions: | |
of NaNs and infinities. | ||
* Implementations may ignore the sign of a zero. | ||
That is, a zero with a positive sign may behave like a zero a with a negative sign, and vice versa. | ||
* No rounding mode is specified. | ||
* Rounding mode is round-to-nearest even, but implementation may truncate results to zero. | ||
* Implementations may flush denormalized value on the input and/or output of | ||
any operation listed in [[#floating-point-accuracy]]. | ||
* Other operations are required to preserve denormalized numbers. | ||
|
@@ -6310,9 +6310,6 @@ The <dfn>correctly rounded</dfn> result of the operation for floating point type | |
|
||
</div> | ||
|
||
That is, the result may be rounded up or down: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have to keep this. |
||
[SHORTNAME] does not specify a rounding mode. | ||
|
||
Note: Floating point types include positive and negative infinity, so | ||
the correctly rounded result may be finite or infinite. | ||
|
||
|
@@ -6419,8 +6416,9 @@ However, the result may not be the same when computed in floating point. | |
The reassociated result may be inaccurate due to approximation, or may trigger | ||
an overflow or NaN when computing intermediate results. | ||
|
||
An implementation may reassociate and/or fuse operations if the optimization is | ||
at least as accurate as the original formulation. | ||
<dfn noexport>Fusing</dfn> occur when two floating-point operation performed in one step. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The example of multiply-then-add is s less general than the original statement. |
||
|
||
* `a * b + c` is calculated with one or sequence of two correctly rounded operations. | ||
|
||
### Floating point conversion ### {#floating-point-conversion} | ||
|
||
|
@@ -6463,8 +6461,6 @@ Then, for example, integers 2<sup>28</sup> and 1+2<sup>28</sup> both map to the | |
least significant 1 bit is not representable by the floating point format. | ||
This kind of collision occurs for pairs of adjacent integers with a magnitude of at least 2<sup>25</sup>. | ||
|
||
Issue: (dneto) Default rounding mode is an implementation choice. Is that what we want? | ||
munrocket marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Issue: Check behaviour of the f32 to f16 conversion for numbers just beyond the max normal f16 values. | ||
I've written what an NVIDIA GPU does. See https://github.com/google/amber/pull/918 for an executable test case. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds strange to me. So is this line normatively saying what the rounding is, or is it basically "implementation defined" still?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change
implementation
->hardware
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean - this says there are 2 possible behaviors: rounding to even and truncation to zero. Aren't those conflicting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we can't guarantee round-to-nearest-even for all cases, because sometime it works like round to zero (or truncation in other words). DX11 have term
hardware
so probably it fit better here too. From DX11: "hardware is allowed to truncate results to 32-bit rather than perform round-to-nearest-even"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I was thinking about
round()
builtin, not general rounding.FYI, we can't talk about "hardware" directly in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, so this is some kind of trade off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason no rounding mode was specified is because underlying implementations are unconstrained that way.
Are you trying to constrain it to exactly one of two possibilities: round-to-even and round-to-zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just editorial PR. But I wanted to constraint it as a superset of Spir-V, MSL and DX. Or even better to specify that it is loosely round-to-nearest.
Firefox/Chrome is doing well and hope Safari will do it in the same way. But according to spec
2+2=666
is legit rounding and I think it is a good counterexample to fix this.