Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow floating-point operations to provide extra precision than specified, as an optimization #2686

Open
wants to merge 11 commits into
base: master
from

Conversation

Projects
None yet
@joshtriplett
Copy link
Member

commented Apr 17, 2019

Rendered

This enables optimizations such as fused multiply-add operations by default, while providing robust mechanisms to disable extra precision for applications that wish to do so.

EDIT: Please note that this RFC has been substantially overhauled to better accommodate applications that wish to disable extra precision. In particular, there's a top-level codegen option (-C extra-fp-precision=off) to disable this program-wide.

Allow floating-point operations to provide extra precision than speci…
…fied, as an optimization

This enables optimizations such as fused multiply-add operations by
default.
@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 17, 2019

@joshtriplett joshtriplett added the T-lang label Apr 17, 2019

@rkruppe
Copy link
Member

left a comment

I really want Rust to have a good story for licensing floating point optimizations, including but not limited to contraction. However, simply turning on contraction by default is not a good step in that direction. Contrary to what the RFC claims, contraction is not "safe" (meaning that it breaks otherwise-working programs; obviously there's no memory safety at stake), and we have not previously reserved the right to do this or given any other indication to users that it might happen.

Let's design a way to opt into and out of this behavior at crate/module/function first, and once that's done we can look at how to make more code use it automatically. A fine-grained opt-in and -out is very useful even if we end up changing the default e.g., to ensure code that breaks under contraction can be compiled as part of a crate graph that generally has contraction enabled. There's plenty of design work to keep us busy even without touching defaults:

  • compiler options or attributes or ...?
  • how does it propagate from callers into callees, if at all? (generally hard problem, but IMO a good story for this is just as valuable as providing the basic feature in the first place)
  • what transformations are licensed exactly? (e.g., do we want roughly what the C standard allows, or do we want more like GCC does?)
back to a lower-precision format.

In general, providing more precision than required should not cause a
mathematical algorithm to fail or to lose numeric accuracy.

This comment has been minimized.

Copy link
@rkruppe

rkruppe Apr 17, 2019

Member

This is incorrect. One simple counter-example is x * x - y * y, which is non-negative for all x and y whose squares are finite floats, but if the expression is contracted to x.mul_add(x, - y * y) then it can have negative results. This can of course snowball into even worse issues downstream, e.g., if this is fed into sqrt() to get the 2D euclidean norm, contraction can cause you to end up with NaNs on perfectly innocuous vectors.

This comment has been minimized.

Copy link
@joshtriplett

joshtriplett Apr 17, 2019

Author Member

Any programs that have a problem with that will need to pass non-default compiler options on many common C, C++, and Fortran compilers.

That said, I'll adjust the language.

This comment has been minimized.

Copy link
@gnzlbg

gnzlbg Apr 18, 2019

Contributor

Any programs that have a problem with that will need to pass non-default compiler options on many common C, C++, and Fortran compilers.

Some C, C++, and Fortran compilers do this (gcc, msvc), some don't (clang). If this were an universally good idea, all of them would do this, but this is not the case. That is, those languages are prior art, but I'm really missing from the prior art section why this would actually be a good idea - are programmers using those languages happy with that "feature" ?

A sign change trickling down your application depending on the optimization level (or even debug-information level) can be extremely hard to debug in practice. So IMO this issue raised by @rkruppe deserves more analysis than a language adjustment.

This comment has been minimized.

Copy link
@joshtriplett

joshtriplett Apr 18, 2019

Author Member

why this would actually be a good idea
are programmers using those languages happy with that "feature"

The beginning of the RFC already makes the rationale quite clear: this allows for optimizations on the scale of 2x performance improvements, while never reducing the accuracy of a calculation compared to the mathematically accurate result.

This comment has been minimized.

Copy link
@joshtriplett

joshtriplett Apr 18, 2019

Author Member

@rkruppe Looking again at your example, I think there's something missing from it? You said:

One simple counter-example is x * x - y * y, which is non-negative for all x and y whose squares are finite floats

Counter-example to that: x = 2.0, y = 4.0. Both x and y square to finite floats, and x*x - y*y should absolutely be negative. I don't think those properties alone are enough to reasonably expect that you can call sqrt on that and get a non-imaginary result.

This comment has been minimized.

Copy link
@rkruppe

rkruppe Apr 20, 2019

Member

Ugh, sorry, you're right. That's what I get for repeating the argument from memory and filling the gaps without thinking too long. In general of course x² may be smaller than y². The problematic case is only when x = y (+ aforementioned side conditions), in that case (x * x) - (y * y) is zero but with FMA it can be negative.

Another example, I am told, is complex multiplication when multiplying a number by its conjugate. I will not elaborate because apparently I cannot be trusted this late in the evening to work out the details correctly.

This comment has been minimized.

Copy link
@fenrus75

fenrus75 Apr 21, 2019

This is incorrect. One simple counter-example is x * x - y * y, which is non-negative for all x and y whose squares are finite floats, but if the expression is contracted to x.mul_add(x, - y * y) then it can have negative results. This can of course snowball into even worse issues downstream, e.g., if this is fed into sqrt() to get the 2D euclidean norm, contraction can cause you to end up with NaNs on perfectly innocuous vectors.

I suspect this is not a valid statement.

the original is in pseudocode

round64( round64(x * x) - round64(y * y) )

the contraction you gives

round64( x * x - round64(y * y) )

the case for this to go negative only in the contraction case would require the
round64(x * x) to round up to >= round64(y * y) while x * x itself is < round64(y * y),
so round64(x * x) == round64(y * y) by the "nearest" element of rounding; it can't cross round64(y * y)

since we're rounding to nearest, it means that x * x is equal to less than half a unit of precision away from round64(y * y).
this in turn means that x * x - round64(y * y) is, while negative in this case, less than half a unit of precision way from 0, which means the outer round64() will round up to 0.

This comment has been minimized.

Copy link
@mtijink

mtijink Apr 24, 2019

This is incorrect. One simple counter-example is x * x - y * y, which is non-negative for all x and y whose squares are finite floats, but if the expression is contracted to x.mul_add(x, - y * y) then it can have negative results. This can of course snowball into even worse issues downstream, e.g., if this is fed into sqrt() to get the 2D euclidean norm, contraction can cause you to end up with NaNs on perfectly innocuous vectors.

I suspect this is not a valid statement.

the original is in pseudocode

round64( round64(x * x) - round64(y * y) )

the contraction you gives

round64( x * x - round64(y * y) )

If you use y=x, then if round64(x*x) rounds up, it's easy to see that round64(x*x - round64(x*x)) is negative. This does not round to zero, because units of precision are not absolute, but relative (think significant figures in scientific notation).

For reference (and more interesting floating point information!) see the "fmadd" section on https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/

across platforms, this change could potentially allow floating-point
computations to differ by platform (though never below the standards-required
accuracy). However, standards-compliant implementations of math functions on
floating-point values may already vary slightly by platform, sufficiently so to

This comment has been minimized.

Copy link
@rkruppe

rkruppe Apr 17, 2019

Member

I'm the last person to argue we have any sort of bit-for-bit reproducibility of floating point calculations across platforms or even optimization levels (I know in regretable detail many of the reasons why not), but it seems like a notable further step further to make even the basic arithmetic operations dependent on the optimization level, even for normal inputs, even on the (numerous) targets where they are currently not.

- [C11](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) allows
this with the `STDC FP_CONTRACT` pragma enabled, and the default state
of that pragma is implementation-defined. GCC enables this pragma by
default, [as does the Microsoft C

This comment has been minimized.

Copy link
@rkruppe

rkruppe Apr 17, 2019

Member

Note that GCC defaults to -ffp-contract=fast, which goes beyond what's described in the C standard, and according to documentation the only other option it implements is off.

This comment has been minimized.

Copy link
@joshtriplett

joshtriplett Apr 17, 2019

Author Member

Based on some careful research, as far as I can tell GCC's -ffp-contract=fast just changes the default value of STDC FP_CONTRACT, nothing else. It does not enable any of the potentially accuracy-reducing "fast-math" optimizations.

(-ffp-contract=off means "ignore the pragma", and -ffp-contract=on means "don't ignore the pragma" but doesn't change the default.)

This comment has been minimized.

Copy link
@rkruppe

rkruppe Apr 20, 2019

Member

My understanding is: the C standard only allows FMA synthesis within a source-level expression. This is extremely inconvenient to respect at the IR level (you'd have to track which source level expression each operation comes from), so -ffp-contract=fast simply disregards source level information and just contracts IR operations if they're of the suitable form.

Clang implements this option too, but it defaults to standard compliance by performing contraction in the frontend where source level boundaries are still available.

expression", where "Two arithmetic expressions are mathematically
equivalent if, for all possible values of their primaries, their
mathematical values are equal. However, mathematically equivalent
arithmetic expressions may produce different computational results."

This comment has been minimized.

Copy link
@rkruppe

rkruppe Apr 17, 2019

Member

I'm not familiar with Fortran (or at least this aspect of it), but this quote seems to license far more than contraction, e.g. all sorts of -ffast-math style transformation that ignore the existence of NaNs. Is that right?

This comment has been minimized.

Copy link
@joshtriplett

joshtriplett Apr 17, 2019

Author Member

@rkruppe That's correct, Fortran also allows things like reassociation and commutation, as long as you never ignore parentheses.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 17, 2019

@rkruppe wrote:

I really want Rust to have a good story for licensing floating point optimizations, including but not limited to contraction. However, simply turning on contraction by default is not a good step in that direction.

It'd be a step towards parity with other languages, rather than intentionally being slower. I think we need to seriously evaluate whether we're buying anything by intentionally being slower. (And by "slower" here, I don't mean a few percent, I mean 2x slower.)

Contrary to what the RFC claims, contraction is not "safe" (meaning that it breaks otherwise-working programs; obviously there's no memory safety at stake),

Any such programs would be broken in C, C++, Fortran, and likely other languages by default; they'd have to explicitly disable the default behavior. Such programs are also going directly against best practices in numerical methods; if anything, we should ideally be linting against code like (x*x - y*y).sqrt().

and we have not previously reserved the right to do this or given any other indication to users that it might happen.

I've also found no explicit indications that we can't do this. And I've seen no indications that people expect Rust's default behavior to be different than the default behavior of other languages in this regard. What concrete problem are we trying to solve that outweighs a 2x performance win?

A fine-grained opt-in and -out is very useful even if we end up changing the default

Agreed. The RFC already proposes an attribute; I could expand that to provide an attribute with two possible values.

There's plenty of design work to keep us busy even without touching defaults:

If we have any hope of changing the defaults, the time to do that would be before those defaults are relied on.

compiler options or attributes or ...?

I think it makes sense to have a global compiler codegen option, and I also think it makes sense to have an attribute (with a yes/no) that can be applied to any amount of code.

how does it propagate from callers into callees, if at all? (generally hard problem, but IMO a good story for this is just as valuable as providing the basic feature in the first place)

The attribute shouldn't. It should only affect code generation under the scope of the attribute.

what transformations are licensed exactly? (e.g., do we want roughly what the C standard allows, or do we want more like GCC does?)

My ideal goal would be "anything that strictly increases accuracy, making the result closer to the mathematically accurate answer". That would also include, for instance, doing f32 math in f64 registers and not forcing the result to f32 after each operation, if that'd be faster.

@ExpHP

This comment has been minimized.

Copy link

commented Apr 18, 2019

Such programs are also going directly against best practices in numerical methods; if anything, we should ideally be linting against code like (x*x - y*y).sqrt().

In favor of what?

@fenrus75

This comment has been minimized.

Copy link

commented Apr 18, 2019

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 18, 2019

Currently, Rust's [specification for floating-point
types](https://doc.rust-lang.org/reference/types/numeric.html#floating-point-types)
states that:
> The IEEE 754-2008 "binary32" and "binary64" floating-point types are f32 and f64, respectively.

This comment has been minimized.

Copy link
@gnzlbg

gnzlbg Apr 18, 2019

Contributor

Shall this be understood as "the layout of f{32, 64} is that of binary{32, 64}" or as "the layout and arithmetic of f{32, 64} is that of binary{32, 64}" ?

The IEEE-754:2008 standard is very clear that optimizations like replacing a * b + c with fusedMultiplyAdd(a, b, c) should be opt-in, and not opt-out (e.g. see section 10.4), so depending on how one interprets the above, the proposed change could be a backwards incompatible change.

computations to differ by platform (though never below the standards-required
accuracy). However, standards-compliant implementations of math functions on
floating-point values may already vary slightly by platform, sufficiently so to
produce different binary results. This proposal can never make results *less*

This comment has been minimized.

Copy link
@gnzlbg

gnzlbg Apr 18, 2019

Contributor

If the intention of the user was for its Rust programs to actually have the semantics of the code it actually wrote, e.g., first do a a * b, and then add the result to c, performing intermediate rounding according the precision of the type, this proposal does not only make the result less accurate, but it makes it impossible to actually even express that operation in the Rust language.

If the user wants higher precision they can write fma(a, b, c) today, and if the user does not care, they can write fmul_add(a, b, c). This proposal, as presented, does not provide a first_mul_a_b_then_add_c(a, b, c) intrinsic that replaces the current semantics, so the current semantics become impossible to write.

This comment has been minimized.

Copy link
@joshtriplett

joshtriplett Apr 18, 2019

Author Member

performing intermediate rounding, according the precision of the type

What we're discussing in this RFC is, precisely, 1) whether that's actually the definition of the Rust language, and 2) whether it should be. Meanwhile, I'm not seeing any indication that that's actually the behavior Rust developers expect to get, or that they expect to pay 2x performance by default to get it.

but it makes it impossible to actually even express that operation in the Rust language

I'm already editing the RFC to require (rather than suggest) an attribute for this.


We could provide a separate set of types and allow extra accuracy in their
operations; however, this would create ABI differences between floating-point
functions, and the longer, less-well-known types seem unlikely to see

This comment has been minimized.

Copy link
@gnzlbg

gnzlbg Apr 18, 2019

Contributor

Not necessarily, these wrappers could be repr(transparent).

This comment has been minimized.

Copy link
@joshtriplett

joshtriplett Apr 18, 2019

Author Member

I mean this in the sense that changing from one to the other would be an incompatible API change in a crate. I'll clarify that.

This comment has been minimized.

Copy link
@gnzlbg

gnzlbg Apr 18, 2019

Contributor

If the algorithm does not care about contraction, it might also not care about NaNs, or associativity, or denormals, or ... so if it wants to accept a NonNaN<Associative<NoDenormals<fXY>>> type as well as the primitive f{32, 64} types, then it has to be generic, and if its generic, it would also accept a type wrapper lifting the assumption that contraction is not ok without breaking the API.

In other words, once one starts walking down the road of lifting assumptions about floating-point arithmetic, contraction is just one of the many many different assumptions that one might want to lift. Making it special does not solve the issue of these APIs having to be generic about these.

This comment has been minimized.

Copy link
@rkruppe

rkruppe Apr 20, 2019

Member

I do not think we have anywhere near a smooth enough UX for working with wrappers around primitive arithmetic types for me to seriously consider them as a solution for licensing fast-math transformations. There's serious papercuts even when trying to generic over the existing primitive types (e.g., you can't use literals without wrapping them in ugly T::from calls), and we have even less machinery to address the mixing of different types that such wrappers would entail.

I also think it's quite questionable whether these should be properties of the type. It kind of fits "no infinities/nans/etc." but other things are fundamentally about particular operations and therefore may be OK in one code region but not in another code region even if the same data is being operated on.

We could provide a separate set of types and allow extra accuracy in their
operations; however, this would create ABI differences between floating-point
functions, and the longer, less-well-known types seem unlikely to see
widespread use.

This comment has been minimized.

Copy link
@gnzlbg

gnzlbg Apr 18, 2019

Contributor

Prior art shows that people that need / want this are going to use them, e.g., "less-well-known_ flags like -ffast-math are in widespread use, even though they are not enabled by default. So it is unclear to me how much weight this argument should actually have.

This comment has been minimized.

Copy link
@rkruppe

rkruppe Apr 20, 2019

Member

Separate types are harder to drop into a code base than a compiler flag or attribute, though, because using the type in one place generally leads to type errors (and need for conversions to solve them) at the interface with other code.

We could do nothing, and require code to use `a.mul_add(b, c)` for
optimization; however, this would not allow for similar future optimizations,
and would not allow code to easily enable this optimization without substantial
code changes.

This comment has been minimized.

Copy link
@gnzlbg

gnzlbg Apr 18, 2019

Contributor

We could provide a clippy lint that recognizes a * b + c (and many others), and tell people that if they don't care about precision, they can write a.mul_add(b, c) instead. We could have a group of clippy lints about these kind of things that people can enable in bulk.

This comment has been minimized.

Copy link
@Lokathor

Lokathor Apr 20, 2019

On this particular point a clippy lint is helpful but not necessarily enough. Once the optimizer chews through layers of code it can end up at an a * b + c expression without it being anything that is obvious to clippy.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

Let's design a way to opt into and out of this behavior at crate/module/function first, and once that's done we can look at how to make more code use it automatically.

@rkruppe I would prefer even finer grained control than that, e.g., individual type wrappers that add a single assumption about floating-point math that the compiler is allowed to make and that can be combined, e.g.,

  • Trapless<T>: whether floating-point arithmetic can be assumed not to trap (e.g. on signaling NaNs)
  • Round{Nearest,0,+∞,-∞}<T> : whether the rounding mode can be assumed
  • Associative<T>: whether floating-point arithmetic can be assumed to be associative
  • Finite<T>: whether floating-point arithmetic can be assumed to produce numbers
  • Normal<T>: whether floating-point arithmetic can be assumed to produce normal numbers (as opposed to denormals/subnormals)
  • Contractable<T>: whether intermediate operations can be contracted using higher precision
  • ...

That way I can write a:

pub type Real = Trapless<Finite<Normal<Associative<Constractable<...<f32>...>>>>>>;

and use it throughout the parts of my code where its appropriate. When I need to interface with other crates (or they with me), I can still use f32/f64:

pub fn my_algo(x: f32) -> f32 {
    let r: Real = x.into()
    // ... do stuff with r ... 
    r.into()
}

Sure, some people might go overboard with these, and create complicated trait hierarchies, make all their code generic, etc. but one doesn't really need to do that (if somebody wants to provide a good library to abstract over all of this, similar to how num::Float works today, well they are free to do that, and those who find it useful will use it).

Global flags for turning these on/off require you to inspect the module/function/crate/ .cargo/config / ... to know what the rules for floating-point arithmetic are, and then use that knowledge to reason about your program, and the chances that some code that wasn't intended to play by those rules get those flags applied (e.g. because it was inlined, monomorphized, etc. on a module with those flags enabled), don't seem worth the risk (reading Fortran here gives me fond memories of writing implicit none at the top of every file).

The main argument of this RFC is that if we do something like this, then some code that expends 99% of its execution time doing a * b + c would be 2x slower. If that's the case, submitting a PR to change that code to a.fmul_add(b, c) is a no brainer (been there, done that: https://github.com/rust-lang-nursery/packed_simd/search?q=fma&type=Commits) - changing the behavior of all Rust code to fix such program feels overkill. If the issue is that code that could benefit from such a change is hard to fine, that's what clippy is for.

@scottmcm

This comment has been minimized.

Copy link
Member

commented Apr 18, 2019

@eaglgenes101

This comment has been minimized.

Copy link

commented Apr 18, 2019

In C, even if you make sure your compiler outputs code that uses IEEE 754 floats on all platforms, trying to get the same floating-point results across different platforms, build configurations, and times is an exercise in plugging up a bazillion abstraction leaks. That's par for the course for C. Not for Rust.

I am well aware that floating point is a mere approximation of the real numbers, and that you're suggesting transformations that would increase this accuracy. That said, I still disapprove of the proposed new defaults. I'd much rather not have the compiler try by default to second-guess me on what really should be a perfectly well-defined and predictable operation. I'd much rather the compiler, by default, choose some specific observable output behaviour, and stick to it, just like it normally does. I'll flick the floating point flags myself if I want to sacrifice determinism for a better approximation of what I've given up on since I was a clueless novice looking around for the reason why 0.1 + 0.2 === 0.3 evaluated to false. And I'm pretty sure I'd much rather performance-optimize another clueless programmer's slow floating point code than debug another clueless programmer's heisenbug-laden floating point code.

NaNs may also have unspecified bit patterns. However, IEEE 754 mandates behaviour for NaNs that make them opaque unless you specifically crack them open, and NaNs propagate through most floating-point operations, so if their payload can be disregarded, they are essentially fixed points of floating point operations. Small floating point evaluation differences tend to be magnified by systems with chaotic behaviour, which includes most nontrivial physical systems, and treating finite floats as opaque would completely defeat the purpose of doing the floating point computations in the first place.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 18, 2019

By way of providing concrete examples that Rust already provides extra accuracy today on some platforms:

$ cat test.rs ; echo === ; rustc +nightly --target=i586-unknown-linux-gnu test.rs -o test32 && rustc +nightly test.rs -o test64 && ./test32 && ./test64
fn foo(num: f32) -> f32 {
    ((num + 0.1) / 1.5e38) * 1.5e38
}

fn main() {
    println!("error: {:.50}", foo(1.23456789) - 1.23456789 - 0.1);
}
===
error: 0.00000002235174179077148437500000000000000000000000
error: 0.00000014156103134155273437500000000000000000000000

i586-unknown-linux-gnu has more accuracy than x86_64-unknown-linux-gnu, because it does intermediate calculations with more precision. And changing that would substantially reduce performance.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 18, 2019

@gnzlbg What code do you expect the compiler to generate when you use those generics? Because ultimately, if you want that, you're asking for pure software floating-point on many platforms.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

@joshtriplett

What code do you expect the compiler to generate when you use arbitrary combinations of those types?

If you check the LangRef for the LLVM-IR of the floating point intrinsics, e.g., llvm.fmul for a * b, you see that since recently it looks like:

<result> = fmul [fast-math flags]* <ty> <op1>, <op2>   ; yields ty:result

where flags like nonnan, finite, etc. can be inserted in [fast-math flags].

So when one uses such a type, I expect that rustc will insert the fast-math flags for each operation as appropriate. That's more finer grained than just inserting them as function attributes for all functions in an LLVM module.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 18, 2019

@gnzlbg What machine code do you expect to generate when every operation can potentially have different flags? How much performance do you consider reasonable to sacrifice to get the behavior you're proposing? What specific code do you want to write that depends on having such fine-grained type-level control of this behavior?

Not all abstract machines and specifications translate to reasonable machine code on concrete machines. If you want bit-for-bit identical results for floating point on different platforms and target feature flats and optimization levels, you're going to end up doing software floating point for many operations on many platforms, and I don't think that's going to meet people's expectations at all. If you can live with the current state that we've had for years, then this RFC is already consistent with that behavior.

I would like to request that discussion of adding much more fine-grained control of specific floating-point flags that weren't already raised in the RFC be part of some other RFC, rather than this one. I already have a mention of the idea of adding specific types, which covers the idea of (for instance) Contractable<T>. I don't think the full spectrum of flag-by-flag types mentioned in this comment is in scope for this RFC.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 18, 2019

Expanding on my earlier comment, Rust also already allows floating-point accuracy to depend on optimization level, in addition to targets:

$ cat test.rs ; echo === ; rustc +nightly --target=i586-unknown-linux-gnu test.rs -o test32 && rustc +nightly --target=i586-unknown-linux-gnu -O test.rs -o test32-opt && rustc +nightly test.rs -o test64 && ./test32 && ./test32-opt && ./test64
fn foo(num: f32) -> f32 {
    ((num + 0.1) / 1.5e38) * 1.5e38
}

fn main() {
    let prog = std::env::args().next().unwrap();
    println!("{:12} error: {:.50}", prog, foo(1.23456789) - 1.23456789 - 0.1);
}
===
./test32     error: 0.00000002235174179077148437500000000000000000000000
./test32-opt error: 0.00000014156103134155273437500000000000000000000000
./test64     error: 0.00000014156103134155273437500000000000000000000000

So, in practice, Rust already has this behavior, and this RFC does not represent a breaking change.

(Worth noting that it's easy enough to reproduce this with f64 as well, just by changing the types and constants.)

@scottmcm

This comment has been minimized.

Copy link
Member

commented Apr 18, 2019

you're going to end up doing software floating point for many operations

For things like cos, yes, but not for ordinary addition.

From http://www.box2d.org/forum/viewtopic.php?f=3&t=1800#p16480:

I work at Gas Powered Games and i can tell you first hand that floating point math is deterministic. You just need the same instruction set and compiler and of course the user's processor adhears to the IEEE754 standard, which includes all of our PC and 360 customers. The engine that runs DemiGod, Supreme Commander 1 and 2 rely upon the IEEE754 standard. Not to mention probably all other RTS peer to peer games in the market. As soon as you have a peer to peer network game where each client broadcasts what command they are doing on what 'tick' number and rely on the client computer to figure out the simulation/physical details your going to rely on the determinism of the floating point processor.

So it's not trivial, but apparently it works across processor vendors and such.

joshtriplett added some commits Apr 18, 2019

Add clarification that this is *not* the equivalent of "fast math" fr…
…om other languages

"fast math" is widely perceived as an unsafe option to go faster by
sacrificing accuracy.
@fenrus75

This comment has been minimized.

Copy link

commented Apr 20, 2019

What do you think the less expert user will do when their code under --release produces different results than in debug mode ?

it ALREADY does that. See Josh's example earlier... ON THE SAME SYSTEM (64 bit x86 so not obscure), you can already get different floating point answers depending on optimization level.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Apr 20, 2019

Which example?

@fenrus75

This comment has been minimized.

Copy link

commented Apr 20, 2019

Which example?

#2686 (comment)

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Apr 20, 2019

Is the behavior in those examples by design? (do we define that as the expected behavior in the reference, RFCs, etc.) or is it by accident (are they bugs)? @rkruppe do you know?

@eaglgenes101

This comment has been minimized.

Copy link

commented Apr 20, 2019

Which example?

#2686 (comment)

Rust platform targeting considers i{3,5}86 a different target processor architecture than i686, which in turn is considered different from x86_64, so being on the same system is an incidental detail (x86 processors are notoriously backwards-compatible down to replicating what were clearly hardware bugs in initial implementations), as in each case rustc is told different things about the platform it's outputting binary executables for. The optimization in question is constant evaluation, which runs using the host's processor's floating point semantics, which is using f64 all the way through on x86_64 and i686, but in my opinion should really be evaluated using the target platform's floating point semantics.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Apr 20, 2019

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 22, 2019

The simplest example is some code that computes an array index from floats, and then uses ‘get_unchecked’ in unsafe code under the assumption that under IEEE-754 floats the result is always in bounds. Is such code safe? IMO the answer to this question should not be “depends on the optimization level / target / libm/ ...”. It should be yes or no.

Now that's something I entirely agree with, and I think the answer should "no, that code isn't safe, don't do that". I'd love to see more ways for us to detect and lint against unsafe behavior like that. However, that's part of why such operations are unsafe: they may actually be safe but only based on assumptions that only the programmer knows, such as range restrictions on input values. So I don't think we can lint against such operations in the fully general case, but to the extent we can reasonably do so I think we should.

We already have lints against, for instance, doing exact equality comparisons on floating-point values.

In much the same spirit as the RFCs about "portability lints", I'd like to have more ways to catch such issues early, and regardless of target platform.

Further clarification on separate types
Credit to Robin Kruppe for the explanation and phrasing.
@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

Now that's something I entirely agree with, and I think the answer should "no, that code isn't safe, don't do that". I'd love to see more ways for us to detect and lint against unsafe behavior like that. However, that's part of why such operations are unsafe: they may actually be safe but only based on assumptions that only the programmer knows, such as range restrictions on input values. So I don't think we can lint against such operations in the fully general case, but to the extent we can reasonably do so I think we should.

I think that before resolving this we have to clarify exactly what do we guarantee (if anything) about f32 and f64 arithmetic. I've opened a UCG issue for that (rust-lang/unsafe-code-guidelines#123).

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 23, 2019

@sgrif

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

I'm not well versed enough in FP specifics to comment on the concerns about whether contraction by default is safe or not, but I really like this RFC overall. These sort of details should definitely allow for the compiler to make future optimizations in the future, so a step in that direction makes a ton of sense. Overall I think this RFC is well motivated, and opens a door that I'd like to see opened.

@gThorondorsen

This comment has been minimized.

Copy link

commented Apr 24, 2019

First and foremost, I think that having a way to turn these optimisations off is required. I find the reference-level explanation not strong enough on this point. In particular, this function must (continue to) be implementable somehow in pure Rust. (It is a building block for accurate summations.)

/// Returns the sum of two floating-point numbers together with the rounding error.
fn add_with_error(in0: f32, in1: f32) -> (f32, f32) {
    let sum = in0 + in1;
    let diff = (sum - in0, sum - in1);
    let err = (diff.0 - in1, diff.1 - in0);
    (sum, err.0 + err.1)
}

If local variables are allowed to have more precision, e.g. f64 instead of f32, then it can erroneously report that an addition is exact (i.e. the second component of the result is ±0.0).

Since there currently exists no way (that I know of) to declare reliance on strict floating-point arithmetic, I would argue that enabling the proposed optimisations by default is a breaking change. And although I can't find any reference to it right now, it seems to me that strict IEEE 754 floating-point arithmetic by default used to be a selling point for Rust.

Moreover, floating-point is hard enough as it is when staying within the bounds defined by IEEE 754, which at least is deterministic for the basic + - * / operations. Then one may have to worry about the hardware-induced deviations. I find it is very much not necessary to add the implementation details of an optimising compiler into the mix.

Think about colleagues trying to debug an ill-conditioned or chaotic problem and getting widely different results because exactly one of them has an old computer whose CPU has no FMA instruction (and the other a forgotten but applicable -C target-cpu=native somewhere). Or just a beginner getting different results between debug and release mode, maybe as far as true and false (e.g. for add_with_error(0.1f32, 0.2f32).1 != 0).

Therefore, I don't think it is worth the trouble, in terms of learnability or silent breaking changes, to enable non-strict floating-point optimisations by default.


That being said, I am sympathetic to the concerns about the loss of performance potential. In line with what I wrote above, I propose the following alternative, much slower plan.

  1. Implement intrinsics of the following form for every operation, matching the corresponding LLVM IR instructions.

    /// Applies `op` to its `fp` arguments and returns the result.
    ///
    /// The `optim_flags` must be a compile-time constant. It encodes which
    /// optimisations this operation is allowed to be involved in. The `fp`
    /// type parameter must be a floating-point type, either `f32` or `f64`.
    fn optim_fp_op<fp>(optim_flags: u32, x: fp, y: fp) -> fp;

    There is probably enough room in the flags argument to fit in the rounding mode too, in which case we need to find a different name.

  2. Give access to these intrinsics as functions in the f32 and f64 modules of the standard library (or as methods on the types). I am thinking of an API like the byteorder crate (or const generics if that lands soon enough).

  3. Now third-party crates can implement their own Wrapping-like newtype wrappers. Given enough time, experience and experimentation some might get promoted to the standard library.

  4. Only when people relying on the strict floating-point arithmetic have converted their code to the new functions with guaranteed semantics should it become possible to enable more optimisations on the operators and the current, unqualified methods of the primitive floating-point types. It can be a codegen flag or an attribute (or both) but should be opt-in at first. The default might be adjusted later (though I still recommend against it).

In conclusion, I find that this RFC just rushes to (the controversial part of) step 4 above without considering the IMO necessary previous ones. The right incrementality needed here is to build an API surface to be able to choose and enforce the right contract, not adding optimisations one at a time.

@genneth

This comment has been minimized.

Copy link

commented Apr 24, 2019

I work on a large piece of software, which, using modern parlance, would be described as some sort of machine-learning-based software. It mixes the need for fast arithmetic (we've just bought something like $25m worth of GPUs --- just to speed up a small part of the overall calculation) and predictable engineering qualities like binary-exact regression tests (it's been in continuous production for over 10 years --- nobody knows where all the skeletons are). From this admittedly somewhat niche corner, where we are about to deploy our first production Rust code (written in Rust for speed): please, please do not do this.

And yes --- we care enough about binary reproducibility to control carefully what CPUs are in our enormous calcfarm (much to the disgust of my infra engineers). And yes, we've carefully made sure that all the binaries use SSE instructions instead of x87. We've even eliminated Windows' own math functions that's part of crt because they observably did not give the same output on different hardware. We understand the tradeoffs very well, and have chosen given that.

If this feature is put in such that we cannot turn it off, I will simply veto any use of Rust for floating point calculations in my company --- thus removing it from consideration for the performance sensitive parts of the software. If this feature is put in, in such a way that I cannot disable it for all crates that we might pull in, it might cost us enough time to maintain our own fork of everything that we will avoid Rust for much of the stack.

@scottmcm

This comment has been minimized.

Copy link
Member

commented Apr 24, 2019

The optimization in question is constant evaluation, which runs using the host's processor's floating point semantics, which is using f64 all the way through on x86_64 and i686, but in my opinion should really be evaluated using the target platform's floating point semantics.

If const eval is calculated using the host in a way that's different from the target, that's just a straight-up bug.

This is why LLVM has things like APFloat, so that it gets it right.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 24, 2019

@eaglgenes101

This comment has been minimized.

Copy link

commented Apr 24, 2019

On Wed, Apr 24, 2019 at 12:44:38PM -0700, Gen Zhang wrote: If this feature is put in such that we cannot turn it off,
[...]
If this feature is put in, in such a way that I cannot disable it for all crates that we might pull in,
I have no intention of having this feature become part of Rust without a robust mechanism to disable it for use cases that need that. I'll update the summary and the first paragraph of the RFC to make that more clear.

Yes please. I'm pretty sure in discussion I said that this transition is fine only with opt-outs of multiple granularities, and a concrete timetable for transitioning this behavior into releases. Unless those are put in, I don't think I can support such a proposal.

@genneth

This comment has been minimized.

Copy link

commented Apr 24, 2019

I have no intention of having this feature become part of Rust without a robust mechanism to disable it for use cases that need that. I'll update the summary and the first paragraph of the RFC to make that more clear.

Thank you. Apologies if I came across a little strong there: but it's been quite a long battle to get our software stack to the point where basic regression test suites are actually a reliable part of the process.

Too often people seem to conflate floating point with non-determinism, and seem to forget that robust regression tests are an extremely useful part of development. It's not even about potential unsafety, as has been pointed out above: in machine-learning-like software, even though the outcome is not important to the last ULP, it is still extremely helpful to know if behaviour has been changed accidentally or deliberately. Usually, different people will be responsible for different kinds of changes, and most engineers on a large team will not be wanting to even accidentally change the numerical outcome. Certainly, in our experience, it has been consistently the right choice to have deterministic outcome, over any potential performance; but I accept that we fall into the "expert" part of the spectrum, and will aggressively profile and fine-tune when we find that performance is necessary, and thus will always be slightly at odds with any system that tries to guess a compromise suitable for a larger group of users.

Substantial overhaul to better accommodate applications wanting to di…
…sable this

- Add some explicit timeline guidance on stabilization, to ensure that
  this behavior does not make its way into stable too quickly.
- Mention at the top of the document that this RFC specifies robust
  mechanisms to disable this behavior.
- Add guide-level explanation of existing behavior, explicit discussion
  of applications still taking special care to achieve identical results
  across *some* platforms, and reference to the disabling mechanisms to
  help achieve that.
- Add much more specific descriptions of the codegen option and
  attributes.
- Remove the opt-in attribute, to ensure that the codegen option acts as
  a top-level opt-out, and avoid any need for a "force-off" mechanism.
- Expand the alternatives section.
- Discuss more C compilers; ICC and some other C compilers do this as
  well, and Clang does not by default.
- Explicitly acknowledge, as a drawback, that though Rust has varations
  like this already, this does introduce variations in target platforms
  that didn't previously have them. This tones down the existing
  ameliorating language in the drawbacks section.
@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 24, 2019

@eaglgenes101

I'm pretty sure in discussion I said that this transition is fine only with opt-outs of multiple granularities

I'm currently overhauling the RFC to make that much more clear, and in particular to more explicitly and strongly prioritize outer opt-outs over inner opt-ins to ensure that people can disable this in one place and feel confident that it is completely and totally disabled.

and a concrete timetable for transitioning this behavior into releases

Adding that as well.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 24, 2019

Substantial overhaul

I've now substantially overhauled this RFC, to better accommodate applications wanting to disable this.

  • Add some explicit timeline guidance on stabilization, to ensure that
    this behavior does not make its way into stable too quickly.
  • Mention at the top of the document that this RFC specifies robust
    mechanisms to disable this behavior.
  • Add guide-level explanation of existing behavior, explicit discussion
    of applications still taking special care to achieve identical results
    across some platforms, and reference to the disabling mechanisms to
    help achieve that.
  • Add much more specific descriptions of the codegen option and
    attributes.
  • Remove the opt-in attribute, to ensure that the codegen option acts as
    a top-level opt-out, and avoid any need for a "force-off" mechanism.
  • Expand the alternatives section.
  • Discuss more C compilers; ICC and some other C compilers do this as
    well, and Clang does not by default.
  • Explicitly acknowledge, as a drawback, that though Rust has varations
    like this already, this does introduce variations in target platforms
    that didn't previously have them. This tones down the existing
    ameliorating language in the drawbacks section.

Thank you to everyone providing feedback, concerns, and use cases.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 24, 2019

@genneth

Thank you. Apologies if I came across a little strong there: but it's been quite a long battle to get our software stack to the point where basic regression test suites are actually a reliable part of the process.

Not at all! You were entirely reasonable, and I really appreciated your comment in particular. Your perspective made two things clear that I had not realized from previous feedback:

  • It wasn't sufficiently clear, in several places in the RFC (including the opening motivation) that this would provide robust mechanisms to disable it. That was critically important, and I hope it's sufficiently clear now.
  • Given your use case specifically, I adapted the description and behavior of the codegen option and attribute, dropping the ability to opt into this by attribute, precisely so that you can be certain that a single top-level -C extra-fp-precision=off will disable this everywhere. I didn't want you to have to deal with a "force-off" or similar.

I hope that helps! I'd appreciate your feedback on the new version.

@kornelski

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

I don't mind that behavior. In fact, I'd like even more reckless-approx-math options. Is there a path to opting in to more fast fp math?

Maybe #[extra_fp_precision(on)] could be #[fp(extra_precision)] and eventually become #[fp(extra_precision, associative, reciprocal_approx, no_signed_zero, no_traps)], etc.

@Lokathor

This comment has been minimized.

Copy link

commented Apr 25, 2019

well, can perhaps we just be real about what we're telling the compiler to allow?

#![fp(allow_fma)]

Or are there things besides just allowing for FMA usage that we're talking about here? (EDIT: in this first wave of optimizations at least)

@fenrus75

This comment has been minimized.

Copy link

commented Apr 25, 2019

there most certainly are other things;
example would be a system where converting from f64 to f32 is expensive (rounding, range checks etc), and if a calculation is a mix of f32 and f64... this would allow the whole calculation to be done in f64 and the rounding down only at the final store to memory.

(f32->f64 tends to be cheap since it's mostly just padding 0 bits)

@programmerjake

This comment has been minimized.

Copy link

commented Apr 25, 2019

From what I understand, LLVM (and probably Rust) by default assumes that traps don't occur and the rounding mode is set to round-to-nearest

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 25, 2019

@joshtriplett

This comment has been minimized.

Copy link
Member Author

commented Apr 25, 2019

@gThorondorsen

This comment has been minimized.

Copy link

commented Apr 26, 2019

On Wed, Apr 24, 2019 at 05:31:34PM -0700, Kornel wrote:
I don't mind that behavior. In fact, I'd like even more reckless-approx-math options. Is there a path to opting in to more fast fp math? Maybe #[extra_fp_precision(on)] could be #[fp(extra_precision)] and eventually become #[fp(extra_precision, associative, reciprocal_approx, no_signed_zero, no_traps)], etc.

I don't want to add those other flags in this RFC (I really want to avoid the implication of association with -ffast-math), but I have no objection to changing this to fp(extra_precision(off)) (or perhaps fp(no_extra_precision)), to allow for future fp(...) flags. That seems entirely reasonable.

In my opinion, this attribute should really have an option to disable all the optimisations that may change the exact results of the computations, present and future. So that people who care can write e.g. #[fp(strict)] and know their library will not break when new optimisations are introduced. And also allow fp(strict, contraction) or fp(strict, extra_precision) to enable optimisations selectively.

Also, I find fp to be too short and generic of a name. I would expect this attribute to also be able to control the rounding mode and trapping behaviour. It may or may not be a good idea to group all these features into a single attribute. I propose to use fp_optimize instead, and leave the other functionality to other names (probably fp-prefixed as well).

should explicitly discuss the issue of extra floating-point precision and how
to disable it. Furthermore, this change should not become part of a stable Rust
release until at least eight stable releases *after* it first becomes
implemented in the nightly compiler.

This comment has been minimized.

Copy link
@gThorondorsen

gThorondorsen Apr 26, 2019

I'm not sure I understand the point of this last sentence. And particularly, why is the reference point the first availability in nightly? I think it would be more useful to guarantee that the optimisations will not be enabled by default on stable until the opt-out has been available as a no-op for a few stable releases.

loss.) However, with some additional care, applications desiring cross-platform
identical results can potentially achieve that on multiple target platforms. In
particular, applications prioritizing identical, portable results across two or
more target platforms can disable extra floating-point precision entirely.

This comment has been minimized.

Copy link
@gThorondorsen

gThorondorsen Apr 26, 2019

As I mentioned in a previous comment, mere reproducibility is not always the reason to disable this behaviour. Some algorithms can actually take advantage of the weird special properties of floating-point arithmetic. Such algorithms should remain implementable as Rust libraries, and those should not break just because someone decided they wanted their unrelated floating-point code to be as fast as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.