-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Imprecise floating point operations (fast-math) #21690
Comments
Inline IR was discussed on #15180. Another option is |
Yeah, adding it as a function in There are a few different fast math flags, but the |
This forum thread has examples of loops that llvm can vectorize well for integers, but doesn't for floats (a dot product). |
I've prototyped it using a newtype: https://gitlab.com/kornelski/ffast-math (https://play.rust-lang.org/?gist=d516771d1d002f740cc9bf6eb5cacdf0&version=nightly&backtrace=0) It works in simple cases, but the newtype solution is insufficient:
So I'm very keen on seeing it supported natively in Rust. |
I've tried -ffast-math in my C vs Rust benchmark of some graphics code: https://github.com/pedrocr/rustc-math-bench In the C code it's a ~20% improvement in clang but no benefit with GCC. In both cases it returns a wrong result and the math is extremely simple (multiplying a vector by a matrix). According to this:
|
@pedrocr Your benchmark has a loss of precision in With You get significantly different sum with All values from matrix multiplication are the same to at least 6 digits (I've diffed |
@pornel thanks, fixed here: pedrocr/rustc-math-bench@8169fa3 The benchmark results are fine though, the sum is only used as a checksum. Here are the averages of three runs in ms/megapixel:
So as I mentioned before clang/llvm gets a good benefit from ffast-math but not gcc. I'd say making sure things like |
I've suggested it would make sense to expose https://internals.rust-lang.org/t/pre-rfc-stabilization-of-target-feature/5176/23 |
Rust has fast math intrinsics, so the fast math behavior could be limited to a specific type or selected functions, without forcing the whole program into it. |
A usable solution for my use cases would probably be to have the vector types in the simd crate be the types that allow the opt-in to ffast-math. That way there's only one type I need to conciously convert the code to for speedups. But for the general solution of in normal code having to swap types seems cumbersome. But maybe just doing |
Created a pre-RFC discussion on internals to try and get a discussion on the best way to do this: https://internals.rust-lang.org/t/pre-rfc-whats-the-best-way-to-implement-ffast-math/5740 |
Is there a current recommended approach to using fast-math optimizations in rust nightly? |
If it helps, a good benchmark comparison article between C++ and Rust floating point optimizations (link) inside loops was written recently (Oct 19), with a good Hacker News discussion exploring this concept. Personally, I think the key is that without specifying any (EDIT: floating-point specific) flags (and after using iterators), by default clang and gcc do more optimizations on float math than Rust currently does. (EDIT: It seems that An important key for any discussion on opmitized float math should keep this in mind: vectorization isn't always less precise - a commenter pointed out that a vectorized floating point sum is actually more accurate than the un-vectorized version. Also see Stack Overflow: https://stackoverflow.com/a/7455442 I'm curious what criteria for vectorization clang (or gcc) uses for figuring out floating point optimization. I'm not enough of an expert in these areas to know specifics though. I'm also not sure what precision guarantees Rust makes for floating point math. |
That's not the case in the article. The clang compilation was using |
Relevant part on LLVM. I read that as function calls (that return float types?) on floating-points or floating-point vectors being amenable to fast-math optimizations. |
How would the attribute work if |
@LiamTheProgrammer All the operations is vast because it includes intrinsics. Anything here with Anecdotally in the C++ code bases I've worked on that used fast math they've generally done the inverse -- use fast math everywhere then by hand annotate specific functions where it would be problematic with pragmas to disable the optimization for those functions. This works in practice because If you care strongly about floating point performance you're probably also using vector operations, and vector operations tend to mitigate precision problems (e.g. to implement vectorized dot product you have 8 separate accumulators effectively instead of just the one in a scalar implementation). It's almost guaranteed that in a system where you have to bless particular operations that somebody is going to ship a crate with a newtype wrapper for floats that just lists every single operation and that will become the way most people end up using the operations. |
Adding a wishlist item when it eventually becomes true: Rust should get a way to temporarily avoid reordering in a certain part of an expression. Fortran compilers usually reorder as they please, but avoid breaking up parenthesized stuff. (I mean, I would love to have it in C too…) |
However these optimizations are approached, the consequences can be quite dire: shared objects emitted by some compilers with this style of optimizations can, upon being linked to, change the floating point CSR for the entire program and leave it that way, altering results in unrelated code. While Rust is more commonly a standalone binary and thus largely in control of its execution and not interfering with others, rather than a |
Saw that thread too! Still not sure whether it affects Rust at all — crtfastmath.o is a linkage thing and the decision seems to be due to the compiler driver, not the backend codegen. |
Everywhere I've worked using Rust, people have been linking against Rust dylibs, so it's already fairly common in my experience, just not when the program itself is written in Rust. That said, I don't think this needs to be worried that much about so long as we don't do the wrong thing when calling the linker. I think this is a very tricky problem, and the right solution might be to have different scoped-enabled fast math options. For a while I had an RFC I was working on that was going to propose enabling them in a manner similar to target_feature, which I think would be a good model for it. It explicitly allowed you to say "this function requires strict floating point semantics, even if they required allowed in the caller", but by default you'd inherit them from the caller... There are a lot of cases where this can benefit stuff in portable_simd, and without some way of doing this, portable handling of edge cases like NaNs could easily make things... substantially slower (for example, fixing rust-lang/stdarch#1155 sped up my code in a tight loop by 50%, and the code had... more to do than just min) That said, I'm very sympathetic to the point made here: https://twitter.com/johnregehr/status/1440090854383706116, that defining "fast-math" in terms optimizations performed rather than semantics is just going to be a mess. |
Drawing some inspiration from the CakeML paper, perhaps we could have an annotation to mark possible values (ranges and inf/NaN), and have an annotation to allow any value in the range spanned by every combination of real number and floating point evaluation (this should allow widening and fma, I think? - it could require some tolerance for rounding in the wrong direction; perhaps returning an adjacent floating point value (1 ulp away) should be allowed), as well as some way to specify looser error bounds (e.g. within this factor of either end of the range described). |
Directly? No. I am more noting it as a potential consequence of missteps: we shouldn't allow Rust code to be a "bad citizen" and change the results in unaffected programs, so we should be careful about e.g. allowing changing the FPCSR as an optimization. I agree with a scope-oriented model that would allow defining functions that have stricter or looser FP semantics, with a broadly similar model to |
https://simonbyrne.github.io/notes/fastmath/ makes some good points about the perils of fast-math. |
There's quite a lot of comments about rounding modes. Is it about GCC and other backends? Pretty sure LLVM's fast-math flags doesn't even touch that so there shouldn't be any problem of fast-math enabled Rust libraries messing up other code that will link to it. Besides couldn't we already do the really dangerous floating-point environment stuff like #![feature(link_llvm_intrinsics)]
extern {
#[link_name="llvm.set.rounding"]
fn set_rounding(mode: i32);
} and also via the CSR intrinsics in |
Just a note from a user here, attempting to progress on Rust CV by optimizing the Akaze visual feature detector. The lack of even opt-in location-specific fast math (such as I hope that such a user's perspective can be considered when discussing the dangers of fast math. Because for the language adoption in several modern fields, there is also something to lose by not having something like |
Are you able to probe (perhaps most easily in the C code since clang exposes flags for each) which of the flags are necessary? In particular, some like |
Ok, so using the C code in the comparison, enabling The comparison code is likely an extreme case as it looks like the compiler could really inline a lot. |
@stephanemagnenat I assume you're using x86? What about with |
Does anyone know if the developers even remember that we need -ffast-math? |
@workingjubilee yes I'm using x86-64. Using that bench, I got worst results using the
vs
Using an AMD Ryzen 9 7950X CPU. This is somewhat surprising. The |
That's... Very Weird, given that usually it significantly improves it. |
I fully agree, I don't think I made a mistake but that's always a possibility. It would be interesting for others to try to replicate this little experiment, it is very easy to do: just clone the repo, and run the benchmark, with and without the |
@stephanemagnenat I observe the same slowdown with
|
... which in hindsight is obvious, as it enables the feature 🤦 |
While I do think that adding an option to enable fast-math in Rust is definitely desireable, I don't like the idea of making a new type for it however. I would rather make it an optional compiler flag, that is not set by default in |
Putting it in profile, allowing each crate to set it, and allowing the binary crate to override it per-crate seems to make sense. You could then enable it for your library crate if you know it is safe , and for binary crate they can disable it if it turns out doesn't work, or enable it if they know what they are doing. |
Making it a compile flag that applies to other crates sounds like a terrible idea. When you download a crate from the internet, you can't know whether it was written in a way that is compatible with fast-math semantics. It is very important not to apply fast-math semantics to code that assumes IEEE semantics. We could have a crate-level attribute meaning "all floating-point ops in this crate are fast-math", but under no circumstances should you be able to force fast-math on other people's code. That would ultimately even undermine Rust's safety promise. |
That sounds like a good path to go on in my eyes. Being able to set an attribute in the Cargo.toml which basically means "This crate is fast-math-safe". Compiling your code with fast-math on would then check every dependency whether it is fast-math-safe or not and compile it accordingly. |
I use |
There should be a way to use imprecise floating point operations like GCC's and Clang's
-ffast-math
. The simplest way to do this would be to do like GCC and Clang and implement a command line flag, but I think a better way to do this would be to create af32fast
andf64fast
type that would then call the fast LLVM math functions. This way you can easily mix fast and "slow" floating point operations.I think this could be implemented as a library if LLVM assembly could be used in the
asm
macro.The text was updated successfully, but these errors were encountered: