Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
floating point to integer casts can cause undefined behaviour #10184
Comments
|
Nominating |
thestinger
referenced this issue
Nov 3, 2013
Open
Audit for binary IEEE 754-2008 compliance on relevant platforms #10087
|
accepted for P-high, same reasoning as #10183 |
pnkfelix
referenced this issue
Nov 7, 2013
Closed
integer to floating point casts can cause undefined behaviour #10185
|
I don't think this is backwards incompatible at a language level. It will not cause code that was working OK to stop working. Nominating. |
|
changing to P-high, same reasoning as #10183 |
huonw
referenced this issue
Jul 8, 2014
Open
floating point to floating point casts have undefined behaviour #15536
|
How do we propose to solve this and #10185? Since whether behaviour is defined or not depends on the dynamic value of the number being cast, it seems the only solution is to insert dynamic checks. We seem to agree we do not want to do that for arithmetic overflow, are we happy to do it for cast overflow? |
|
We could add an intrinsic to LLVM that performs a "safe conversion". @zwarich may have other ideas. |
zwarich
commented
Sep 12, 2014
|
AFAIK the only solution at the moment is to use the target-specific intrinsics. That's what JavaScriptCore does, at least according to someone I asked. |
|
Oh, that's easy enough then. |
|
ping @pnkfelix is this covered by the new overflow checking stuff? |
|
These casts are not checked by rustc with debug assertions. |
This was referenced Jun 29, 2015
|
I'm happy to handle this, but I need a concrete solution. I personally think that it should be checked along with overflowing integer arithmetic, as it's a very similar issue. I don't really mind what we do though. Note that this issue is currently causing an ICE when used in certain constant expressions. |
|
This allows violating memory safety in safe rust, example from this forum post:
#[inline(never)]
pub fn f(ary: &[u8; 5]) -> &[u8] {
let idx = 1e100f64 as usize;
&ary[idx..]
}
fn main() {
println!("{}", f(&[1; 5])[0xdeadbeef]);
}
|
steveklabnik
added
the
I-unsound
label
Oct 8, 2015
|
Marking with I-unsound given the violation of memory safety in safe rust. |
|
@bluss , this does not segfualt for me, just gives an assertion error. untagging since i was the one who added it |
steveklabnik
added
I-unsound
and removed
I-unsound
labels
Oct 29, 2015
|
Sigh, I forgot the -O, re-tagging. |
This was referenced Jan 6, 2016
|
re-nominating for P-high. Apparently this was at some point P-high but got lower over time. This seems pretty important for correctness. EDIT: didn’t react to triage comment, adding label manually. |
nagisa
added
the
I-nominated
label
Feb 21, 2016
nikomatsakis
added
T-compiler
T-lang
labels
Feb 25, 2016
|
It seems like the precedent from the overflow stuff (e.g. for shifting) is to just settle on some behavior. Java seems to produce the result modulo the range, which seems not unreasonable; I'm not sure just what kind of LLVM code we'd need to handle that. |
|
According to https://docs.oracle.com/javase/specs/jls/se7/html/jls-5.html#jls-5.1.3 Java also guarantees that As an alternative, I would suggest that float->int casts are guaranteed to only be valid if the truncation of the original value can be represented as a value of the destination type (or maybe as The main advantages of the Java approach are that the conversion function is total, but this also means that unexpected behaviour might creep in: it would prevent undefined behaviour, but it would be easy to be tricked into not checking if the cast actually made any sense (this is unfortunately true also for the other casts The other approach matches the one currently used for arithmetic operations: simple & efficient implementation in release, panics triggered by range checking in debug. Unfortunately unlike other |
|
The problem isn't "let's return any integer, you obviously don't care which", it is that it causes an undef which isn't a random value but rather a nasal demon value and LLVM is allowed to assume the undef never occurs enabling optimizations that do horrible incorrect things. If it was a random value, but crucially not undef, then that would be enough to fix the soundness issues. We don't need to define how unrepresentable values are represented, we just need to prevent undef. |
|
Discussed in @rust-lang/compiler meeting. The most consistent course of action remains:
The main problem is that we need a concrete suggestion for option 2. |
|
triage: P-medium |
rust-highfive
added
P-medium
and removed
P-medium
I-nominated
labels
Mar 3, 2016
|
@nikomatsakis Does |
Concrete suggestion: extract digits and exponent as fn f64_as_u64(f: f64) -> u64 {
let (mantissa, exponent, _sign) = f.integer_decode();
mantissa >> ((-exponent) & 63)
}Yes it's not zero cost, but it's somewhat optimizable (would be better if we marked integer_decode |
|
Does LLVM not have platform intrinsics for the conversion functions? EDIT: @zwarich said (a long time ago):
Why even bother panicking? AFAIK, @glaebhoerl is correct, |
|
On Sat, Mar 05, 2016 at 03:47:55AM -0800, Gábor Lehel wrote:
True. I find that persuasive. |
|
On Wed, Mar 09, 2016 at 02:31:05AM -0800, Eduard-Mihai Burtescu wrote:
Yes, I think I was mistaken before. |
|
@nikomatsakis: it seems the behavior hasn't been defined yet? Can you give an update about the planning regarding that? |
gmorenz
commented
Aug 21, 2016
|
Just ran into this with much smaller numbers let x: f64 = -1.0;
x as u8Results in 0, 16, etc. depending on optimizations, I was hoping it would be defined as 255 so I don't have to write |
|
@gmorenz Did you try |
gmorenz
commented
Aug 24, 2016
|
In context that wouldn't make sense, I was getting the f64 from a transformation on data sent over the network, with a range of [-255, 255]. I was hoping it would wrap nicely (in the exact way that |
|
Here's a recent LLVM proposal to "kill undef" http://lists.llvm.org/pipermail/llvm-dev/2016-October/106182.html , though I'm hardly knowledgeable enough to know whether or not this would automagically resolve this issue. |
|
They're replacing undef with poison, the semantics being slightly different. It's not going to make int -> float casts defined behavior. |
|
We probably should provide some explicit way to do a saturating cast? I’ve wanted that exact behaviour just now. |
|
Seems like this should be marked I-crash, given #10184 (comment) . |
japaric
referenced this issue
in rust-lang-nursery/compiler-builtins
Feb 5, 2017
Closed
WIP: Implement signed/unsigned Integer convertion to single/double precision float #121
|
We had a question about this in |
petrochenkov
referenced this issue
Mar 14, 2017
Merged
Add functions to safely transmute float to int #39271
|
The book I'm writing with @jimblandy, Programming Rust, mentions this bug.
Our deadline for this chapter is May 19. I'd love to delete that last paragraph, but I feel like we should at least have some kind of plan here first. Apparently current JavaScriptCore uses an interesting hack on x86. They use the CVTTSD2SI instruction, then fall back on some hairy C++ if the value is out of range. Since out-of-range values currently explode, using that instruction (with no fallback!) would be an improvement on what we have now, albeit only for one architecture. |
|
Honestly I think we should deprecate numeric casts with |
|
Maybe so, but that seems orthogonal to me. |
|
OK, I've just re-read this whole conversation. I think there is agreement that this operation should not panic (for general consistency with
It's not clear to me if there is a clear precedent for what the result ought to be in the first case? |
|
After having written that out, my preference would be to maintain a deterministic result. I feel like every place that we can hold the line on determinism is a win. I am not really sure what the result ought to be though. I like saturation because I can understand it and it seems useful, but it seems somehow incongruent with the way that |
|
My code gives the correct value for things in the range 0..2^64 and deterministic but bogus values for everything else. floats are represented by mantissa ^ exponent, e.g. |
nikomatsakis
added
E-needs-mentor
WG-compiler-middle
E-mentor
and removed
E-needs-mentor
E-mentor
labels
Sep 22, 2017
|
I marked this as |
|
IIRC LLVM are planning to eventually implement |
s3bk
commented
Sep 24, 2017
•
|
My results so far: https://gist.github.com/s3bk/4bdfbe2acca30fcf587006ebb4811744 The _array variants run a loop of 1024 values.
|
sp-1234
commented
Sep 25, 2017
|
Perhaps you need not round the results to integer for individual operations. Clearly there must be some difference behind these 2 ns/iter. Or is it really like this, exactly 2 ns for all 4 variants? |
|
@sp-1234 I wonder if it's partially optimized out. |
s3bk
commented
Sep 25, 2017
•
|
@sp-1234 It is too fast to measure. The non-array benchmarks are basically useless. |
|
@arielb1 However this means that we can now read uninitialized memory from safe code. This could lead to secret data being leaked, somewhat like Heartbleed. It's debatable whether this is truly considered UB from the point of view of Rust, but it clearly seems undesirable. |
|
I ran @s3bk's benchmark locally. I can confirm the scalar versions are optimized out completely, and the asm for the array variants also looks suspiciously well-optimized: for example, the loops are vectorized, which is nice but makes it hard to extrapolate performance to scalar code. Unfortunately, spamming
The array benchmarks vary too much for me to trust them. Well, truth to be told, I'm skeptical of the Two points worth noting:
For the record, I've measured with rustc 1.21.0-nightly (d692a91 2017-08-04), In conclusion, I conclude that don't have reliable data so far and that getting more reliable data seems hard. Furthermore, I strongly doubt that any real application spends even 1% of its wall clock time on this operation. Therefore, I would suggest to move forward by implementing saturating Edit: I would also recommend to run such benchmarks on a variety of architectures (e.g., including ARM) and microarchitectures, if at all possible. |
simonbyrne
commented
Sep 27, 2017
|
I admit I'm not that familiar with rust, but I think this line is subtly incorrect: |
|
Yeah, we had exactly that problem in Servo before. The final solution was to cast to f64 and then clamp. There are other solutions but they're pretty tricky and rust doesn't expose nice APIs for dealing with this well. |
s3bk
commented
Sep 27, 2017
•
|
using 0x7FFF_FF80i32 as upper limit and -0x8000_0000i32 should solve this without casting to f64. |
simonbyrne
commented
Sep 27, 2017
|
I think you mean |
s3bk
commented
Sep 27, 2017
|
as in |
ActuallyaDeviloper
commented
Sep 28, 2017
•
|
I think of all the suggested deterministic options, clamping is geneally most useful one because I think it's done often anyway. If the conversion type would actually be documentated to be saturating, manual clamping would become unnecessary. I am only a little worried about the suggested implementation because it doesn't properly translate to machine instructions and it relies heavily on branching. Branching makes the performance dependent on specific data patterns. In the test cases given above everything looks (comparatively) fast because the same branch is taken always and the processor has good branch prediction data from many previous loop iterations. The real world will probably not look like that. Additionally the branching hurts the ability of the compiler to vectorize the code. I disagree with the opinion of @rkruppe , that the operation shouldn't also be tested in combination with vectorization. Vectorization is important in high performance code and being able to vectorize simple casts on common architectures should be a crucial requirement. For the reasons given above, I played around with an alternative branchless and data flow oriented version of @alexcrichton 's cast with saturation semantics and @simonbyrne 's fix. I implemented it for u16, i16 and i32 since they all have to cover slightly different cases which result in varying performance. The results:
The test was run on an Intel Haswell i5-4570 CPU and Rust 1.22.0-nightly. For the Assmbly comparison on x86: https://godbolt.org/g/AhdF71 Unfortunately I wasn't able to make godbolt generate ARM assembly from Rust, but here is a ARM comparison of the methods with Clang: https://godbolt.org/g/s7ronw |
|
@ActuallyaDeviloper The asm and the benchmark results look very good! Furthermore, branchless code like yours is probably easier to generate in I have a question about PS: To be clear, I was not trying to imply that it's unimportant whether the cast can be vectorized. It clearly is important if the surrounding code is otherwise vectorizable. But scalar performance is also important, as vectorization is often not applicable, and the benchmarks I was commenting on were not making any statement about scalar performance. Out of interest, have you checked the asm of the |
ActuallyaDeviloper
commented
Sep 28, 2017
|
@rkruppe You are right, I accidently swapped the sides of the I updated my previous post. The x86 code did not change significantly at all but unfortunately the change stops LLVM from generating For why it works, notice that the lower boundary value is actually 0 for unsigned values. So NaN and the lower bound can be catched at the same time. The array versions are vectorized. |
|
Re: the ARM asm, I believe the reason
Ohhh, right. Nice. |
I have implemented this and will file a PR once I also fixed #41799 and have a lot more tests. |
bstrie
added
the
A-LLVM
label
Oct 9, 2017
This was referenced Oct 9, 2017
|
#45134 has pointed out a code path that I missed (generation of LLVM constant expressions – this is separate from rustc's own constant evaluation). I'll roll a fix for that into the same PR, but it will take a little while longer. |
rkruppe
referenced this issue
Oct 11, 2017
Merged
Saturating casts between integers and floats #45205
|
Pull request is up: #45205 |
added a commit
that referenced
this issue
Oct 18, 2017
added a commit
that referenced
this issue
Nov 7, 2017
added a commit
that referenced
this issue
Nov 8, 2017
|
#45205 has been merged, so anyone can now (well, starting with the next nightly) measure the performance impact of saturation by passing [1] Strictly speaking, this won't affect the non-generic, non- |
|
@rkruppe I suggest starting an internals/users page to collect data, in the same vein as https://internals.rust-lang.org/t/help-us-benchmark-incremental-compilation/6153/ (we can then also link people to that, rather than some random comments in our issue tracker) |
|
@rkruppe you should create a tracking issue. This discussion is split up into two issues already. That's not good! |
|
@Gankro Yeah I agree, but it may be a few days before I find the time to write that post properly, so I figured I'd solicit feedback from the people subscribed to this issue in the mean time. @est31 Hmm. Although the -Z flag covers both cast directions (which may have been a mistake, in retrospect), it seems unlikely that we'll flip the switch on both at the same time, and there's little overlap between the two in terms of what must be discussed (e.g., this issue hinges on the performance of saturation, while in #41799 it's agreed upon what the right solution is). I've considered a tracking issue for the task of removing the flag once it has outlived its usefulness, but I don't see the need to merging the discussions occuring here and in #41799. |
|
I have drafted up an internals post: https://gist.github.com/Gankro/feab9fb0c42881984caf93c7ad494ebd Feel free to copy that, or just give me notes so I can post it. (note I'm a bit confused about the |
|
One additional tidbit is that the cost of float->int conversions is specific to the current implementation, rather than being fundamental. On x86, |
|
Awesome, thanks a lot! I left a few notes on the gist. After reading this, I'd like to take a stab at separating u128->f32 casts from the -Z flag. Just for the sake of getting rid of the distracting caveat about the flag covering two orthogonal features. |
|
(I've filed #45900 to refocus the -Z flag so that it only covers the float->int issue) |
|
It would be nice if we could get platform-specific implementations a la @sunfishcode (at least for x86) before asking for mass benchmarking. It shouldn't be very difficult. |
|
The problem is that LLVM doesn't currently provide a way to do this, as far as I know, except maybe with inline asm which I wouldn't necessarily recommend for a release. |
|
I have updated the draft to reflect discussion (basically ripping out any inline mention of u128 -> f32 to an extra section at the end). |
|
@sunfishcode Are you sure? Isn't the Here is a playground link that uses it: https://play.rust-lang.org/?gist=33cf9e0871df2eb2475b845af4f1b574&version=nightly In release mode, It even seems to do constant folding correctly. For example, float_to_int_with_intrinsic(42.0)becomes movl $42, %eaxBut an out-of-range value, float_to_int_with_intrinsic(42.0e33)does not get folded: cvttss2si .LCPI2_0(%rip), %eax(Ideally it would fold to constant 0x80000000, but that's no big deal. The important thing is that it doesn't produce undef.) |
|
Oh, cool. It looks like that would work! |
|
It's cool to know that we do, after all, have a way to build on Most people will benchmark on x86, so if we special case x86, we'll get far less data on the general implementation, which will still be used on most other targets. Admittedly, it's already difficult to infer anything about other architectures, but a wholly different implementation makes it outright impossible. Second, if we collect benchmarks now, with the "simple" solution, and find that there are no performance regressions in real code (and tbh that's what I expect), then we don't even need to go through the trouble of trying to optimize this code path further. Finally, I'm not even sure building on
|
|
Is it safe to assume that we're going to need a slew of |
ssokolow
commented
Nov 11, 2017
|
@bstrie I think it'd make more sense, in a case like that, to do something like extending the syntax to As I see it, a forest of |
|
@ssokolow Adding syntax should always be a last resort, especially if all of this can be taken care of with just ten rote functions. Even having a generic |
ssokolow
commented
Nov 11, 2017
|
Point. The turbofish slipped my mind when considering options and, in hindsight, I'm not exactly firing on all cylinders this evening either, so I should have been more cautious about commenting on design decisions. That said, it feels wrong to bake the destination type into the function name... inelegant and a potential burden on future evolution of the language. The turbofish feels like a better option. |
|
A generic method could be supported by a new set of |
|
@bstrie One alternative solution for people whose code got slower could be to use an intrinsic (e.g., via stdsimd) to access the underlying hardware instruction. I argued earlier that this has downsides for the optimizer – auto-vectorization likely suffers, and LLVM can't exploit it returning |
|
Some notes on conversions in the x86 instruction set: SSE2 is actually relatively limited in which conversion operations it gives you. You have:
Each of those has variants for But there is nothing for unsigned integers, nothing for sizes smaller than 32, and if you're on 32-bit x86, nothing for 64-bit. Later instruction set extensions add more functionality, but it seems barely anybody compiles for those. As a result, the existing ('unsafe') behavior:
fn f32_to_u64(f: f32) -> u64 {
const CUTOFF: f32 = 0x8000000000000000 as f32; // 2^63 exactly
if !(f >= CUTOFF) { // less, or NaN
// just use the signed conversion
f as i64 as u64
} else {
0x8000000000000000u64 + ((f - CUTOFF) as i64 as u64)
}
}Unrelated fun fact: "Convert-than-truncate" code generation is what causes the "parallel universes" glitch in Super Mario 64. The collision detection code first MIPS instruction to convert f32 coordinates to i32, then truncates to i16; thus, coordinates that fit in i16 but not i32 'wrap', e.g. going to coordinate 65536.0 gets you collision detection for 0.0. Anyway, conclusions:
is only three instructions (which have decent code size and throughput/latency) and no branches. However:
I suggest a compromise between the two approaches:
let f = if f > 32767.0 { 32767.0 } else { f };
let f = if f < -32768.0 { -32768.0 } else { f };
cvttss2si(f) as i16(For u/i16 and u/i8, the original conversion can be to i32; for f64 to u/i32, it needs to be to i64.)
let r = cvttss2si64(f) as u32;
if f >= 4294967296.0 { 4294967295 } else { r }is only a few instructions and no branches:
let r = cvttss2si64(f);
if f >= 9223372036854775808. {
9223372036854775807
} else if f != f {
0
} else {
r
}This produces a longer (still branchless) sequence:
…but at least we save one comparison compared to the naive approach, as if
|
|
Call for benchmarks is up: https://internals.rust-lang.org/t/help-us-benchmark-saturating-float-casts/6231 |
added a commit
to fst3a/gltf
that referenced
this issue
Nov 14, 2017
matklad
referenced this issue
in neon-bindings/rfcs
Nov 17, 2017
Open
WIP: RFC: ArrayBuffer Views #5
|
In https://internals.rust-lang.org/t/help-us-benchmark-saturating-float-casts/6231/14 someone reported a measurable and significant slowdown on JPEG encoding with the image crate. I've minimized the program so that it's self-contained and mostly focused on the parts that are related to the slowdown: https://gist.github.com/rkruppe/4e7972a209f74654ebd872eb4bc57722 (this program shows ~ 15% slowdown for me with saturating casts). Note that the casts are f32->u8 ( I've also poked at the LLVM IR some -- it appears literally the only difference are the comparisons and selects from the saturating casts. A quick look indicates the asm has corresponding instructions and of course a bunch more live values (which lead to more spills). @comex Do you think f32->u8 and f32->i32 casts can be made measurably faster with CVTTSS2SI? |
thestinger commentedOct 31, 2013
•
Edited 1 time
-
nikomatsakis
May 17, 2017
UPDATE (by @nikomatsakis): After much discussion, we've got the rudiments of a plan for how to address this problem. But we need some help with actually investigating the performance impact and working out the final details!
ORIGINAL ISSUE FOLLOWS: