ICE on too long float literals #31109

panicbit · 2016-01-22T13:52:41Z

Minimal example:

fn main() {
    let n = 0.3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333;
}

error: internal compiler error: unexpected panic
note: the compiler unexpectedly panicked. this is a bug.
note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
thread 'rustc' panicked at 'index out of bounds: the len is 40 but the index is 40', ../src/libcore/num/bignum.rs:312
stack backtrace:
   1:     0x7ff30d98bbf0 - sys::backtrace::tracing::imp::write::haa19c02b4de52f3bG0t
   2:     0x7ff30d9927d5 - panicking::log_panic::_<closure>::closure.41218
   3:     0x7ff30d99226f - panicking::log_panic::h527fe484e9de8fe1W7x
   4:     0x7ff30d95b8c3 - sys_common::unwind::begin_unwind_inner::h51f64b1a34c60827fTs
   5:     0x7ff30d95c228 - sys_common::unwind::begin_unwind_fmt::h0845853a1913f45blSs
   6:     0x7ff30d989a41 - rust_begin_unwind
   7:     0x7ff30d9da4ff - panicking::panic_fmt::h3967afc085fe8067LFK
   8:     0x7ff30d9da6e2 - panicking::panic_bounds_check::h3ef0001c29fc27faREK
   9:     0x7ff30d9e3b98 - num::bignum::_<impl>::mul_pow2::hf44130abcdb21a20Rcg
  10:     0x7ff30b2ffa36 - middle::const_eval::lit_to_const::hbf4c6ead21bd7490vdn
  11:     0x7ff30b2b1dde - middle::const_eval::eval_const_expr_partial::h88510700fb26a214pcm
  12:     0x7ff30b2f2bb1 - middle::const_eval::eval_const_expr::h20663b302f1cb1c1WPk
  13:     0x7ff30c5c1ece - hair::cx::expr::_<impl>::make_mirror::h291808654ea13ab1aFd
  14:     0x7ff30c59488b - build::into::_<impl>::eval_into::h6cb1b5bbafd636bfTob
  15:     0x7ff30c5a7a11 - build::stmt::_<impl>::stmts::_<closure>::closure.18489
  16:     0x7ff30c580531 - build::block::_<impl>::ast_block::h1282e09e7e411f8fdla
  17:     0x7ff30c57d586 - build::construct::h8a4ec3c0bcbd6db2cca
  18:     0x7ff30c5aa9de - mir_map::_<impl>::visit_fn::h01af1300e9119117Cid
  19:     0x7ff30c5a8a71 - mir_map::_<impl>::visit_item::hcc14ab58582fe303Nfd
  20:     0x7ff30c5a80a3 - mir_map::build_mir_for_crate::h293121b3ad2600dfWdd
  21:     0x7ff30de93a3f - driver::phase_3_run_analysis_passes::_<closure>::closure.25599
  22:     0x7ff30de75813 - middle::ty::context::_<impl>::create_and_enter::create_and_enter::h13716335538836217011
  23:     0x7ff30de71251 - driver::phase_3_run_analysis_passes::h5249241190295309167
  24:     0x7ff30de46559 - driver::compile_input::h58ee58653fb11a3ehca
  25:     0x7ff30de3877b - run_compiler::hd4933b7af1175e4aOwc
  26:     0x7ff30de35436 - sys_common::unwind::try::try_fn::try_fn::h5795434642313485966
  27:     0x7ff30d989888 - __rust_try
  28:     0x7ff30d98118b - sys_common::unwind::try::inner_try::h771fec597ec8f35cNPs
  29:     0x7ff30de35790 - boxed::_<impl>::call_box::call_box::h18189923456789545281
  30:     0x7ff30d990cd3 - sys::thread::_<impl>::new::thread_start::haf749b76474da6b089w
  31:     0x7ff3076474a3 - start_thread
  32:     0x7ff30d62013c - clone
  33:                0x0 - <unknown>

The text was updated successfully, but these errors were encountered:

hanna-kruppe · 2016-01-22T16:32:44Z

This is a problem with dec2flt, also happens if you try to parse this number normally. I'll investigate.

hanna-kruppe · 2016-01-22T16:42:30Z

Interestingly, one digit less or a couple digits more don't crash. I assume one of the many checks rejecting inputs that can't be handled is slightly off.

However even if this panic is fixed dec2flt currently can't handle this float literal. It's a long standing issue (not filed AFAIK) to implement a different algorithm that can handle overly long inputs correctly, but in the meantime the const evaluator (in middle:const_eval::lit_to_const) may want to avoid unwraping the parse::<f64> result and instead not const eval the literal if parse returns an error. Glancing at the function signature (I know nothing about const eval) it seems that it's assumed all literals can be const eval'd. Someone who knows that code should chime in and say if this is necessary or can be changed. Without such a change, the program in the issue text still won't compile even when the immediate issue is fixed.

pnkfelix · 2016-01-22T16:47:26Z

cc me

pnkfelix · 2016-01-22T16:54:05Z

(slightly smaller way to reproduce the problem follows...)

fn main() {
    let mut s = String::new();
    s.push_str("0.");
    for _ in 0..370 {
        s.push_str("3");
    }
    s.parse::<f64>();
}

hanna-kruppe · 2016-01-22T17:57:38Z

It only panics for between 370 and 375 '3's. This suggests changing the if max_digits < 375 in libcore/num/dec2flt/mod.rs to 369 or such will fix
the issue. But I'd like to unterstand the cause first.

hanna-kruppe · 2016-01-26T23:32:52Z

A shorter program that doesn't panic in float parsing but still causes an ICE for the reason described above:

fn main() {
    let _: f64 = 1234567890123456789012345678901234567890e-340;
}

hanna-kruppe · 2016-01-27T00:01:51Z

BTW I have found the cause of the dec2flt panic and fixed it. I'll make a PR once I tested it some more and did something about the const eval unwrap().

Speaking of which: I've been told there is no way around const eval, so just skipping the offending literals is apparently not possible. But I peeked at (non-MIR) trans and it doesn't seem to const eval literals, instead passing the source string to LLVM directly (in rustc_trans::consts::const_lit).

So now I'm trying to modify const eval to give up when str::parse returns Err, which I think would stop all the programs posted in this issue from ICE-ing (and would give an ordinary const eval error if such a literal is used in a context where a constant expression is required). Someone step in and tell me if this is stupid.

cc @oli-obk

hanna-kruppe · 2016-01-27T00:30:30Z

Aw, turns out failing to const eval a literal will simply lead to a compiler error. So, since it's going to be an error, which is better?

Make it a normal error that doesn't crash the compiler, or
leave it as an ICE (with span_bug to have the semblance of an error message)

It's not exactly unexpected or a bug in the compiler proper, but it's not really something that should generate a compiler error either.

oli-obk · 2016-01-27T08:12:32Z

We can easily make https://github.com/rust-lang/rust/blob/master/src/librustc/middle/const_eval.rs#L1322-L1346 return a Result. But wouldn't it make more sense to do the change during lowering from ast to hir and add F32 and F64 variants to the literal?

hanna-kruppe · 2016-01-27T09:14:49Z

I did change lit_to_const to return Result, but that lead to a compiler error. I assume some part of the compiler tries to force const eval of literals (i.e., calls eval_const_expr_partial and errors out if that fails). Perhaps that too can be changed but at this point I'm pretty far out of my depth.

But wouldn't it make more sense to do the change during lowering from ast to hir and add F32 and F64 variants to the literal?

I'm not sure what this has to do with this issue, can you elaborate? This ICE happens because str::parse can't handle some valid literals that are accepted by libsyntax and the rest of the compiler.

oli-obk · 2016-01-27T09:33:09Z

I'm not sure what this has to do with this issue, can you elaborate? This ICE happens because str::parse can't handle some valid literals that are accepted by libsyntax and the rest of the compiler.

I mean that we can attempt to parse and it fails, emit an error and yield a NaN instead. At some point the compiler will stop due to the emitted error. Before that, whatever tries to use the float will get a NaN. It will save all stages using the HIR from having to work with actual textual float literals.

hanna-kruppe · 2016-01-27T09:47:24Z

That sounds like a sensible change in general, but I still don't see the relation to this issue

It seems to me the same effect (in terms of error messages) could be achieved by just returning NaN from lit_to_const without any changes to HIR?
Carrying on with a plainly wrong value could have all sorts of undesirable side effects (e.g. I'd have to check if it confuses the overflowing_literals lint) and I'm not sure if it has any upsides vs. just emitting a fatal error. Recall that this is a very rare edge case.

oli-obk · 2016-01-27T09:53:13Z

good point. So simply emitting a fatal error in lit_to_const should fix the ICE.

hanna-kruppe · 2016-01-27T10:14:32Z

I still wonder whether a fatal error (span_fatal) or an ICE with a message (span_bug) is more appropriate. It is a bug after all, but we don't really need more reports of it and there is also precedent for having an error for a temporary limitation (cf. #31232). Leaning towards fatal error now.

…t run by default)

The code there still triggers an ICE, but for different reasons (const eval unwraps the parse result).

Issue #31109 uncovered two semi-related problems: * A panic in `str::parse::<f64>` * A panic in `rustc::middle::const_eval::lit_to_const` where the result of float parsing was unwrapped. This series of commits fixes both issues and also drive-by-fixes some things I noticed while tracking down the parsing panic.

Implementation is based off fast-float-rust, with a few notable changes. - Some unsafe methods have been removed. - Safe methods with inherently unsafe functionality have been removed. - All unsafe functionality is documented and provably safe. - Extensive documentation has been added for simpler maintenance. - Inline annotations on internal routines has been removed. - Fixed Python errors in src/etc/test-float-parse/runtests.py. - Updated test-float-parse to be a library, to avoid missing rand dependency. - Added regression tests for rust-lang#31109 and rust-lang#31407 in core tests. - Added regression tests for rust-lang#31109 and rust-lang#31407 in ui tests. - Use the existing slice primitive to simplify shared dec2flt methods - Remove Miri ignores from dec2flt, due to faster parsing times. - resolves rust-lang#85198 - resolves rust-lang#85214 - resolves rust-lang#85234 - fixes rust-lang#31407 - fixes rust-lang#31109 - fixes rust-lang#53015 - resolves rust-lang#68396 - closes aldanor/fast-float-rust#15

Update Rust Float-Parsing Algorithms to use the Eisel-Lemire algorithm. # Summary Rust, although it implements a correct float parser, has major performance issues in float parsing. Even for common floats, the performance can be 3-10x [slower](https://arxiv.org/pdf/2101.11408.pdf) than external libraries such as [lexical](https://github.com/Alexhuszagh/rust-lexical) and [fast-float-rust](https://github.com/aldanor/fast-float-rust). Recently, major advances in float-parsing algorithms have been developed by Daniel Lemire, along with others, and implement a fast, performant, and correct float parser, with speeds up to 1200 MiB/s on Apple's M1 architecture for the [canada](https://github.com/lemire/simple_fastfloat_benchmark/blob/0e2b5d163d4074cc0bde2acdaae78546d6e5c5f1/data/canada.txt) dataset, 10x faster than Rust's 130 MiB/s. In addition, [edge-cases](rust-lang#85234) in Rust's [dec2flt](https://github.com/rust-lang/rust/tree/868c702d0c9a471a28fb55f0148eb1e3e8b1dcc5/library/core/src/num/dec2flt) algorithm can lead to over a 1600x slowdown relative to efficient algorithms. This is due to the use of Clinger's correct, but slow [AlgorithmM and Bellepheron](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.4152&rep=rep1&type=pdf), which have been improved by faster big-integer algorithms and the Eisel-Lemire algorithm, respectively. Finally, this algorithm provides substantial improvements in the number of floats the Rust core library can parse. Denormal floats with a large number of digits cannot be parsed, due to use of the `Big32x40`, which simply does not have enough digits to round a float correctly. Using a custom decimal class, with much simpler logic, we can parse all valid decimal strings of any digit count. ```rust // Issue in Rust's dec2fly. "2.47032822920623272088284396434110686182e-324".parse::<f64>(); // Err(ParseFloatError { kind: Invalid }) ``` # Solution This pull request implements the Eisel-Lemire algorithm, modified from [fast-float-rust](https://github.com/aldanor/fast-float-rust) (which is licensed under Apache 2.0/MIT), along with numerous modifications to make it more amenable to inclusion in the Rust core library. The following describes both features in fast-float-rust and improvements in fast-float-rust for inclusion in core. **Documentation** Extensive documentation has been added to ensure the code base may be maintained by others, which explains the algorithms as well as various associated constants and routines. For example, two seemingly magical constants include documentation to describe how they were derived as follows: ```rust // Round-to-even only happens for negative values of q // when q ≥ −4 in the 64-bit case and when q ≥ −17 in // the 32-bitcase. // // When q ≥ 0,we have that 5^q ≤ 2m+1. In the 64-bit case,we // have 5^q ≤ 2m+1 ≤ 2^54 or q ≤ 23. In the 32-bit case,we have // 5^q ≤ 2m+1 ≤ 2^25 or q ≤ 10. // // When q < 0, we have w ≥ (2m+1)×5^−q. We must have that w < 2^64 // so (2m+1)×5^−q < 2^64. We have that 2m+1 > 2^53 (64-bit case) // or 2m+1 > 2^24 (32-bit case). Hence,we must have 2^53×5^−q < 2^64 // (64-bit) and 2^24×5^−q < 2^64 (32-bit). Hence we have 5^−q < 2^11 // or q ≥ −4 (64-bit case) and 5^−q < 2^40 or q ≥ −17 (32-bitcase). // // Thus we have that we only need to round ties to even when // we have that q ∈ [−4,23](in the 64-bit case) or q∈[−17,10] // (in the 32-bit case). In both cases,the power of five(5^|q|) // fits in a 64-bit word. const MIN_EXPONENT_ROUND_TO_EVEN: i32; const MAX_EXPONENT_ROUND_TO_EVEN: i32; ``` This ensures maintainability of the code base. **Improvements for Disguised Fast-Path Cases** The fast path in float parsing algorithms attempts to use native, machine floats to represent both the significant digits and the exponent, which is only possible if both can be exactly represented without rounding. In practice, this means that the significant digits must be 53-bits or less and the then exponent must be in the range `[-22, 22]` (for an f64). This is similar to the existing dec2flt implementation. However, disguised fast-path cases exist, where there are few significant digits and an exponent above the valid range, such as `1.23e25`. In this case, powers-of-10 may be shifted from the exponent to the significant digits, discussed at length in rust-lang#85198. **Digit Parsing Improvements** Typically, integers are parsed from string 1-at-a-time, requiring unnecessary multiplications which can slow down parsing. An approach to parse 8 digits at a time using only 3 multiplications is described in length [here](https://johnnylee-sde.github.io/Fast-numeric-string-to-int/). This leads to significant performance improvements, and is implemented for both big and little-endian systems. **Unsafe Changes** Relative to fast-float-rust, this library makes less use of unsafe functionality and clearly documents it. This includes the refactoring and documentation of numerous unsafe methods undesirably marked as safe. The original code would look something like this, which is deceptively marked as safe for unsafe functionality. ```rust impl AsciiStr { #[inline] pub fn step_by(&mut self, n: usize) -> &mut Self { unsafe { self.ptr = self.ptr.add(n) }; self } } ... #[inline] fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 { // the first character is 'e'/'E' and scientific mode is enabled let start = *s; s.step(); ... } ``` The new code clearly documents safety concerns, and does not mark unsafe functionality as safe, leading to better safety guarantees. ```rust impl AsciiStr { /// Advance the view by n, advancing it in-place to (n..). pub unsafe fn step_by(&mut self, n: usize) -> &mut Self { // SAFETY: same as step_by, safe as long n is less than the buffer length self.ptr = unsafe { self.ptr.add(n) }; self } } ... /// Parse the scientific notation component of a float. fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 { let start = *s; // SAFETY: the first character is 'e'/'E' and scientific mode is enabled unsafe { s.step(); } ... } ``` This allows us to trivially demonstrate the new implementation of dec2flt is safe. **Inline Annotations Have Been Removed** In the previous implementation of dec2flt, inline annotations exist practically nowhere in the entire module. Therefore, these annotations have been removed, which mostly does not impact [performance](aldanor/fast-float-rust#15 (comment)). **Fixed Correctness Tests** Numerous compile errors in `src/etc/test-float-parse` were present, due to deprecation of `time.clock()`, as well as the crate dependencies with `rand`. The tests have therefore been reworked as a [crate](https://github.com/Alexhuszagh/rust/tree/master/src/etc/test-float-parse), and any errors in `runtests.py` have been patched. **Undefined Behavior** An implementation of `check_len` which relied on undefined behavior (in fast-float-rust) has been refactored, to ensure that the behavior is well-defined. The original code is as follows: ```rust #[inline] pub fn check_len(&self, n: usize) -> bool { unsafe { self.ptr.add(n) <= self.end } } ``` And the new implementation is as follows: ```rust /// Check if the slice at least `n` length. fn check_len(&self, n: usize) -> bool { n <= self.as_ref().len() } ``` Note that this has since been fixed in [fast-float-rust](aldanor/fast-float-rust#29). **Inferring Binary Exponents** Rather than explicitly store binary exponents, this new implementation infers them from the decimal exponent, reducing the amount of static storage required. This removes the requirement to store [611 i16s](https://github.com/rust-lang/rust/blob/868c702d0c9a471a28fb55f0148eb1e3e8b1dcc5/library/core/src/num/dec2flt/table.rs#L8). # Code Size The code size, for all optimizations, does not considerably change relative to before for stripped builds, however it is **significantly** smaller prior to stripping the resulting binaries. These binary sizes were calculated on x86_64-unknown-linux-gnu. **new** Using rustc version 1.55.0-dev. opt-level|size|size(stripped) |:-:|:-:|:-:| 0|400k|300K 1|396k|292K 2|392k|292K 3|392k|296K s|396k|292K z|396k|292K **old** Using rustc version 1.53.0-nightly. opt-level|size|size(stripped) |:-:|:-:|:-:| 0|3.2M|304K 1|3.2M|292K 2|3.1M|284K 3|3.1M|284K s|3.1M|284K z|3.1M|284K # Correctness The dec2flt implementation passes all of Rust's unittests and comprehensive float parsing tests, along with numerous other tests such as Nigel Toa's comprehensive float [tests](https://github.com/nigeltao/parse-number-fxx-test-data) and Hrvoje Abraham [strtod_tests](https://github.com/ahrvoje/numerics/blob/master/strtod/strtod_tests.toml). Therefore, it is unlikely that this algorithm will incorrectly round parsed floats. # Issues Addressed This will fix and close the following issues: - resolves rust-lang#85198 - resolves rust-lang#85214 - resolves rust-lang#85234 - fixes rust-lang#31407 - fixes rust-lang#31109 - fixes rust-lang#53015 - resolves rust-lang#68396 - closes aldanor/fast-float-rust#15

Remove obsolete test case This test case was supposed to cover issue rust-lang#31109 at some point. It never did anything, as the issue was still open at the time of its creation. When the issue was resolved, the `issue31109` test case was created, making the existence of this test pointless.

nagisa added the I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ label Jan 22, 2016

hanna-kruppe mentioned this issue Feb 4, 2016

Float parsing can fail on valid float literals #31407

Closed

hanna-kruppe pushed a commit to hanna-kruppe/rust that referenced this issue Feb 4, 2016

Add the kind of input from rust-lang#31109 to the expensive tests (no…

a8dc1f9

…t run by default)

hanna-kruppe pushed a commit to hanna-kruppe/rust that referenced this issue Feb 4, 2016

Prevent the immediate panic uncovered by rust-lang#31109 and add a test.

af5d574

The code there still triggers an ICE, but for different reasons (const eval unwraps the parse result).

hanna-kruppe mentioned this issue Feb 4, 2016

Fix issue 31109 #31410

Merged

bors closed this as completed in a76cb45 Feb 6, 2016

jyn514 mentioned this issue Jul 2, 2021

Update Rust Float-Parsing Algorithms to use the Eisel-Lemire algorithm. #86761

Merged

TDecking mentioned this issue Apr 10, 2023

Remove obsolete test case #110158

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ICE on too long float literals #31109

ICE on too long float literals #31109

panicbit commented Jan 22, 2016

hanna-kruppe commented Jan 22, 2016

hanna-kruppe commented Jan 22, 2016

pnkfelix commented Jan 22, 2016

pnkfelix commented Jan 22, 2016

hanna-kruppe commented Jan 22, 2016

hanna-kruppe commented Jan 26, 2016

hanna-kruppe commented Jan 27, 2016

hanna-kruppe commented Jan 27, 2016

oli-obk commented Jan 27, 2016

hanna-kruppe commented Jan 27, 2016

oli-obk commented Jan 27, 2016

hanna-kruppe commented Jan 27, 2016

oli-obk commented Jan 27, 2016

hanna-kruppe commented Jan 27, 2016

ICE on too long float literals #31109

ICE on too long float literals #31109

Comments

panicbit commented Jan 22, 2016

hanna-kruppe commented Jan 22, 2016

hanna-kruppe commented Jan 22, 2016

pnkfelix commented Jan 22, 2016

pnkfelix commented Jan 22, 2016

hanna-kruppe commented Jan 22, 2016

hanna-kruppe commented Jan 26, 2016

hanna-kruppe commented Jan 27, 2016

hanna-kruppe commented Jan 27, 2016

oli-obk commented Jan 27, 2016

hanna-kruppe commented Jan 27, 2016

oli-obk commented Jan 27, 2016

hanna-kruppe commented Jan 27, 2016

oli-obk commented Jan 27, 2016

hanna-kruppe commented Jan 27, 2016