Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICE on too long float literals #31109

Closed
panicbit opened this issue Jan 22, 2016 · 14 comments · Fixed by #86761
Closed

ICE on too long float literals #31109

panicbit opened this issue Jan 22, 2016 · 14 comments · Fixed by #86761
Labels
I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️

Comments

@panicbit
Copy link
Contributor

Minimal example:

fn main() {
    let n = 0.3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333;
}
error: internal compiler error: unexpected panic
note: the compiler unexpectedly panicked. this is a bug.
note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
thread 'rustc' panicked at 'index out of bounds: the len is 40 but the index is 40', ../src/libcore/num/bignum.rs:312
stack backtrace:
   1:     0x7ff30d98bbf0 - sys::backtrace::tracing::imp::write::haa19c02b4de52f3bG0t
   2:     0x7ff30d9927d5 - panicking::log_panic::_<closure>::closure.41218
   3:     0x7ff30d99226f - panicking::log_panic::h527fe484e9de8fe1W7x
   4:     0x7ff30d95b8c3 - sys_common::unwind::begin_unwind_inner::h51f64b1a34c60827fTs
   5:     0x7ff30d95c228 - sys_common::unwind::begin_unwind_fmt::h0845853a1913f45blSs
   6:     0x7ff30d989a41 - rust_begin_unwind
   7:     0x7ff30d9da4ff - panicking::panic_fmt::h3967afc085fe8067LFK
   8:     0x7ff30d9da6e2 - panicking::panic_bounds_check::h3ef0001c29fc27faREK
   9:     0x7ff30d9e3b98 - num::bignum::_<impl>::mul_pow2::hf44130abcdb21a20Rcg
  10:     0x7ff30b2ffa36 - middle::const_eval::lit_to_const::hbf4c6ead21bd7490vdn
  11:     0x7ff30b2b1dde - middle::const_eval::eval_const_expr_partial::h88510700fb26a214pcm
  12:     0x7ff30b2f2bb1 - middle::const_eval::eval_const_expr::h20663b302f1cb1c1WPk
  13:     0x7ff30c5c1ece - hair::cx::expr::_<impl>::make_mirror::h291808654ea13ab1aFd
  14:     0x7ff30c59488b - build::into::_<impl>::eval_into::h6cb1b5bbafd636bfTob
  15:     0x7ff30c5a7a11 - build::stmt::_<impl>::stmts::_<closure>::closure.18489
  16:     0x7ff30c580531 - build::block::_<impl>::ast_block::h1282e09e7e411f8fdla
  17:     0x7ff30c57d586 - build::construct::h8a4ec3c0bcbd6db2cca
  18:     0x7ff30c5aa9de - mir_map::_<impl>::visit_fn::h01af1300e9119117Cid
  19:     0x7ff30c5a8a71 - mir_map::_<impl>::visit_item::hcc14ab58582fe303Nfd
  20:     0x7ff30c5a80a3 - mir_map::build_mir_for_crate::h293121b3ad2600dfWdd
  21:     0x7ff30de93a3f - driver::phase_3_run_analysis_passes::_<closure>::closure.25599
  22:     0x7ff30de75813 - middle::ty::context::_<impl>::create_and_enter::create_and_enter::h13716335538836217011
  23:     0x7ff30de71251 - driver::phase_3_run_analysis_passes::h5249241190295309167
  24:     0x7ff30de46559 - driver::compile_input::h58ee58653fb11a3ehca
  25:     0x7ff30de3877b - run_compiler::hd4933b7af1175e4aOwc
  26:     0x7ff30de35436 - sys_common::unwind::try::try_fn::try_fn::h5795434642313485966
  27:     0x7ff30d989888 - __rust_try
  28:     0x7ff30d98118b - sys_common::unwind::try::inner_try::h771fec597ec8f35cNPs
  29:     0x7ff30de35790 - boxed::_<impl>::call_box::call_box::h18189923456789545281
  30:     0x7ff30d990cd3 - sys::thread::_<impl>::new::thread_start::haf749b76474da6b089w
  31:     0x7ff3076474a3 - start_thread
  32:     0x7ff30d62013c - clone
  33:                0x0 - <unknown>
@nagisa nagisa added the I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ label Jan 22, 2016
@hanna-kruppe
Copy link
Contributor

This is a problem with dec2flt, also happens if you try to parse this number normally. I'll investigate.

@hanna-kruppe
Copy link
Contributor

Interestingly, one digit less or a couple digits more don't crash. I assume one of the many checks rejecting inputs that can't be handled is slightly off.

However even if this panic is fixed dec2flt currently can't handle this float literal. It's a long standing issue (not filed AFAIK) to implement a different algorithm that can handle overly long inputs correctly, but in the meantime the const evaluator (in middle:const_eval::lit_to_const) may want to avoid unwraping the parse::<f64> result and instead not const eval the literal if parse returns an error. Glancing at the function signature (I know nothing about const eval) it seems that it's assumed all literals can be const eval'd. Someone who knows that code should chime in and say if this is necessary or can be changed. Without such a change, the program in the issue text still won't compile even when the immediate issue is fixed.

@pnkfelix
Copy link
Member

cc me

@pnkfelix
Copy link
Member

(slightly smaller way to reproduce the problem follows...)

fn main() {
    let mut s = String::new();
    s.push_str("0.");
    for _ in 0..370 {
        s.push_str("3");
    }
    s.parse::<f64>();
}

@hanna-kruppe
Copy link
Contributor

It only panics for between 370 and 375 '3's. This suggests changing the if max_digits < 375 in libcore/num/dec2flt/mod.rs to 369 or such will fix
the issue. But I'd like to unterstand the cause first.

@hanna-kruppe
Copy link
Contributor

A shorter program that doesn't panic in float parsing but still causes an ICE for the reason described above:

fn main() {
    let _: f64 = 1234567890123456789012345678901234567890e-340;
}

@hanna-kruppe
Copy link
Contributor

BTW I have found the cause of the dec2flt panic and fixed it. I'll make a PR once I tested it some more and did something about the const eval unwrap().

Speaking of which: I've been told there is no way around const eval, so just skipping the offending literals is apparently not possible. But I peeked at (non-MIR) trans and it doesn't seem to const eval literals, instead passing the source string to LLVM directly (in rustc_trans::consts::const_lit).

So now I'm trying to modify const eval to give up when str::parse returns Err, which I think would stop all the programs posted in this issue from ICE-ing (and would give an ordinary const eval error if such a literal is used in a context where a constant expression is required). Someone step in and tell me if this is stupid.

cc @oli-obk

@hanna-kruppe
Copy link
Contributor

Aw, turns out failing to const eval a literal will simply lead to a compiler error. So, since it's going to be an error, which is better?

  • Make it a normal error that doesn't crash the compiler, or
  • leave it as an ICE (with span_bug to have the semblance of an error message)

It's not exactly unexpected or a bug in the compiler proper, but it's not really something that should generate a compiler error either.

@oli-obk
Copy link
Contributor

oli-obk commented Jan 27, 2016

We can easily make https://github.com/rust-lang/rust/blob/master/src/librustc/middle/const_eval.rs#L1322-L1346 return a Result. But wouldn't it make more sense to do the change during lowering from ast to hir and add F32 and F64 variants to the literal?

@hanna-kruppe
Copy link
Contributor

I did change lit_to_const to return Result, but that lead to a compiler error. I assume some part of the compiler tries to force const eval of literals (i.e., calls eval_const_expr_partial and errors out if that fails). Perhaps that too can be changed but at this point I'm pretty far out of my depth.

But wouldn't it make more sense to do the change during lowering from ast to hir and add F32 and F64 variants to the literal?

I'm not sure what this has to do with this issue, can you elaborate? This ICE happens because str::parse can't handle some valid literals that are accepted by libsyntax and the rest of the compiler.

@oli-obk
Copy link
Contributor

oli-obk commented Jan 27, 2016

I'm not sure what this has to do with this issue, can you elaborate? This ICE happens because str::parse can't handle some valid literals that are accepted by libsyntax and the rest of the compiler.

I mean that we can attempt to parse and it fails, emit an error and yield a NaN instead. At some point the compiler will stop due to the emitted error. Before that, whatever tries to use the float will get a NaN. It will save all stages using the HIR from having to work with actual textual float literals.

@hanna-kruppe
Copy link
Contributor

That sounds like a sensible change in general, but I still don't see the relation to this issue

  1. It seems to me the same effect (in terms of error messages) could be achieved by just returning NaN from lit_to_const without any changes to HIR?
  2. Carrying on with a plainly wrong value could have all sorts of undesirable side effects (e.g. I'd have to check if it confuses the overflowing_literals lint) and I'm not sure if it has any upsides vs. just emitting a fatal error. Recall that this is a very rare edge case.

@oli-obk
Copy link
Contributor

oli-obk commented Jan 27, 2016

good point. So simply emitting a fatal error in lit_to_const should fix the ICE.

@hanna-kruppe
Copy link
Contributor

I still wonder whether a fatal error (span_fatal) or an ICE with a message (span_bug) is more appropriate. It is a bug after all, but we don't really need more reports of it and there is also precedent for having an error for a temporary limitation (cf. #31232). Leaning towards fatal error now.

hanna-kruppe pushed a commit to hanna-kruppe/rust that referenced this issue Feb 4, 2016
hanna-kruppe pushed a commit to hanna-kruppe/rust that referenced this issue Feb 4, 2016
The code there still triggers an ICE, but for different reasons (const eval unwraps the parse result).
bors added a commit that referenced this issue Feb 6, 2016
Issue #31109 uncovered two semi-related problems:

* A panic in `str::parse::<f64>`
* A panic in `rustc::middle::const_eval::lit_to_const` where the result of float parsing was unwrapped.

This series of commits fixes both issues and also drive-by-fixes some things I noticed while tracking down the parsing panic.
@bors bors closed this as completed in a76cb45 Feb 6, 2016
Alexhuszagh added a commit to Alexhuszagh/rust that referenced this issue Jul 17, 2021
Implementation is based off fast-float-rust, with a few notable changes.

- Some unsafe methods have been removed.
- Safe methods with inherently unsafe functionality have been removed.
- All unsafe functionality is documented and provably safe.
- Extensive documentation has been added for simpler maintenance.
- Inline annotations on internal routines has been removed.
- Fixed Python errors in src/etc/test-float-parse/runtests.py.
- Updated test-float-parse to be a library, to avoid missing rand dependency.
- Added regression tests for rust-lang#31109 and rust-lang#31407 in core tests.
- Added regression tests for rust-lang#31109 and rust-lang#31407 in ui tests.
- Use the existing slice primitive to simplify shared dec2flt methods
- Remove Miri ignores from dec2flt, due to faster parsing times.

- resolves rust-lang#85198
- resolves rust-lang#85214
- resolves rust-lang#85234
- fixes rust-lang#31407
- fixes rust-lang#31109
- fixes rust-lang#53015
- resolves rust-lang#68396
- closes aldanor/fast-float-rust#15
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 17, 2021
Update Rust Float-Parsing Algorithms to use the Eisel-Lemire algorithm.

# Summary

Rust, although it implements a correct float parser, has major performance issues in float parsing. Even for common floats, the performance can be 3-10x [slower](https://arxiv.org/pdf/2101.11408.pdf) than external libraries such as [lexical](https://github.com/Alexhuszagh/rust-lexical) and [fast-float-rust](https://github.com/aldanor/fast-float-rust).

Recently, major advances in float-parsing algorithms have been developed by Daniel Lemire, along with others, and implement a fast, performant, and correct float parser, with speeds up to 1200 MiB/s on Apple's M1 architecture for the [canada](https://github.com/lemire/simple_fastfloat_benchmark/blob/0e2b5d163d4074cc0bde2acdaae78546d6e5c5f1/data/canada.txt) dataset, 10x faster than Rust's 130 MiB/s.

In addition, [edge-cases](rust-lang#85234) in Rust's [dec2flt](https://github.com/rust-lang/rust/tree/868c702d0c9a471a28fb55f0148eb1e3e8b1dcc5/library/core/src/num/dec2flt) algorithm can lead to over a 1600x slowdown relative to efficient algorithms. This is due to the use of Clinger's correct, but slow [AlgorithmM and Bellepheron](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.4152&rep=rep1&type=pdf), which have been improved by faster big-integer algorithms and the Eisel-Lemire algorithm, respectively.

Finally, this algorithm provides substantial improvements in the number of floats the Rust core library can parse. Denormal floats with a large number of digits cannot be parsed, due to use of the `Big32x40`, which simply does not have enough digits to round a float correctly. Using a custom decimal class, with much simpler logic, we can parse all valid decimal strings of any digit count.

```rust
// Issue in Rust's dec2fly.
"2.47032822920623272088284396434110686182e-324".parse::<f64>();   // Err(ParseFloatError { kind: Invalid })
```

# Solution

This pull request implements the Eisel-Lemire algorithm, modified from [fast-float-rust](https://github.com/aldanor/fast-float-rust) (which is licensed under Apache 2.0/MIT), along with numerous modifications to make it more amenable to inclusion in the Rust core library. The following describes both features in fast-float-rust and improvements in fast-float-rust for inclusion in core.

**Documentation**

Extensive documentation has been added to ensure the code base may be maintained by others, which explains the algorithms as well as various associated constants and routines. For example, two seemingly magical constants include documentation to describe how they were derived as follows:

```rust
    // Round-to-even only happens for negative values of q
    // when q ≥ −4 in the 64-bit case and when q ≥ −17 in
    // the 32-bitcase.
    //
    // When q ≥ 0,we have that 5^q ≤ 2m+1. In the 64-bit case,we
    // have 5^q ≤ 2m+1 ≤ 2^54 or q ≤ 23. In the 32-bit case,we have
    // 5^q ≤ 2m+1 ≤ 2^25 or q ≤ 10.
    //
    // When q < 0, we have w ≥ (2m+1)×5^−q. We must have that w < 2^64
    // so (2m+1)×5^−q < 2^64. We have that 2m+1 > 2^53 (64-bit case)
    // or 2m+1 > 2^24 (32-bit case). Hence,we must have 2^53×5^−q < 2^64
    // (64-bit) and 2^24×5^−q < 2^64 (32-bit). Hence we have 5^−q < 2^11
    // or q ≥ −4 (64-bit case) and 5^−q < 2^40 or q ≥ −17 (32-bitcase).
    //
    // Thus we have that we only need to round ties to even when
    // we have that q ∈ [−4,23](in the 64-bit case) or q∈[−17,10]
    // (in the 32-bit case). In both cases,the power of five(5^|q|)
    // fits in a 64-bit word.
    const MIN_EXPONENT_ROUND_TO_EVEN: i32;
    const MAX_EXPONENT_ROUND_TO_EVEN: i32;
```

This ensures maintainability of the code base.

**Improvements for Disguised Fast-Path Cases**

The fast path in float parsing algorithms attempts to use native, machine floats to represent both the significant digits and the exponent, which is only possible if both can be exactly represented without rounding. In practice, this means that the significant digits must be 53-bits or less and the then exponent must be in the range `[-22, 22]` (for an f64). This is similar to the existing dec2flt implementation.

However, disguised fast-path cases exist, where there are few significant digits and an exponent above the valid range, such as `1.23e25`. In this case, powers-of-10 may be shifted from the exponent to the significant digits, discussed at length in rust-lang#85198.

**Digit Parsing Improvements**

Typically, integers are parsed from string 1-at-a-time, requiring unnecessary multiplications which can slow down parsing. An approach to parse 8 digits at a time using only 3 multiplications is described in length [here](https://johnnylee-sde.github.io/Fast-numeric-string-to-int/). This leads to significant performance improvements, and is implemented for both big and little-endian systems.

**Unsafe Changes**

Relative to fast-float-rust, this library makes less use of unsafe functionality and clearly documents it. This includes the refactoring and documentation of numerous unsafe methods undesirably marked as safe. The original code would look something like this, which is deceptively marked as safe for unsafe functionality.

```rust
impl AsciiStr {
    #[inline]
    pub fn step_by(&mut self, n: usize) -> &mut Self {
        unsafe { self.ptr = self.ptr.add(n) };
        self
    }
}

...

#[inline]
fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 {
    // the first character is 'e'/'E' and scientific mode is enabled
    let start = *s;
    s.step();
    ...
}
```

The new code clearly documents safety concerns, and does not mark unsafe functionality as safe, leading to better safety guarantees.

```rust
impl AsciiStr {
    /// Advance the view by n, advancing it in-place to (n..).
    pub unsafe fn step_by(&mut self, n: usize) -> &mut Self {
        // SAFETY: same as step_by, safe as long n is less than the buffer length
        self.ptr = unsafe { self.ptr.add(n) };
        self
    }
}

...

/// Parse the scientific notation component of a float.
fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 {
    let start = *s;
    // SAFETY: the first character is 'e'/'E' and scientific mode is enabled
    unsafe {
        s.step();
    }
    ...
}
```

This allows us to trivially demonstrate the new implementation of dec2flt is safe.

**Inline Annotations Have Been Removed**

In the previous implementation of dec2flt, inline annotations exist practically nowhere in the entire module. Therefore, these annotations have been removed, which mostly does not impact [performance](aldanor/fast-float-rust#15 (comment)).

**Fixed Correctness Tests**

Numerous compile errors in `src/etc/test-float-parse` were present, due to deprecation of `time.clock()`, as well as the crate dependencies with `rand`. The tests have therefore been reworked as a [crate](https://github.com/Alexhuszagh/rust/tree/master/src/etc/test-float-parse), and any errors in `runtests.py` have been patched.

**Undefined Behavior**

An implementation of `check_len` which relied on undefined behavior (in fast-float-rust) has been refactored, to ensure that the behavior is well-defined. The original code is as follows:

```rust
    #[inline]
    pub fn check_len(&self, n: usize) -> bool {
        unsafe { self.ptr.add(n) <= self.end }
    }
```

And the new implementation is as follows:

```rust
    /// Check if the slice at least `n` length.
    fn check_len(&self, n: usize) -> bool {
        n <= self.as_ref().len()
    }
```

Note that this has since been fixed in [fast-float-rust](aldanor/fast-float-rust#29).

**Inferring Binary Exponents**

Rather than explicitly store binary exponents, this new implementation infers them from the decimal exponent, reducing the amount of static storage required. This removes the requirement to store [611 i16s](https://github.com/rust-lang/rust/blob/868c702d0c9a471a28fb55f0148eb1e3e8b1dcc5/library/core/src/num/dec2flt/table.rs#L8).

# Code Size

The code size, for all optimizations, does not considerably change relative to before for stripped builds, however it is **significantly** smaller prior to stripping the resulting binaries. These binary sizes were calculated on x86_64-unknown-linux-gnu.

**new**

Using rustc version 1.55.0-dev.

opt-level|size|size(stripped)
|:-:|:-:|:-:|
0|400k|300K
1|396k|292K
2|392k|292K
3|392k|296K
s|396k|292K
z|396k|292K

**old**

Using rustc version 1.53.0-nightly.

opt-level|size|size(stripped)
|:-:|:-:|:-:|
0|3.2M|304K
1|3.2M|292K
2|3.1M|284K
3|3.1M|284K
s|3.1M|284K
z|3.1M|284K

# Correctness

The dec2flt implementation passes all of Rust's unittests and comprehensive float parsing tests, along with numerous other tests such as Nigel Toa's comprehensive float [tests](https://github.com/nigeltao/parse-number-fxx-test-data) and Hrvoje Abraham  [strtod_tests](https://github.com/ahrvoje/numerics/blob/master/strtod/strtod_tests.toml). Therefore, it is unlikely that this algorithm will incorrectly round parsed floats.

# Issues Addressed

This will fix and close the following issues:

- resolves rust-lang#85198
- resolves rust-lang#85214
- resolves rust-lang#85234
- fixes rust-lang#31407
- fixes rust-lang#31109
- fixes rust-lang#53015
- resolves rust-lang#68396
- closes aldanor/fast-float-rust#15
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Apr 14, 2023
Remove obsolete test case

This test case was supposed to cover issue rust-lang#31109 at some point.
It never did anything, as the issue was still open at the time of its creation.
When the issue was resolved, the `issue31109` test case was created,
making the existence of this test pointless.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Apr 14, 2023
Remove obsolete test case

This test case was supposed to cover issue rust-lang#31109 at some point.
It never did anything, as the issue was still open at the time of its creation.
When the issue was resolved, the `issue31109` test case was created,
making the existence of this test pointless.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Apr 14, 2023
Remove obsolete test case

This test case was supposed to cover issue rust-lang#31109 at some point.
It never did anything, as the issue was still open at the time of its creation.
When the issue was resolved, the `issue31109` test case was created,
making the existence of this test pointless.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants