Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float parsing can fail on valid float literals #31407

Open
rkruppe opened this issue Feb 4, 2016 · 18 comments

Comments

Projects
None yet
@rkruppe
Copy link
Member

commented Feb 4, 2016

That is, it returns Err(..) on inputs with lots of digits and extreme exponents — it doesn't (shouldn't) panic.

This is a limitation of the current code that has been known since that code was originally written, but is quite nontrivial to fix, so I didn't get around to it yet. Basically one would have to implement an algorithm similar to glib's strtod and replace the current slow paths (Algorithm M and possibly Algorithm R) with it. If someone wants to have a crack at it, go ahead, just give me a heads up so we don't duplicate work. See also #27307 for some discussion.

This bug doesn't affect reasonably short (17-ish decimal digits) representations of finite (even subnormal) floats, and neither does it affect most inputs that are rounded down to zero or rounded up to infinity. The only problem are very small exponents with so many integer digits that the number can't just be rounded down to zero and can't be represented with our custom 1280-bit bignums. One example, for f64, is 1234567890123456789012345678901234567890e-340 (which is a finite and normal float, about 1.23456789e-301). This would be annoying on its own, but it also trips up constant evaluation in the compiler (see #31109).

@pnkfelix

This comment has been minimized.

Copy link
Member

commented Feb 4, 2016

cc me

@arielb1

This comment has been minimized.

Copy link
Contributor

commented Feb 4, 2016

can't we just have a slowpath that uses 3072-bit bignums or something?

I mean, for numbers with a very long number of digits, we could:

  • find one of the (2, or 1 if exactly equal) halfway-floating-point-numbers closest to our string. This can be done by truncating the number to 20 or so digits, converting that to a float, and taking the halfway-point above that.
  • compare our string with that halfway-float and use that to pick which float to round it to.

Certainly, at the worst-case, we do need to compare our string with an halfway-float, After all, 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000247032822920623272088284396434110686182529901307162382212792841250337753635104375932649918180817996189898282347722858865463328355177969898199387398005390939063150356595155702263922908583924491051844359318028499365361525003193704576782492193656236698636584807570015857692699037063119282795585513329278343384093519780155312465972635795746227664652728272200563740064854999770965994704540208281662262378573934507363390079677619305775067401763246736009689513405355374585166611342237666786041621596804619144672918403005300575308490487653917113865916462395249126236538818796362393732804238910186723484976682350898633885879256283027559956575244555072551893136908362547791869486679949683240497058210285131854513962138377228261454376934125320985913276672363281249 is zero and 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000247032822920623272088284396434110686182529901307162382212792841250337753635104375932649918180817996189898282347722858865463328355177969898199387398005390939063150356595155702263922908583924491051844359318028499365361525003193704576782492193656236698636584807570015857692699037063119282795585513329278343384093519780155312465972635795746227664652728272200563740064854999770965994704540208281662262378573934507363390079677619305775067401763246736009689513405355374585166611342237666786041621596804619144672918403005300575308490487653917113865916462395249126236538818796362393732804238910186723484976682350898633885879256283027559956575244555072551893136908362547791869486679949683240497058210285131854513962138377228261454376934125320985913276672363281251 is the smallest denormal.

@rkruppe

This comment has been minimized.

Copy link
Member Author

commented Feb 4, 2016

@arielb1 How would you actually implement the second step? Keep in mind that this is in libcore, so we can't use float formatting because that would allocate. (Maybe you could use the underlying functions that are also in libcore, but this probably complicates everything significantly. However I have to admit that I don't even know what the interface of those functions looks like, so maybe it's not that bad.)

@arielb1

This comment has been minimized.

Copy link
Contributor

commented Feb 4, 2016

I actually read the paper, and its Algorithm R is a variant of what I've came up with. I guess we just need to enlarge our bignums to sufficiently-many bits and use sticky bits+rounding (we don't really need 1075 digits = 3571 bits because we can skip the powers-of-2).

@ahrvoje

This comment has been minimized.

Copy link

commented Jan 22, 2017

I've gathered a collection of String to Double cases that proved to be problematic in various projects during the past few decades:
https://github.com/ahrvoje/numerics/blob/master/strtod/strtod_tests.toml

Rust 1.14.0 (e8a0123 2016-12-16) for Windows fails on 17 of 81 conversion tests (C21, C22, C23, C25, C29, C37, C38, C39, C40, C64, C65, C66, C67, C68, C69, C76, C79), with the following being the shortest one (C29):

fn main() {
    println!("{}", 2.47032822920623272e-324);
}

error[E0080]: constant evaluation error
--> src\main.rs:2:20
|
2 | println!("{}", 2.47032822920623272e-324);
| ^^^^^^^^^^^^^^^^^^^^^^^^ unimplemented constant expression: could not evaluate float literal (see issue #31407)

@rkruppe

This comment has been minimized.

Copy link
Member Author

commented Jan 22, 2017

@ahrvoje That's a very nice collection of test cases, thanks for sharing!

@nagisa

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2017

This is a regression from 1.7.0 to 1.8.0. It broke my crate (lol).

@anatol

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2017

@nagisa's crate is math.rs.

@nagisa

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2017

I've nominated this for discussion in the T-libs. This is T-libs area, because error comes from the floating number parsing routines in libstd/core.

This is an annoying regression to me and not a nice regression to have in the compiler overall, as one cannot feed any sort of non-normal floating point numbers to the compiler (compile-time) or the parser (run-time). You can’t do something like *(&0xBITSBITSBITSu32 as *const _ as *const f32) at compile time either, which makes it impossible to construct esoteric floating point numbers at compile time.

@rkruppe

This comment has been minimized.

Copy link
Member Author

commented Feb 26, 2017

@nagisa AFAIK for every floating point value [*] there's a canonical literal that is parsed correctly, it's just that some literals that would also result in the value don't work. Do you have a counter example?

[*] Other than the bazillion NaNs with different payloads, of course, but we don't have literals for them anyway.

@nagisa

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2017

@rkruppe my counter example is that I have no idea what the canonical literals for my denormals are and am in general paranoid enough about floats to always write them down with some extra precision.

That being said, I’m gonna believe you and retract my statement that there’s no way at all to write the denormals down.

@rkruppe

This comment has been minimized.

Copy link
Member Author

commented Feb 26, 2017

If you know the bit pattern you want, you can construct the right float value via transmute and format it (either {} or {:e} should work). The resulting decimal will be as short as possible in the respective format and so the current code should manage to parse it. (This is tested for a couple subnormals in the test suite.)

To be clear, I don't want to deflect attention away from this issue — far from it, I'd really love to see it addressed somehow, and would do so right now I had the time and energy. I'm just trying to understand how it's causing you trouble and offer workarounds in the mean time.

@alexcrichton

This comment has been minimized.

Copy link
Member

commented Feb 28, 2017

discussed during libs triage today conclusion was we'd definitely like to fix but otherwise doesn't seem P-high, so P-medium

@evanw

This comment has been minimized.

Copy link

commented Apr 4, 2018

What about falling back to a C library in the meantime? For example, using https://crates.io/crates/strtod instead of Rust's built-in parser seems to fix this bug for me.

@rkruppe

This comment has been minimized.

Copy link
Member Author

commented Apr 4, 2018

The problem is that float parsing is in core, not just in std, so it can't depend on memory allocation or C libraries.

@oli-obk

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2018

Note that you can now (on nightly and beta) cast bit representations to floats inside constants:

pub static F: f32 = unsafe {
    union Helper {
        bits: u32,
        f: f32,
    }
    Helper { bits: 0x0000_0000 }.f
};
@top1st

This comment has been minimized.

Copy link

commented Nov 20, 2018

image

frewsxcv added a commit to frewsxcv/rust that referenced this issue Nov 25, 2018

Add grammar for {f32,f64}::from_str, mention known bug.
- Original bug about documenting grammar
  - rust-lang#32243
- Known bug with parsing
  - rust-lang#31407

frewsxcv added a commit to frewsxcv/rust that referenced this issue Nov 25, 2018

Add grammar for {f32,f64}::from_str, mention known bug.
- Original bug about documenting grammar
  - rust-lang#32243
- Known bug with parsing
  - rust-lang#31407
@donbright

This comment has been minimized.

Copy link

commented Dec 1, 2018

It seems inconsistent to me that rust discourages thoughtless conversion from 64 bit to 32 bit integers, or thoughtless conversion between signed and unsigned integers, but it allows thoughtless conversion from decimal to binary. Decimal and Binary are not the same set of numbers, just like i32 and u64 are not the same set, just like binary floats and binary integers are not the same set. To be precise, (or, safe, even), maybe there could be a basic decimal number type in the language and decisions could be made about its conversion to other types.

Centril added a commit to Centril/rust that referenced this issue Jan 25, 2019

Rollup merge of rust-lang#56217 - frewsxcv:frewsxcv-float-parse, r=Qu…
…ietMisdreavus

Add grammar in docs for {f32,f64}::from_str, mention known bug.

- Original bug about documenting grammar
  - rust-lang#32243
- Known bug with parsing
  - rust-lang#31407

VardhanThigle pushed a commit to jethrogb/rust that referenced this issue Jan 31, 2019

Add grammar for {f32,f64}::from_str, mention known bug.
- Original bug about documenting grammar
  - rust-lang#32243
- Known bug with parsing
  - rust-lang#31407
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.