Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upFloat printing and/or parsing is inaccurate #24557
Comments
rkruppe
referenced this issue
Apr 18, 2015
Closed
Default float formatting is too narrow-minded #24556
This comment has been minimized.
This comment has been minimized.
|
potential dupe of #7030 and/or #7648 . I've been meaning to try to look at this for a long long time; for the most part we've always just triaged these issues as "just a bug", but that's not a reason to not try to fix them, if someone has time. I am very familiar with Will Clinger's paper, as he was my Ph.D advisor -- though floating-point parsing was not my topic area -- and I have seen him present the material more than once. Also relevant are the other papers I listed on my comment here: and possibly also relevant is Steele and White, also from PLDI 1990, "How to Print Floating Point Numbers Accurately" and possibly also this recent paper: |
This comment has been minimized.
This comment has been minimized.
|
Thanks for the dupes, I'm useless with GitHub's search. If you check out the companion #24556 you'll see that lifthrasiir has an implementation (combining Loitsch's algorithm and the algorithm of Steele and White) and IIUC is intending to contribute it. I'm very interested in seeing this fixed, so I'll try implementing Clinger's algorithm over the next weeks. |
This comment has been minimized.
This comment has been minimized.
|
I only just realized that there are actually functions in the standard libraries that display in bases other than ten. They all seem to be deprecated, though. Is this a thing that should be supported indefinitely, or is this slated for removal in the near future? That would be a bummer, since (as discussed in one of the other issues) a lot of the research focuses on base 10. |
steveklabnik
added
the
A-libs
label
Apr 18, 2015
This comment has been minimized.
This comment has been minimized.
|
cc @lifthrasiir, because you were working on making conversions correct AFAIR. |
lifthrasiir
referenced this issue
Apr 19, 2015
Merged
New floating-to-decimal formatting routine #24612
This comment has been minimized.
This comment has been minimized.
|
I originally planned to work on @rkruppe Base 10 is not that special, but many practical algorithms rely on the precalculated table which is base-specific. It is not reasonable to ship all tables for possible bases. |
This comment has been minimized.
This comment has been minimized.
|
I just ran into this issue. Printing a |
This comment has been minimized.
This comment has been minimized.
|
@kballard that should be fixed by @lifthrasiir 's PR #24612, at least according to my understanding of the algorithms and the comment here |
This comment has been minimized.
This comment has been minimized.
|
@pnkfelix It certainly sounds like it should. |
bors
added a commit
that referenced
this issue
May 9, 2015
bors
added a commit
that referenced
this issue
May 9, 2015
lifthrasiir
referenced this issue
May 31, 2015
Closed
from_str_radix_float gives incorrect (/imprecise) results on beta and nightly #25895
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
triage: P-medium Adopting the priority from #7648. |
rust-highfive
added
the
P-medium
label
May 31, 2015
This comment has been minimized.
This comment has been minimized.
|
This remains an issue. My current theory is that while we greatly improved our float2decimal conversion, we have not yet done anything to address our float-parsing routines (i.e. decimal2float conversion). Here is some new demo code that I think makes this pretty clear. (I got rid of the random number generation because frankly I don't know how to do that on nightly anymore without using cargo -- and in any case, these deterministic tests should be easy to understand. https://play.rust-lang.org/?gist=7858323861f542d64e34 Update: I also need to review @kballard 's argument about about the printing of |
This comment has been minimized.
This comment has been minimized.
|
I am working on dec2flt. I have transcribed the algorithms (R and Bellerophon) from Clinger's paper and implemented most helper routines that were missing. It compiles and said helper routines pass some tests but the whole thing is still missing some pieces before I can start running it against well-behaved inputs. I hope to finish this part next weekend. Before I file a PR I also want to implement correct handling of overflowing, almost-overflowing, underflowing, subnormal and almost-subnormal numbers. This may turn out to be more tricky (the paper mentions this more as an afterthought) but I have a pretty clear plan of attack. Regarding 0.6: No, 0.60...0 is correct. It's not what Python outputs for |
This comment has been minimized.
This comment has been minimized.
I suspect that we don't handle any of these properly now, so I would be very happy with a PR where these weren't horribly-wrong (even if not fully correct). E.g. subnormals: fn main(){
let x = 1e-308 / 1e15;
println!("{}", x == 0.0);
println!("{:e}", x);
}
|
This comment has been minimized.
This comment has been minimized.
|
To be clear, the current state of my code is that things go horribly wrong when fed such numbers. There may be a relatively easy fix that's somewhat correct, but I'm not sure and (now that I've spent quite a bit of time thinking about proper support) I don't think the bandaid will be significantly faster to implement than a proper solution. I'll just have to implement bignum long division and Algorithm M from the paper — of which I have already written a reference implementation in Python — with checks for the minimum and maximum exponent. Well, and set the right cutoff for when to use Algorithm M. Aside: I'm sure you're aware but @lifthrasiir's code correctly handles subnormals, at least on this example. Only the old code has this problem and 1.0 stable (and thus play.rust-lang.org with default settings) still uses the old code. |
This comment has been minimized.
This comment has been minimized.
|
Executive summary: I mostly agree with @rukruppe about I wrote:
@rkruppe responded:
I agree that printing First, I wrote this Rust code to print out some ranges of decimal fractions corresponding to low and high points for the floating point values around the area of 0.6 https://play.rust-lang.org/?gist=a50a1eea69cc97be1d45 [deleted paragraph with reported conclusions that were based on code with unquantified floating point error] UPDATE: Argh, the technique in the Rust code above is flawed; casting the numerators and denominators to floats and then dividing is simply a bad way to try to my hands on the exact ranges that are involved here... I will go back later and do it again with proper bignum-based fractions in the future. Now, as for the question of what should be printed (since we seem to have a number of options available to us) ... obviously when you use Based on some interactions with a Scheme interpreter (my go-to choice for bignum stuff), the 64-bit float for 0.59999999999999997779553950749686919152736663818359375From this we can see why @kballard might expect to see So: Its possible that some people may be annoyed by our current flt2dec handling of specifications like |
This comment has been minimized.
This comment has been minimized.
|
ps @rkruppe thanks for looking into this; I was about to roll up my sleeves to work on it yesterday, but i am happy to see you taking care of it. Let me know if you want a second set of eyes during debugging. |
This comment has been minimized.
This comment has been minimized.
any reason we cannot just cut-and-paste the old (slow-ish) bignum code that we used to have in the Rust repo into a private mod for these bignum implementation issues? The path that utilizes bignums is only exercised in exceptional cases, right?
I'm referring to the Update: Its possible I'm missing the point and that the abstractions provided by |
This comment has been minimized.
This comment has been minimized.
|
Decimal parsing and float formatting lives in libcore, i.e., can't use dynamic memory allocation. That alone already rules out using libnum verbatim. If I dug deeper, I'd probably find several other reasons, but it's a moot point since @lifthrasiir has already implemented almost all functionality I need (and all that flt2dec needs). FYI, while division is only needed in exceptional cases, most code paths do require a bignum to some degree (subtraction, comparison, multiplication by powers of two and ten). I realized just now how it can be avoided on some cases (fewer than 15 decimal digits and the absolute decimal exponent is less than 23) but it's still required even for some relatively common cases. Maybe libnum's code for division is a good starting point algorithmically, and I'll make sure to look at it in more depth before I dive into implementing division, but from a cursory glance it seems like importing the code and whipping it into something working may be more trouble than writing division from scratch. |
This comment has been minimized.
This comment has been minimized.
|
Just to note, I think that Which is to say, the behavior of printing some value that's not strictly accurate but evaluates to the same value when parsed as an IEEE floating-point value is behavior that's only appropriate when precision has not been specified. When output precision has been specified, it should always print based on the precise floating-point representation. |
pvginkel
referenced this issue
Jun 21, 2015
Closed
Implement custom floating point parsing and printing functions #42
This comment has been minimized.
This comment has been minimized.
Famous last words, heh. That said, I also did a lot of other clean up and refactoring and testing, confirmed a couple of bugs I'd long suspected, and addressed the underlying issues satisfactorily. For example, up-front input validation is now pretty comprehensive and prevents any way I can think of to cause integer (including bignum) overflow during the conversion. As for actual functionality: the code paths for nice inputs work now and passed a couple of simple tests. I'm not willing to declare it correct yet, but I'm hopeful. This is only for f64 as of now (of course I can parse into f64 and cast that to f32 and indeed I do that, and have plenty of test cases where it fails). Still missing, in order:
I won't make any predictions, but I feel I've solved all the hard problems and the rest is polishing and making sure I haven't overlooked any hard problems. |
This comment has been minimized.
This comment has been minimized.
|
@rkruppe can you share a pointer to your code? Maybe I can assist with some of the steps, like integration into libcore, or a libcore-based implementation of AlgorithmM/ |
This comment has been minimized.
This comment has been minimized.
|
My code is now at https://github.com/rkruppe/rust-dec2flt (virtually no git history though). The modifications to @lifthrasiir's bignum code are in my rust fork in the @pnkfelix Thanks for the offer, but splitting the work between two people seems tough. None of the steps are particularly hard in isolation and most depend on earlier steps. For example, half of the testing can't be done before f32 is properly supported, and integrating into libcore before the code is 99% done would just slow down the edit-compile-test cycle significantly. That said, additional test cases are always good and the optimizations should be independent from everything else. If either interests you, take a look at what's there and submit a PR. Since you'll be the ideal reviewer for the PR, even reading through the code will be valuable. And if you notice a bug, a possible optimization, a way to simplify the code, etc. — great! |
This comment has been minimized.
This comment has been minimized.
|
Quick FYI, AlgorithmM is implemented and everything works with |
rkruppe commentedApr 18, 2015
This example program produces a frighteningly large amount of numbers that, without even leaving the save haven of
[0.0; 1.0), change when printed out and read back in:Not all numbers fail to round trip, but in all my trials a majority did. The error is in the order of 1e-16 (much more, around 1e-6, for the default
{}formatting) for all outputs I looked at, but that is more than IEEE-754 permits (it requires correct round trips when 17 decimal digits are printed) and certainly more than users should be forced to endure. Perfect round tripping is possible, useful, and important --- it's just a pain to implement.I have recently worked my way through float formatting, and at the very least it looks pretty naive (as in, code and comments read as if the authors didn't care at all about rounding error, or were entirely unaware that it's a thing). I also just skimmed over the
FromStrimplementation and it looks a bit smarter, but since there is no reference to any paper and it doesn't use bignums I have doubts about its accuracy as well.Many numbers reach a fixed point after one round trip, but there are also numbers which take hundreds, thousands, or even more round trips before they settle down --- if they settle down at all (some cycle). The largest I found was almost nine million round trips, which is almost absurd. For obvious reasons, the error changing (presumably, growing) over time is even worse than a one-time inaccurate rounding. Here's the program I used to find those.
For the formatting side of things, I also filed #24556
I know less about the parsing side of things. The topic seems to get less attention in general, but I found a paper by William Clinger which seems promising.
cc @rprichard