New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New floating-to-decimal formatting routine #24612

Merged
merged 9 commits into from May 9, 2015

Conversation

Projects
None yet
@lifthrasiir
Contributor

lifthrasiir commented Apr 19, 2015

This is a direct port of my prior work on the float formatting. The detailed description is available here. In brief,

  • This adds a new hidden module core::num::flt2dec for testing from libcoretest. Why is it in core::num instead of core::fmt? Because I envision that the table used by flt2dec is directly applicable to dec2flt (cf. #24557) as well, which exceeds the realm of "formatting".
  • This contains both Dragon4 algorithm (exact, complete but slow) and Grisu3 algorithm (exact, fast but incomplete).
  • The code is accompanied with a large amount of self-tests and some exhaustive tests. In particular, libcoretest gets a new dependency on librand. For the external interface it relies on the existing test suite.
  • It is known that, in the best case, the entire formatting code has about 30 KBs of binary overhead (judged from strconv experiments). Not too bad but there might be a potential room for improvements.

This is rather large code. I did my best to comment and annotate the code, but you have been warned.

For the maximal availability the original code was licensed in CC0, but I've also dual-licensed it in MIT/Apache as well so there should be no licensing concern.

This is [breaking-change] as it changes the float output slightly (and it also affects the casing of inf and nan). I hope this is not a big deal though :)

Fixes #7030, #18038 and #24556. Also related to #6220 and #20870.

Known Issues

  • I've yet to finish make check-stage1. It does pass main test suites including run-pass but there might be some unknown edges on the doctests.
  • Figure out how this PR affects rustc.
  • Determine which internal routine is mapped to the formatting specifier. Depending on the decision, some internal routine can be safely removed (for instance, currently to_shortest_str is unused).
@rust-highfive

This comment has been minimized.

Collaborator

rust-highfive commented Apr 19, 2015

r? @aturon

(rust_highfive has picked a reviewer for you, use r? to override)

@pnkfelix

This comment has been minimized.

Member

pnkfelix commented Apr 19, 2015

@rust-highfive rust-highfive assigned pnkfelix and unassigned aturon Apr 19, 2015

@lifthrasiir

This comment has been minimized.

Contributor

lifthrasiir commented Apr 20, 2015

I've successfully finished make check-stage1. Also, the basic size benchmark done with fn main() { println!("hello, pi is {}", std::f64::consts::PI) }:

Options 2015-04-19 Nightly This PR Delta
-C link-args=-s 318,704 339,160 +20,456
-C link-args=-s -O 318,704 339,160 +20,456
-C link-args=-s -O -C lto 294,128 314,584 +20,456

(-C link-args=-s added because the binary from the current stage1 rustc seems to lack some debug sections.)

It seems that the original float formatting code had 10KB of overhead, so the best case overhead is somewhat amortized.

@pnkfelix

This comment has been minimized.

Member

pnkfelix commented Apr 20, 2015

Can you here summarize the effect of "it changes the float output slightly" beyond "affects the casing of inf and nan")?

In particular: There is some discussion in other tickets about things like:

  • whether to not to include the minus sign on -0.0,
  • or likewise, whether to include the .0 in +/-0.0.

I suspect that those sorts of changes are in some ways more breaking (or at least, more likely to catch developers unawares) than merely fixing our bugs with round-off error on the actual non-zero digits that we emit. And so it would be useful to know up front here whether this PR changes any of those behaviors, or if the changes to the float output is restricted to which (and how many) digits it chooses to emit on non-zero values.

@lifthrasiir

This comment has been minimized.

Contributor

lifthrasiir commented Apr 20, 2015

@pnkfelix It is a bit hard to say, precisely because the old code was not well-behaved. I can however surely say that this does not change the intention (say, it doesn't make {:e} to print decimals only). More precisely, I think that exact changes are as follows:

  • When the precision is not given, all specifiers default to the shortest representation.
  • When the precision is given, all specifiers round precisely based on the exact (not shortest) decimal expansion. Unit tests have some examples, but namely, the exact value of 0.95f64 is slightly smaller than 0.95 and format!("{:.1}", 0.95f64) will give "0.9" instead of "1.0".
  • Given the enough precision (more than 800 fractional digits) it will (correctly) print trailling zeroes.
  • NaN is unsigned, i.e. format!("{:+?}", 0.0/0.0) is simply "NaN". (Previously it gave "+NaN".)

Edit 2015-04-22: I've changed the implementation so that it preserves the original behavior as much as possible. The changes specific to the original code (but subsequently reverted in this PR) are as follows:

  • For {} and {:?}, large (>=10^16) or small (<10^-4) value will rendered in the exponential notation.
  • Every letter inside the formatted number (e, inf, nan) is lowercased except for {:E} where it is uppercased instead. (Previously inf was always lowercased and NaN was always mixed-cased.)

Regarding the existing discussions, with an exception of the sign of nan, it behaves identically as before. It is easy to switch the behavior (e.g. in order to get a trailing .0, use frac_digits of 1), so I'm not too worried.

limitation `limit` (which determines the actual `n`), and we would like to get
the representation `V = 0.d[0..n-1] * 10^k` such that:
- `d[0]` is non-zero, unless `n` was zero in which case only `k` is returned.

This comment has been minimized.

@pnkfelix

pnkfelix Apr 20, 2015

Member

I don't understand how the returned value for k is meaningful in the case where n is 0 ... is this comment implying that in such a scenario, the returned d[0] would have been non-zero if n had been positive?

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 20, 2015

Contributor

No. For example, v = 3.14 and limit = 3 gives n = 0 and k = 3. The extrapolated digits would start with 00314....

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 20, 2015

Contributor

Ah, re-reading your comment I understand what you really meant; yes, k <= limit holds if and only if n = 0. The exact value of k is thus unused in that case, but the code has a debugging assertion to check this identity.

There actually was one wrong comment (derived from the older faulty implementation) on to_exact_fixed_str implying that n cannot be zero (oops). I'll fix that soon.

This comment has been minimized.

@pnkfelix

pnkfelix Apr 21, 2015

Member

No, I was and still am confused. In the comment above, n is the number of digits that we need to extract from the array d in the representation V = 0.d[0..n-1] * 10^k. My comments were based on that equation; it led me to ask the question: when can n be zero?

At the time when I wrote that question, I had not yet read the paragraph that starts with

When limit is given but not n, ...

which seems to describe scenarios that yield non-positive values for n. But this mostly just confused me further, probably because I do not yet actually understand the conversion that you have named "fixed mode conversion".

The text in that paragraph that makes it sound like the equation V = 0.d[0..n-1] * 10^k simply cannot be well-defined ... maybe that is the source of my confusion.

I will try reading the whole text here and see if I can suggest a way to rephrase it to be clearer.

This comment has been minimized.

@pnkfelix

pnkfelix Apr 21, 2015

Member

(BTW I do recognize that the equation for V makes sense for n = 0 when V = 0 itself. But you yourself gave the example of v = 3.14; so clearly I am still not understanding something here.)

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 21, 2015

Contributor

Probably I've introduced the concepts backwards.

  • The "exact mode conversion" is the mode that the caller demands the specified number of significant digits.
  • The "fixed mode conversion" is the mode that the caller demands the specified number of fractional digits.

The fixed mode can be implemented via the exact mode and the estimator, which calculates the required number of significant digits from the number of fractional digits and the original value. It turns out that the internal digit generation code already has to estimate the exponent (which is almost same to the difference between them), so we can reuse this logic.

In order to achieve this merger, the combined exact-fixed mode is implemented so that both the number of significant digits len and the number of fractional digits frac_digits are given. (The actual interface differs from this to be more flexible: len == buf.len() and frac_digits == -limit. This doesn't change the discussion however.) The digit generation code will finish whichever comes first, so it is possible that there are no digits generated. The n as I originally described is really determined from either len or frac_digits, so it might have been confusing.

By the way, I realized that my original example doesn't arise with the current external interface. So I'll give a better example: v = 0.000045, len = 3, frac_digits = 3 (so that limit = -3). Since v = 0.45 * 10^-4, k = -4, but this clearly triggers the condition k <= limit. This means that the resulting representation is all zeroes, i.e. V = 0.000 (note that k is unused here).

(The better description for this is always welcomed. 😭)

This comment has been minimized.

@pnkfelix

pnkfelix Apr 21, 2015

Member

Oh wait! Are you actually saying:

we would like to get the representation V = 0.d[0..n-1] * 10^k such that d[0] is non-zero, unless n was zero in which case k digits are returned.

That I could understand, I think!

I will try re-reading this text, but substituting e.g. "we will only get k digits" in the spots where you wrote "we will only get k", and see if it all becomes clear to me then.

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 21, 2015

Contributor

Uh, no. k is just the exponent which does not depend on the determined n (which is the number of digits actually returned). Say that, for certain v, the algorithm returned 0.123 * 10^-8 (which implies that n = 3, and also implies either len = 3 or limit = -11). Then for some other values for limit, it may return 0. * 10^-8 (with n = 0 and k = -8). 10^-8 part still remains here, therefore "we will only get k".

This comment has been minimized.

@pnkfelix

pnkfelix Apr 21, 2015

Member

Oh sorry, somehow I didn't hit "reload" before I posted my last comment, so it may seem like I completely ignored you in my last comment.

I am re-reading the Steele+White Dragon paper now; I think your terminology matches theirs, so I just need to digest section 8 of their paper and figure out the right way to succinctly describe it.

Though to be honest the comment you wrote that points out the distinction between requesting a number of significant digits versus a number of fractional digits has helped me a lot.

Okay, back to reading, thanks.

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 21, 2015

Contributor

You are welcome. Actually, I guess my comment above can directly go to the module documentation...

They try to fill the `u8` buffer with digits and returns the number of digits
written and the exponent `k`. They are total for all finite `f32` and `f64`
inputs (Grisu internally falls back to Dragon if possible).

This comment has been minimized.

@pnkfelix

pnkfelix Apr 20, 2015

Member

I assume you meant to write "falls back to Dragon if necessary", yes?

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 20, 2015

Contributor

Ah, yes.

They all return a slice of preallocated `Part` array, which corresponds to
the individual part of strings: a fixed string, a part of rendered digits,
a number of zeroes or a small (`u16`) number. The caller is expected to
provide an enough buffer and `Part` array, and to assemble the final

This comment has been minimized.

@pnkfelix

pnkfelix Apr 20, 2015

Member

I think you left out a word here, perhaps you intended "The caller is expected to provid a large enough buffer and Part array, parts, and to assemble the final string from parts itself."

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 20, 2015

Contributor

You are right.

@rprichard

This comment has been minimized.

Contributor

rprichard commented Apr 21, 2015

The Display behavior prior to this change always used a decimal format (like %f). With this PR, it uses either exponent or decimal depending on magnitude (like %g). I think that's the right default, which #24556 argues for, but it's also a rather largish breaking change that I didn't see mentioned above. Display can format 1e300, which breaks the JSON serializer:

JSON:

            let s = v.to_string();
            if s.contains(".") {s} else {s + ".0"}

1e300.0 is not valid.

The CSV serializer works because Display in this PR reverts to decimal when a precision is specified:

CSV:

    let s: String = format!("{:.10}", v).trim_right_matches('0').into();
    if s.ends_with('.') { s + "0" } else { s }
@lifthrasiir

This comment has been minimized.

Contributor

lifthrasiir commented Apr 21, 2015

@rprichard Aargh, I forgot to mention that! I have updated the earlier comment to include that breaking change. Thank you so much.

(As mentioned above, this too is very easy to change though. If the decision is reached I'll rebase this commit. I explicitly mentioned this kind of decision in the "known issues" to gather the future direction, as it seems that there are ongoing discussions about changing defaults.)

@pnkfelix

This comment has been minimized.

Member

pnkfelix commented Apr 21, 2015

@lifthrasiir hmm, it will probably be easier to land the most important parts of this PR (namely, the bug fixes) if you remove the big breaking change(s) like the one that @rprichard mentioned.

(Hopefully we could figure out a way for a user to opt-in to getting the %g-like behavior, either by adding a new format trait, or by changing the return type of the precision method and adding a new variant to its rule for precision in the grammar.)

But, I'll make sure we talk about this at the team meeting tonight. Maybe we can go ahead and change this without going through an RFC.

@lifthrasiir

This comment has been minimized.

Contributor

lifthrasiir commented Apr 21, 2015

@pnkfelix Yeah, that was my oversight, I forgot to replicate the old behavior in some cases. I'll rebase this tonight (i.e. within 6 hours) to avoid any breaking changes except for fixing the rounding and inaccurate result.

@rprichard

This comment has been minimized.

Contributor

rprichard commented Apr 21, 2015

FWIW, the %g-like format is the default in many languages -- probably because it's non-lossy, concise, and convenient for values near zero in magnitude. It is awkward that it can take either of two different forms, though, and perhaps it's too late. I think the precision field for %g typically controls the total number of digits (before and after the decimal point).

self.base[i] = 0;
}
// shift by `nbits` bits

This comment has been minimized.

@pnkfelix

pnkfelix Apr 21, 2015

Member

nit: nbits must be referring to bits here, right?

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 21, 2015

Contributor

Yes.

retsz
}
let mut ret = [0; $n];

This comment has been minimized.

@pnkfelix

pnkfelix Apr 21, 2015

Member

out of curiosity, would it be potentially faster here to instead copy self.base into the stack-allocated temp, and write the output of mul_inner into self.base directly?

(it seems like either way one has a copy of an array of length $n, so maybe it makes no difference... just curious)

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 21, 2015

Contributor

Good observation. I will check if this would make a difference.

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 21, 2015

Contributor

There was no significant difference (actually, that was very slightly slower).

if mant == minnorm.0 && exp == minnorm.1 {
// (maxmant, exp - 1) -- (minnormmant, exp) -- (minnormmant + 1, exp)
// where maxmant = minnormmant * 2 - 1
FullDecoded::Finite(Decoded { mant: mant << 1, minus: 1, plus: 2,

This comment has been minimized.

@pnkfelix

pnkfelix Apr 21, 2015

Member

I'm having trouble connecting up the above comment with this code here.

IIUC, your notation is lower_bound_for_rounding -- value -- upper_bound_for_rounding

in the above, we are in the branch where mant == minnorm, so I am thinking that the uses of mant in the code match up with the uses of minnormmant in the comment.

And then you define maxmant to a value equivalent to (mant << 1) - 1.

But then you use maxnant in only the lower_bound_for_rounding in the comment, while from the code, it seems like the mant << 1 value is relevant to the other two parts too: value and upper_bound_for_rounding... I guess that's the place where I'm confused; why aren't the value and upper_bound_for_rounding defined to be minnormant * 2 and minnormant * 2 + 1 ?

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 21, 2015

Contributor

Uh, this actually is a bug: the range was twice large. This should read mant: mant << 2, minus: 1, plus: 2, exp: exp - 2 (so that, the bounds are (mant - 1/4) * 2^exp through (mant + 1/2) * 2^exp). I don't know how this escaped my attention, but I'll add a test case for this.

The comment is correct by the way; the notation reads like (value - 1 ulp) -- value -- (value + 1 ulp).

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 21, 2015

Contributor

N.B. The reason that this escaped my attention is that, the exhaustive check assumes that the decoder is correct, and there is no separate decoder test. Fortunately for us, there are only two values which can trigger this situation---std::{f32, f64}::MIN_POSITIVE---and they are explicitly tested. I was a bit confused here too and the actual problem may occur for every 2^n values. I'm reevaluating the fix now.

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 21, 2015

Contributor

Okay, I forgot to see that this affects many 2^n values. I've added a new commit which has both the fix and regression tests.

pub fn estimate_scaling_factor(mant: u64, exp: i16) -> i16 {
// 2^(nbits-1) < mant <= 2^nbits if mant > 0
let nbits = 64 - (mant - 1).leading_zeros() as i64;
(((nbits + exp as i64) * 1292913986) >> 32) as i16

This comment has been minimized.

@pnkfelix

pnkfelix Apr 21, 2015

Member

Where does this formula come from? Especially the magic constant 1292913986? I know there's lot of doc in other modules, but it probably would be good to put a reference here.

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 21, 2015

Contributor

That is floor(2^32 * log_10 2). Thus it may underestimate the true k, but the approximated k_0 cannot exceed it. (I agree that this needs a comment.)

@lifthrasiir lifthrasiir force-pushed the lifthrasiir:flt2dec branch 3 times, most recently from 97c76cb to dccacbc Apr 21, 2015

/// The integer mantissa.
pub f: u64,
/// The exponent in base 2.
pub e: i16,

This comment has been minimized.

@pnkfelix

pnkfelix Apr 21, 2015

Member

I haven't read the Grisu paper carefully, but in its discussion of the diy_fp struct, it points out

  • (1.) the paper's definition uses an unlimited range value for e_x to simplify the proofs,
  • (2.) in practice the exponent type must only have "slightly greater range than the input exponent" and
  • (3.) for IEEE doubles which have 11-bit exponents, a 32-bit signed integer is by far big enough.

The combination of 2. and 3. leads me to wonder whether we could get into trouble using only i16 here, which only has five extra bits. I know you exhaustively tested your algorithm for f32, but how confident are you that i16 suffices for f64 ?

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 21, 2015

Contributor

That is more than enough. Grisu does have a stricter limitation on the mantissa size (Grisu1 needs 2 more extra bits, Grisu2 and Grisu3 needs 3), but they are very generous about the exponent range.

diy_fp is converted from the decoded float, then normalized, then multiplied by the cached power of 10 to get that within a certain interval (to be exact, [4, 2^32)). Since the normalization step can decrease the exponent by at most 63, and the original float has the minimal exponent of -1075, the exponent is no less than -1138. With the same argument, the exponent is no more than 971.

(In fact, this exponent type is enough for f80 if Rust had support for that. The problem with f80 is, however, that its mantissa is too large and u64 would not be sufficient.)

#[test]
#[should_panic]
fn test_add_overflow_1() {
Big::from_small(1).add(&Big::from_u64(0xffffff));

This comment has been minimized.

@pnkfelix

pnkfelix Apr 21, 2015

Member

these tests are marked as #[should_panic], but I cannot tell from reviewing the code where overflow would unconditionally originate.

Does the panic come from an index-out-of-bounds access, e.g. here? Or does it come from an arithmetic-overflow, which means that one cannot rely on the #[should_panic] here actually occurring in general, not without -C debug-assertions turned on?

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 21, 2015

Contributor

Your guess is correct: for add and mul_* the panic should come from the out-of-bounds access. For sub it should come from assert!(noborrow).

# Implementation overview
It is easy to get the floating point printing correct but slow (Russ Cox has
[demonstrated](http://research.swtch.com/ftoa) how it's easy), or incorrect but

This comment has been minimized.

@pnkfelix

pnkfelix Apr 21, 2015

Member

Should we actually add an implementation of Russ Cox's code for purposes of comparative validation of the Dragon implementation, in the same way that you have validated your Grisu3 implementation against your Dragon4 implementation?

This comment has been minimized.

@lifthrasiir

lifthrasiir Apr 21, 2015

Contributor

Russ Cox's code only deals with the exact mode (quite reasonably), so it is not directly comparable.

Edit: I also believe that the current dragon::format_exact function is, with an exception of the last-digit limitation and the use of bignum, simple enough to understand that we don't need Russ Cox's code.

@pnkfelix

This comment has been minimized.

Member

pnkfelix commented Apr 21, 2015

(By the way this is all looking pretty fantastic so far. I just want to really make sure I or someone else besides you has a decent understanding of all of it, so please do bear with me while I continue to work my way through it.)

@lifthrasiir lifthrasiir force-pushed the lifthrasiir:flt2dec branch from 61780c7 to 3d34e17 May 6, 2015

@lifthrasiir

This comment has been minimized.

Contributor

lifthrasiir commented May 6, 2015

I've rebased the commit to account for another typo and banker's rounding in the exact mode. r? @pnkfelix

@pnkfelix

This comment has been minimized.

pnkfelix commented on 3d34e17 May 6, 2015

I'm a little surprised that there seem to be no tests exercising this change ... e.g. I would have expected to see some of this diff showing cases where the earlier version would have rounded up to an odd value and now we round down to an even value.

This comment has been minimized.

pnkfelix replied May 6, 2015

Or wait, was this: assert_eq!(to_string(f, 0.5, Minus, 0, false), "1"); such a case?

((... deleted confused parentheticals ...))

Update: In any case, since rounding-down a negative number looks a lot like round-away-from-zero for a negative number, I would suggest having at least one test that shows round-to-even on a non-negative value.

(I really could have sworn this whole issue came up because a different example elsewhere was illustrating the old implementation resolving ties by rounding up; what happened to that test? A shame that the old comments are gone.)

This comment has been minimized.

Owner

lifthrasiir replied May 6, 2015

Hmm, I thought this testing function exactly does that (the function essentially acts as a test generator). There are some existing test cases triggering this code: for example, minf64 test in f64_exact_sanity_test will trigger this case when rounded at the second-to-last digit.

For your information, tests for 0.5 have been added for other purpose, i.e. testing a new rounding behavior at the edge case (k = limit).

@pnkfelix

This comment has been minimized.

Member

pnkfelix commented May 9, 2015

@lifthrasiir okay, thanks for your patience with this.

I'm going to r+ this. I cannot claim that I understand the algorithms 100%, but I have reviewed all the papers and satisfied myself that this implementation is a good match for them, and that the goals of the papers are a match for what we want in Rust.

@pnkfelix

This comment has been minimized.

Member

pnkfelix commented May 9, 2015

@bors r+ 3d34e17 p=1

@tshepang

This comment has been minimized.

Contributor

tshepang commented May 9, 2015

epic!

bors added a commit that referenced this pull request May 9, 2015

Auto merge of #24612 - lifthrasiir:flt2dec, r=pnkfelix
This is a direct port of my prior work on the float formatting. The detailed description is available [here](https://github.com/lifthrasiir/rust-strconv#flt2dec). In brief,

* This adds a new hidden module `core::num::flt2dec` for testing from `libcoretest`. Why is it in `core::num` instead of `core::fmt`? Because I envision that the table used by `flt2dec` is directly applicable to `dec2flt` (cf. #24557) as well, which exceeds the realm of "formatting".
* This contains both Dragon4 algorithm (exact, complete but slow) and Grisu3 algorithm (exact, fast but incomplete).
* The code is accompanied with a large amount of self-tests and some exhaustive tests. In particular, `libcoretest` gets a new dependency on `librand`. For the external interface it relies on the existing test suite.
* It is known that, in the best case, the entire formatting code has about 30 KBs of binary overhead (judged from strconv experiments). Not too bad but there might be a potential room for improvements.

This is rather large code. I did my best to comment and annotate the code, but you have been warned.

For the maximal availability the original code was licensed in CC0, but I've also dual-licensed it in MIT/Apache as well so there should be no licensing concern.

This is [breaking-change] as it changes the float output slightly (and it also affects the casing of `inf` and `nan`). I hope this is not a big deal though :)

Fixes #7030, #18038 and #24556. Also related to #6220 and #20870.

## Known Issues

- [x] I've yet to finish `make check-stage1`. It does pass main test suites including `run-pass` but there might be some unknown edges on the doctests.
- [ ] Figure out how this PR affects rustc.
- [ ] Determine which internal routine is mapped to the formatting specifier. Depending on the decision, some internal routine can be safely removed (for instance, currently `to_shortest_str` is unused).
@bors

This comment has been minimized.

Contributor

bors commented May 9, 2015

⌛️ Testing commit 3d34e17 with merge d8b3a6a...

@bors

This comment has been minimized.

Contributor

bors commented May 9, 2015

💔 Test failed - auto-mac-64-opt

@lifthrasiir

This comment has been minimized.

Contributor

lifthrasiir commented May 9, 2015

Whoops, I forgot to run make check-stage1 (not just make check-stage1-coretest). This failure is nothing to do with the implementation but only the tests, I'll shortly fix that.

@lifthrasiir

This comment has been minimized.

Contributor

lifthrasiir commented May 9, 2015

I've confirmed this new commit passes make check-stage1. r? @pnkfelix

@pnkfelix

This comment has been minimized.

Member

pnkfelix commented May 9, 2015

@bors r+ 1aecd17 p=1

@bors

This comment has been minimized.

Contributor

bors commented May 9, 2015

⌛️ Testing commit 1aecd17 with merge 67ba6dc...

bors added a commit that referenced this pull request May 9, 2015

Auto merge of #24612 - lifthrasiir:flt2dec, r=pnkfelix
This is a direct port of my prior work on the float formatting. The detailed description is available [here](https://github.com/lifthrasiir/rust-strconv#flt2dec). In brief,

* This adds a new hidden module `core::num::flt2dec` for testing from `libcoretest`. Why is it in `core::num` instead of `core::fmt`? Because I envision that the table used by `flt2dec` is directly applicable to `dec2flt` (cf. #24557) as well, which exceeds the realm of "formatting".
* This contains both Dragon4 algorithm (exact, complete but slow) and Grisu3 algorithm (exact, fast but incomplete).
* The code is accompanied with a large amount of self-tests and some exhaustive tests. In particular, `libcoretest` gets a new dependency on `librand`. For the external interface it relies on the existing test suite.
* It is known that, in the best case, the entire formatting code has about 30 KBs of binary overhead (judged from strconv experiments). Not too bad but there might be a potential room for improvements.

This is rather large code. I did my best to comment and annotate the code, but you have been warned.

For the maximal availability the original code was licensed in CC0, but I've also dual-licensed it in MIT/Apache as well so there should be no licensing concern.

This is [breaking-change] as it changes the float output slightly (and it also affects the casing of `inf` and `nan`). I hope this is not a big deal though :)

Fixes #7030, #18038 and #24556. Also related to #6220 and #20870.

## Known Issues

- [x] I've yet to finish `make check-stage1`. It does pass main test suites including `run-pass` but there might be some unknown edges on the doctests.
- [ ] Figure out how this PR affects rustc.
- [ ] Determine which internal routine is mapped to the formatting specifier. Depending on the decision, some internal routine can be safely removed (for instance, currently `to_shortest_str` is unused).

@bors bors merged commit 1aecd17 into rust-lang:master May 9, 2015

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
@cmr

This comment has been minimized.

Member

cmr commented May 10, 2015

Awesome! Epic review @pnkfelix, and thanks to @lifthrasiir for implementing these tricky algorithms.

@Kimundi

This comment has been minimized.

Member

Kimundi commented May 20, 2015

Cool! Finally a proper implementation for this stuff :)

@huonw huonw referenced this pull request Jan 5, 2016

Closed

Number to/from string API #6220

zoffixznet added a commit to MoarVM/MoarVM that referenced this pull request Mar 24, 2018

Stringify Num using Grisu3 algo
- Makes Num stringification 2x faster (tested with rand.Str)
- Fixes RT#127184 https://rt.perl.org/Ticket/Display.html?id=127184
- Fixes RT#124796 https://rt.perl.org/Ticket/Display.html?id=124796
- Fixes RT#132330 https://rt.perl.org/Ticket/Display.html?id=132330
  (fixes Num.WHICH and problems with set()s mentioned in that ticket)

Grisu3[^1] is a 2010 algorithm that stringifies doubles to shortest
possible representation that would still result in the same value if
it's parsed and stringified again. This sort of strigification is
what fixes the reported bugs. As a bonus, the algo is faster than
snprintf('%.15g') we used to use for stringification.

Grisu3 handles ~99.5% of cases. In the remaining 0.5%, it knows it won't
produce the shortest possible result and so a fallback to slower
algorithm is made, which in our current case is snprintf('%.17g'),
which is a suboptimal fallback, and in the future Dragon4 algo[^2]
could be used instead.

The change from .15g to .17g in the fallback is intentional, as 15 is
the number of guaranteed significant digits, but 17 is the number of
maximum available significant digits when the IEEE doubles differ by
at most one unit[^7].

Based on my research, an improved fallback would be Dragon4 algo[^2],
and Grisu3+Dragon4 is the current state of the art. (some references
may state that Errol[^8] algo superseded Grisu3+Dragon4 combo, but
based on authors' corrections[^9], it appears there was a
benchmarking error and Errol is not in fact faster).

The Grisu3+Dragon4 is used[^5] by Rust (see also some trial impls
in [^6]) and Grisu3's author's C++ version of the code[^3] indicates
it's used by V8 JavaScript engine as well. There exist a C# impl[^4]
of the algo as well, used in an efficient JSON encoder.

[1] "Printing Floating-Point Numbers Quickly and
    Accurately with Integers" by Florian Loitsch https://goo.gl/cbvogg
[2] "How to Print Floating-Point Numbers Accurately" by Steel & White:
    https://lists.nongnu.org/archive/html/gcl-devel/2012-10/pdfkieTlklRzN.pdf
[3] https://github.com/google/double-conversion
[4] https://github.com/kring/grisu.net
[5] rust-lang/rust#24612
[6] https://github.com/lifthrasiir/rust-strconv
[7] http://www2.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1822.pdf
[8] https://github.com/marcandrysco/Errol
[9] https://github.com/marcandrysco/Errol#performance-evaluation-correction

zoffixznet added a commit to MoarVM/MoarVM that referenced this pull request Mar 24, 2018

Stringify Num using Grisu3 algo
- Makes Num stringification 2x faster (tested with rand.Str)
- Fixes RT#127184 https://rt.perl.org/Ticket/Display.html?id=127184
- Fixes RT#132330 https://rt.perl.org/Ticket/Display.html?id=132330
  (fixes Num.WHICH and problems with set()s mentioned in that ticket)

Grisu3[^1] is a 2010 algorithm that stringifies doubles to shortest
possible representation that would still result in the same value if
it's parsed and stringified again. This sort of strigification is
what fixes the reported bugs. As a bonus, the algo is faster than
snprintf('%.15g') we used to use for stringification.

Grisu3 handles ~99.5% of cases. In the remaining 0.5%, it knows it won't
produce the shortest possible result and so a fallback to slower
algorithm is made, which in our current case is snprintf('%.17g'),
which is a suboptimal fallback, and in the future Dragon4 algo[^2]
could be used instead.

The change from .15g to .17g in the fallback is intentional, as 15 is
the number of guaranteed significant digits, but 17 is the number of
maximum available significant digits when the IEEE doubles differ by
at most one unit[^7].

Based on my research, an improved fallback would be Dragon4 algo[^2],
and Grisu3+Dragon4 is the current state of the art. (some references
may state that Errol[^8] algo superseded Grisu3+Dragon4 combo, but
based on authors' corrections[^9], it appears there was a
benchmarking error and Errol is not in fact faster).

The Grisu3+Dragon4 is used[^5] by Rust (see also some trial impls
in [^6]) and Grisu3's author's C++ version of the code[^3] indicates
it's used by V8 JavaScript engine as well. There exist a C# impl[^4]
of the algo as well, used in an efficient JSON encoder.

[1] "Printing Floating-Point Numbers Quickly and
    Accurately with Integers" by Florian Loitsch https://goo.gl/cbvogg
[2] "How to Print Floating-Point Numbers Accurately" by Steel & White:
    https://lists.nongnu.org/archive/html/gcl-devel/2012-10/pdfkieTlklRzN.pdf
[3] https://github.com/google/double-conversion
[4] https://github.com/kring/grisu.net
[5] rust-lang/rust#24612
[6] https://github.com/lifthrasiir/rust-strconv
[7] http://www2.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1822.pdf
[8] https://github.com/marcandrysco/Errol
[9] https://github.com/marcandrysco/Errol#performance-evaluation-correction
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment