Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upDecimal to floating point conversion #27307
Conversation
rust-highfive
assigned
pnkfelix
Jul 26, 2015
This comment has been minimized.
This comment has been minimized.
|
CC @lifthrasiir |
This comment has been minimized.
This comment has been minimized.
This seems like something which could be handled trivially: if the exponent overflows, either the correct result is inf or 0, or the string has a stretch of zeros way too long to fit into memory. |
This comment has been minimized.
This comment has been minimized.
|
That's a good point. Exponent and number of decimal digits used to be a smaller integer type and I didn't stop to reconsider when I switched to 64 bit integers. However, exploiting this insight requires restructuring some code: The exponent is parsed in the parsing module, which isn't expected to produce any floats. I'll sleep over the least hacky way to integrate this. A related aspect is that the integral and fractional parts really can't be larger than 1.8 exabyte, which is the limit now. I'll remove those guards when I do the other change. |
This comment has been minimized.
This comment has been minimized.
Umm, 1.7976931348623157e-324 ? Are you sure you didn't mean some other number? |
This comment has been minimized.
This comment has been minimized.
|
It would probably be a good idea to deprecate f32/f64::from_str_radix. (We don't want to keep them around given that they don't actually work correctly.) |
eefriedman
reviewed
Jul 27, 2015
|
|
||
| // Find the smallest floating point number strictly larger than the argument. | ||
| // This operation is saturating, i.e. next_float(inf) == inf. | ||
| // Unlike most code in this module, this function does handle zero, subnormals, and infinities. |
This comment has been minimized.
This comment has been minimized.
eefriedman
Jul 27, 2015
Contributor
This should explicitly document that it only handles floats with a positive sign bit.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
This might be a silly question... but does fast_path() work correctly on Linux x86-32 with SSE disabled? (The Clinger paper calls out a similar scenario in section 9.) |
rkruppe
force-pushed the
rkruppe:dec2flt
branch
from
8270600
to
216823c
Jul 27, 2015
This comment has been minimized.
This comment has been minimized.
Gah! The decimal part belongs to another boundary case. The correct number is
Good to hear. I'd like to collect some more support before I go ahead and do it, and I'm not sure if it needs to go into this PR.
Not silly at all! Unfortunately the VM on which I'd normally test this currently has... issues. If anyone could compile a short program doing a float multiplication with a 32 bit rust and |
rkruppe
force-pushed the
rkruppe:dec2flt
branch
from
216823c
to
9de3766
Jul 27, 2015
This comment has been minimized.
This comment has been minimized.
|
Lifted the unnecessary restrictions @eefriedman pointed out. I immediately squashed that into the last commit, I hope that's okay. |
eefriedman
reviewed
Jul 27, 2015
| // in the decimal digits only adjusts the exponent by +/- 1, at exp = 10^18 the input would | ||
| // have to be 17 exabyte (!) of zeros to get even remotely close to being finite. | ||
| // This is not exactly a use case we need to cater to. | ||
| if number.len() >= 18 { |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
rkruppe
Jul 27, 2015
Contributor
Indeed, thanks! That would have been sensible even when rejecting large exponents.
rkruppe
force-pushed the
rkruppe:dec2flt
branch
from
9de3766
to
5e5b93d
Jul 27, 2015
eefriedman
reviewed
Jul 27, 2015
| return Some(T::zero()); | ||
| } | ||
| // This is a crude approximation of ceil(log10(the real value)). | ||
| let max_place = e + decimal.integral.len() as i64; |
This comment has been minimized.
This comment has been minimized.
eefriedman
Jul 27, 2015
Contributor
Might want to explicitly note why the math involving e in this file can't overflow; it's a bit subtle.
This comment has been minimized.
This comment has been minimized.
rkruppe
Jul 27, 2015
Contributor
Done (in the long comment at the start of the file, and also in parse.rs).
eefriedman
reviewed
Jul 27, 2015
| // If we exceed this, perhaps while calculating `f * 10^e` in Algorithm R or Algorithm M, | ||
| // we'll crash. So we error out before getting too close, with a generous safety margin. | ||
| if max_digits > 375 { | ||
| return Err(PFE { __kind: FloatErrorKind::Invalid }); |
This comment has been minimized.
This comment has been minimized.
eefriedman
Jul 27, 2015
Contributor
You can "round" numbers with too many digits: once you have too many digits, the actual values of the digits cease to matter. Your bigints need to be a bit larger to make that work, but it's the same order of magnitude; glibc uses the formula "1 + ((DBL_MANT_DIG - DBL_MIN_EXP + 2) * 10) / 3" (==3587) for the number of necessary bits.
This comment has been minimized.
This comment has been minimized.
rkruppe
Jul 27, 2015
Contributor
Interesting. Do you have a paper or other write-up explaining this in detail? It is not at all obvious to me how this works, or what the corresponding algorithm looks like.
This comment has been minimized.
This comment has been minimized.
rkruppe
Jul 27, 2015
Contributor
Thinking about it some more, I find it hard to believe that this is always strictly correct. Consider the exact decimal representation (this always exists and is terminating) of some float with an even mantissa. Add half an ULP. This still rounds to the same float as we started with because the mantissa was even. Now add 2^-10000 or some other sufficiently small power of two: Suddenly we need to round up, and it's not visible without inspecting thousands of additional bits! The tiny power of two does not affect the first couple thousand decimal digits either. Is there some magic trick to correctly evaluate the decimal digits you're not putting into the bigint?
This comment has been minimized.
This comment has been minimized.
eefriedman
Jul 27, 2015
Contributor
There's an overview of glibc's algorithm at http://www.exploringbinary.com/how-glibc-strtod-works/ .
In terms of correctness, what I meant by "round" is to treat the existence extra digits like a sticky bit. For example, take the exact decimal representation of half of the smallest denormal double. If you convert that exact number, it gets rounded down to zero. If you append any combination of digits (assuming they aren't all zero), it gets rounded up. The values of the digits don't actually matter.
This comment has been minimized.
This comment has been minimized.
rkruppe
Jul 27, 2015
Contributor
That sounds sensible. However, from my experience over the last few weeks, I predict that actually implementing this algorithm (like all the algorithms that I did implement) will be fraught with many small yet tricky issues, making it quite time-consuming to polish and bring up to production quality. Therefore I suggest that this too is filed under future work. The remaining restrictions of the code in this PR should not be a problem for any practical inputs.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
arielb1
Jul 27, 2015
Contributor
The result of dec2flt depends on the order of the input with respect to the numbers half-between floating points. These are numbers of the form (2^53 + 2u + 1) * 2^-s for some 0 <= u < 2^52 for normals, and (2u+1)*(2^-1075) for denormals. These numbers all become integers when multiplied by 10^1075, and therefore have less than 1075 nonzero digits (to be more precise, after multiplying by 10^s, you are left with numbers smaller than 2^54*5^s, so smaller than 10^[log(2,10)*54 + log(5,10)*1075] < 10^768, so with no more than 768 nonzero digits) - beyond that last digit the only thing that matters is the tie-breaker sticky bit.
For the record, one such number of the maximum possible length is halfway between the largest subnormal and the smallest normal,
2.22507385850720113605740979670913197593481954635164564802342610972482222202107694551652952390813508791414915891303962110687008643869459464552765720740782062174337998814106326732925355228688137214901298112245145188984905722230728525513315575501591439747639798341180199932396254828901710708185069063066665599493827577257201576306269066333264756530000924588831643303777979186961204949739037782970490505108060994073026293712895895000358379996720725430436028407889577179615094551674824347103070260914462157228988025818254518032570701886087211312807951223342628836862232150377566662250398253433597456888442390026549819838548794829220689472168983109969836584681402285424333066033985088644580400103493397042756718644338377048603786162277173854562306587467901408672332763671875E-308
That number is (2^53-1)/2^1075, and has 307 zeros and 768 significant digits.
This comment has been minimized.
This comment has been minimized.
pnkfelix
Aug 6, 2015
Member
I'm fine with leaving this extension for future work (i don't want this PR to be delayed further than it already has been, due to my tardiness in reviewing it).
This comment has been minimized.
This comment has been minimized.
|
re deprecation: kill it with fire |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
So I finally got around to testing whether the fast path is correct on x86 without SSE (thanks, @eefriedman!). Turns out the exact same problem described in the paper (in the context of a Motorola 68881/68882 floating point coprocessor) also applies to more common hardware. |
rkruppe
force-pushed the
rkruppe:dec2flt
branch
2 times, most recently
from
3078a03
to
2df7da8
Aug 8, 2015
rkruppe
added some commits
Jun 27, 2015
rkruppe
force-pushed the
rkruppe:dec2flt
branch
from
2df7da8
to
a783ddc
Aug 8, 2015
rkruppe
force-pushed the
rkruppe:dec2flt
branch
2 times, most recently
from
6d622f5
to
e0515f6
Aug 9, 2015
rkruppe
force-pushed the
rkruppe:dec2flt
branch
3 times, most recently
from
c3930d4
to
3a60b04
Aug 10, 2015
This comment has been minimized.
This comment has been minimized.
|
Now that PR #27647 has been merged and the fast path brokenness is at least somewhat addressed (see last commit), I declare this PR officially ready. The resolution is somewhat unsatisfying and maybe I'll find a better way in the future, but that shouldn't stop the very real improvements (even on platforms where the fast path is technically "broken") from landing. |
This comment has been minimized.
This comment has been minimized.
|
Isn't |
This comment has been minimized.
This comment has been minimized.
eefriedman
referenced this pull request
Aug 11, 2015
Merged
Use correct target CPU for iOS simulator. #27672
added a commit
to alexcrichton/rust
that referenced
this pull request
Aug 12, 2015
added a commit
to alexcrichton/rust
that referenced
this pull request
Aug 12, 2015
rkruppe
force-pushed the
rkruppe:dec2flt
branch
from
3a60b04
to
6e1361a
Aug 12, 2015
rkruppe
force-pushed the
rkruppe:dec2flt
branch
from
6e1361a
to
15518a9
Aug 12, 2015
This comment has been minimized.
This comment has been minimized.
|
Since there are no more in-tree targets that disable SSE, "deal with it" now means "explain that it's theoretically broken and add a test that will expose said brokenness if some unfortunate soul is forced to compile without SSE". |
This comment has been minimized.
This comment has been minimized.
|
@bors r=pnkfelix |
This comment has been minimized.
This comment has been minimized.
|
|
added a commit
that referenced
this pull request
Aug 13, 2015
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
bors
merged commit 15518a9
into
rust-lang:master
Aug 13, 2015
This comment has been minimized.
This comment has been minimized.
|
@rkruppe, thank you for this work! |
This comment has been minimized.
This comment has been minimized.
|
\o/ |
rkruppe
referenced this pull request
Oct 5, 2015
Merged
Integer parsing should accept leading plus #28826
rkruppe
referenced this pull request
Feb 4, 2016
Open
Float parsing can fail on valid float literals #31407
rkruppe
referenced this pull request
Feb 3, 2017
Open
Escaping `char` in libcore adds 2k of static data for no_std cases #39492
lifthrasiir
referenced this pull request
Mar 21, 2017
Closed
Add support for hexadecimal float literals #1433
dtolnay
referenced this pull request
Aug 3, 2018
Open
Performance cliff when parsing f64 above 1e305 #53015
This comment has been minimized.
This comment has been minimized.
donbright
commented
Dec 17, 2018
|
"absolutely positively completely 100% accurate " i question the basic assumption that base-10 numbers can be converted into base-2 with 100% accuracy. there will be information lost in the transfer, to me it seems that would result in accuracy of some number less than 100%. and that number should be measurable. |
rkruppe commentedJul 26, 2015
Completely rewrite the conversion of decimal strings to
f64andf32. The code is intended to be absolutely positively completely 100% accurate (when it doesn't give up). To the best of my knowledge, it achieves that goal. Any input that is not rejected is converted to the floating point number that is closest to the true value of the input. This includes overflow, subnormal numbers, and underflow to zero. In other words, the rounding error is less than or equal to 0.5 units in the last place. Half-way cases (exactly 0.5 ULP error) are handled with half-to-even rounding, also known as banker's rounding.This code implements the algorithms from the paper How to Read Floating Point Numbers Accurately by William D. Clinger, with extensions to handle underflow, overflow and subnormals, as well as some algorithmic optimizations.
Correctness
With such a large amount of tricky code, many bugs are to be expected. Indeed tracking down the obscure causes of various rounding errors accounts for the bulk of the development time. Extensive tests (taking in the order of hours to run through to completion) are included in
src/etc/test-float-parse: Though exhaustively testing all possible inputs is impossible, I've had good success with generating millions of instances from various "classes" of inputs. These tests take far too long to be run by @bors so contributors who touch this code need the discipline to run them. There are#[test]s, but they don't even cover every stupid mistake I made in course of writing this.Another aspect is integer overflow. Extreme (or malicious) inputs could cause overflow both in the machine-sized integers used for bookkeeping throughout the algorithms (e.g., the decimal exponent) as well as the arbitrary-precision arithmetic. There is input validation to reject all such cases I know of, and I am quite sure nobody will accidentally cause this code to go out of range. Still, no guarantees.
Limitations
Noticed the weasel words "(when it doesn't give up)" at the beginning? Some otherwise well-formed decimal strings are rejected because spelling out the value of the input requires too many digits, i.e.,
digits * 10^abs(exp)can't be stored in a bignum. This only applies if the value is not "obviously" zero or infinite, i.e., if you take a near-infinity or near-zero value and add many pointless fractional digits. At least with the algorithm used here, computing the precise value would require computing the full value as a fraction, which would overflow. The precise limit isnumber_of_digits + abs(exp) > 375but could be raised almost arbitrarily. In the future, another algorithm might lift this restriction entirely.This should not be an issue for any realistic inputs. Still, the code does reject inputs that would result in a finite float when evaluated with unlimited precision. Some of these inputs are even regressions that the old code (mostly) handled, such as
0.333...333with 400+3s. Thus this might qualify as [breaking-change].Performance
Benchmarks results are... tolerable. Short numbers that hit the fast paths (
f64multiplication or shortcuts to zero/inf) have performance in the same order of magnitude as the old code tens of nanoseconds. Numbers that are delegated to Algorithm Bellerophon (using floats with 64 bit significand, implemented in software) are slower, but not drastically so (couple hundred nanoseconds).Numbers that need the AlgorithmM fallback (for
f64, roughly everything below 1e-305 and above 1e305) take far, far longer, hundreds of microseconds. Note that my implementation is not quite as naive as the expository version in the paper (it needs one to four division instead of ~1000), but division is fundamentally pretty expensive and my implementation of it is extremely simple and slow.All benchmarks run on a mediocre laptop with a i5-4200U CPU under light load.
Binary size
Unfortunately the implementation needs to duplicate almost all code: Once for
f32and once forf64. Before you ask, no, this cannot be avoided, at least not completely (but see the Future Work section). There's also a precomputed table of powers of ten, weighing in at about six kilobytes.Running a stage1
rustcover a stand-alone program that simply parses pi tof32andf64and outputs both results reveals that the overhead vs. the old parsing code is about 44 KiB normally and about 28 KiB with LTO. It's presumably half of that + 3 KiB when only one of the two code paths is exercised.Future Work
Directory layout
The
dec2fltcode uses some types embedded deeply in theflt2decmodule hierarchy, even though nothing about them it formatting-specific. They should be moved to a more conversion-direction-agnostic location at some point.Performance
It could be much better, especially for large inputs. Some low-hanging fruit has been picked but much more work could be done. Some specific ideas are jotted down in
FIXMEs all over the code.Binary size
One could try to compress the table further, though I am skeptical. Another avenue would be reducing the code duplication from basically everything being generic over
T: RawFloat. Perhaps one can reduce the magnitude of the duplication by pushing the parts that don't need to know the target type into separate functions, but this is finicky and probably makes some code read less naturally.Other bases
This PR leaves
f{32,64}::from_str_radixalone. It only replacesFromStr(and thus.parse()). I am convinced thatfrom_str_radixshould not exist, and have proposed its deprecation and speedy removal. Whatever the outcome of that discussion, it is independent from, and out of scope for, this PR.Fixes #24557
Fixes #14353
r? @pnkfelix
cc @lifthrasiir @huonw