Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strconv: implement the Ryu algorithm to speed up float64->decimal conversion #15672

Open
dvyukov opened this issue May 13, 2016 · 30 comments
Open

strconv: implement the Ryu algorithm to speed up float64->decimal conversion #15672

dvyukov opened this issue May 13, 2016 · 30 comments
Assignees
Labels
Milestone

Comments

@dvyukov
Copy link
Member

@dvyukov dvyukov commented May 13, 2016

go version devel +15f2d0e Fri May 13 01:19:05 2016 +0000 linux/amd64

func BenchmarkAppendFloatLarge1(b *testing.B) { benchmarkAppendFloat(b, 622666234635.321e-320, 'e', -1, 64) }
func BenchmarkAppendFloatLarge2(b *testing.B) { benchmarkAppendFloat(b, 622666234635.3213e-320, 'e', -1, 64) }
func BenchmarkAppendFloatLarge3(b *testing.B) { benchmarkAppendFloat(b, 622666234635.322e-320, 'e', -1, 64) }
BenchmarkAppendFloatLarge1-48        5000000           276 ns/op
BenchmarkAppendFloatLarge2-48           2000         89589 ns/op
BenchmarkAppendFloatLarge3-48         500000           278 ns/op
@dvyukov

This comment has been minimized.

Copy link
Member Author

@dvyukov dvyukov commented May 13, 2016

320 times slower WAT?1!1
100 microseconds. Does it contact FFS (Float-Formatting Service)?

@griesemer

This comment has been minimized.

Copy link
Contributor

@griesemer griesemer commented May 13, 2016

This is a corner-case of a corner-case - extremely unlikely to matter in real-world apps. We have large-exponent values (unlikely), and for the BenchmarkAppendFloatLarge2-48 we have a slow-path:

In genericFtoa, we end up calling bigFtoa which uses slow bignum arithmetic. In bigFtoa, the call to roundShortest fails to return quickly, and it proceeds doing the slow (but precise) binary-to-decimal mantissa conversion (ftoa.go:257ff).

Probably not worth spending much time on fixing this.

@dvyukov

This comment has been minimized.

Copy link
Member Author

@dvyukov dvyukov commented May 13, 2016

Looks like a nice DoS attack vector on any Go software that ever formats floats.

Also, strconv.AppendFloat(dst[:0], 622666234635.3213e-320, 'f', -1, 64) outputs just zeros, but still takes 100+ microseconds.

@griesemer

This comment has been minimized.

Copy link
Contributor

@griesemer griesemer commented May 13, 2016

cc: @agl for input re: uses of float formatting for DoS attacks. I suspect this might not be unique to Go, and if so, what do other systems do?

@agl

This comment has been minimized.

Copy link
Contributor

@agl agl commented May 13, 2016

It's an interesting algorithmic complexity attack. (There have been similar issues in the past.)

It's a bit of a sharp edge, but some values will probably always take a little longer to format. We could put effort into optimising this, but it probably makes the code complex in an already complex area. Perhaps it's worth a comment, but I'm not sure about more than that.

@nightlyone

This comment has been minimized.

Copy link
Contributor

@nightlyone nightlyone commented May 17, 2016

How many of these corner case numbers exist? If their number is small, maybe a static lookup table could work....

@griesemer

This comment has been minimized.

Copy link
Contributor

@griesemer griesemer commented May 18, 2016

@nightlyone Hard to say without very careful (and difficult!) mathematical analysis.

@ALTree

This comment has been minimized.

Copy link
Member

@ALTree ALTree commented May 18, 2016

Well, the fact that Dmitry found one (in a space of 2**64 elements) by fuzzing (I believe?) suggests that the number of slow values is probably not that small.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Sep 30, 2016

Basically all floating point printing algorithms use a fast path for most numbers and fall back to a slow path. Go's fast path is the Grisu3 algorithm, which fails for 0.5% of inputs. The fallback is to some slow decimal code that I wrote a long time ago. In practice this is fine. Grisu3 was basically state of the art when it was added here, I believe by @remyoudompheng .

Very recently a new paper came out that gets the fast path down to "all but 45 float64 values", which you can then handle with a table. See Andrysco, Jhala, and Lerner,
"Printing Floating-Point Numbers: A Faster, Always Correct Method", POPL 2016.

If someone wants to implement that, great, but I don't think it's a particularly high priority.

@rsc rsc changed the title strconv: AppendFloat is super slow for some numbers strconv: implement Errol algorithm to speed float64->decimal conversion Sep 30, 2016
@cespare

This comment has been minimized.

Copy link
Contributor

@cespare cespare commented Oct 1, 2016

@rsc it seems that by the authors' own benchmarks, Errol is 2.4x slower than Grisu3. Does that seem like a reasonable tradeoff for strconv? (I ask because I'm thinking of working on this but I anticipate pushback if the change makes the 95.5% case more than twice as slow. Another alternative would be to use Errol as the Grisu3 fallback but that seems like a lot of code for printing floats.)

@ALTree

This comment has been minimized.

Copy link
Member

@ALTree ALTree commented Oct 1, 2016

FWIW I have a one-line change that makes the slow path about 35% faster on Dmitry's number:

name                      old time/op  new time/op  delta
FormatFloat/Slowpath64-4  68.6µs ± 0%  44.1µs ± 2%  -35.71%  (p=0.000 n=13+15)

It doesn't fix the problem but it's better than nothing I guess:

https://go-review.googlesource.com/#/c/30099

@remyoudompheng

This comment has been minimized.

Copy link
Contributor

@remyoudompheng remyoudompheng commented Oct 2, 2016

@cespare Having good performance for the majority of cases can be critical for some users. In my applications I would not withstand a 2.4x slowdown (however using Go's code generation the ratio could turn out to be different). If I understand correctly Errol can remove the need for big number arithmetic entirely, so it could be a fallback to Grisu3 without so much code size inflation, and it would be quite useful because if 1 number out of 200 is 300 times slower to process, the impact is far from negligible.

@bmkessler

This comment has been minimized.

Copy link
Contributor

@bmkessler bmkessler commented Sep 20, 2017

@remyoudompheng Errol does not eliminate the need for big number arithmetic entirely. For floating point numbers in the range [2^54, 2^131) it falls back to exact integer arithmetic. The algorithm is actually basically the same as previous conversion algorithms except outside that range it uses double-double (2x float64) to calculate. The double-double has enough precision except for a few enumerated cases that are looked up in a table. The compute_digits algorithm in the paper for the shortest output produces the largest correctly rounded decimal in the rounding range, so that would need to be modified to produce the closest correctly rounded decimal. Also, note that the paper does not contain any error analysis for float32. I think these issues might indicate that Errol is not ready for inclusion in the standard library.

Figure 13(c) in the paper shows the fall back performance they are benchmarking Errol against and it shows ~ linear dependence on the floating point exponent. Benchmarking the Go fall back, which I think is using the same Dragon4 algorithm, shows ~ quadratic dependence on the exponent. So the current slow path should be able to speed up quite a bit.

@bmkessler

This comment has been minimized.

Copy link
Contributor

@bmkessler bmkessler commented Jul 30, 2018

Ryū: fast float-to-string conversion, presentation, github

Just came out and is simpler and faster than grisu3 (~3x in the paper's benchmarks). It requires only integer operations, although for float64 it uses 128-bit integers that should be made more efficient by #24813 It also requires a few small lookup tables:

|Type   |  B0|  B1| #Entries| Total Memory|
-------------------------------------------
|Float32|  60|  63|       78|     624 Byte|
|Float64| 124| 124|      617|   9,872 Byte|

Since it is both simpler and faster, I would recommend investigating the ryu algorithm as the replacement for grisu3 instead of errol.

@cespare cespare changed the title strconv: implement Errol algorithm to speed float64->decimal conversion strconv: implement the Ryu algorithm to speed up float64->decimal conversion Jan 13, 2019
@cespare

This comment has been minimized.

Copy link
Contributor

@cespare cespare commented Jan 13, 2019

I have implemented Ryu in Go at https://github.com/cespare/ryu.

On my machine it is significantly faster than strconv for all the inputs I've tried.

name                                     old time/op    new time/op    delta
FormatFloat32-12                            128ns ± 1%      50ns ± 2%  -60.82%  (p=0.000 n=7+8)
FormatFloat64-12                            129ns ± 4%      65ns ± 5%  -49.54%  (p=0.000 n=7+8)
AppendFloat32/0e+00-12                     24.4ns ± 1%     3.0ns ± 1%  -87.88%  (p=0.000 n=8+8)
AppendFloat32/1e+00-12                     26.5ns ± 1%    13.2ns ± 3%  -49.98%  (p=0.000 n=8+8)
AppendFloat32/3e-01-12                     52.2ns ± 1%    32.5ns ± 2%  -37.73%  (p=0.000 n=8+8)
AppendFloat32/1e+06-12                     41.2ns ± 1%    17.9ns ± 1%  -56.45%  (p=0.000 n=8+7)
AppendFloat32/-1.2345e+02-12               83.3ns ± 2%    34.2ns ± 1%  -58.90%  (p=0.000 n=8+8)
AppendFloat64/0e+00-12                     24.5ns ± 2%     3.3ns ± 2%  -86.50%  (p=0.000 n=8+8)
AppendFloat64/1e+00-12                     26.9ns ± 1%    14.5ns ± 1%  -46.06%  (p=0.001 n=8+6)
AppendFloat64/3e-01-12                     53.0ns ± 1%    42.5ns ± 0%  -19.75%  (p=0.001 n=8+6)
AppendFloat64/1e+06-12                     41.4ns ± 1%    21.1ns ± 1%  -49.05%  (p=0.000 n=8+8)
AppendFloat64/-1.2345e+02-12               83.8ns ± 1%    43.3ns ± 1%  -48.32%  (p=0.000 n=8+8)
AppendFloat64/6.226662346353213e-309-12    25.5µs ± 1%     0.0µs ± 1%  -99.84%  (p=0.000 n=8+8)

(That last one is @dvyukov's pathological case.)

While some inputs are faster than others, there aren't any drastically slower fallback paths.

    ryu_test.go:279: after sampling 50000 float64s:
        ryu:               min = 2ns  max = 90ns     median = 41ns   mean = 41ns
        strconv (stdlib):  min = 8ns  max = 25845ns  median = 106ns  mean = 154ns

It's true that Ryu requires some lookup tables. However, in his C implementation @ulfjack (the Ryu author) has a size-optimized version that uses much smaller lookup tables in exchange for a little more CPU cost. I've implemented that as well and it's still faster than the current strconv implementation in all the cases I've looked at:

name                                     old time/op    new time/op    delta
FormatFloat32-12                            129ns ± 2%      49ns ± 1%  -61.72%  (p=0.000 n=8+8)
FormatFloat64-12                            130ns ± 3%      72ns ± 5%  -44.32%  (p=0.000 n=7+8)
AppendFloat32/0e+00-12                     24.5ns ± 2%     3.0ns ± 1%  -87.83%  (p=0.000 n=8+8)
AppendFloat32/1e+00-12                     26.4ns ± 1%    13.1ns ± 1%  -50.26%  (p=0.000 n=7+8)
AppendFloat32/3e-01-12                     52.6ns ± 2%    32.4ns ± 1%  -38.43%  (p=0.000 n=8+8)
AppendFloat32/1e+06-12                     41.3ns ± 2%    17.6ns ± 1%  -57.51%  (p=0.000 n=8+8)
AppendFloat32/-1.2345e+02-12               83.5ns ± 1%    34.4ns ± 1%  -58.82%  (p=0.000 n=8+8)
AppendFloat64/0e+00-12                     24.6ns ± 2%     3.3ns ± 1%  -86.63%  (p=0.000 n=8+8)
AppendFloat64/1e+00-12                     26.7ns ± 1%    14.6ns ± 4%  -45.51%  (p=0.000 n=8+8)
AppendFloat64/3e-01-12                     52.7ns ± 1%    50.0ns ± 1%   -5.17%  (p=0.000 n=8+8)
AppendFloat64/1e+06-12                     41.2ns ± 1%    21.1ns ± 2%  -48.61%  (p=0.000 n=7+8)
AppendFloat64/-1.2345e+02-12               83.7ns ± 1%    50.9ns ± 1%  -39.17%  (p=0.000 n=8+8)
AppendFloat64/6.226662346353213e-309-12    25.8µs ± 2%     0.0µs ± 1%  -99.81%  (p=0.000 n=8+8)

Based on these promising results and the comments by @bmkessler above, it seems like switching strconv to use Ryu is the best option available to us. I've taken the liberty of retitling the issue accordingly. I intend to work on this during the Go 1.13 cycle.

@remyoudompheng

This comment has been minimized.

Copy link
Contributor

@remyoudompheng remyoudompheng commented Jan 13, 2019

I'm also in favour of switching. I also have some version on my side in the works, but I'm taking a different path by making it look like the old Grisu code. I'll try to show it somewhere.

@cyberphone

This comment has been minimized.

Copy link

@cyberphone cyberphone commented Jan 13, 2019

remyoudompheng pushed a commit to remyoudompheng/go that referenced this issue Jan 16, 2019
A lot of internal functions are exposed in export_test.go
in order to test and bench various levels of function calls.

Random tests have been run on a few billion values.

Implements golang#15672.

Change-Id: I028faa4f97c38f51709469f7314bfd7ec12f06dd
@remyoudompheng

This comment has been minimized.

Copy link
Contributor

@remyoudompheng remyoudompheng commented Jan 16, 2019

I have pushed my own things in the following branch https://github.com/remyoudompheng/go/tree/ryu

A few notes:

  • It is implemented in strconv
  • It is implemented from scratch, but of course shows great similarity with the reference implementation and @cespare version
  • It does not support float32 in this version
  • I have implemented an addition to it: the fixed precision formatting, which is faster in corner cases (notably 16 digits), and can handle float32
  • The code is not optimized with particular tricks for production of individual digits
  • I would like to add a function for "atof" using this framework, as I did with Grisu3 in the past, I don't know what kind of speedups we can expect there
  • I would like to write unit tests which prove the fundamental mathematical statements of @ulfjack's paper
    They might be needed to validate the extension of the algorithm to fixed precision or the validity domain of 128-bit arithmetic when converting the other way (atof).
@remyoudompheng

This comment has been minimized.

Copy link
Contributor

@remyoudompheng remyoudompheng commented Jan 17, 2019

I have also prepared a bunch of denormals very hard to parse when printed in their shortest form (they are also hard for AppendFloat). The last one is "kind of" pathological but since it has few digits, the existing code handles it fine.

BenchmarkAtof/1.68514038588815e-309-4               	   50000	     18384 ns/op
BenchmarkAtof/9.11691642378e-312-4                  	   50000	     19854 ns/op
BenchmarkAtof/1.62420278e-315-4                     	10000000	        71.4 ns/op

BenchmarkAppendFloatHard/341076211242912p-1074-4   	   50000	     32519 ns/op
BenchmarkAppendFloatHard/1845284427387p-1074-4     	   50000	     32516 ns/op
BenchmarkAppendFloatHard/328742302p-1074-4         	20000000	       103 ns/op

The corresponding numbers are:

9.11691642378e-312 = 0x1ada385d67b.7fffffff5d9...p-1074
1.62420278e-315 = 0x1398359e.7fffe022p-1074
@remyoudompheng

This comment has been minimized.

Copy link
Contributor

@remyoudompheng remyoudompheng commented Jan 17, 2019

Hello,
I have pushed to my branch preliminary support for ParseFloat (commit 0350964).
When feeding in the shortest representations of float64s, only 93% of them can be processed because I have to add support (and prove correctness) for 17-digit strings.

Benchmarks of the computational part (convert uint64 mantissa and power of 10 exponent to float64).
"Old" is the old code which tries simple float64 multiply, then Grisu, then big integers.
"New" tries only Ryu you can notice the extremely constant running time.

benchmark                                  old ns/op     new ns/op     delta
BenchmarkAtof/33909e0-4                    5.64          14.9          +164.18%
BenchmarkAtof/3397784e-4-4                 6.58          20.3          +208.51%
BenchmarkAtof/509e73-4                     36.9          18.0          -51.22%
BenchmarkAtof/6226662346353213e-324-4      39.8          18.5          -53.52%
BenchmarkAtof/6808957268280643e116-4       5684          18.0          -99.68%
BenchmarkAtof/4334126125515466e-225-4      9829          18.0          -99.82%
BenchmarkAtof/168514038588815e-323-4       18157         18.6          -99.90%
BenchmarkAtof/911691642378e-323-4          19622         18.6          -99.91%
BenchmarkAtof/162420278e-323-4             40.0          18.5          -53.75%
BenchmarkAtof/22250738585072011e-324-4     40.0          18.6          -53.50%

Benchmark of ParseFloat: in this benchmark the Grisu parsing is replaced by the Ryu routine (keeping the "float64 multiply" fast path).
This implementation does not try long mantissas (so it falls back to multi-precision arithmetic). You can notice the high cost of boilerplate compared to the pure computation.

benchmark                                       old ns/op     new ns/op     delta
BenchmarkAtof/33909-4                           24.1          24.0          -0.41%
BenchmarkAtof/339.7784-4                        29.1          29.1          +0.00%
BenchmarkAtof/-5.09e75-4                        63.4          46.2          -27.13%
BenchmarkAtof/123456789123456789123456789-4     98.8          999           +911.13%
BenchmarkAtof/622666234635.3213e-320-4          82.8          64.1          -22.58%
BenchmarkAtof/33909#01-4                        23.5          23.6          +0.43%
BenchmarkAtof/339.778-4                         26.5          26.4          -0.38%
BenchmarkAtof/12.3456e32-4                      66.7          66.2          -0.75%
BenchmarkAtof/100000000000000016777215-4        844           786           -6.87%
BenchmarkAtof/100000000000000016777216-4        693           643           -7.22%
BenchmarkAtof/6808957268280643e116-4            5836          65.5          -98.88%
BenchmarkAtof/4.334126125515466e-210-4          10045         72.3          -99.28%
BenchmarkAtof/1.68514038588815e-309-4           18462         64.8          -99.65%
BenchmarkAtof/9.11691642378e-312-4              19818         58.8          -99.70%
BenchmarkAtof/1.62420278e-315-4                 71.1          54.4          -23.49%
BenchmarkAtof/2.2250738585072011e-308-4         84.2          67.1          -20.31%
@ulfjack

This comment has been minimized.

Copy link

@ulfjack ulfjack commented Jan 18, 2019

Nice. Great observation, @remyoudompheng. I haven't published my parsing code yet; really need to finish that. I believe it can be made fast regardless of input string length, i.e., scale linearly.

remyoudompheng pushed a commit to remyoudompheng/go that referenced this issue Jan 19, 2019
A lot of internal functions are exposed in export_test.go
in order to test and bench various levels of function calls.

Random tests have been run on a few billion values.

Implements golang#15672.

Change-Id: I028faa4f97c38f51709469f7314bfd7ec12f06dd
@remyoudompheng

This comment has been minimized.

Copy link
Contributor

@remyoudompheng remyoudompheng commented Jan 20, 2019

As of commit remyoudompheng@ed35176 the Atof implementation is now extended to work whenever the decimal digits form a number that fits in a uint64. It can also handle non ambiguous very long inputs that simply converting the rounded-up mantissa if it yields the same result.
This allows to work with more inputs, even a few exceptional corner cases that I have added to tests (all rejected by the "reverse Grisu3")

      // Halfway is 500016682268521616.00000000000001e229
      {"500016682268521616e229", "5.000166822685216e+246", nil}, // 18 digits necessary
      // Halfway is 1873795671212201760.9999999999999998e108
      {"1873795671212201761e108", "1.873795671212202e+126", nil}, // 19 digits (61 bits) necessary
      // Halfway is 10027399025072458413.99999999999998e140
      {"10027399025072458414e140", "1.002739902507246e+159", nil}, // 20 digits (64 bits) necessary

@ulfjack I am interested in knowing whether the "TestRyuNoCarry" is exactly the same proof as in your paper or not.
The implementation is now feature complete compared to what Grisu3 was used for (shortest formatting, fixed formatting, parsing).

Final benchmarks:

BenchmarkAtof/33909-4                           23.2          23.0          -0.86%
BenchmarkAtof/339.7784-4                        29.0          28.7          -1.03%
BenchmarkAtof/-5.09e75-4                        63.8          45.2          -29.15%
BenchmarkAtof/18446744073709551608-4            98.0          68.8          -29.80%
BenchmarkAtof/123456789123456789123456789-4     90.7          82.1          -9.48%
BenchmarkAtof/622666234635.3213e-320-4          80.0          59.1          -26.12%
BenchmarkAtof/33909#01-4                        22.9          22.2          -3.06%
BenchmarkAtof/339.778-4                         26.7          26.8          +0.37%
BenchmarkAtof/12.3456e32-4                      66.2          61.4          -7.25%
BenchmarkAtof/2.3399415873862403e69-4           2836          59.0          -97.92%
BenchmarkAtof/500016682268521616e229-4          19610         67.7          -99.65%
BenchmarkAtof/1873795671212201761e108-4         6383          76.2          -98.81%
BenchmarkAtof/10027399025072458414e140-4        7728          79.2          -98.98%
BenchmarkAtof/100000000000000016777215-4        889           833           -6.30%
BenchmarkAtof/100000000000000016777216-4        768           721           -6.12%
BenchmarkAtof/6808957268280643e116-4            5828          57.7          -99.01%
BenchmarkAtof/4.334126125515466e-210-4          9924          59.6          -99.40%
BenchmarkAtof/1.68514038588815e-309-4           18428         58.8          -99.68%
BenchmarkAtof/9.11691642378e-312-4              19888         55.0          -99.72%
BenchmarkAtof/1.62420278e-315-4                 73.7          54.3          -26.32%
BenchmarkAtof/2.2250738585072011e-308-4         82.7          61.1          -26.12%
@ulfjack

This comment has been minimized.

Copy link

@ulfjack ulfjack commented Jan 20, 2019

@remyoudompheng I'm afraid I don't understand your question. The proof in the paper only covers binary to decimal conversion, not the other way round. I have since written down an extended proof that shows that the same concepts apply to all source and target bases with minor changes for certain base pairs. I believe that my implementation closely follows the proof.

@remyoudompheng

This comment has been minimized.

Copy link
Contributor

@remyoudompheng remyoudompheng commented Jan 20, 2019

@ulfjack I meant that this unit test (https://github.com/remyoudompheng/go/blob/ed351765ae8a8307e4df08a24511ec058b7f7ccc/src/strconv/extfloat2_test.go#L478) aims at embedding a proof of what is paragraph 3.2.3 of your paper, to make the code self-contained. I believe it is essentially equivalent.

@cespare

This comment has been minimized.

Copy link
Contributor

@cespare cespare commented Mar 4, 2019

@remyoudompheng and I discussed this and he's going to work on creating the CLs for his implementation in strconv. I may help review the changes. I'm changing the assignment accordingly.

@cespare cespare assigned remyoudompheng and unassigned cespare Mar 4, 2019
@remyoudompheng

This comment has been minimized.

Copy link
Contributor

@remyoudompheng remyoudompheng commented Mar 24, 2019

Status update:

  • I spent some time creating a library of iterators over numbers that are "very hard" to round (https://github.com/remyoudompheng/fptest). That is, numbers that you cannot format by doing simple fixed-precision arithmetic (Grisu3 is 64-bit, Errol is about 104-bit, Ryū is 128-bit). It is an implementation of Farey sequences in Go. It can detect nearly any mistake in rounding or a simple corruption in powers of 10 table (typically not detected by strconv test suite).
  • I prepared commits nearly ready for submitting CL. I will submit them soon.

Shortest formatting is not yet cleaned up.

@remyoudompheng

This comment has been minimized.

Copy link
Contributor

@remyoudompheng remyoudompheng commented Mar 28, 2019

Shortest formatting is now included and I added a fix to avoid all long divisions on 32-bit platforms (especially arm where performance was awful). ARM now gets performance gains as well.

remyoudompheng pushed a commit to remyoudompheng/go that referenced this issue Mar 29, 2019
The Ryū algorithm as described in a paper by Ulf Adams,
"Ryū: Fast Float-to-String Conversion" (doi:10.1145/3192366.3192369)
is better than Grisu3 because it handles all edge cases properly.
In Grisu3, about 0.5% of float64 numbers fall back to the slow
algorithm with can be 10-200 times slower.

The core property used by the Ryū algorithm is that using
sufficiently large precision for powers of 10 can eliminate
all rounding edge cases. Such edge cases can be characterized
by an equation of shape:
    m * P <= n * 2^k <= m * (P+1)
where P is the fixed precision truncated version of the power of 10.
Solving this equation can be done using Farey sequences to enumerate
rationals n/m in the interval [P/2^k, (P+1)/2^k].

The original algorithm describes formatting to shortest decimal
representation. This patch implements a variant of this algorithm
for atof functions, using the properties:
- 64-bit powers of 10 are enough to handle 31-bit decimal mantissas
  to parse float32 values
- 128-bit powers of 10 are enough to handle 64-bit decimal mantissas
  to parse float64 values

Since Grisu3 already uses 64-bit powers of ten, the difference
in atof32 is hard to notice, but rather resides in much clearer
logic.

Powers of 10 are tabulated and will be reused for the ftoa
implementation.

AMD64 benchmarks:

benchmark                     old ns/op     new ns/op     delta
BenchmarkAtof64Decimal        38.6          38.5          -0.26%
BenchmarkAtof64Float          49.9          49.6          -0.60%
BenchmarkAtof64FloatExp       78.5          69.9          -10.96%
BenchmarkAtof64FloatExact     125           141           +12.80%
BenchmarkAtof64Big            148           161           +8.78%
BenchmarkAtof64Hard           9946          120           -98.79%
BenchmarkAtof64RandomBits     70.7          69.1          -2.26%
BenchmarkAtof64RandomFloats   70.4          70.5          +0.14%
BenchmarkAtof32Decimal        40.1          37.5          -6.48%
BenchmarkAtof32Float          48.4          45.3          -6.40%
BenchmarkAtof32FloatExp       87.1          74.1          -14.93%
BenchmarkAtof32FloatHard      951           104           -89.06%
BenchmarkAtof32Random         113           97.1          -14.07%

ARM benchmarks:

benchmark                     old ns/op     new ns/op     delta
BenchmarkAtof64Decimal        670           659           -1.64%
BenchmarkAtof64Float          2082          2050          -1.54%
BenchmarkAtof64FloatExp       1137          1044          -8.18%
BenchmarkAtof64FloatExact     1007          1623          +61.17%
BenchmarkAtof64Big            1179          1361          +15.44%
BenchmarkAtof64Hard           61099         1097          -98.20%
BenchmarkAtof64RandomBits     646           634           -1.86%
BenchmarkAtof64RandomFloats   639           627           -1.88%
BenchmarkAtof32Decimal        823           824           +0.12%
BenchmarkAtof32Float          2398          2364          -1.42%
BenchmarkAtof32FloatExp       1294          1195          -7.65%
BenchmarkAtof32FloatHard      6168          965           -84.35%
BenchmarkAtof32Random         1175          1100          -6.38%

Updates golang#15672

Change-Id: I297f2ffb038d7c4598e1365b61c13b30e9bdd7fc
remyoudompheng pushed a commit to remyoudompheng/go that referenced this issue Mar 29, 2019
This patch implements a simplified version of Ulf Adams,
"Ryū: Fast Float-to-String Conversion" (doi:10.1145/3192366.3192369)
for formatting floating-point numbers with a fixed number of decimal
digits.

It uses the same principles but does not need to handle
the complex task of finding a shortest representation.
This allows to handle a few more cases than Grisu3, notably
formatting with up to 18 significant digits.

AMD64 benchmarks

benchmark                                old ns/op     new ns/op     delta
BenchmarkAppendFloat/32Fixed8Hard        74.2          47.8          -35.58%
BenchmarkAppendFloat/32Fixed9Hard        77.1          57.6          -25.29%
BenchmarkAppendFloat/64Fixed1            62.1          48.9          -21.26%
BenchmarkAppendFloat/64Fixed2            69.6          49.3          -29.17%
BenchmarkAppendFloat/64Fixed3            63.4          50.6          -20.19%
BenchmarkAppendFloat/64Fixed4            71.5          49.1          -31.33%
BenchmarkAppendFloat/64Fixed12           95.7          71.5          -25.29%
BenchmarkAppendFloat/64Fixed16           1608          63.1          -96.08%
BenchmarkAppendFloat/64Fixed12Hard       1276          60.3          -95.27%
BenchmarkAppendFloat/64Fixed17Hard       4128          68.6          -98.34%
BenchmarkAppendFloat/64Fixed18Hard       4155          4146          -0.22%

ARM benchmarks

benchmark                                old ns/op     new ns/op     delta
BenchmarkAppendFloat/32Fixed8Hard        1045          575           -44.98%
BenchmarkAppendFloat/32Fixed9Hard        1178          996           -15.45%
BenchmarkAppendFloat/64Fixed1            781           786           +0.64%
BenchmarkAppendFloat/64Fixed2            806           694           -13.90%
BenchmarkAppendFloat/64Fixed3            765           723           -5.49%
BenchmarkAppendFloat/64Fixed4            815           648           -20.49%
BenchmarkAppendFloat/64Fixed12           1292          1039          -19.58%
BenchmarkAppendFloat/64Fixed16           20045         1103          -94.50%
BenchmarkAppendFloat/64Fixed12Hard       16041         979           -93.90%
BenchmarkAppendFloat/64Fixed17Hard       50489         1200          -97.62%
BenchmarkAppendFloat/64Fixed18Hard       53500         53630         +0.24%

Updates golang#15672

Change-Id: I160963e141dd48287ad8cf57bcc3c686277788e8
remyoudompheng pushed a commit to remyoudompheng/go that referenced this issue Mar 29, 2019
This patch implements the algorithm from Ulf Adams,
"Ryū: Fast Float-to-String Conversion" (doi:10.1145/3192366.3192369)
for formatting floating-point numbers with a fixed number of decimal
digits.

It is not a direct translation of the reference C implementation
but still follows the original paper. In particular, it uses full
128-bit powers of 10, which allows for more precision in the other
modes (fixed ftoa, atof).

AMD64 benchmarks

benchmark                                     old ns/op     new ns/op     delta
BenchmarkAppendFloat/Decimal-4                49.9          57.0          +14.23%
BenchmarkAppendFloat/Float-4                  121           89.0          -26.45%
BenchmarkAppendFloat/Exp-4                    89.4          96.4          +7.83%
BenchmarkAppendFloat/NegExp-4                 88.7          93.0          +4.85%
BenchmarkAppendFloat/LongExp-4                142           108           -23.94%
BenchmarkAppendFloat/Big-4                    144           112           -22.22%
BenchmarkAppendFloat/BinaryExp-4              43.0          43.1          +0.23%
BenchmarkAppendFloat/32Integer-4              51.4          56.4          +9.73%
BenchmarkAppendFloat/32ExactFraction-4        95.3          79.4          -16.68%
BenchmarkAppendFloat/32Point-4                121           77.2          -36.20%
BenchmarkAppendFloat/32Exp-4                  87.3          103           +17.98%
BenchmarkAppendFloat/32NegExp-4               87.1          85.2          -2.18%
BenchmarkAppendFloat/32Shortest-4             106           76.2          -28.11%
BenchmarkAppendFloat/Slowpath64-4             1016          95.3          -90.62%
BenchmarkAppendFloat/SlowpathDenormal64-4     32013         86.2          -99.73%

ARM benchmarks

benchmark                                     old ns/op     new ns/op     delta
BenchmarkAppendFloat/Decimal-4                829           678           -18.21%
BenchmarkAppendFloat/Float-4                  1367          1259          -7.90%
BenchmarkAppendFloat/Exp-4                    1100          1338          +21.64%
BenchmarkAppendFloat/NegExp-4                 1097          1336          +21.79%
BenchmarkAppendFloat/LongExp-4                1852          1367          -26.19%
BenchmarkAppendFloat/Big-4                    1885          1621          -14.01%
BenchmarkAppendFloat/BinaryExp-4              1000          966           -3.40%
BenchmarkAppendFloat/32Integer-4              892           737           -17.38%
BenchmarkAppendFloat/32ExactFraction-4        1201          1134          -5.58%
BenchmarkAppendFloat/32Point-4                1439          1085          -24.60%
BenchmarkAppendFloat/32Exp-4                  1130          1372          +21.42%
BenchmarkAppendFloat/32NegExp-4               1128          1126          -0.18%
BenchmarkAppendFloat/32Shortest-4             1368          1069          -21.86%
BenchmarkAppendFloat/Slowpath64-4             28468         1333          -95.32%
BenchmarkAppendFloat/SlowpathDenormal64-4     378975        1291          -99.66%

Fixes golang#15672

Change-Id: Ib90dfa245f62490a6666671896013cf3f9a1fb22
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Mar 29, 2019

Change https://golang.org/cl/170078 mentions this issue: strconv: implement Ryū-like algorithm for atof

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Mar 29, 2019

Change https://golang.org/cl/170079 mentions this issue: strconv: implement Ryū-like algorithm for fixed precision ftoa

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Mar 29, 2019

Change https://golang.org/cl/170080 mentions this issue: strconv: Implement Ryū algorithm for ftoa shortest mode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.