strconv: implement the Ryu algorithm to speed up float64->decimal conversion #15672

dvyukov · 2016-05-13T08:54:40Z

go version devel +15f2d0e Fri May 13 01:19:05 2016 +0000 linux/amd64

func BenchmarkAppendFloatLarge1(b *testing.B) { benchmarkAppendFloat(b, 622666234635.321e-320, 'e', -1, 64) }
func BenchmarkAppendFloatLarge2(b *testing.B) { benchmarkAppendFloat(b, 622666234635.3213e-320, 'e', -1, 64) }
func BenchmarkAppendFloatLarge3(b *testing.B) { benchmarkAppendFloat(b, 622666234635.322e-320, 'e', -1, 64) }

BenchmarkAppendFloatLarge1-48        5000000           276 ns/op
BenchmarkAppendFloatLarge2-48           2000         89589 ns/op
BenchmarkAppendFloatLarge3-48         500000           278 ns/op

The text was updated successfully, but these errors were encountered:

dvyukov · 2016-05-13T08:55:27Z

320 times slower WAT?1!1
100 microseconds. Does it contact FFS (Float-Formatting Service)?

griesemer · 2016-05-13T09:34:07Z

This is a corner-case of a corner-case - extremely unlikely to matter in real-world apps. We have large-exponent values (unlikely), and for the BenchmarkAppendFloatLarge2-48 we have a slow-path:

In genericFtoa, we end up calling bigFtoa which uses slow bignum arithmetic. In bigFtoa, the call to roundShortest fails to return quickly, and it proceeds doing the slow (but precise) binary-to-decimal mantissa conversion (ftoa.go:257ff).

Probably not worth spending much time on fixing this.

dvyukov · 2016-05-13T10:04:16Z

Looks like a nice DoS attack vector on any Go software that ever formats floats.

Also, strconv.AppendFloat(dst[:0], 622666234635.3213e-320, 'f', -1, 64) outputs just zeros, but still takes 100+ microseconds.

griesemer · 2016-05-13T11:07:00Z

cc: @agl for input re: uses of float formatting for DoS attacks. I suspect this might not be unique to Go, and if so, what do other systems do?

agl · 2016-05-13T21:24:44Z

It's an interesting algorithmic complexity attack. (There have been similar issues in the past.)

It's a bit of a sharp edge, but some values will probably always take a little longer to format. We could put effort into optimising this, but it probably makes the code complex in an already complex area. Perhaps it's worth a comment, but I'm not sure about more than that.

nightlyone · 2016-05-17T18:56:51Z

How many of these corner case numbers exist? If their number is small, maybe a static lookup table could work....

griesemer · 2016-05-18T17:53:16Z

@nightlyone Hard to say without very careful (and difficult!) mathematical analysis.

ALTree · 2016-05-18T19:24:55Z

Well, the fact that Dmitry found one (in a space of 2**64 elements) by fuzzing (I believe?) suggests that the number of slow values is probably not that small.

rsc · 2016-09-30T20:57:43Z

Basically all floating point printing algorithms use a fast path for most numbers and fall back to a slow path. Go's fast path is the Grisu3 algorithm, which fails for 0.5% of inputs. The fallback is to some slow decimal code that I wrote a long time ago. In practice this is fine. Grisu3 was basically state of the art when it was added here, I believe by @remyoudompheng .

Very recently a new paper came out that gets the fast path down to "all but 45 float64 values", which you can then handle with a table. See Andrysco, Jhala, and Lerner,
"Printing Floating-Point Numbers: A Faster, Always Correct Method", POPL 2016.

If someone wants to implement that, great, but I don't think it's a particularly high priority.

cespare · 2016-10-01T02:22:40Z

@rsc it seems that by the authors' own benchmarks, Errol is 2.4x slower than Grisu3. Does that seem like a reasonable tradeoff for strconv? (I ask because I'm thinking of working on this but I anticipate pushback if the change makes the 95.5% case more than twice as slow. Another alternative would be to use Errol as the Grisu3 fallback but that seems like a lot of code for printing floats.)

ALTree · 2016-10-01T12:38:12Z

FWIW I have a one-line change that makes the slow path about 35% faster on Dmitry's number:

name                      old time/op  new time/op  delta
FormatFloat/Slowpath64-4  68.6µs ± 0%  44.1µs ± 2%  -35.71%  (p=0.000 n=13+15)

It doesn't fix the problem but it's better than nothing I guess:

https://go-review.googlesource.com/#/c/30099

remyoudompheng · 2016-10-02T16:59:14Z

@cespare Having good performance for the majority of cases can be critical for some users. In my applications I would not withstand a 2.4x slowdown (however using Go's code generation the ratio could turn out to be different). If I understand correctly Errol can remove the need for big number arithmetic entirely, so it could be a fallback to Grisu3 without so much code size inflation, and it would be quite useful because if 1 number out of 200 is 300 times slower to process, the impact is far from negligible.

bmkessler · 2017-09-20T05:46:03Z

@remyoudompheng Errol does not eliminate the need for big number arithmetic entirely. For floating point numbers in the range [2^54, 2^131) it falls back to exact integer arithmetic. The algorithm is actually basically the same as previous conversion algorithms except outside that range it uses double-double (2x float64) to calculate. The double-double has enough precision except for a few enumerated cases that are looked up in a table. The compute_digits algorithm in the paper for the shortest output produces the largest correctly rounded decimal in the rounding range, so that would need to be modified to produce the closest correctly rounded decimal. Also, note that the paper does not contain any error analysis for float32. I think these issues might indicate that Errol is not ready for inclusion in the standard library.

Figure 13(c) in the paper shows the fall back performance they are benchmarking Errol against and it shows ~ linear dependence on the floating point exponent. Benchmarking the Go fall back, which I think is using the same Dragon4 algorithm, shows ~ quadratic dependence on the exponent. So the current slow path should be able to speed up quite a bit.

bmkessler · 2018-07-30T16:31:28Z

Ryū: fast float-to-string conversion, presentation, github

Just came out and is simpler and faster than grisu3 (~3x in the paper's benchmarks). It requires only integer operations, although for float64 it uses 128-bit integers that should be made more efficient by #24813 It also requires a few small lookup tables:

|Type   |  B0|  B1| #Entries| Total Memory|
-------------------------------------------
|Float32|  60|  63|       78|     624 Byte|
|Float64| 124| 124|      617|   9,872 Byte|

Since it is both simpler and faster, I would recommend investigating the ryu algorithm as the replacement for grisu3 instead of errol.

cespare · 2019-01-13T06:53:02Z

I have implemented Ryu in Go at https://github.com/cespare/ryu.

On my machine it is significantly faster than strconv for all the inputs I've tried.

name                                     old time/op    new time/op    delta
FormatFloat32-12                            128ns ± 1%      50ns ± 2%  -60.82%  (p=0.000 n=7+8)
FormatFloat64-12                            129ns ± 4%      65ns ± 5%  -49.54%  (p=0.000 n=7+8)
AppendFloat32/0e+00-12                     24.4ns ± 1%     3.0ns ± 1%  -87.88%  (p=0.000 n=8+8)
AppendFloat32/1e+00-12                     26.5ns ± 1%    13.2ns ± 3%  -49.98%  (p=0.000 n=8+8)
AppendFloat32/3e-01-12                     52.2ns ± 1%    32.5ns ± 2%  -37.73%  (p=0.000 n=8+8)
AppendFloat32/1e+06-12                     41.2ns ± 1%    17.9ns ± 1%  -56.45%  (p=0.000 n=8+7)
AppendFloat32/-1.2345e+02-12               83.3ns ± 2%    34.2ns ± 1%  -58.90%  (p=0.000 n=8+8)
AppendFloat64/0e+00-12                     24.5ns ± 2%     3.3ns ± 2%  -86.50%  (p=0.000 n=8+8)
AppendFloat64/1e+00-12                     26.9ns ± 1%    14.5ns ± 1%  -46.06%  (p=0.001 n=8+6)
AppendFloat64/3e-01-12                     53.0ns ± 1%    42.5ns ± 0%  -19.75%  (p=0.001 n=8+6)
AppendFloat64/1e+06-12                     41.4ns ± 1%    21.1ns ± 1%  -49.05%  (p=0.000 n=8+8)
AppendFloat64/-1.2345e+02-12               83.8ns ± 1%    43.3ns ± 1%  -48.32%  (p=0.000 n=8+8)
AppendFloat64/6.226662346353213e-309-12    25.5µs ± 1%     0.0µs ± 1%  -99.84%  (p=0.000 n=8+8)

(That last one is @dvyukov's pathological case.)

While some inputs are faster than others, there aren't any drastically slower fallback paths.

    ryu_test.go:279: after sampling 50000 float64s:
        ryu:               min = 2ns  max = 90ns     median = 41ns   mean = 41ns
        strconv (stdlib):  min = 8ns  max = 25845ns  median = 106ns  mean = 154ns

It's true that Ryu requires some lookup tables. However, in his C implementation @ulfjack (the Ryu author) has a size-optimized version that uses much smaller lookup tables in exchange for a little more CPU cost. I've implemented that as well and it's still faster than the current strconv implementation in all the cases I've looked at:

name                                     old time/op    new time/op    delta
FormatFloat32-12                            129ns ± 2%      49ns ± 1%  -61.72%  (p=0.000 n=8+8)
FormatFloat64-12                            130ns ± 3%      72ns ± 5%  -44.32%  (p=0.000 n=7+8)
AppendFloat32/0e+00-12                     24.5ns ± 2%     3.0ns ± 1%  -87.83%  (p=0.000 n=8+8)
AppendFloat32/1e+00-12                     26.4ns ± 1%    13.1ns ± 1%  -50.26%  (p=0.000 n=7+8)
AppendFloat32/3e-01-12                     52.6ns ± 2%    32.4ns ± 1%  -38.43%  (p=0.000 n=8+8)
AppendFloat32/1e+06-12                     41.3ns ± 2%    17.6ns ± 1%  -57.51%  (p=0.000 n=8+8)
AppendFloat32/-1.2345e+02-12               83.5ns ± 1%    34.4ns ± 1%  -58.82%  (p=0.000 n=8+8)
AppendFloat64/0e+00-12                     24.6ns ± 2%     3.3ns ± 1%  -86.63%  (p=0.000 n=8+8)
AppendFloat64/1e+00-12                     26.7ns ± 1%    14.6ns ± 4%  -45.51%  (p=0.000 n=8+8)
AppendFloat64/3e-01-12                     52.7ns ± 1%    50.0ns ± 1%   -5.17%  (p=0.000 n=8+8)
AppendFloat64/1e+06-12                     41.2ns ± 1%    21.1ns ± 2%  -48.61%  (p=0.000 n=7+8)
AppendFloat64/-1.2345e+02-12               83.7ns ± 1%    50.9ns ± 1%  -39.17%  (p=0.000 n=8+8)
AppendFloat64/6.226662346353213e-309-12    25.8µs ± 2%     0.0µs ± 1%  -99.81%  (p=0.000 n=8+8)

Based on these promising results and the comments by @bmkessler above, it seems like switching strconv to use Ryu is the best option available to us. I've taken the liberty of retitling the issue accordingly. I intend to work on this during the Go 1.13 cycle.

remyoudompheng · 2019-01-13T09:17:36Z

I'm also in favour of switching. I also have some version on my side in the works, but I'm taking a different path by making it look like the old Grisu code. I'll try to show it somewhere.

cyberphone · 2019-01-13T17:23:28Z

#29491 (comment)

A lot of internal functions are exposed in export_test.go in order to test and bench various levels of function calls. Random tests have been run on a few billion values. Implements golang#15672. Change-Id: I028faa4f97c38f51709469f7314bfd7ec12f06dd

remyoudompheng · 2019-01-16T07:24:29Z

I have pushed my own things in the following branch https://github.com/remyoudompheng/go/tree/ryu

A few notes:

It is implemented in strconv
It is implemented from scratch, but of course shows great similarity with the reference implementation and @cespare version
It does not support float32 in this version
I have implemented an addition to it: the fixed precision formatting, which is faster in corner cases (notably 16 digits), and can handle float32
The code is not optimized with particular tricks for production of individual digits
I would like to add a function for "atof" using this framework, as I did with Grisu3 in the past, I don't know what kind of speedups we can expect there
I would like to write unit tests which prove the fundamental mathematical statements of @ulfjack's paper
They might be needed to validate the extension of the algorithm to fixed precision or the validity domain of 128-bit arithmetic when converting the other way (atof).

remyoudompheng · 2019-01-17T07:14:10Z

I have also prepared a bunch of denormals very hard to parse when printed in their shortest form (they are also hard for AppendFloat). The last one is "kind of" pathological but since it has few digits, the existing code handles it fine.

BenchmarkAtof/1.68514038588815e-309-4               	   50000	     18384 ns/op
BenchmarkAtof/9.11691642378e-312-4                  	   50000	     19854 ns/op
BenchmarkAtof/1.62420278e-315-4                     	10000000	        71.4 ns/op

BenchmarkAppendFloatHard/341076211242912p-1074-4   	   50000	     32519 ns/op
BenchmarkAppendFloatHard/1845284427387p-1074-4     	   50000	     32516 ns/op
BenchmarkAppendFloatHard/328742302p-1074-4         	20000000	       103 ns/op

The corresponding numbers are:

9.11691642378e-312 = 0x1ada385d67b.7fffffff5d9...p-1074
1.62420278e-315 = 0x1398359e.7fffe022p-1074

remyoudompheng · 2019-01-17T23:19:15Z

Hello,
I have pushed to my branch preliminary support for ParseFloat (commit 0350964).
When feeding in the shortest representations of float64s, only 93% of them can be processed because I have to add support (and prove correctness) for 17-digit strings.

Benchmarks of the computational part (convert uint64 mantissa and power of 10 exponent to float64).
"Old" is the old code which tries simple float64 multiply, then Grisu, then big integers.
"New" tries only Ryu you can notice the extremely constant running time.

benchmark                                  old ns/op     new ns/op     delta
BenchmarkAtof/33909e0-4                    5.64          14.9          +164.18%
BenchmarkAtof/3397784e-4-4                 6.58          20.3          +208.51%
BenchmarkAtof/509e73-4                     36.9          18.0          -51.22%
BenchmarkAtof/6226662346353213e-324-4      39.8          18.5          -53.52%
BenchmarkAtof/6808957268280643e116-4       5684          18.0          -99.68%
BenchmarkAtof/4334126125515466e-225-4      9829          18.0          -99.82%
BenchmarkAtof/168514038588815e-323-4       18157         18.6          -99.90%
BenchmarkAtof/911691642378e-323-4          19622         18.6          -99.91%
BenchmarkAtof/162420278e-323-4             40.0          18.5          -53.75%
BenchmarkAtof/22250738585072011e-324-4     40.0          18.6          -53.50%

Benchmark of ParseFloat: in this benchmark the Grisu parsing is replaced by the Ryu routine (keeping the "float64 multiply" fast path).
This implementation does not try long mantissas (so it falls back to multi-precision arithmetic). You can notice the high cost of boilerplate compared to the pure computation.

benchmark                                       old ns/op     new ns/op     delta
BenchmarkAtof/33909-4                           24.1          24.0          -0.41%
BenchmarkAtof/339.7784-4                        29.1          29.1          +0.00%
BenchmarkAtof/-5.09e75-4                        63.4          46.2          -27.13%
BenchmarkAtof/123456789123456789123456789-4     98.8          999           +911.13%
BenchmarkAtof/622666234635.3213e-320-4          82.8          64.1          -22.58%
BenchmarkAtof/33909#01-4                        23.5          23.6          +0.43%
BenchmarkAtof/339.778-4                         26.5          26.4          -0.38%
BenchmarkAtof/12.3456e32-4                      66.7          66.2          -0.75%
BenchmarkAtof/100000000000000016777215-4        844           786           -6.87%
BenchmarkAtof/100000000000000016777216-4        693           643           -7.22%
BenchmarkAtof/6808957268280643e116-4            5836          65.5          -98.88%
BenchmarkAtof/4.334126125515466e-210-4          10045         72.3          -99.28%
BenchmarkAtof/1.68514038588815e-309-4           18462         64.8          -99.65%
BenchmarkAtof/9.11691642378e-312-4              19818         58.8          -99.70%
BenchmarkAtof/1.62420278e-315-4                 71.1          54.4          -23.49%
BenchmarkAtof/2.2250738585072011e-308-4         84.2          67.1          -20.31%

ulfjack · 2019-01-18T11:42:14Z

Nice. Great observation, @remyoudompheng. I haven't published my parsing code yet; really need to finish that. I believe it can be made fast regardless of input string length, i.e., scale linearly.

A lot of internal functions are exposed in export_test.go in order to test and bench various levels of function calls. Random tests have been run on a few billion values. Implements golang#15672. Change-Id: I028faa4f97c38f51709469f7314bfd7ec12f06dd

remyoudompheng · 2019-01-20T13:39:33Z

As of commit remyoudompheng@ed351765ae the Atof implementation is now extended to work whenever the decimal digits form a number that fits in a uint64. It can also handle non ambiguous very long inputs that simply converting the rounded-up mantissa if it yields the same result.
This allows to work with more inputs, even a few exceptional corner cases that I have added to tests (all rejected by the "reverse Grisu3")

      // Halfway is 500016682268521616.00000000000001e229
      {"500016682268521616e229", "5.000166822685216e+246", nil}, // 18 digits necessary
      // Halfway is 1873795671212201760.9999999999999998e108
      {"1873795671212201761e108", "1.873795671212202e+126", nil}, // 19 digits (61 bits) necessary
      // Halfway is 10027399025072458413.99999999999998e140
      {"10027399025072458414e140", "1.002739902507246e+159", nil}, // 20 digits (64 bits) necessary

@ulfjack I am interested in knowing whether the "TestRyuNoCarry" is exactly the same proof as in your paper or not.
The implementation is now feature complete compared to what Grisu3 was used for (shortest formatting, fixed formatting, parsing).

Final benchmarks:

BenchmarkAtof/33909-4                           23.2          23.0          -0.86%
BenchmarkAtof/339.7784-4                        29.0          28.7          -1.03%
BenchmarkAtof/-5.09e75-4                        63.8          45.2          -29.15%
BenchmarkAtof/18446744073709551608-4            98.0          68.8          -29.80%
BenchmarkAtof/123456789123456789123456789-4     90.7          82.1          -9.48%
BenchmarkAtof/622666234635.3213e-320-4          80.0          59.1          -26.12%
BenchmarkAtof/33909#01-4                        22.9          22.2          -3.06%
BenchmarkAtof/339.778-4                         26.7          26.8          +0.37%
BenchmarkAtof/12.3456e32-4                      66.2          61.4          -7.25%
BenchmarkAtof/2.3399415873862403e69-4           2836          59.0          -97.92%
BenchmarkAtof/500016682268521616e229-4          19610         67.7          -99.65%
BenchmarkAtof/1873795671212201761e108-4         6383          76.2          -98.81%
BenchmarkAtof/10027399025072458414e140-4        7728          79.2          -98.98%
BenchmarkAtof/100000000000000016777215-4        889           833           -6.30%
BenchmarkAtof/100000000000000016777216-4        768           721           -6.12%
BenchmarkAtof/6808957268280643e116-4            5828          57.7          -99.01%
BenchmarkAtof/4.334126125515466e-210-4          9924          59.6          -99.40%
BenchmarkAtof/1.68514038588815e-309-4           18428         58.8          -99.68%
BenchmarkAtof/9.11691642378e-312-4              19888         55.0          -99.72%
BenchmarkAtof/1.62420278e-315-4                 73.7          54.3          -26.32%
BenchmarkAtof/2.2250738585072011e-308-4         82.7          61.1          -26.12%

ulfjack · 2019-01-20T14:09:08Z

@remyoudompheng I'm afraid I don't understand your question. The proof in the paper only covers binary to decimal conversion, not the other way round. I have since written down an extended proof that shows that the same concepts apply to all source and target bases with minor changes for certain base pairs. I believe that my implementation closely follows the proof.

remyoudompheng · 2019-01-20T22:27:03Z

@ulfjack I meant that this unit test (https://github.com/remyoudompheng/go/blob/ed351765ae8a8307e4df08a24511ec058b7f7ccc/src/strconv/extfloat2_test.go#L478) aims at embedding a proof of what is paragraph 3.2.3 of your paper, to make the code self-contained. I believe it is essentially equivalent.

cespare · 2019-03-04T01:06:52Z

@remyoudompheng and I discussed this and he's going to work on creating the CLs for his implementation in strconv. I may help review the changes. I'm changing the assignment accordingly.

remyoudompheng · 2019-03-24T23:23:04Z

Status update:

I spent some time creating a library of iterators over numbers that are "very hard" to round (https://github.com/remyoudompheng/fptest). That is, numbers that you cannot format by doing simple fixed-precision arithmetic (Grisu3 is 64-bit, Errol is about 104-bit, Ryū is 128-bit). It is an implementation of Farey sequences in Go. It can detect nearly any mistake in rounding or a simple corruption in powers of 10 table (typically not detected by strconv test suite).
I prepared commits nearly ready for submitting CL. I will submit them soon.
- ParseFloat : remyoudompheng@c228d1c
- FormatFloat (fixed digits) : remyoudompheng@7080c0c

Shortest formatting is not yet cleaned up.

remyoudompheng · 2019-03-28T07:52:44Z

Shortest formatting is now included and I added a fix to avoid all long divisions on 32-bit platforms (especially arm where performance was awful). ARM now gets performance gains as well.

The Ryū algorithm as described in a paper by Ulf Adams, "Ryū: Fast Float-to-String Conversion" (doi:10.1145/3192366.3192369) is better than Grisu3 because it handles all edge cases properly. In Grisu3, about 0.5% of float64 numbers fall back to the slow algorithm with can be 10-200 times slower. The core property used by the Ryū algorithm is that using sufficiently large precision for powers of 10 can eliminate all rounding edge cases. Such edge cases can be characterized by an equation of shape: m * P <= n * 2^k <= m * (P+1) where P is the fixed precision truncated version of the power of 10. Solving this equation can be done using Farey sequences to enumerate rationals n/m in the interval [P/2^k, (P+1)/2^k]. The original algorithm describes formatting to shortest decimal representation. This patch implements a variant of this algorithm for atof functions, using the properties: - 64-bit powers of 10 are enough to handle 31-bit decimal mantissas to parse float32 values - 128-bit powers of 10 are enough to handle 64-bit decimal mantissas to parse float64 values Since Grisu3 already uses 64-bit powers of ten, the difference in atof32 is hard to notice, but rather resides in much clearer logic. Powers of 10 are tabulated and will be reused for the ftoa implementation. AMD64 benchmarks: benchmark old ns/op new ns/op delta BenchmarkAtof64Decimal 38.6 38.5 -0.26% BenchmarkAtof64Float 49.9 49.6 -0.60% BenchmarkAtof64FloatExp 78.5 69.9 -10.96% BenchmarkAtof64FloatExact 125 141 +12.80% BenchmarkAtof64Big 148 161 +8.78% BenchmarkAtof64Hard 9946 120 -98.79% BenchmarkAtof64RandomBits 70.7 69.1 -2.26% BenchmarkAtof64RandomFloats 70.4 70.5 +0.14% BenchmarkAtof32Decimal 40.1 37.5 -6.48% BenchmarkAtof32Float 48.4 45.3 -6.40% BenchmarkAtof32FloatExp 87.1 74.1 -14.93% BenchmarkAtof32FloatHard 951 104 -89.06% BenchmarkAtof32Random 113 97.1 -14.07% ARM benchmarks: benchmark old ns/op new ns/op delta BenchmarkAtof64Decimal 670 659 -1.64% BenchmarkAtof64Float 2082 2050 -1.54% BenchmarkAtof64FloatExp 1137 1044 -8.18% BenchmarkAtof64FloatExact 1007 1623 +61.17% BenchmarkAtof64Big 1179 1361 +15.44% BenchmarkAtof64Hard 61099 1097 -98.20% BenchmarkAtof64RandomBits 646 634 -1.86% BenchmarkAtof64RandomFloats 639 627 -1.88% BenchmarkAtof32Decimal 823 824 +0.12% BenchmarkAtof32Float 2398 2364 -1.42% BenchmarkAtof32FloatExp 1294 1195 -7.65% BenchmarkAtof32FloatHard 6168 965 -84.35% BenchmarkAtof32Random 1175 1100 -6.38% Updates golang#15672 Change-Id: I297f2ffb038d7c4598e1365b61c13b30e9bdd7fc

This patch implements a simplified version of Ulf Adams, "Ryū: Fast Float-to-String Conversion" (doi:10.1145/3192366.3192369) for formatting floating-point numbers with a fixed number of decimal digits. It uses the same principles but does not need to handle the complex task of finding a shortest representation. This allows to handle a few more cases than Grisu3, notably formatting with up to 18 significant digits. AMD64 benchmarks benchmark old ns/op new ns/op delta BenchmarkAppendFloat/32Fixed8Hard 74.2 47.8 -35.58% BenchmarkAppendFloat/32Fixed9Hard 77.1 57.6 -25.29% BenchmarkAppendFloat/64Fixed1 62.1 48.9 -21.26% BenchmarkAppendFloat/64Fixed2 69.6 49.3 -29.17% BenchmarkAppendFloat/64Fixed3 63.4 50.6 -20.19% BenchmarkAppendFloat/64Fixed4 71.5 49.1 -31.33% BenchmarkAppendFloat/64Fixed12 95.7 71.5 -25.29% BenchmarkAppendFloat/64Fixed16 1608 63.1 -96.08% BenchmarkAppendFloat/64Fixed12Hard 1276 60.3 -95.27% BenchmarkAppendFloat/64Fixed17Hard 4128 68.6 -98.34% BenchmarkAppendFloat/64Fixed18Hard 4155 4146 -0.22% ARM benchmarks benchmark old ns/op new ns/op delta BenchmarkAppendFloat/32Fixed8Hard 1045 575 -44.98% BenchmarkAppendFloat/32Fixed9Hard 1178 996 -15.45% BenchmarkAppendFloat/64Fixed1 781 786 +0.64% BenchmarkAppendFloat/64Fixed2 806 694 -13.90% BenchmarkAppendFloat/64Fixed3 765 723 -5.49% BenchmarkAppendFloat/64Fixed4 815 648 -20.49% BenchmarkAppendFloat/64Fixed12 1292 1039 -19.58% BenchmarkAppendFloat/64Fixed16 20045 1103 -94.50% BenchmarkAppendFloat/64Fixed12Hard 16041 979 -93.90% BenchmarkAppendFloat/64Fixed17Hard 50489 1200 -97.62% BenchmarkAppendFloat/64Fixed18Hard 53500 53630 +0.24% Updates golang#15672 Change-Id: I160963e141dd48287ad8cf57bcc3c686277788e8

This patch implements the algorithm from Ulf Adams, "Ryū: Fast Float-to-String Conversion" (doi:10.1145/3192366.3192369) for formatting floating-point numbers with a fixed number of decimal digits. It is not a direct translation of the reference C implementation but still follows the original paper. In particular, it uses full 128-bit powers of 10, which allows for more precision in the other modes (fixed ftoa, atof). AMD64 benchmarks benchmark old ns/op new ns/op delta BenchmarkAppendFloat/Decimal-4 49.9 57.0 +14.23% BenchmarkAppendFloat/Float-4 121 89.0 -26.45% BenchmarkAppendFloat/Exp-4 89.4 96.4 +7.83% BenchmarkAppendFloat/NegExp-4 88.7 93.0 +4.85% BenchmarkAppendFloat/LongExp-4 142 108 -23.94% BenchmarkAppendFloat/Big-4 144 112 -22.22% BenchmarkAppendFloat/BinaryExp-4 43.0 43.1 +0.23% BenchmarkAppendFloat/32Integer-4 51.4 56.4 +9.73% BenchmarkAppendFloat/32ExactFraction-4 95.3 79.4 -16.68% BenchmarkAppendFloat/32Point-4 121 77.2 -36.20% BenchmarkAppendFloat/32Exp-4 87.3 103 +17.98% BenchmarkAppendFloat/32NegExp-4 87.1 85.2 -2.18% BenchmarkAppendFloat/32Shortest-4 106 76.2 -28.11% BenchmarkAppendFloat/Slowpath64-4 1016 95.3 -90.62% BenchmarkAppendFloat/SlowpathDenormal64-4 32013 86.2 -99.73% ARM benchmarks benchmark old ns/op new ns/op delta BenchmarkAppendFloat/Decimal-4 829 678 -18.21% BenchmarkAppendFloat/Float-4 1367 1259 -7.90% BenchmarkAppendFloat/Exp-4 1100 1338 +21.64% BenchmarkAppendFloat/NegExp-4 1097 1336 +21.79% BenchmarkAppendFloat/LongExp-4 1852 1367 -26.19% BenchmarkAppendFloat/Big-4 1885 1621 -14.01% BenchmarkAppendFloat/BinaryExp-4 1000 966 -3.40% BenchmarkAppendFloat/32Integer-4 892 737 -17.38% BenchmarkAppendFloat/32ExactFraction-4 1201 1134 -5.58% BenchmarkAppendFloat/32Point-4 1439 1085 -24.60% BenchmarkAppendFloat/32Exp-4 1130 1372 +21.42% BenchmarkAppendFloat/32NegExp-4 1128 1126 -0.18% BenchmarkAppendFloat/32Shortest-4 1368 1069 -21.86% BenchmarkAppendFloat/Slowpath64-4 28468 1333 -95.32% BenchmarkAppendFloat/SlowpathDenormal64-4 378975 1291 -99.66% Fixes golang#15672 Change-Id: Ib90dfa245f62490a6666671896013cf3f9a1fb22

gopherbot · 2019-03-29T07:23:31Z

Change https://golang.org/cl/170078 mentions this issue: strconv: implement Ryū-like algorithm for atof

gopherbot · 2019-03-29T07:23:32Z

Change https://golang.org/cl/170079 mentions this issue: strconv: implement Ryū-like algorithm for fixed precision ftoa

gopherbot · 2019-03-29T07:23:32Z

Change https://golang.org/cl/170080 mentions this issue: strconv: Implement Ryū algorithm for ftoa shortest mode

smasher164 · 2020-02-20T07:55:36Z

Reading @rsc's comment on CL 170078, it looks like these changes were intended to be reviewed early-in-cycle for 1.15. @remyoudompheng is this still the case?

This patch implements a simplified version of Ulf Adams, "Ryū: Fast Float-to-String Conversion" (doi:10.1145/3192366.3192369) for formatting floating-point numbers with a fixed number of decimal digits. It uses the same principles but does not need to handle the complex task of finding a shortest representation. This allows to handle a few more cases than Grisu3, notably formatting with up to 18 significant digits. name old time/op new time/op delta AppendFloat/32Fixed8Hard-4 72.0ns ± 2% 56.0ns ± 2% -22.28% (p=0.000 n=10+10) AppendFloat/32Fixed9Hard-4 74.8ns ± 0% 64.2ns ± 2% -14.16% (p=0.000 n=8+10) AppendFloat/64Fixed1-4 60.4ns ± 1% 54.2ns ± 1% -10.31% (p=0.000 n=10+9) AppendFloat/64Fixed2-4 66.3ns ± 1% 53.3ns ± 1% -19.54% (p=0.000 n=10+9) AppendFloat/64Fixed3-4 61.0ns ± 1% 55.0ns ± 2% -9.80% (p=0.000 n=9+10) AppendFloat/64Fixed4-4 66.9ns ± 0% 52.0ns ± 2% -22.20% (p=0.000 n=8+10) AppendFloat/64Fixed12-4 95.5ns ± 1% 76.2ns ± 3% -20.19% (p=0.000 n=10+9) AppendFloat/64Fixed16-4 1.62µs ± 0% 0.07µs ± 2% -95.69% (p=0.000 n=10+10) AppendFloat/64Fixed12Hard-4 1.27µs ± 1% 0.07µs ± 1% -94.83% (p=0.000 n=9+9) AppendFloat/64Fixed17Hard-4 3.68µs ± 1% 0.08µs ± 2% -97.86% (p=0.000 n=10+9) AppendFloat/64Fixed18Hard-4 3.67µs ± 0% 3.72µs ± 1% +1.44% (p=0.000 n=9+10) Updates #15672 Change-Id: I160963e141dd48287ad8cf57bcc3c686277788e8 Reviewed-on: https://go-review.googlesource.com/c/go/+/170079 Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com> Trust: Emmanuel Odeke <emmanuel@orijtech.com> Trust: Nigel Tao <nigeltao@golang.org> Trust: Robert Griesemer <gri@golang.org> Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com> TryBot-Result: Go Bot <gobot@golang.org>

bradfitz added the Performance label May 13, 2016

bradfitz added this to the Go1.8Maybe milestone May 13, 2016

rsc changed the title ~~strconv: AppendFloat is super slow for some numbers~~ strconv: implement Errol algorithm to speed float64->decimal conversion Sep 30, 2016

rsc modified the milestones: Unplanned, Go1.8Maybe Sep 30, 2016

cespare changed the title ~~strconv: implement Errol algorithm to speed float64->decimal conversion~~ strconv: implement the Ryu algorithm to speed up float64->decimal conversion Jan 13, 2019

cespare self-assigned this Jan 13, 2019

cespare mentioned this issue Jan 13, 2019

strconv: FormatFloat rounding error #29491

Closed

cespare assigned remyoudompheng and unassigned cespare Mar 4, 2019

vlazar mentioned this issue Nov 6, 2019

Use Ryu algorithm for floating point to string conversation crystal-lang/crystal#8441

Closed

smasher164 mentioned this issue Jan 24, 2020

strconv: inaccurate string to float64 conversion ParseFloat #36657

Closed

bufdev mentioned this issue May 14, 2020

encoding: provide canonical output format golang/protobuf#1121

Open

gopherbot closed this as completed in 61a08fc Apr 15, 2021

bcmills modified the milestones: Unplanned, Go1.17 Jun 17, 2021

wangkechun mentioned this issue Jul 1, 2021

implement the Ryu algorithm to speed up float64->decimal conversion bytedance/sonic#40

Closed

nikolaydubina mentioned this issue May 3, 2022

Should this be merged to Go core? alphadose/ZenQ#1

Open

golang locked and limited conversation to collaborators Jun 17, 2022

gopherbot added the FrozenDueToAge label Jun 17, 2022

rsc unassigned remyoudompheng Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strconv: implement the Ryu algorithm to speed up float64->decimal conversion #15672

strconv: implement the Ryu algorithm to speed up float64->decimal conversion #15672

dvyukov commented May 13, 2016 •

edited

Loading

dvyukov commented May 13, 2016

griesemer commented May 13, 2016

dvyukov commented May 13, 2016

griesemer commented May 13, 2016

agl commented May 13, 2016

nightlyone commented May 17, 2016

griesemer commented May 18, 2016

ALTree commented May 18, 2016

rsc commented Sep 30, 2016 •

edited

Loading

cespare commented Oct 1, 2016

ALTree commented Oct 1, 2016 •

edited

Loading

remyoudompheng commented Oct 2, 2016

bmkessler commented Sep 20, 2017

bmkessler commented Jul 30, 2018

cespare commented Jan 13, 2019

remyoudompheng commented Jan 13, 2019

cyberphone commented Jan 13, 2019

remyoudompheng commented Jan 16, 2019

remyoudompheng commented Jan 17, 2019

remyoudompheng commented Jan 17, 2019

ulfjack commented Jan 18, 2019

remyoudompheng commented Jan 20, 2019

ulfjack commented Jan 20, 2019

remyoudompheng commented Jan 20, 2019

cespare commented Mar 4, 2019

remyoudompheng commented Mar 24, 2019

remyoudompheng commented Mar 28, 2019

gopherbot commented Mar 29, 2019

gopherbot commented Mar 29, 2019

gopherbot commented Mar 29, 2019

smasher164 commented Feb 20, 2020

strconv: implement the Ryu algorithm to speed up float64->decimal conversion #15672

strconv: implement the Ryu algorithm to speed up float64->decimal conversion #15672

Comments

dvyukov commented May 13, 2016 • edited Loading

dvyukov commented May 13, 2016

griesemer commented May 13, 2016

dvyukov commented May 13, 2016

griesemer commented May 13, 2016

agl commented May 13, 2016

nightlyone commented May 17, 2016

griesemer commented May 18, 2016

ALTree commented May 18, 2016

rsc commented Sep 30, 2016 • edited Loading

cespare commented Oct 1, 2016

ALTree commented Oct 1, 2016 • edited Loading

remyoudompheng commented Oct 2, 2016

bmkessler commented Sep 20, 2017

bmkessler commented Jul 30, 2018

cespare commented Jan 13, 2019

remyoudompheng commented Jan 13, 2019

cyberphone commented Jan 13, 2019

remyoudompheng commented Jan 16, 2019

remyoudompheng commented Jan 17, 2019

remyoudompheng commented Jan 17, 2019

ulfjack commented Jan 18, 2019

remyoudompheng commented Jan 20, 2019

ulfjack commented Jan 20, 2019

remyoudompheng commented Jan 20, 2019

cespare commented Mar 4, 2019

remyoudompheng commented Mar 24, 2019

remyoudompheng commented Mar 28, 2019

gopherbot commented Mar 29, 2019

gopherbot commented Mar 29, 2019

gopherbot commented Mar 29, 2019

smasher164 commented Feb 20, 2020

dvyukov commented May 13, 2016 •

edited

Loading

rsc commented Sep 30, 2016 •

edited

Loading

ALTree commented Oct 1, 2016 •

edited

Loading