Implement floating point conversion with ryu #222

la-wu · 2020-05-17T22:05:37Z

This PR implements floatDec and doubleDec using the Ryu algorithm described in Ulf Adams' paper. The majority of the decimal-conversion logic is pulled directly from his C implementation, with some wrappers for fixed-format printing to match GHC's show.

The relevant benchmarks (on my i7-8700k) are as follows:

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap floatDec (10000)
time                 1.105 ms   (1.097 ms .. 1.111 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.103 ms   (1.100 ms .. 1.108 ms)
std dev              13.34 μs   (10.34 μs .. 20.32 μs)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap show float (10000)
time                 9.280 ms   (9.205 ms .. 9.350 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 9.225 ms   (9.197 ms .. 9.261 ms)
std dev              88.34 μs   (69.95 μs .. 111.7 μs)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap doubleDec (10000)
time                 1.135 ms   (1.117 ms .. 1.164 ms)
                     0.998 R²   (0.997 R² .. 1.000 R²)
mean                 1.124 ms   (1.119 ms .. 1.133 ms)
std dev              22.21 μs   (13.56 μs .. 37.51 μs)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap show double (10000)
time                 17.69 ms   (17.24 ms .. 18.04 ms)
                     0.997 R²   (0.995 R² .. 0.999 R²)
mean                 17.33 ms   (17.10 ms .. 17.58 ms)
std dev              549.9 μs   (437.0 μs .. 724.7 μs)

Bodigrim · 2020-05-18T00:45:19Z

Nice!

Please take care of compatibility with ancient GHCs.
Huge precomputed tables make me nervous. Are existing property tests strong enough to catch a typo in them?
I personally cannot review C code and, to be honest, would prefer Haskell, even if it means taking a moderate performance hit. But let's wait for others to opine on this matter.

- instead of relying on semigroup for append (since semigroup is not supported until base 4.9.0

- CBool was introduced in 4.10.0

- top bits used in 64-bit multiplication (timesWord2#) are not exposed until then

This reverts commit 9f00d7f.

la-wu · 2020-05-18T03:19:40Z

Done. This was more painful than anticipated. What do you recommend for testing old builds locally? I started trying to build GHC-7.0.4 myself and quickly felt the pain.
The tables themselves are code-gen'd. I'll take a look at adding some tests to specifically exercise all the indices.
Sure. To me, the advantage of the C code is that it has been more thoroughly verified (systematically across floats instead of unit tests / quickcheck). This might be a non-issue once I get through 2. though. On semi-random cases, the performance improvement is roughly 15%, but may diverge on corner cases.

Bodigrim · 2020-05-18T22:46:14Z

I feel your pain. I'm too lazy to setup GHC 7.0 locally, so usually just force-push until it works (and swear when it does not).
Sure, I expect such tables to be generated and not god sent, but it does not really help to validate. Developing a bulletproof test suite for double-to-string conversions is beneficial for wider audience.
To be honest, I would readily sacrifice 15% and even more to avoid non-trivial C code.

la-wu · 2020-05-19T00:21:27Z

How important is maintaining exact conformance to show? I started some validation against it as a first-step for correctness and quickly discovered:

show does round-up instead of round-even, which I believe introduces a rounding bias. This behavior is easy to match.
show does not return the shortest-possible representation, which is arguably incorrect. Specifically, I dug into the floatToDigits method in Numeric that I believe implements FP show and that uses the grisu algorithm without fallback (variant 2, which 'fails' on some small percentage of cases).

Bodigrim · 2020-05-19T00:48:38Z

Good points! It would be quite unexpected to diverge from show, but if the latter is arguably incorrect, it can potentially be fixed in base. Could you possibly raise these shortcomings of instance Show Double on a wider distribution list? Haskell subreddit and/or Libraries maillist, I suggest.

la-wu · 2020-05-19T06:36:14Z

I had too many papers and sources open at that same time and confused myself... In correction: floatToDigits uses a variation on dragon but (perhaps intentionally?) omits the round-to-even step and that ultimately results in sub-optimal encoding length. I modified Ryu (locally) to use the same rounding method, and confirmed through exhaustive comparison of all floats that the outputs match. I still intend to bring up the shortcoming on the maillist. What are your thoughts @Bodigrim

Bodigrim · 2020-05-19T22:04:49Z

I am no expert in this area, unfortunately.

There are several attack vectors:

I would probably start with raising questions about show :: Double -> String in base. It is worth exploring whether folks are happy to change it (including round-to-even step or whatever is desirable) or no.
Submit as a separate PR comprehensive tests. Whatever happens next, such tests will be helpful for this and any future implementations of float-to-string.
Another option is to make a companion package bytestring-ryu to provide fast floatDec :: Float -> Builder and doubleDec :: Double -> Builder. Are there enough internals exposed? Being a separate package allows to put a big red warning that this is not exactly the same conversion as show does.

I don't want to discourage you from contributing to bytestring. But as you might guess from #115, submitting a huge chunk of code makes maintainers nervous and takes ages to review and merge. That's because the price of a mistake is extremely high, and it is difficult to roll out a hot fix of the package, tied to GHC.

This is why I would go with bytestring-ryu, promote it, propose other packages to use it, gather feedback, make couple of iterations. (I'm happy to help at any stage, if needed.) This will give a much better ground to adopt it as a standard algorithm in bytestring later.

Data/ByteString/Builder/RealFloat.hs

vdukhovni · 2020-05-25T20:54:24Z

cbits/ftoa.c

+// Alternatively, the contents of this file may be used under the terms of
+// the Boost Software License, Version 1.0.
+//    (See accompanying file LICENSE-Boost or copy at
+//     https://www.boost.org/LICENSE_1_0.txt)


Are either of the licenses acceptable for inclusion in bytestring

Prefacing that I am not an expert in licenses but they seem to both be quite permissive. I believe the conditions for Apache are just that we include the license and note that copyright and modifications, which have been done.

cbits/dtoa.c

- avoid clash with Builder.Prim.toB (aka liftFixedToBounded)

la-wu · 2020-05-28T21:45:26Z

Thanks @vdukhovni for the first review. I agree with most of your points and have made the relevant changes. Unfortunately, it looks like RealFloat.hsc file is not picked up by cabal's test suite anymore. I spent a while trying to poke around and fix it but could not find a solution. Do you know what a solution might be?

Data/ByteString/Builder/RealFloat.hsc

vdukhovni · 2020-05-28T22:11:59Z

cbits/ryu.h

@@ -0,0 +1,4 @@
+
+#define F2S_MAX_DIGITS 16
+#define D2S_MAX_DIGITS 25


Does the C code use these anywhere? Just moving the magic to a C header does not entirely solve the problem.

Not currently, since the buffers are provided to it. Would you prefer they be spelled out in terms of the components?

I was looking for evidence in the C code that to_chars() will write at most that much output... How do we know that 16 or 25 are the right limits?

Good point... I'm not sure they easily extractible from the algorithm but they rely on the following (which I can put in a comment?)

The maximal number of decimal digits needed to encode an n-bit floating point (which is 9 and 17 for float and decimal respectively). These bounds are 'well known' but are derived from In-and-Out conversions. which state the binary-decimal-binary round-trip can be satisfied for an n digit binary number in ceil(n*(log(2)/log(10))) + 1.

That Ryu is correct and produces the shortest representation (in general, but especially for max-length values)

The decimalLength9 and decimalLength17 functions also implicitly rely on these facts. The other digits come from punctuation (-, ., e, - again in exponent) and the exponent (-38<->+38 for floats and -308<->+308 for doubles)

Updated 4db3078

Data/ByteString/Builder.hs

Data/ByteString/Builder/RealFloat.hsc

vdukhovni · 2020-05-28T23:45:38Z

Thanks @vdukhovni for the first review. I agree with most of your points and have made the relevant changes. Unfortunately, it looks like RealFloat.hsc file is not picked up by cabal's test suite anymore. I spent a while trying to poke around and fix it but could not find a solution. Do you know what a solution might be?

For me, the tests ran once I added 'Data.ByteString.Builder.RealFloat' to other-modules (in tests/bytestring-test.cabal under test-builder), but see #227, which rebased to and ran the tests via cabal configure; cabal test in the test directory. That may not be needed, but I also wanted to check that it would work...

vdukhovni · 2020-05-28T23:46:54Z

I am still concerned about the license issue. It is one thing to say the licenses are compatible, and quite another to say bytestring is covered by the BSD-3-Clause license if it includes Apache2 licensed code...

- solution by vdukhovni

- duplicate definitions in RealFloat instead to avoid side-effects

- not much gained, but requires PatternGuards extension and Haskell2010

la-wu · 2020-05-29T02:17:24Z

I see. I will have to defer to others on the licensing issue. FWIW, the C-code is also licensed under the BSL.

vdukhovni · 2020-05-29T02:30:43Z

Your CI will work much better once #227 is merged. Perhaps wait until that happens and rebase.

I'd like to hear from others about the license. @hvr, @cartazio ?

vdukhovni · 2020-05-30T00:05:38Z

For me, the tests ran once I added 'Data.ByteString.Builder.RealFloat' to other-modules (in tests/bytestring-test.cabal under test-builder), but see #227, which rebased to and ran the tests via cabal configure; cabal test in the test directory. That may not be needed, but I also wanted to check that it would work...

I've reworked #227 per feedback from @sjakobi and now all required modules once again need to be listed in both .cabal files (i.e. in each of the top and tests/ directories). Otherwise, the tests should go fine. I do see that the BSL license is quite permissive, worst case if this prevents inclusion in bytestring you can create a parallel library with essentially the same content, but bytestring as a dependency.

I hope you'll find the review feedback helpful either way.

fumieval · 2020-07-13T15:23:10Z

Data/ByteString/Builder/RealFloat.hsc

+{-# INLINE ryu_d2s_to_chars #-}
+ryu_d2s_to_chars :: Word64 -> Int32 -> Bool -> ByteString
+ryu_d2s_to_chars m e s = unsafeDupablePerformIO $ do
+    fp <- mallocByteString d2s_max_digits :: IO (ForeignPtr Word8)


Something like boundedPrim 20 would make it even faster, because it doesn't allocate a ByteString. Same goes for ryu_f2s, ryu_d2s and ryu_f2s_to_chars.

@Lumaere Can you make them return BoundedPrim instead of ByteString? I think it's going to make a big difference

Sure. Added in b3a033d. The new results on my PC are

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap floatDec (10000) time 992.7 μs (984.3 μs .. 1.001 ms) 0.999 R² (0.999 R² .. 1.000 R²) mean 987.9 μs (982.7 μs .. 998.0 μs) std dev 22.92 μs (12.41 μs .. 36.71 μs) variance introduced by outliers: 12% (moderately inflated) benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap doubleDec (10000) time 1.071 ms (1.066 ms .. 1.076 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.073 ms (1.070 ms .. 1.076 ms) std dev 11.26 μs (8.056 μs .. 15.52 μs)

fumieval · 2020-09-11T02:21:06Z

Data/ByteString/Builder/RealFloat.hsc

+{-# INLINE ryu_d2s_to_chars #-}
+ryu_d2s_to_chars :: Word64 -> Int32 -> Bool -> ByteString
+ryu_d2s_to_chars m e s = unsafeDupablePerformIO $ do
+    fp <- mallocByteString d2s_max_digits :: IO (ForeignPtr Word8)


@Lumaere Can you make them return BoundedPrim instead of ByteString? I think it's going to make a big difference

- ~5-10% speedup on foldMap *Dec cases

fumieval · 2020-09-16T04:09:09Z

bench/bench-bytestring.cabal

  include-dirs:     ../include
+                    ../cbits


I think the same change needs to be applied to bench-builder-csv and bench-strict-indices below.

@Lumaere Ping. I think you need to update include-dirs for bench-builder-csv and bench-strict-indices to make CI green.

Bodigrim · 2021-01-06T19:25:01Z

Good news are that bytestring development has less latency now, and, I think, we'll be able to merge faster floating point conversions directly into master, once ready.

Bad news are that I cannot review or maintain such amount of C code. Neither CLC can be made responsible to provide maintainers with C skills in future. As noted above, I'm absolutely happy to lose 20-25% of performance in comparison to C version.

@Lumaere how does it sound?

fumieval · 2021-01-20T01:22:28Z

I don't think you have to maintain it; it's a static, independent code. If you strongly object, we could release it as separate package; this is a very useful gem and it's a shame to leave it here for a long time.

Bodigrim · 2021-01-20T02:15:45Z

If you strongly object, we could release it as separate package; this is a very useful gem and it's a shame to leave it here for a long time.

I agree that it is a shame, and believe me I do not like to see such amount of effort wasted. However, my position about large amounts of C was declared in the first reply and, unfortunately, has not changed.

@Lumaere has originally written a native Haskell implementation: https://github.com/Lumaere/ryu. From my perspective productionalizing it would be an ideal solution, which could make way into bytestring.

I don't mind (and don't have any right or reason to mind) if someone releases this PR as a separate package; I have encouraged to do so above.

Bodigrim · 2021-02-26T18:53:58Z

Closing, superseded by #365.

Implement floating point conversion with ryu

f552c28

la-wu mentioned this pull request May 17, 2020

Reimplement floatDec/doubleDec #115

Closed

la-wu added 8 commits May 17, 2020 19:53

Use Monoid builder instance

f599009

- instead of relying on semigroup for append (since semigroup is not supported until base 4.9.0

Remove CBool dependency

025655f

- CBool was introduced in 4.10.0

Pick correct import module

ea673ef

Import monoid and applicative as needed

6330666

Remove dependency on c-types constructors

719c169

Constrain ghc-prim version to >= 0.3

9f00d7f

- top bits used in 64-bit multiplication (timesWord2#) are not exposed until then

Revert "Constrain ghc-prim version to >= 0.3"

e7c82ed

This reverts commit 9f00d7f.

Use prelude quotRem for older GHCs

05a292d

vdukhovni reviewed May 25, 2020

View reviewed changes

la-wu added 8 commits May 28, 2020 13:41

Restrict input types to indicate expected bounds

96f30c7

Use mconcat where canonical

2202a4a

Rename local function

44336fc

- avoid clash with Builder.Prim.toB (aka liftFixedToBounded)

Note function prerequisites

285d60b

Simplify Bool-to-C conversion

0c9f43b

Name constants for max formatted float lengths

bf9c410

Consolidate common ryu c functions

5bbad32

Guard fast 64-bit division

e672ee4

vdukhovni suggested changes May 28, 2020

View reviewed changes

vdukhovni reviewed May 28, 2020

View reviewed changes

Data/ByteString/Builder.hs Outdated Show resolved Hide resolved

vdukhovni reviewed May 28, 2020

View reviewed changes

Data/ByteString/Builder/RealFloat.hsc Outdated Show resolved Hide resolved

la-wu added 4 commits May 28, 2020 19:58

Fix test build

5c3c2cd

- solution by vdukhovni

Revert removal of char7 and string7 from Builder

7b6bc51

- duplicate definitions in RealFloat instead to avoid side-effects

Add magic constant explanation, use non-null-terminated values

4db3078

Remove PatternGuards dependency

12ac021

- not much gained, but requires PatternGuards extension and Haskell2010

fumieval reviewed Jul 13, 2020

View reviewed changes

Bodigrim added performance floating-point labels Aug 21, 2020

fumieval suggested changes Sep 11, 2020

View reviewed changes

la-wu added 4 commits September 13, 2020 17:48

Add missing deps to fix bench build

e6b701b

Wrap formatting calls with BoundedPrim

b3a033d

- ~5-10% speedup on foldMap *Dec cases

Merge remote-tracking branch 'upstream/master'

6c88a2c

Rename boudedPrim usage to boundedPrim

3094eee

fumieval suggested changes Sep 16, 2020

View reviewed changes

Bodigrim added the blocked: patch-needed somebody needs to write a patch or contribute code label Jan 10, 2021

la-wu mentioned this pull request Feb 22, 2021

Implement floating point conversion with ryu #365

Merged

Bodigrim closed this Feb 26, 2021

la-wu mentioned this pull request Jun 18, 2021

Floating point conversion tests #402

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement floating point conversion with ryu #222

Implement floating point conversion with ryu #222

la-wu commented May 17, 2020

Bodigrim commented May 18, 2020

la-wu commented May 18, 2020

Bodigrim commented May 18, 2020

la-wu commented May 19, 2020 •

edited

Loading

Bodigrim commented May 19, 2020

la-wu commented May 19, 2020

Bodigrim commented May 19, 2020

vdukhovni May 25, 2020

la-wu May 28, 2020

la-wu commented May 28, 2020

vdukhovni May 28, 2020

la-wu May 28, 2020

vdukhovni May 28, 2020

la-wu May 28, 2020

la-wu May 29, 2020

vdukhovni commented May 28, 2020

vdukhovni commented May 28, 2020

la-wu commented May 29, 2020

vdukhovni commented May 29, 2020

vdukhovni commented May 30, 2020

fumieval Jul 13, 2020

fumieval Sep 11, 2020

la-wu Sep 13, 2020

fumieval Sep 16, 2020

fumieval Sep 11, 2020

fumieval Sep 16, 2020

fumieval Dec 30, 2020

Bodigrim commented Jan 6, 2021

fumieval commented Jan 20, 2021

Bodigrim commented Jan 20, 2021

Bodigrim commented Feb 26, 2021

		@@ -0,0 +1,4 @@

		#define F2S_MAX_DIGITS 16
		#define D2S_MAX_DIGITS 25

Implement floating point conversion with ryu #222

Implement floating point conversion with ryu #222

Conversation

la-wu commented May 17, 2020

Bodigrim commented May 18, 2020

la-wu commented May 18, 2020

Bodigrim commented May 18, 2020

la-wu commented May 19, 2020 • edited Loading

Bodigrim commented May 19, 2020

la-wu commented May 19, 2020

Bodigrim commented May 19, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

la-wu commented May 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vdukhovni commented May 28, 2020

vdukhovni commented May 28, 2020

la-wu commented May 29, 2020

vdukhovni commented May 29, 2020

vdukhovni commented May 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bodigrim commented Jan 6, 2021

fumieval commented Jan 20, 2021

Bodigrim commented Jan 20, 2021

Bodigrim commented Feb 26, 2021

la-wu commented May 19, 2020 •

edited

Loading