Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement floating point conversion with ryu #222

Closed
wants to merge 25 commits into from

Conversation

la-wu
Copy link
Contributor

@la-wu la-wu commented May 17, 2020

This PR implements floatDec and doubleDec using the Ryu algorithm described in Ulf Adams' paper. The majority of the decimal-conversion logic is pulled directly from his C implementation, with some wrappers for fixed-format printing to match GHC's show.

The relevant benchmarks (on my i7-8700k) are as follows:

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap floatDec (10000)
time                 1.105 ms   (1.097 ms .. 1.111 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.103 ms   (1.100 ms .. 1.108 ms)
std dev              13.34 μs   (10.34 μs .. 20.32 μs)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap show float (10000)
time                 9.280 ms   (9.205 ms .. 9.350 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 9.225 ms   (9.197 ms .. 9.261 ms)
std dev              88.34 μs   (69.95 μs .. 111.7 μs)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap doubleDec (10000)
time                 1.135 ms   (1.117 ms .. 1.164 ms)
                     0.998 R²   (0.997 R² .. 1.000 R²)
mean                 1.124 ms   (1.119 ms .. 1.133 ms)
std dev              22.21 μs   (13.56 μs .. 37.51 μs)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap show double (10000)
time                 17.69 ms   (17.24 ms .. 18.04 ms)
                     0.997 R²   (0.995 R² .. 0.999 R²)
mean                 17.33 ms   (17.10 ms .. 17.58 ms)
std dev              549.9 μs   (437.0 μs .. 724.7 μs)

@Bodigrim
Copy link
Contributor

Nice!

  1. Please take care of compatibility with ancient GHCs.
  2. Huge precomputed tables make me nervous. Are existing property tests strong enough to catch a typo in them?
  3. I personally cannot review C code and, to be honest, would prefer Haskell, even if it means taking a moderate performance hit. But let's wait for others to opine on this matter.

la-wu added 8 commits May 17, 2020 19:53
- instead of relying on semigroup for append (since semigroup is not
  supported until base 4.9.0
- CBool was introduced in 4.10.0
- top bits used in 64-bit multiplication (timesWord2#) are not exposed
  until then
@la-wu
Copy link
Contributor Author

la-wu commented May 18, 2020

  1. Done. This was more painful than anticipated. What do you recommend for testing old builds locally? I started trying to build GHC-7.0.4 myself and quickly felt the pain.
  2. The tables themselves are code-gen'd. I'll take a look at adding some tests to specifically exercise all the indices.
  3. Sure. To me, the advantage of the C code is that it has been more thoroughly verified (systematically across floats instead of unit tests / quickcheck). This might be a non-issue once I get through 2. though. On semi-random cases, the performance improvement is roughly 15%, but may diverge on corner cases.

@Bodigrim
Copy link
Contributor

  1. I feel your pain. I'm too lazy to setup GHC 7.0 locally, so usually just force-push until it works (and swear when it does not).

  2. Sure, I expect such tables to be generated and not god sent, but it does not really help to validate. Developing a bulletproof test suite for double-to-string conversions is beneficial for wider audience.

  3. To be honest, I would readily sacrifice 15% and even more to avoid non-trivial C code.

@la-wu
Copy link
Contributor Author

la-wu commented May 19, 2020

How important is maintaining exact conformance to show? I started some validation against it as a first-step for correctness and quickly discovered:

  1. show does round-up instead of round-even, which I believe introduces a rounding bias. This behavior is easy to match.
  2. show does not return the shortest-possible representation, which is arguably incorrect. Specifically, I dug into the floatToDigits method in Numeric that I believe implements FP show and that uses the grisu algorithm without fallback (variant 2, which 'fails' on some small percentage of cases).

@Bodigrim
Copy link
Contributor

Good points! It would be quite unexpected to diverge from show, but if the latter is arguably incorrect, it can potentially be fixed in base. Could you possibly raise these shortcomings of instance Show Double on a wider distribution list? Haskell subreddit and/or Libraries maillist, I suggest.

@la-wu
Copy link
Contributor Author

la-wu commented May 19, 2020

I had too many papers and sources open at that same time and confused myself... In correction: floatToDigits uses a variation on dragon but (perhaps intentionally?) omits the round-to-even step and that ultimately results in sub-optimal encoding length. I modified Ryu (locally) to use the same rounding method, and confirmed through exhaustive comparison of all floats that the outputs match. I still intend to bring up the shortcoming on the maillist. What are your thoughts @Bodigrim

@Bodigrim
Copy link
Contributor

I am no expert in this area, unfortunately.

There are several attack vectors:

  1. I would probably start with raising questions about show :: Double -> String in base. It is worth exploring whether folks are happy to change it (including round-to-even step or whatever is desirable) or no.

  2. Submit as a separate PR comprehensive tests. Whatever happens next, such tests will be helpful for this and any future implementations of float-to-string.

  3. Another option is to make a companion package bytestring-ryu to provide fast floatDec :: Float -> Builder and doubleDec :: Double -> Builder. Are there enough internals exposed? Being a separate package allows to put a big red warning that this is not exactly the same conversion as show does.

I don't want to discourage you from contributing to bytestring. But as you might guess from #115, submitting a huge chunk of code makes maintainers nervous and takes ages to review and merge. That's because the price of a mistake is extremely high, and it is difficult to roll out a hot fix of the package, tied to GHC.

This is why I would go with bytestring-ryu, promote it, propose other packages to use it, gather feedback, make couple of iterations. (I'm happy to help at any stage, if needed.) This will give a much better ground to adopt it as a standard algorithm in bytestring later.

Data/ByteString/Builder/RealFloat.hs Outdated Show resolved Hide resolved
Data/ByteString/Builder/RealFloat.hs Outdated Show resolved Hide resolved
Data/ByteString/Builder/RealFloat.hs Outdated Show resolved Hide resolved
Data/ByteString/Builder/RealFloat.hs Outdated Show resolved Hide resolved
Data/ByteString/Builder/RealFloat.hs Outdated Show resolved Hide resolved
Data/ByteString/Builder/RealFloat.hs Outdated Show resolved Hide resolved
Data/ByteString/Builder/RealFloat.hs Outdated Show resolved Hide resolved
Data/ByteString/Builder/RealFloat.hs Outdated Show resolved Hide resolved
// Alternatively, the contents of this file may be used under the terms of
// the Boost Software License, Version 1.0.
// (See accompanying file LICENSE-Boost or copy at
// https://www.boost.org/LICENSE_1_0.txt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are either of the licenses acceptable for inclusion in bytestring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefacing that I am not an expert in licenses but they seem to both be quite permissive. I believe the conditions for Apache are just that we include the license and note that copyright and modifications, which have been done.

cbits/dtoa.c Show resolved Hide resolved
@la-wu
Copy link
Contributor Author

la-wu commented May 28, 2020

Thanks @vdukhovni for the first review. I agree with most of your points and have made the relevant changes. Unfortunately, it looks like RealFloat.hsc file is not picked up by cabal's test suite anymore. I spent a while trying to poke around and fix it but could not find a solution. Do you know what a solution might be?

Data/ByteString/Builder/RealFloat.hsc Show resolved Hide resolved
Data/ByteString/Builder/RealFloat.hsc Show resolved Hide resolved
Data/ByteString/Builder/RealFloat.hsc Show resolved Hide resolved
Data/ByteString/Builder/RealFloat.hsc Show resolved Hide resolved
Data/ByteString/Builder/RealFloat.hsc Show resolved Hide resolved
Data/ByteString/Builder/RealFloat.hsc Show resolved Hide resolved
cbits/ryu.h Outdated
@@ -0,0 +1,4 @@

#define F2S_MAX_DIGITS 16
#define D2S_MAX_DIGITS 25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the C code use these anywhere? Just moving the magic to a C header does not entirely solve the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently, since the buffers are provided to it. Would you prefer they be spelled out in terms of the components?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking for evidence in the C code that to_chars() will write at most that much output... How do we know that 16 or 25 are the right limits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point... I'm not sure they easily extractible from the algorithm but they rely on the following (which I can put in a comment?)

  1. The maximal number of decimal digits needed to encode an n-bit floating point (which is 9 and 17 for float and decimal respectively). These bounds are 'well known' but are derived from In-and-Out conversions. which state the binary-decimal-binary round-trip can be satisfied for an n digit binary number in ceil(n*(log(2)/log(10))) + 1.
  2. That Ryu is correct and produces the shortest representation (in general, but especially for max-length values)

The decimalLength9 and decimalLength17 functions also implicitly rely on these facts. The other digits come from punctuation (-, ., e, - again in exponent) and the exponent (-38<->+38 for floats and -308<->+308 for doubles)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 4db3078

@vdukhovni
Copy link
Contributor

Thanks @vdukhovni for the first review. I agree with most of your points and have made the relevant changes. Unfortunately, it looks like RealFloat.hsc file is not picked up by cabal's test suite anymore. I spent a while trying to poke around and fix it but could not find a solution. Do you know what a solution might be?

For me, the tests ran once I added 'Data.ByteString.Builder.RealFloat' to other-modules (in tests/bytestring-test.cabal under test-builder), but see #227, which rebased to and ran the tests via cabal configure; cabal test in the test directory. That may not be needed, but I also wanted to check that it would work...

@vdukhovni
Copy link
Contributor

I am still concerned about the license issue. It is one thing to say the licenses are compatible, and quite another to say bytestring is covered by the BSD-3-Clause license if it includes Apache2 licensed code...

la-wu added 4 commits May 28, 2020 19:58
- solution by vdukhovni
- duplicate definitions in RealFloat instead to avoid side-effects
- not much gained, but requires PatternGuards extension and Haskell2010
@la-wu
Copy link
Contributor Author

la-wu commented May 29, 2020

I see. I will have to defer to others on the licensing issue. FWIW, the C-code is also licensed under the BSL.

@vdukhovni
Copy link
Contributor

Your CI will work much better once #227 is merged. Perhaps wait until that happens and rebase.

I'd like to hear from others about the license. @hvr, @cartazio ?

@vdukhovni
Copy link
Contributor

For me, the tests ran once I added 'Data.ByteString.Builder.RealFloat' to other-modules (in tests/bytestring-test.cabal under test-builder), but see #227, which rebased to and ran the tests via cabal configure; cabal test in the test directory. That may not be needed, but I also wanted to check that it would work...

I've reworked #227 per feedback from @sjakobi and now all required modules once again need to be listed in both .cabal files (i.e. in each of the top and tests/ directories). Otherwise, the tests should go fine. I do see that the BSL license is quite permissive, worst case if this prevents inclusion in bytestring you can create a parallel library with essentially the same content, but bytestring as a dependency.

I hope you'll find the review feedback helpful either way.

{-# INLINE ryu_d2s_to_chars #-}
ryu_d2s_to_chars :: Word64 -> Int32 -> Bool -> ByteString
ryu_d2s_to_chars m e s = unsafeDupablePerformIO $ do
fp <- mallocByteString d2s_max_digits :: IO (ForeignPtr Word8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like boundedPrim 20 would make it even faster, because it doesn't allocate a ByteString. Same goes for ryu_f2s, ryu_d2s and ryu_f2s_to_chars.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lumaere Can you make them return BoundedPrim instead of ByteString? I think it's going to make a big difference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Added in b3a033d. The new results on my PC are

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap floatDec (10000)
time                 992.7 μs   (984.3 μs .. 1.001 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 987.9 μs   (982.7 μs .. 998.0 μs)
std dev              22.92 μs   (12.41 μs .. 36.71 μs)
variance introduced by outliers: 12% (moderately inflated)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap doubleDec (10000)
time                 1.071 ms   (1.066 ms .. 1.076 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.073 ms   (1.070 ms .. 1.076 ms)
std dev              11.26 μs   (8.056 μs .. 15.52 μs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

{-# INLINE ryu_d2s_to_chars #-}
ryu_d2s_to_chars :: Word64 -> Int32 -> Bool -> ByteString
ryu_d2s_to_chars m e s = unsafeDupablePerformIO $ do
fp <- mallocByteString d2s_max_digits :: IO (ForeignPtr Word8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lumaere Can you make them return BoundedPrim instead of ByteString? I think it's going to make a big difference

include-dirs: ../include
../cbits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same change needs to be applied to bench-builder-csv and bench-strict-indices below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lumaere Ping. I think you need to update include-dirs for bench-builder-csv and bench-strict-indices to make CI green.

@Bodigrim
Copy link
Contributor

Bodigrim commented Jan 6, 2021

Good news are that bytestring development has less latency now, and, I think, we'll be able to merge faster floating point conversions directly into master, once ready.

Bad news are that I cannot review or maintain such amount of C code. Neither CLC can be made responsible to provide maintainers with C skills in future. As noted above, I'm absolutely happy to lose 20-25% of performance in comparison to C version.

@Lumaere how does it sound?

@Bodigrim Bodigrim added the blocked: patch-needed somebody needs to write a patch or contribute code label Jan 10, 2021
@fumieval
Copy link
Contributor

I don't think you have to maintain it; it's a static, independent code. If you strongly object, we could release it as separate package; this is a very useful gem and it's a shame to leave it here for a long time.

@Bodigrim
Copy link
Contributor

If you strongly object, we could release it as separate package; this is a very useful gem and it's a shame to leave it here for a long time.

I agree that it is a shame, and believe me I do not like to see such amount of effort wasted. However, my position about large amounts of C was declared in the first reply and, unfortunately, has not changed.

@Lumaere has originally written a native Haskell implementation: https://github.com/Lumaere/ryu. From my perspective productionalizing it would be an ideal solution, which could make way into bytestring.

I don't mind (and don't have any right or reason to mind) if someone releases this PR as a separate package; I have encouraged to do so above.

@Bodigrim
Copy link
Contributor

Closing, superseded by #365.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked: patch-needed somebody needs to write a patch or contribute code floating-point performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants