-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement floating point conversion with ryu #222
Conversation
Nice!
|
- instead of relying on semigroup for append (since semigroup is not supported until base 4.9.0
- CBool was introduced in 4.10.0
- top bits used in 64-bit multiplication (timesWord2#) are not exposed until then
This reverts commit 9f00d7f.
|
|
How important is maintaining exact conformance to
|
Good points! It would be quite unexpected to diverge from |
I had too many papers and sources open at that same time and confused myself... In correction: |
I am no expert in this area, unfortunately. There are several attack vectors:
I don't want to discourage you from contributing to This is why I would go with |
// Alternatively, the contents of this file may be used under the terms of | ||
// the Boost Software License, Version 1.0. | ||
// (See accompanying file LICENSE-Boost or copy at | ||
// https://www.boost.org/LICENSE_1_0.txt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are either of the licenses acceptable for inclusion in bytestring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefacing that I am not an expert in licenses but they seem to both be quite permissive. I believe the conditions for Apache are just that we include the license and note that copyright and modifications, which have been done.
- avoid clash with Builder.Prim.toB (aka liftFixedToBounded)
Thanks @vdukhovni for the first review. I agree with most of your points and have made the relevant changes. Unfortunately, it looks like |
cbits/ryu.h
Outdated
@@ -0,0 +1,4 @@ | |||
|
|||
#define F2S_MAX_DIGITS 16 | |||
#define D2S_MAX_DIGITS 25 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the C code use these anywhere? Just moving the magic to a C header does not entirely solve the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not currently, since the buffers are provided to it. Would you prefer they be spelled out in terms of the components?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking for evidence in the C code that to_chars()
will write at most that much output... How do we know that 16
or 25
are the right limits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point... I'm not sure they easily extractible from the algorithm but they rely on the following (which I can put in a comment?)
- The maximal number of decimal digits needed to encode an n-bit floating point (which is 9 and 17 for float and decimal respectively). These bounds are 'well known' but are derived from In-and-Out conversions. which state the binary-decimal-binary round-trip can be satisfied for an
n
digit binary number inceil(n*(log(2)/log(10))) + 1
. - That Ryu is correct and produces the shortest representation (in general, but especially for max-length values)
The decimalLength9
and decimalLength17
functions also implicitly rely on these facts. The other digits come from punctuation (-
, .
, e
, -
again in exponent) and the exponent (-38<->+38 for floats and -308<->+308 for doubles)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated 4db3078
For me, the tests ran once I added 'Data.ByteString.Builder.RealFloat' to |
I am still concerned about the license issue. It is one thing to say the licenses are compatible, and quite another to say |
- solution by vdukhovni
- duplicate definitions in RealFloat instead to avoid side-effects
- not much gained, but requires PatternGuards extension and Haskell2010
I see. I will have to defer to others on the licensing issue. FWIW, the C-code is also licensed under the BSL. |
I've reworked #227 per feedback from @sjakobi and now all required modules once again need to be listed in both .cabal files (i.e. in each of the top and I hope you'll find the review feedback helpful either way. |
{-# INLINE ryu_d2s_to_chars #-} | ||
ryu_d2s_to_chars :: Word64 -> Int32 -> Bool -> ByteString | ||
ryu_d2s_to_chars m e s = unsafeDupablePerformIO $ do | ||
fp <- mallocByteString d2s_max_digits :: IO (ForeignPtr Word8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like boundedPrim 20
would make it even faster, because it doesn't allocate a ByteString. Same goes for ryu_f2s
, ryu_d2s
and ryu_f2s_to_chars
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lumaere Can you make them return BoundedPrim
instead of ByteString
? I think it's going to make a big difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Added in b3a033d. The new results on my PC are
benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap floatDec (10000)
time 992.7 μs (984.3 μs .. 1.001 ms)
0.999 R² (0.999 R² .. 1.000 R²)
mean 987.9 μs (982.7 μs .. 998.0 μs)
std dev 22.92 μs (12.41 μs .. 36.71 μs)
variance introduced by outliers: 12% (moderately inflated)
benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap doubleDec (10000)
time 1.071 ms (1.066 ms .. 1.076 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.073 ms (1.070 ms .. 1.076 ms)
std dev 11.26 μs (8.056 μs .. 15.52 μs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
{-# INLINE ryu_d2s_to_chars #-} | ||
ryu_d2s_to_chars :: Word64 -> Int32 -> Bool -> ByteString | ||
ryu_d2s_to_chars m e s = unsafeDupablePerformIO $ do | ||
fp <- mallocByteString d2s_max_digits :: IO (ForeignPtr Word8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lumaere Can you make them return BoundedPrim
instead of ByteString
? I think it's going to make a big difference
include-dirs: ../include | ||
../cbits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the same change needs to be applied to bench-builder-csv
and bench-strict-indices below
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lumaere Ping. I think you need to update include-dirs
for bench-builder-csv
and bench-strict-indices
to make CI green.
Good news are that Bad news are that I cannot review or maintain such amount of C code. Neither CLC can be made responsible to provide maintainers with C skills in future. As noted above, I'm absolutely happy to lose 20-25% of performance in comparison to C version. @Lumaere how does it sound? |
I don't think you have to maintain it; it's a static, independent code. If you strongly object, we could release it as separate package; this is a very useful gem and it's a shame to leave it here for a long time. |
I agree that it is a shame, and believe me I do not like to see such amount of effort wasted. However, my position about large amounts of C was declared in the first reply and, unfortunately, has not changed. @Lumaere has originally written a native Haskell implementation: https://github.com/Lumaere/ryu. From my perspective productionalizing it would be an ideal solution, which could make way into I don't mind (and don't have any right or reason to mind) if someone releases this PR as a separate package; I have encouraged to do so above. |
Closing, superseded by #365. |
This PR implements
floatDec
anddoubleDec
using the Ryu algorithm described in Ulf Adams' paper. The majority of the decimal-conversion logic is pulled directly from his C implementation, with some wrappers for fixed-format printing to match GHC'sshow
.The relevant benchmarks (on my i7-8700k) are as follows: