Reimplement floatDec/doubleDec #115

winterland1989 · 2017-02-14T13:08:08Z

Sorry for the delay, the floating format problem is definitely not as simple as i thought, but i finally made it forward somehow, in short this patch:

Implement doubleDec/floatDec using grisu3 with dragon4 fallback with reference here.
~~Implement floatDec using dragon4 with floatToDigits in GHC.Float.~~

Benchmarks are uploaded, please give a review any time you want.

winterland1989 · 2017-02-14T17:38:56Z

I have made the benchmark running again. Grisu3 give ~3x boost to floatDec/doubleDec , which is reasonable according to this.

Run on my MacBook Pro 2015(2.7GHz i5):

...
benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap floatDec (10000)
time                 6.393 ms   (6.283 ms .. 6.506 ms)
                     0.998 R²   (0.996 R² .. 0.999 R²)
mean                 6.473 ms   (6.411 ms .. 6.544 ms)
std dev              196.3 μs   (158.5 μs .. 256.4 μs)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap show float (10000)
time                 17.74 ms   (17.55 ms .. 17.95 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 17.83 ms   (17.63 ms .. 18.32 ms)
std dev              740.7 μs   (304.3 μs .. 1.398 ms)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap doubleDec (10000)
time                 10.99 ms   (10.90 ms .. 11.10 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 11.20 ms   (11.07 ms .. 11.40 ms)
std dev              415.4 μs   (176.1 μs .. 645.1 μs)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap show double (10000)
time                 30.69 ms   (30.27 ms .. 31.14 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 30.70 ms   (30.55 ms .. 30.87 ms)
std dev              352.9 μs   (285.3 μs .. 450.5 μs)


...

dcoutts · 2017-02-16T12:54:51Z

This looks very interesting. I'll have to review it in detail, and I invite anyone else to do so too.

winterland1989 · 2017-02-16T13:32:01Z

Yes, please ; )

neongreen · 2017-02-16T15:01:10Z

How does it compare to double-conversion?

winterland1989 · 2017-02-16T15:28:36Z

The C/C++ part should be similar since they're basically same. But it's quite hard to make a comparison because this patch use primitive builder to fill the buffer, while double-conversion directly fill the buffer in C. I'll add double-conversion to the builder benchmark locally and post the result.

Update: It seems double-conversion is doing a better job here. i think we should find the reason.

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap double conversion
time                 2.210 ms   (2.204 ms .. 2.216 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.217 ms   (2.211 ms .. 2.227 ms)
std dev              25.88 μs   (19.04 μs .. 38.26 μs)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap doubleDec (10000)
time                 7.766 ms   (7.732 ms .. 7.814 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 7.770 ms   (7.745 ms .. 7.797 ms)
std dev              74.86 μs   (57.71 μs .. 101.1 μs)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap show double (10000)
time                 21.65 ms   (21.31 ms .. 21.89 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 21.60 ms   (21.48 ms .. 21.72 ms)
std dev              279.3 μs   (233.4 μs .. 369.3 μs)

winterland1989 · 2017-02-17T02:34:17Z

Further benchmark indeed show that the time spend on C part is comparable to google's double conversion, much time is spent on FFI peeking, itoa, builder filling etc. If we want to go faster while keep backward-compatible format, we have to fill the buffer in C, and port doFmt to C to get exactly same formatting logic(which is not very fun).

Note that unlike this patch, double-conversion won't give you the same result as show(which is required by bytestring's test), for example:

Prelude Data.Double.Conversion.ByteString> toShortest 0.0012
"0.0012"
Prelude Data.Double.Conversion.ByteString> show 0.0012
"1.2e-3"

So for now, this patch is the best i can do. If you have any other ideas please tell me, i'm happy to give a try.

dcoutts · 2017-02-17T12:35:46Z

@winterland1989 there's no problem in principle with writing directly into the buffer in C, the only thing to worry about is that it's a client allocate buffer: so while you can ensure there's a fixed amount of space in advance, you cannot reallocate it half way through. Is that OK? Can we determine in advance the maximum size and then return the number of bytes we wrote? If so then no problem. There's a specific class of builder primitives like that (the prim ones that are not the fixed-size ones).

Indeed, the main thing I noticed when I first skimmed the code is that it's not just writing to the buffer on the C side, which I would have assumed is the way to go.

winterland1989 · 2017-02-17T13:11:08Z

Can we determine in advance the maximum size and then return the number of bytes we wrote?

Yes, Bos use 26, which works fine.

What stops me from doing so is the compatibility with GHC's show format. What i'm doing in this patch is perfectly explained by aseipp in this post's comment:

Second, the Haskell code and the C code are doing different things, which is why it's all a bit verbose. The Haskell code has to both use Grisu3 and also fall back to Dragon4 (AKA the existing floatToDigits function, written in Haskell), because Grisu3 is not complete: it may detect that its output is not "the shortest possible" for some small percent of inputs, and thus it will fail. Furthermore, despite the somewhat opaque naming -- these functions are NOT functions of type Float -> String -- they are effectively implementations of the floatsToDigits function which only gives you the list of digits and exponent as a pair of ([Int],Int). That means you still have to actually convert this to a string in the proper representation.

The code inside the patch, aside from adding Grisu, is effectively just a copy of the source code from Float's Show instance inside GHC.Float, here, but instead using the Grisu/Dragon combination, and specialized to Builder from ByteString. That's why the patch replaces floatDec and doubleDec which were previously just implemented as string7 . show (because show for Float and Double in fact was literally defined as (show x = formatRealFloat ...), and it replaces them with a copy of our already well-used formatting code.

knupfer · 2017-11-03T07:30:11Z

I'd really like this to be merged. This would remove my libs dependence on double-conversion.

bgamari · 2017-08-01T18:09:34Z

cbits/dtoa.c

+    int16_t b_exp, d_exp;
+} power;
+
+static const power pow_cache[] =


A comment here is needed. How is this array indexed? How are the elements computed?

Argh, I forget to submit this review comment in time, sorry for the delay. To see how this array works, please check section 4 Cache Powers from the paper in the code comment. This array is a cache table for converting between 2-based power into 10-based, it store data in power struct:

typedef struct power { uint64_t fract; int16_t b_exp, d_exp; } power;

The data from the table satisfy this equation: fract * power(2, b_exp) ~= power(10, d_exp) . It can be calculated in such manner:

Decide step value and range for d_exp which satisfy the condition from the paper(Section 5). In this patch the original author choose following (which exceeded the requirement from the paper):

define MIN_CACHED_EXP -348 define CACHED_EXP_STEP 8

For each d_exp from [MIN_CACHED_EXP, MIN_CACHED_EXP + CACHED_EXP_STEP, -MIN_CACHED_EXP], convert power(10, d_exp) to IEEE representation, record its fract and b_exp part.

Now we get the table.

If I understand @bgamari correctly, he wanted your comment to be in the source code. (I agree that that would be useful).

bgamari · 2018-02-18T16:35:31Z

I agree that it would be nice to have this merged. However, it's hard to review without proper commenting.

knupfer · 2018-04-29T18:45:42Z

ping @winterland1989

winterland1989 · 2018-05-01T17:11:57Z

I added notes on table generating, is there any problems?

alexbiehl · 2018-05-17T09:14:11Z

+1 for faster floating point encoding!

fumieval · 2020-04-02T07:39:21Z

I was trying out Grisu3 for decimal encoding on mason, my alternative bytestring builder library. It seems to work perfectly as far as I can see, and is ~20x faster than show! I'd like this change to be merged

Bodigrim · 2020-05-10T15:59:50Z

I wonder how this compares against Ryu algorithm. @Lumaere would you possibly take a look?

@sergv could you possibly weigh in and review?

la-wu · 2020-05-11T02:52:49Z

Ulf Adams did an analysis of Ryu vs Grisu3 in his paper. Regarding performance

Grisu3 has several outliers, not shown in the graphs, which correspond to cases where it has to fall back to a slower arbitrary-precision implementation. The C implementation
of Ryu has better performance than Grisu3 and is also less volatile. On average, Ryu is roughly three times faster than Grisu for 32-bit and 64-bit floating point numbers

My implementation is in pure Haskell put does use malloc and directly writes bits for performance.

Ryu on this benchmark

[ bgroup "fold"
    [ benchB "foldMap f2s       " floatData $ foldMap (BB.string7 . f2s)
    , benchB "foldMap show float" floatData $ foldMap (BB.string7 . show)
    ]
]

on my machine (i7-8700k):

ryu> benchmarks
Running 1 benchmarks...
Benchmark ryu-bench: RUNNING...
benchmarking fold/foldMap f2s        (10000)
time                 1.783 ms   (1.759 ms .. 1.807 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 1.785 ms   (1.775 ms .. 1.802 ms)
std dev              44.22 μs   (28.67 μs .. 69.66 μs)
variance introduced by outliers: 13% (moderately inflated)

benchmarking fold/foldMap show float (10000)
time                 8.932 ms   (8.841 ms .. 9.003 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 8.862 ms   (8.825 ms .. 8.899 ms)
std dev              103.9 μs   (87.95 μs .. 123.6 μs)

This branch:

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap floatDec (10000)
time                 3.662 ms   (3.623 ms .. 3.710 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 3.646 ms   (3.632 ms .. 3.665 ms)
std dev              53.30 μs   (38.45 μs .. 71.58 μs)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap show float (10000)
time                 9.931 ms   (9.793 ms .. 10.08 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 9.776 ms   (9.717 ms .. 9.857 ms)
std dev              186.8 μs   (132.5 μs .. 260.9 μs)

My double implementation is not complete (missing fixed-point) but the exponential format is

benchmarking fold/foldMap d2s         (10000)
time                 1.908 ms   (1.899 ms .. 1.917 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.904 ms   (1.901 ms .. 1.908 ms)
std dev              11.77 μs   (9.075 μs .. 15.81 μs)

Bodigrim · 2020-05-11T22:01:42Z

@knupfer @alexbiehl @fumieval is anyone up to review this PR?

@Lumaere would it be possible to port you implementation of Ryu, avoiding extra dependencies?

la-wu · 2020-05-12T01:55:58Z

Sure, I could take a look at bringing my implementation into Bytestring. However, for performance, using the C implementation and adding Haskell bindings would ultimately be faster. The C implementation is also more feature complete and has better support for different platforms, word sizes, etc.

la-wu · 2020-05-17T22:09:30Z

Here is the PR with the Ryu implementation: #222.

With a slightly less naive benchmark of the Haskell implementation, the performance difference is closer.

benchB "foldMap f2s       " floatData $ foldMap (BB.byteString . f2sGeneral)

results in

benchmarking fold/foldMap f2s        (10000)
time                 1.355 ms   (1.348 ms .. 1.363 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 1.357 ms   (1.350 ms .. 1.365 ms)
std dev              24.90 μs   (19.70 μs .. 30.80 μs)

But the difference was significant enough that I opted to use the C implementation.

fumieval · 2020-09-11T02:18:47Z

@nikita-volkov pointed out that this algorithm produces a different result from Show Double: fumieval/mason#5

As discussed in the issue, it's an improvement, but it might break some tests in the ecosystem.

winterland1989 · 2020-09-11T06:01:29Z

@nikita-volkov pointed out that this algorithm produces a different result from Show Double: fumieval/mason#5

As discussed in the issue, it's an improvement, but it might break some tests in the ecosystem.

It's up to you to implement the logic here to match base's show. I definitely added the logic to match grisu's output with show's, otherwise, the tests won't pass.

fumieval · 2020-09-11T07:14:11Z

Ah I see, great.

neongreen · 2020-12-24T15:11:49Z

Do we need this given that there is a Ryu pull-request available? #222

My impression is that Ryu is better — am I wrong?

Bodigrim · 2020-12-24T18:19:54Z

@neongreen yes, potentially we could abandon this PR, if would be enough to proceed with #222 alone. But if #222 is stalled, finishing this one would be also a decent step forward.

I'd appreciate if someone steps in to shepherd this area of improvements.

fumieval · 2020-12-30T05:12:33Z

I think #222 is good to go; just some benchmarks are not compiling due to a few missing lines in bytestring.cabal. I asked the author to fix them but we could as well just merge it and fix them in a succeeding commit.

Who's responsible for merging PRs?

Bodigrim · 2020-12-30T19:35:09Z

As I mentioned in #222, I am sorry, but I cannot review or maintain such amount of C code. Neither CLC can be made responsible to provide maintainers with C skills in future. I am not particularly happy even with existing cbits/itoa.c, but at least it is very straightforward.

I will not particularly oppose, if other maintainers (CC @sjakobi @hsyl20) are willing to sign off and agree a sustainable way of future support.

Besides this, both PRs lack a comprehensive test suite with a coverage report.

Bodigrim · 2021-02-26T18:54:13Z

Closing, superseded by #365.

winterland1989 and others added 6 commits February 14, 2017 20:26

add grisu3 implementation

8c907ad

reimplement formatRealFloat/formatRealDouble

7db22fb

fix travis building

4da5012

use fmap instead of <$> for older ghc

fb3974e

fix benchmarks, fix Foreign.C.Types imports on old ghc

2dd1aff

remove dependency on CInt/CDouble constructors

d2386c2

winter and others added 3 commits February 15, 2017 02:18

remove wrong comments, add more INLINE

adec880

add grisu3 for single precision float

c2ed1c9

use more precise limit

81aa681

winterland1989 mentioned this pull request Feb 15, 2017

Improve doubleDec/floatDec #101

Closed

rename format functions

d7c0f4b

bgamari reviewed Nov 10, 2017

View reviewed changes

Bodigrim added the blocked: needs-review label May 11, 2020

Bodigrim mentioned this pull request May 19, 2020

Implement floating point conversion with ryu #222

Closed

sjakobi linked an issue Jun 25, 2020 that may be closed by this pull request

Improve doubleDec/floatDec #101

Closed

Bodigrim added performance floating-point labels Aug 21, 2020

Bodigrim added blocked: patch-needed somebody needs to write a patch or contribute code and removed blocked: needs-review labels Jan 10, 2021

Bodigrim closed this Feb 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplement floatDec/doubleDec #115

Reimplement floatDec/doubleDec #115

winterland1989 commented Feb 14, 2017 •

edited

Loading

winterland1989 commented Feb 14, 2017 •

edited

Loading

dcoutts commented Feb 16, 2017

winterland1989 commented Feb 16, 2017

neongreen commented Feb 16, 2017

winterland1989 commented Feb 16, 2017 •

edited

Loading

winterland1989 commented Feb 17, 2017 •

edited

Loading

dcoutts commented Feb 17, 2017 •

edited

Loading

winterland1989 commented Feb 17, 2017 •

edited

Loading

knupfer commented Nov 3, 2017

bgamari Aug 1, 2017

winterland1989 Feb 23, 2018 •

edited

Loading

sjakobi Dec 19, 2019

bgamari commented Feb 18, 2018

knupfer commented Apr 29, 2018

winterland1989 commented May 1, 2018

alexbiehl commented May 17, 2018

fumieval commented Apr 2, 2020

Bodigrim commented May 10, 2020

la-wu commented May 11, 2020

Bodigrim commented May 11, 2020

la-wu commented May 12, 2020

la-wu commented May 17, 2020

fumieval commented Sep 11, 2020

winterland1989 commented Sep 11, 2020 •

edited

Loading

fumieval commented Sep 11, 2020

neongreen commented Dec 24, 2020

Bodigrim commented Dec 24, 2020 •

edited

Loading

fumieval commented Dec 30, 2020

Bodigrim commented Dec 30, 2020

Bodigrim commented Feb 26, 2021

Reimplement floatDec/doubleDec #115

Reimplement floatDec/doubleDec #115

Conversation

winterland1989 commented Feb 14, 2017 • edited Loading

winterland1989 commented Feb 14, 2017 • edited Loading

dcoutts commented Feb 16, 2017

winterland1989 commented Feb 16, 2017

neongreen commented Feb 16, 2017

winterland1989 commented Feb 16, 2017 • edited Loading

winterland1989 commented Feb 17, 2017 • edited Loading

dcoutts commented Feb 17, 2017 • edited Loading

winterland1989 commented Feb 17, 2017 • edited Loading

knupfer commented Nov 3, 2017

bgamari Aug 1, 2017

Choose a reason for hiding this comment

winterland1989 Feb 23, 2018 • edited Loading

Choose a reason for hiding this comment

sjakobi Dec 19, 2019

Choose a reason for hiding this comment

bgamari commented Feb 18, 2018

knupfer commented Apr 29, 2018

winterland1989 commented May 1, 2018

alexbiehl commented May 17, 2018

fumieval commented Apr 2, 2020

Bodigrim commented May 10, 2020

la-wu commented May 11, 2020

Bodigrim commented May 11, 2020

la-wu commented May 12, 2020

la-wu commented May 17, 2020

fumieval commented Sep 11, 2020

winterland1989 commented Sep 11, 2020 • edited Loading

fumieval commented Sep 11, 2020

neongreen commented Dec 24, 2020

Bodigrim commented Dec 24, 2020 • edited Loading

fumieval commented Dec 30, 2020

Bodigrim commented Dec 30, 2020

Bodigrim commented Feb 26, 2021

winterland1989 commented Feb 14, 2017 •

edited

Loading

winterland1989 commented Feb 14, 2017 •

edited

Loading

winterland1989 commented Feb 16, 2017 •

edited

Loading

winterland1989 commented Feb 17, 2017 •

edited

Loading

dcoutts commented Feb 17, 2017 •

edited

Loading

winterland1989 commented Feb 17, 2017 •

edited

Loading

winterland1989 Feb 23, 2018 •

edited

Loading

winterland1989 commented Sep 11, 2020 •

edited

Loading

Bodigrim commented Dec 24, 2020 •

edited

Loading