Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement floatDec/doubleDec #115

Open
wants to merge 10 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@winterland1989
Copy link

winterland1989 commented Feb 14, 2017

Sorry for the delay, the floating format problem is definitely not as simple as i thought, but i finally made it forward somehow, in short this patch:

  • Implement doubleDec/floatDec using grisu3 with dragon4 fallback with reference here.

  • Implement floatDec using dragon4 with floatToDigits in GHC.Float.

Benchmarks are uploaded, please give a review any time you want.

@winterland1989

This comment has been minimized.

Copy link
Author

winterland1989 commented Feb 14, 2017

I have made the benchmark running again. Grisu3 give ~3x boost to floatDec/doubleDec , which is reasonable according to this.

Run on my MacBook Pro 2015(2.7GHz i5):

...
benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap floatDec (10000)
time                 6.393 ms   (6.283 ms .. 6.506 ms)
                     0.998 R²   (0.996 R² .. 0.999 R²)
mean                 6.473 ms   (6.411 ms .. 6.544 ms)
std dev              196.3 μs   (158.5 μs .. 256.4 μs)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap show float (10000)
time                 17.74 ms   (17.55 ms .. 17.95 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 17.83 ms   (17.63 ms .. 18.32 ms)
std dev              740.7 μs   (304.3 μs .. 1.398 ms)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap doubleDec (10000)
time                 10.99 ms   (10.90 ms .. 11.10 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 11.20 ms   (11.07 ms .. 11.40 ms)
std dev              415.4 μs   (176.1 μs .. 645.1 μs)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap show double (10000)
time                 30.69 ms   (30.27 ms .. 31.14 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 30.70 ms   (30.55 ms .. 30.87 ms)
std dev              352.9 μs   (285.3 μs .. 450.5 μs)


...

winter and others added some commits Feb 14, 2017

@dcoutts

This comment has been minimized.

Copy link
Member

dcoutts commented Feb 16, 2017

This looks very interesting. I'll have to review it in detail, and I invite anyone else to do so too.

@winterland1989

This comment has been minimized.

Copy link
Author

winterland1989 commented Feb 16, 2017

Yes, please ; )

@neongreen

This comment has been minimized.

Copy link

neongreen commented Feb 16, 2017

How does it compare to double-conversion?

@winterland1989

This comment has been minimized.

Copy link
Author

winterland1989 commented Feb 16, 2017

The C/C++ part should be similar since they're basically same. But it's quite hard to make a comparison because this patch use primitive builder to fill the buffer, while double-conversion directly fill the buffer in C. I'll add double-conversion to the builder benchmark locally and post the result.

Update: It seems double-conversion is doing a better job here. i think we should find the reason.

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap double conversion
time                 2.210 ms   (2.204 ms .. 2.216 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.217 ms   (2.211 ms .. 2.227 ms)
std dev              25.88 μs   (19.04 μs .. 38.26 μs)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap doubleDec (10000)
time                 7.766 ms   (7.732 ms .. 7.814 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 7.770 ms   (7.745 ms .. 7.797 ms)
std dev              74.86 μs   (57.71 μs .. 101.1 μs)

benchmarking Data.ByteString.Builder/Non-bounded encodings/foldMap show double (10000)
time                 21.65 ms   (21.31 ms .. 21.89 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 21.60 ms   (21.48 ms .. 21.72 ms)
std dev              279.3 μs   (233.4 μs .. 369.3 μs)
@winterland1989

This comment has been minimized.

Copy link
Author

winterland1989 commented Feb 17, 2017

Further benchmark indeed show that the time spend on C part is comparable to google's double conversion, much time is spent on FFI peeking, itoa, builder filling etc. If we want to go faster while keep backward-compatible format, we have to fill the buffer in C, and port doFmt to C to get exactly same formatting logic(which is not very fun).

Note that unlike this patch, double-conversion won't give you the same result as show(which is required by bytestring's test), for example:

Prelude Data.Double.Conversion.ByteString> toShortest 0.0012
"0.0012"
Prelude Data.Double.Conversion.ByteString> show 0.0012
"1.2e-3"

So for now, this patch is the best i can do. If you have any other ideas please tell me, i'm happy to give a try.

@dcoutts

This comment has been minimized.

Copy link
Member

dcoutts commented Feb 17, 2017

@winterland1989 there's no problem in principle with writing directly into the buffer in C, the only thing to worry about is that it's a client allocate buffer: so while you can ensure there's a fixed amount of space in advance, you cannot reallocate it half way through. Is that OK? Can we determine in advance the maximum size and then return the number of bytes we wrote? If so then no problem. There's a specific class of builder primitives like that (the prim ones that are not the fixed-size ones).

Indeed, the main thing I noticed when I first skimmed the code is that it's not just writing to the buffer on the C side, which I would have assumed is the way to go.

@winterland1989

This comment has been minimized.

Copy link
Author

winterland1989 commented Feb 17, 2017

Can we determine in advance the maximum size and then return the number of bytes we wrote?

Yes, Bos use 26, which works fine.

What stops me from doing so is the compatibility with GHC's show format. What i'm doing in this patch is perfectly explained by aseipp in this post's comment:

Second, the Haskell code and the C code are doing different things, which is why it's all a bit verbose. The Haskell code has to both use Grisu3 and also fall back to Dragon4 (AKA the existing floatToDigits function, written in Haskell), because Grisu3 is not complete: it may detect that its output is not "the shortest possible" for some small percent of inputs, and thus it will fail. Furthermore, despite the somewhat opaque naming -- these functions are NOT functions of type Float -> String -- they are effectively implementations of the floatsToDigits function which only gives you the list of digits and exponent as a pair of ([Int],Int). That means you still have to actually convert this to a string in the proper representation.

The code inside the patch, aside from adding Grisu, is effectively just a copy of the source code from Float's Show instance inside GHC.Float, here, but instead using the Grisu/Dragon combination, and specialized to Builder from ByteString. That's why the patch replaces floatDec and doubleDec which were previously just implemented as string7 . show (because show for Float and Double in fact was literally defined as (show x = formatRealFloat ...), and it replaces them with a copy of our already well-used formatting code.

@knupfer

This comment has been minimized.

Copy link

knupfer commented Nov 3, 2017

I'd really like this to be merged. This would remove my libs dependence on double-conversion.

int16_t b_exp, d_exp;
} power;

static const power pow_cache[] =

This comment has been minimized.

@bgamari

bgamari Nov 10, 2017

Contributor

A comment here is needed. How is this array indexed? How are the elements computed?

This comment has been minimized.

@winterland1989

winterland1989 Feb 23, 2018

Author

Argh, I forget to submit this review comment in time, sorry for the delay. To see how this array works, please check section 4 Cache Powers from the paper in the code comment. This array is a cache table for converting between 2-based power into 10-based, it store data in power struct:

typedef struct power
{
    uint64_t fract;
    int16_t b_exp, d_exp;
} power;

The data from the table satisfy this equation: fract * power(2, b_exp) ~= power(10, d_exp) . It can be calculated in such manner:

  • Decide step value and range for d_exp which satisfy the condition from the paper(Section 5). In this patch the original author choose following (which exceeded the requirement from the paper):
define MIN_CACHED_EXP -348
define CACHED_EXP_STEP 8
  • For each d_exp from [MIN_CACHED_EXP, MIN_CACHED_EXP + CACHED_EXP_STEP, -MIN_CACHED_EXP], convert power(10, d_exp) to IEEE representation, record its fract and b_exp part.

Now we get the table.

@bgamari

This comment has been minimized.

Copy link
Contributor

bgamari commented Feb 18, 2018

I agree that it would be nice to have this merged. However, it's hard to review without proper commenting.

@knupfer

This comment has been minimized.

Copy link

knupfer commented Apr 29, 2018

@winterland1989

This comment has been minimized.

Copy link
Author

winterland1989 commented May 1, 2018

I added notes on table generating, is there any problems?

@alexbiehl

This comment has been minimized.

Copy link
Member

alexbiehl commented May 17, 2018

+1 for faster floating point encoding!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.