Implement floating point conversion with ryu #365

la-wu · 2021-02-22T04:16:23Z

This PR implements floatDec and doubleDec using the Ryu algorithm described in Ulf Adams' paper. Compared to #222, this is a native Haskell implementation based on https://github.com/Lumaere/ryu and the performance difference is relatively minimal.

However, this does add dependencies on template-haskell (which can probably be removed but will result in the hardcoding of large tables) and array.

Relevant benchmarks for this implementation on my i7-8700k

      foldMap floatDec (10000):                               OK (0.28s)
        1.0 ms ±  42 μs
      foldMap doubleDec (10000):                              OK (0.15s)
        1.1 ms ±  91 μs

And current

      foldMap floatDec (10000):                               OK (0.15s)
        9.5 ms ± 861 μs
      foldMap doubleDec (10000):                              OK (0.12s)
         17 ms ± 1.6 ms

Bodigrim · 2021-02-22T07:53:31Z

Wonderful!

We will drop older GHCs soon (see #265), do not worry about correspondent CI failures.

Could you please submit tests as a separate PR?

la-wu · 2021-02-23T02:13:44Z

Moving off of those older GHCs is great. Any thoughts on the windows build failure

C:\Users\RUNNER~1\AppData\Local\Temp\ghcCE41.o.tmp: renameFile:renamePath:MoveFileEx "\\\\?\\C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\ghcCE41.o.tmp" Just "\\\\?\\C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\ghcCE41.o": does not exist (The system cannot find the file specified.)

Some light searching found https://gitlab.haskell.org/ghc/ghc/-/issues/18274 which suggests that there is some issue with process but I'm not very familiar with windows builds

sjakobi · 2021-02-23T12:40:11Z

Some light searching found https://gitlab.haskell.org/ghc/ghc/-/issues/18274 which suggests that there is some issue with process but I'm not very familiar with windows builds

IIUC we could work around this by using a different cabal executable. Maybe we could simply upgrade to using cabal-install-3.4?! Unfortunately I'm not very familiar with our use of GitHub Actions.

Bodigrim · 2021-02-24T19:32:29Z

An issue with a temporary file used to happen before, but became much more prominent when Windows build switched to GHC 9.0. FIled as haskell/actions#36.

Not much can be done on our side, I believe, except just rerunning tests twice or thrice cabal test || cabal test || cabal test. I'll raise a PR.

Bodigrim · 2021-02-25T01:16:23Z

Let's wait for haskell/actions#39 to land, maybe it will alleviate our issues with Windows build.

Bodigrim · 2021-02-25T01:22:41Z

However, this does add dependencies on template-haskell (which can probably be removed but will result in the hardcoding of large tables) and array.

template-haskell is OK, #330 adds it to dependencies as well.
array could be OKish, but I'd prefer to avoid. Could you please point me where it is used? I wonder if we can use Addr# instead. Not only it eliminates a dependency, but also is usually significantly faster.

la-wu · 2021-02-25T02:02:42Z

The precomputed tables of multipliers and the formatting code both use arrays. Since they aren't exposed anywhere, I'll take a look at converting to raw address lookups

la-wu · 2021-02-28T23:14:13Z

@Bodigrim the array dependency has been removed but the build ran into the windows issue again.

Bodigrim · 2021-03-01T00:23:38Z

@Lumaere I rerun, all green now :)

I'm a bit confused: f65dde3 had tests, but they disappeared after push force. Could you please submit tests as a separate PR to be merged before this one?

la-wu · 2021-03-01T03:22:53Z

Can do. Some of the tests are more specific to internals (checking e.g round-trip and rounding). Should I stub those to the reference implementations or bring them back to this PR?

hsyl20 · 2021-03-01T09:11:00Z

template-haskell is OK, #330 adds it to dependencies as well.

Is it OK? bytestring is built with the stage1 compiler which doesn't support template-haskell.

Bodigrim · 2021-03-01T19:03:22Z

@hsyl20 Thanks, good to know. How does template-haskell work for other boot libraries? I see that text and exceptions depend on it.

We can review this with template-haskell, and replace it with generated splices as the very last step.

hsyl20 · 2021-03-02T10:18:17Z

@Bodigrim The issue is that we can't link/execute generated code with GHC stage1, hence no TH splice.

exceptions depends on template-haskell package but only to define MonadThrow for the Q monad, i.e. it doesn't use the TemplateHaskell extension
text only uses TH quotes to define the Lift instances for {Lazy.}Text.

We can review this with template-haskell, and replace it with generated splices as the very last step.

Indeed. Or perhaps write a little utility program to generate the tables?

Also I think we could improve the generated tables by avoiding creating ByteArray at runtime. We can't create static ByteArray but we could generate primitive strings (which have type Addr#) and directly use indexWord64OffAddr#. For example double_pow5_inv_split would look like

double_pow5_inv_split :: Addr#
double_pow5_inv_split = "\0\1\123\45..."#

We have to be careful about endianness though. Perhaps always generate little-endian tables and perform a byte-swap at runtime if the host is big-endian.

Bodigrim · 2021-05-16T16:20:19Z

Can do. Some of the tests are more specific to internals (checking e.g round-trip and rounding). Should I stub those to the reference implementations or bring them back to this PR?

@Lumaere sorry, missed your question. Do you mean that some tests from your branch fail under existing, show-based implementation (because of its infelicities)? In such case still submit them, wrapped into expectFail.

I'm keen to get this merged into the next release (which is September).

la-wu · 2021-06-17T16:23:15Z

@Bodigrim The tests were largely ported from the original C implementation of Ryu by Ulf Adams. I made the modifications to them so that they agree with show based implementation regarding rounding, etc. However, they still look at a new internal function (exposed for testing... they don't handle precision for all formats) called formatFloat / formatDouble that is based loosely on formatRealFloat. I can make the test PR expose these as

formatFloat :: FFFormat -> Maybe Int -> Float -> Builder
formatFloat fmt prec = string7 . formatRealFloat fmt prec

formatDouble :: FFFormat -> Maybe Int -> Double -> Builder
formatDouble fmt prec = string7 . formatRealFloat fmt prec

or modify all the tests to only use floatDec / doubleDec

Bodigrim · 2021-06-17T17:54:42Z

From my perspective, it is better to modify tests to use only floatDec / doubleDec.

la-wu · 2021-06-17T18:31:05Z

Works for me. The PR is #402. Let me know if you wanted it in a different format. Thanks

Bodigrim · 2021-08-01T12:27:15Z

@la-wu could you please rebase?

Bodigrim

@la-wu thanks for your astonishing work! It will take some time to review, here are my first suggestions.

Data/ByteString/Builder/ASCII.hs

Data/ByteString/Builder/RealFloat/Internal.hs

Bodigrim · 2021-08-07T23:32:29Z

Data/ByteString/Builder/RealFloat/Internal.hs

+--    doubles: 1 (sign) + 17 (mantissa) + 1 (.) + 1 (e) + 4 (exponent) = 24
+--
+maxEncodedLength :: Int
+maxEncodedLength = 32


So why it it 32 instead of 24?

IIRC, I liked rounding to the nearest power of 2 for whatever reason

Bodigrim · 2021-08-07T23:35:05Z

Data/ByteString/Builder/RealFloat/Internal.hs

+boundString s = boundedPrim maxEncodedLength $ const (pokeAll s)
+
+--         Sign -> Exp  -> Mantissa
+special :: Bool -> Bool -> Bool -> BoundedPrim ()


Could we possibly define data Foo = Foo { fooSign :: !Bool, fooExpr :: !Bool, fooMantissa :: !Bool }, so that the meaning of arguments is self-explanatory, and their ordering is less error-prone?

Is https://github.com/haskell/bytestring/pull/365/files#diff-e91607876ac26bb3b25c8b060668d06eb50becfd5a682bd59636e8de5bb9f0d3R214 clear enough?

Data/ByteString/Builder/RealFloat/Internal.hs

sjakobi

First off, apologies for the late review! I hadn't realized that this PR had been waiting for a review for so long.

I've made a first pass over the code. So far I'm pretty impressed by the clean and well-documented internals! 👍

I should point out that I'm not very experienced with floating-point encodings, so I probably won't be able to comment on much of the internals. For correctness, I'll have to rely on the test suite, which I'm grateful you have improved so much! :)

I have a few requests:

Please add module header haddocks to all the added modules.
Since I have encountered several bugs that involved fromIntegral, I have a somewhat irrational fear of it. (I also dislike that I often can't tell the types involved just from looking at it.) Would it be possible to replace its use with specialized intToFloat-style helpers? Alternatively, we could consider using -XTypeApplications, but I think that would require dropping support for GHC < 8 a bit earlier than planned so far.
I would also prefer if the types (.>>) and (.<<) were constrained a bit, if that can be done ergonomically.

Data/ByteString/Builder/RealFloat.hs

Data/ByteString/Builder/RealFloat/Internal.hs

la-wu · 2021-09-17T04:24:07Z

Thanks for the review. If you're interested in reading about the floating point string conversion, I think part 2 of the ryu paper has a good intro to it.

* [ ]  Please add module header haddocks to all the added modules.

Will do. What do you suggest I put in the fields Copyright, Maintainer, Stability?

On stability (and wrt some of your other comments that I'll get to individually as I start making edits), perhaps separating the current RealFloat.hs into a stable portion for just floatDec / doubleDec and a internal/experimental portion that has the formatFloat / formatDouble (which are also new to bytestring) could be clearer. What are your thoughts?

* [ ]  Since I have encountered several bugs that involved `fromIntegral`, I have a somewhat irrational fear of it. (I also dislike that I often can't tell the types involved just from looking at it.) Would it be possible to replace its use with specialized `intToFloat`-style helpers? Alternatively, we could consider using `-XTypeApplications`, but I think that would require dropping support for GHC < 8 a bit earlier than planned so far.

* [ ]  I would also prefer if the types `(.>>)` and `(.<<)` were constrained a bit, if that can be done ergonomically.

I think these can both be at least partially achieved. Most (all?) of the fromIntegral usages are actually just signed-to-unsigned or 32-to-64-bit conversion (and vice versa). Some of it can be elided by a more careful handling of integral types and others can get explicit annotations of the whole expression (like in Builder/Prim/Binary.hs)

sjakobi · 2021-09-17T12:28:45Z

Will do. What do you suggest I put in the fields Copyright, Maintainer, Stability?

Oh, I was mostly looking for brief module descriptions. I'm not sure we need all that other data. I usually skip it in other packages that I maintain. (CC @Bodigrim)

Do feel free to add your copyright in the modules that you've written of course!

On stability (and wrt some of your other comments that I'll get to individually as I start making edits), perhaps separating the current RealFloat.hs into a stable portion for just floatDec / doubleDec and a internal/experimental portion that has the formatFloat / formatDouble (which are also new to bytestring) could be clearer. What are your thoughts?

I do think that formatFloat / formatDouble are very desirable additions. If there is particular reason to believe that these functions will change in the near future, it might be better to to export them from an Internal module, so we don't have to do a major version bump. D.B.B.RealFloat.Internal would be the obvious candidate, but then you'd probably want to move the other contents of that module elsewhere…

I'm not overly concerned about breaking changes though. As long as there's a general sense of "improvement", I think we can do a major version bump every 2 or 3 years.

Data/ByteString/Builder/RealFloat.hs

- More intuitive name for scientific notation since we're moving away from GHC.Float.FFFormat anyway

- while precision handling is not fully implemented

sjakobi

Thanks, @la-wu. I like this direction! Needs just a bit more polishing.

sjakobi · 2021-11-15T14:54:27Z

Data/ByteString/Builder/RealFloat.hs

+-- "12.345"
+{-# INLINABLE formatFloat #-}
+formatFloat :: FloatFormat -> Float -> Builder
+formatFloat (MkFloatFormat fmt prec) f =


I think we might want this to inline when applied to only a single argument, roughly like this:

Suggested change

formatFloat (MkFloatFormat fmt prec) f =

formatFloat (MkFloatFormat fmt prec) = \f ->

@Bodigrim, what do you think?

Yes, it will be better for inlining this way.

I updated this. Might there be problems with assuming https://github.com/haskell/bytestring/pull/365/files#diff-3016af887ac9c7d42c56d7358eacf8168698724938c79cf8239cdd4957eaccdaR114 is lazy?

Can you clarify what kind of problems you're concerned about? The easiest thing might be to take a look at the Core…

Just that running f2Intermediate is unnecessary work in case of FScientific. I took a look at the dump-simpl output but it wasn't obvious to me what I should be looking for...

Data/ByteString/Builder/RealFloat.hs

- `fixed` was confusing terminology adopted from FFFormat - `FormatFloat'` -> `FormatMode` - some doc tweaks

sjakobi

Just a few more documentation tweaks.

Could you possibly generate the haddocks and check whether the markup comes out right? Especially that identifiers are hyperlinked and non-identifiers aren't accidentally rendered in mono-space font?

Data/ByteString/Builder/RealFloat.hs

sjakobi · 2021-11-18T17:44:16Z

Data/ByteString/Builder/RealFloat.hs

+-- NB: this implementation matches `show`'s output (specifically with respect
+-- to default rounding and length). In particular, there are boundary cases
+-- where the closest and 'shortest' string representations are not used.
+-- Mentions of 'shortest' in the docs below are with this caveat.


This paragraph is a bit hard to understand I think. An example would be very helpful. Please also clarify what "this implementation" refers to. Maybe create a named chunk and refer to it from the relevant places.

Also note that single quotes (') are used for marking identifiers in Haddock. Escaping the quotes might work?! E.g. \'shortest\'.

I added an example / handwavy explanation about the rounding / boundaries and pointed to the paper for details. TBH at this point I am far enough from the writing this code that I would need to redigest the exact semantics.

WRT quotes, thanks for pointing that out. I'm used to normal MD.

sjakobi · 2021-11-18T17:47:56Z

Data/ByteString/Builder/RealFloat.hs

+standard :: Int -> FloatFormat
+standard n = MkFloatFormat FStandard (Just n)
+
+-- | Standard notation with the default precision (number of decimal places)


"default precision" seems like another term that could be explained in a named chunk that can be referenced from here.

It just means matching show / formatRealFloat. LMK if I should elaborate further...

… paper

Bodigrim · 2021-11-22T20:20:54Z

Data/ByteString/Builder/RealFloat.hs

+           in mk0 ls `mappend` mkDot rs
+      | otherwise ->
+          let (ei, is') = roundTo 10 p' (replicate (-e) 0 ++ ds)
+              (b:bs) = digitsToBuilder (if ei > 0 then is' else 0:is')


This raises Pattern match(es) are non-exhaustive with GHC 9.2.

Hm is there a way to convince GHC that the list is non-empty? Looking at the implementation of roundTo, the non-throwing return values are (0, _) and (1, 1:_). Before the call to digitsToZero, we prepend 0 to the former case's snd value so AFIACT this is 'correct'.

We could also just mimic the case above or add some redundant matches e.g

diff --git a/Data/ByteString/Builder/RealFloat.hs b/Data/ByteString/Builder/RealFloat.hs index cd0d39d..0dadf45 100644 --- a/Data/ByteString/Builder/RealFloat.hs +++ b/Data/ByteString/Builder/RealFloat.hs @@ -258,8 +258,8 @@ showStandard m e prec = in mk0 ls `mappend` mkDot rs | otherwise -> let (ei, is') = roundTo 10 p' (replicate (-e) 0 ++ ds) - (b:bs) = digitsToBuilder (if ei > 0 then is' else 0:is') - in b `mappend` mkDot bs + (ls, rs) = splitAt 1 $ digitsToBuilder (if ei > 0 then is' else 0:is') + in mk0 ls `mappend` mkDot rs where p' = max p 0 where mk0 ls = case ls of [] -> char7 '0'; _ -> mconcat ls

I'm not saying that your code is wrong, I'm saying that we need to get rid of a warning. Boot packages cannot have warnings. Throwing an error for an empty list would be fine, for instance.

I agree that this warning needs to be silenced. Maybe this would be a good fit for Data.List.NonEmpty?!

I added the snippet above. A lot of the code in GHC.Float makes the same assumption though so do they just turn the warning off?

Yes. That module uses the pragma

{-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-}

Bodigrim · 2021-11-22T22:33:48Z

@sjakobi sorry, I lost track of the discussion. Is there anything left unresolved?

sjakobi

I'm happy with the API at this stage.

The haddocks could use some editing and more examples, but I don't think this needs to happen within this PR.

The GHC warning should be addressed before merging though.

sjakobi · 2021-11-27T05:25:33Z

Data/ByteString/Builder/RealFloat.hs

+-- "9.999999999999999e22"
+--
+-- Simplifying, we can build a shorter, lossless representation by just using
+-- @"1.0e23"@ since the floating point values that are 1 ULP away are


What is "ULP"?

Unit in Last Place

Good to know! Could you link to the article?

Note BTW that haddock supports Markdown-style hyperlinks, e.g. [ULP](https://en.wikipedia.org/wiki/Unit_in_the_last_place).

sjakobi · 2021-11-27T05:29:00Z

Data/ByteString/Builder/RealFloat.hs

+-- NB: The float-to-string conversions exposed by this module match `show`'s
+-- output (specifically with respect to default rounding and length). In
+-- particular, there are boundary cases where the closest and \'shortest\'
+-- string representations are not used.  Mentions of \'shortest\' in the docs
+-- below are with this caveat.
+--
+-- For example, for fidelity, we match `show` on the output below.
+--
+-- >>> show (1.0e23 :: Float)
+-- "1.0e23"
+-- >>> show (1.0e23 :: Double)
+-- "9.999999999999999e22"
+-- >>> floatDec 1.0e23
+-- "1.0e23"
+-- >>> doubleDec 1.0e23
+-- "9.999999999999999e22"
+--
+-- Simplifying, we can build a shorter, lossless representation by just using
+-- @"1.0e23"@ since the floating point values that are 1 ULP away are
+--
+-- >>> showHex (castDoubleToWord64 1.0e23) []
+-- "44b52d02c7e14af6"
+-- >>> castWord64ToDouble 0x44b52d02c7e14af5
+-- 9.999999999999997e22
+-- >>> castWord64ToDouble 0x44b52d02c7e14af6
+-- 9.999999999999999e22
+-- >>> castWord64ToDouble 0x44b52d02c7e14af7
+-- 1.0000000000000001e23
+--
+-- In particular, we could use the exact boundary if it is the shortest
+-- representation and the original floating number is even. To experiment with
+-- the shorter rounding, refer to
+-- `Data.ByteString.Builder.RealFloat.Internal.acceptBounds`. This will give us
+--
+-- >>> floatDec 1.0e23
+-- "1.0e23"
+-- >>> doubleDec 1.0e23
+-- "1.0e23"
+--
+-- For more details, please refer to the
+-- <https://dl.acm.org/doi/10.1145/3192366.3192369 Ryu paper>.


I think this section is a bit too technical for most users. If we want to keep it, we should probably move it out of the module header and to the bottom of the page.

Is there such a thing as a footer? This was added per one of our discussions above about clarifying 'shortest'.

You can insert chunks of documentation at selected places in the export list. See e.g. Data.ByteString.Lazy.

Also see the relevant haddock documentation at https://haskell-haddock.readthedocs.io/en/latest/markup.html#controlling-the-documentation-structure.

sjakobi · 2021-11-27T05:33:33Z

Data/ByteString/Builder/RealFloat.hs

+           in mk0 ls `mappend` mkDot rs
+      | otherwise ->
+          let (ei, is') = roundTo 10 p' (replicate (-e) 0 ++ ds)
+              (b:bs) = digitsToBuilder (if ei > 0 then is' else 0:is')


I agree that this warning needs to be silenced. Maybe this would be a good fit for Data.List.NonEmpty?!

- add redundant pattern matching. now symmetrical with the e >= 0 case

Bodigrim · 2021-12-04T00:03:16Z

@sjakobi thanks for extensive reviews!

@la-wu if you get inspiration to fix documentation suggestions in a separate PR, it would be awesome. Otherwise thank you very much for all your heroic efforts to make this state-of-the-art algorithm available to Haskell users. This is a long-awaited improvement.

* Implement floating point conversion with ryu * Use manual strictness - Strict extension is from ghc >= 8.0.1 * Use checked shifts - ghc 9.0.1 doesn't shiftL on negative unsafeShiftR and this is more straightforward regardless * Use builtin float-to-word conversion functions * Use builtin conversion to Bool * Remove dependency on array package * Handle non-exhaustive patterns * Try using prim conversions directly * Revert "Try using prim conversions directly" This reverts commit 10809d3. * Dispatch to slow cast when builtin unavailable - GHC.Float exports them starting from base-4.10.0.0 which starts with with ghc 8.2.1 * Try bumping min version to 8.4.x * Fix log10pow5 approximation and add unit test - expose RealFloat.Internal so that tests can import and verify that the approximations match in the expected ranges * Re-export floatDec and doubleDec to maintain public API * Improve documentation and fixes for initial code review - make double-polymorphic functions singly-polymorphic for clarity - use canonical Control.Arrow.first - name boolean values in special string formatting and add explanation - explain magic numbers in logarithm approximations and reciprocal multiplication - other misc comments * Improve table generation documentation and clean-up - add general overview of floating point conversion algorithm and idea behind ryu algorithm - remove unused Num Word128 instance * Improve documentation of f2s and d2s and cleanup - deduplicate shortest and rounding handling - add some comments and explanations of algorithm steps inline * Use stricter integral types and annotate fromIntegral usages - a closer inspection of `fromIntegral` usages shows that a lot of that conversion scaffolding is unnecessary if types are chose correctly - also fixes a delayed to-signed-int conversion that caused unsigned wraparound on subtraction * Add module descriptions and fix typos * Use internal FloatFormat instead of GHC.Float.FFFormat - avoids dependency especially while we don't actually support the full API of GHC.Float.formatRealFloat * Use monomorphic helpers for remaining integral conversions used by RealFloat * Remove usage of TemplateHaskell in RealFloat * Fix LUT usage on big-endian systems - Do a runtime byteswap. bswap64 is ~1 cycle on most architectures - NB: multiline string literals are parsed differently with language CPP * Add header for endianness detection * Fix big-endian word16 packing in fast digit formatting * Fix big-endian word128 read from raw addr * Clean up unused functions - finv and fnorm kept as comments for reference to table computation * Fix incorrect reciprocal function usage - Doesn't actually affect correctness vs show since round-even is not implemented (acceptBounds is always False) - Adds a couple explorative test cases and a comment anyways * Add more test coverage and fix doc example - Fixed-format fixed-precision tests - Re-exports TableGenerator constants to allow sanity checks for computed bit-range constants * Use quickcheck equality property in tests * Format haddock headers more similarly to existing ones * Use simpler reciprocal math for 32-bit words - Clarity and marginal performance improvement * Use boxed arithmetic in logic flow - more portable wrt internal ghc prim and 32- vs 64-bit - more readable (less syntax cruft in hashes and verbose operation names) - not much of a performance difference measured * Support ghc 9.2 prim word changes - Data.Word wraps fixed-sized word types - array operations now use fixed-sized word types * Fix 32-bit support - Removes most of the raw Word# usage to facilitate support of fixed-size sub-word types and 32-bit systems. Benchmarking shows little difference (<5%) - Implements manual multiplication of 64-bit words for 32-bit systems * Skip conversion to Double before fixed Float formatting - otherwise produces unnecessarily long results since imprecise representations get much more significance with Double precision * Tweak doc wording and add examples - per sjakobi suggestions * Rename FExponent to FScientific - More intuitive name for scientific notation since we're moving away from GHC.Float.FFFormat anyway * Use an opaque FloatFormat type for compatibility - while precision handling is not fully implemented * Rename float fixed-format to standard-format and other naming tweaks - `fixed` was confusing terminology adopted from FFFormat - `FormatFloat'` -> `FormatMode` - some doc tweaks * Encourage inlining by removing partial application * Fix some haddock links and accidental monospacing * Add explanation about difference between implementation and reference paper * Clarify default precision * Point to ryu paper for more details * Fix non-exhaustive warning for ghc 9.2 - add redundant pattern matching. now symmetrical with the e >= 0 case

sjakobi · 2021-12-04T18:30:37Z

@la-wu Thanks for enduring this marathon of a review! :)

You may want to add your implementation to the list at https://github.com/ulfjack/ryu#ryu, BTW!

* Implement floating point conversion with ryu * Use manual strictness - Strict extension is from ghc >= 8.0.1 * Use checked shifts - ghc 9.0.1 doesn't shiftL on negative unsafeShiftR and this is more straightforward regardless * Use builtin float-to-word conversion functions * Use builtin conversion to Bool * Remove dependency on array package * Handle non-exhaustive patterns * Try using prim conversions directly * Revert "Try using prim conversions directly" This reverts commit 10809d3. * Dispatch to slow cast when builtin unavailable - GHC.Float exports them starting from base-4.10.0.0 which starts with with ghc 8.2.1 * Try bumping min version to 8.4.x * Fix log10pow5 approximation and add unit test - expose RealFloat.Internal so that tests can import and verify that the approximations match in the expected ranges * Re-export floatDec and doubleDec to maintain public API * Improve documentation and fixes for initial code review - make double-polymorphic functions singly-polymorphic for clarity - use canonical Control.Arrow.first - name boolean values in special string formatting and add explanation - explain magic numbers in logarithm approximations and reciprocal multiplication - other misc comments * Improve table generation documentation and clean-up - add general overview of floating point conversion algorithm and idea behind ryu algorithm - remove unused Num Word128 instance * Improve documentation of f2s and d2s and cleanup - deduplicate shortest and rounding handling - add some comments and explanations of algorithm steps inline * Use stricter integral types and annotate fromIntegral usages - a closer inspection of `fromIntegral` usages shows that a lot of that conversion scaffolding is unnecessary if types are chose correctly - also fixes a delayed to-signed-int conversion that caused unsigned wraparound on subtraction * Add module descriptions and fix typos * Use internal FloatFormat instead of GHC.Float.FFFormat - avoids dependency especially while we don't actually support the full API of GHC.Float.formatRealFloat * Use monomorphic helpers for remaining integral conversions used by RealFloat * Remove usage of TemplateHaskell in RealFloat * Fix LUT usage on big-endian systems - Do a runtime byteswap. bswap64 is ~1 cycle on most architectures - NB: multiline string literals are parsed differently with language CPP * Add header for endianness detection * Fix big-endian word16 packing in fast digit formatting * Fix big-endian word128 read from raw addr * Clean up unused functions - finv and fnorm kept as comments for reference to table computation * Fix incorrect reciprocal function usage - Doesn't actually affect correctness vs show since round-even is not implemented (acceptBounds is always False) - Adds a couple explorative test cases and a comment anyways * Add more test coverage and fix doc example - Fixed-format fixed-precision tests - Re-exports TableGenerator constants to allow sanity checks for computed bit-range constants * Use quickcheck equality property in tests * Format haddock headers more similarly to existing ones * Use simpler reciprocal math for 32-bit words - Clarity and marginal performance improvement * Use boxed arithmetic in logic flow - more portable wrt internal ghc prim and 32- vs 64-bit - more readable (less syntax cruft in hashes and verbose operation names) - not much of a performance difference measured * Support ghc 9.2 prim word changes - Data.Word wraps fixed-sized word types - array operations now use fixed-sized word types * Fix 32-bit support - Removes most of the raw Word# usage to facilitate support of fixed-size sub-word types and 32-bit systems. Benchmarking shows little difference (<5%) - Implements manual multiplication of 64-bit words for 32-bit systems * Skip conversion to Double before fixed Float formatting - otherwise produces unnecessarily long results since imprecise representations get much more significance with Double precision * Tweak doc wording and add examples - per sjakobi suggestions * Rename FExponent to FScientific - More intuitive name for scientific notation since we're moving away from GHC.Float.FFFormat anyway * Use an opaque FloatFormat type for compatibility - while precision handling is not fully implemented * Rename float fixed-format to standard-format and other naming tweaks - `fixed` was confusing terminology adopted from FFFormat - `FormatFloat'` -> `FormatMode` - some doc tweaks * Encourage inlining by removing partial application * Fix some haddock links and accidental monospacing * Add explanation about difference between implementation and reference paper * Clarify default precision * Point to ryu paper for more details * Fix non-exhaustive warning for ghc 9.2 - add redundant pattern matching. now symmetrical with the e >= 0 case

hsyl20 · 2021-12-14T09:12:39Z

@la-wu Out of curiosity, did you make benchmarks against the double-conversion package? [We have had troubles using this package with GHCJS recently (it needs to be linked with libstdc++...) so a pure Haskell solution is very appealing if performance is comparable]

la-wu · 2022-05-01T18:37:33Z

@la-wu Out of curiosity, did you make benchmarks against the double-conversion package? [We have had troubles using this package with GHCJS recently (it needs to be linked with libstdc++...) so a pure Haskell solution is very appealing if performance is comparable]

Hey @hsyl20, not sure if this is still relevant but I just ran a quick bench for this and I got

foldMap floatDec (10000):                               OK (0.15s)
  1.17 ms ± 114 μs
foldMap doubleDec (10000):                              OK (0.37s)
  1.42 ms ±  45 μs
foldMap floatShow (10000):                              OK (0.33s)
  10.5 ms ± 541 μs
foldMap doubleShow (10000):                             OK (0.27s)
  18.0 ms ± 1.0 ms

And with double-conversion,

foldMap floatDC (10000):    OK (0.16s)
  5.17 ms ± 450 μs
foldMap doubleDC (10000):   OK (0.21s)
  6.55 ms ± 494 μs

With a setup of

import qualified Data.Double.Conversion.Convertable    as DC

...
, benchB "foldMap floatShow"       floatData          $ foldMap (string7 . show)
, benchB "foldMap doubleShow"      doubleData         $ foldMap (string7 . show)
, benchB "foldMap floatDC"         floatData          $ foldMap DC.toShortest
, benchB "foldMap doubleDC"        doubleData         $ foldMap DC.toShortest

Not sure if I was using the library correctly... but directly converting with double-conversion to Bytestring (instead of builder) was much slower fwiw.

hsyl20 · 2022-05-02T09:43:39Z

@la-wu Thanks a lot for this report! We should definitely use your code instead of having to deal with double-conversion. I've added it to our task list.

la-wu force-pushed the ryu-native branch from e68f42f to 567e8a6 Compare February 22, 2021 21:36

This was referenced Feb 26, 2021

Implement floating point conversion with ryu #222

Closed

Reimplement floatDec/doubleDec #115

Closed

la-wu force-pushed the ryu-native branch from 425ceac to 34291e3 Compare February 28, 2021 22:41

Bodigrim mentioned this pull request May 27, 2021

Move sources into src/ #391

Closed

la-wu mentioned this pull request Jun 17, 2021

Floating point conversion tests #402

Merged

la-wu force-pushed the ryu-native branch from 5383873 to 1097671 Compare August 1, 2021 15:30

Bodigrim reviewed Aug 7, 2021

View reviewed changes

Bodigrim mentioned this pull request Sep 13, 2021

Update changelog and since pragmas #422

Merged

sjakobi reviewed Sep 16, 2021

View reviewed changes

sjakobi reviewed Nov 11, 2021

View reviewed changes

Data/ByteString/Builder/RealFloat.hs Outdated Show resolved Hide resolved

Data/ByteString/Builder/RealFloat.hs Outdated Show resolved Hide resolved

Data/ByteString/Builder/RealFloat.hs Outdated Show resolved Hide resolved

la-wu added 2 commits November 14, 2021 22:30

Rename FExponent to FScientific

415ac6f

- More intuitive name for scientific notation since we're moving away from GHC.Float.FFFormat anyway

Use an opaque FloatFormat type for compatibility

0474332

- while precision handling is not fully implemented

sjakobi reviewed Nov 15, 2021

View reviewed changes

Data/ByteString/Builder/RealFloat.hs Show resolved Hide resolved

Bodigrim mentioned this pull request Nov 16, 2021

Number to Text conversion haskell/text#218

Open

la-wu added 2 commits November 16, 2021 23:28

Rename float fixed-format to standard-format and other naming tweaks

9be8170

- `fixed` was confusing terminology adopted from FFFormat - `FormatFloat'` -> `FormatMode` - some doc tweaks

Encourage inlining by removing partial application

f67df50

sjakobi reviewed Nov 18, 2021

View reviewed changes

la-wu added 4 commits November 21, 2021 23:46

Fix some haddock links and accidental monospacing

a01cb00

Add explanation about difference between implementation and reference…

0cc5417

… paper

Clarify default precision

12436a2

Point to ryu paper for more details

d8dac2a

Bodigrim reviewed Nov 22, 2021

View reviewed changes

sjakobi reviewed Nov 27, 2021

View reviewed changes

Fix non-exhaustive warning for ghc 9.2

b70918b

- add redundant pattern matching. now symmetrical with the e >= 0 case

sjakobi approved these changes Nov 28, 2021

View reviewed changes

Bodigrim merged commit 36874c3 into haskell:master Dec 4, 2021

Bodigrim linked an issue Feb 16, 2022 that may be closed by this pull request

Improve doubleDec/floatDec #101

Closed

sjakobi mentioned this pull request Apr 27, 2022

FloatFormat should allow scientific notation with given precision #511

Open

Bodigrim added this to the 0.11.2.0 milestone May 4, 2022

vmchale mentioned this pull request Feb 20, 2024

Pretty-print floats and doubles efficiently. quchen/prettyprinter#249

Open

	formatFloat (MkFloatFormat fmt prec) f =
	formatFloat (MkFloatFormat fmt prec) = \f ->

Implement floating point conversion with ryu #365

Implement floating point conversion with ryu #365

Conversation

la-wu commented Feb 22, 2021

Bodigrim commented Feb 22, 2021

la-wu commented Feb 23, 2021

sjakobi commented Feb 23, 2021

Bodigrim commented Feb 24, 2021

Bodigrim commented Feb 25, 2021

Bodigrim commented Feb 25, 2021 • edited Loading

la-wu commented Feb 25, 2021

la-wu commented Feb 28, 2021

Bodigrim commented Mar 1, 2021

la-wu commented Mar 1, 2021

hsyl20 commented Mar 1, 2021

Bodigrim commented Mar 1, 2021

hsyl20 commented Mar 2, 2021 • edited Loading

Bodigrim commented May 16, 2021

la-wu commented Jun 17, 2021

Bodigrim commented Jun 17, 2021

la-wu commented Jun 17, 2021

Bodigrim commented Aug 1, 2021

Bodigrim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

la-wu Aug 8, 2021 • edited Loading

Choose a reason for hiding this comment

sjakobi left a comment

Choose a reason for hiding this comment

la-wu commented Sep 17, 2021

sjakobi commented Sep 17, 2021

sjakobi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjakobi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bodigrim Nov 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bodigrim commented Nov 22, 2021 • edited Loading

sjakobi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bodigrim commented Dec 4, 2021

sjakobi commented Dec 4, 2021

hsyl20 commented Dec 14, 2021

la-wu commented May 1, 2022

hsyl20 commented May 2, 2022

Bodigrim commented Feb 25, 2021 •

edited

Loading

hsyl20 commented Mar 2, 2021 •

edited

Loading

la-wu Aug 8, 2021 •

edited

Loading

Bodigrim Nov 22, 2021 •

edited

Loading

Bodigrim commented Nov 22, 2021 •

edited

Loading