Convert decnum to binary64 (double) instead of decimal64 #2949

wader · 2023-11-23T17:20:22Z

This is what the JSON spec suggests and will also be less confusing compared to other jq implementations and langauges.

Related to #2939

wader · 2023-11-23T17:20:40Z

@leonid-s-usov something like this?

wader · 2023-11-23T17:51:28Z

src/jv.c

+      ctx->digits = DEC_NUBMER_DOUBLE_PRECISION;
+      ctx->emax = 1023;
+      ctx->emin = -1022;
+      ctx->round = DEC_ROUND_HALF_EVEN;


Is this correct?

wader · 2023-11-23T17:52:57Z

src/jv.c

+      ctx->emin = -1022;
+      ctx->round = DEC_ROUND_HALF_EVEN;
+      ctx->traps = 0;
+      ctx->clamp = 1;


this i think is about IEEE 754 clamping which wikipedia describes as:

Clamped: a result's exponent is too large for the destination format. By default, trailing zeros will be added to the coefficient to reduce the exponent to the largest usable value. If this is not possible (because this would cause the number of digits needed to be more than the destination format) then an overflow exception occurs.

Not sure what disabling would mean. Our current setup with decimal64 enables it also.

wader · 2023-11-25T18:49:05Z

src/jv.c

+      ctx->emax = 1023;
+      ctx->emin = -1022;
+      ctx->round = DEC_ROUND_HALF_EVEN;
+      ctx->traps = 0;


As i understand the decnum code traps are used to SIGFPE on some error conditions, probably not what we want and not what we do for dec_ctx_key above.

leonid-s-usov · 2023-11-26T10:30:53Z

I have spent some time today diving deeper into the topic. I discovered an unfortunate confusion I had about the min and max exponents which I might have passed on with my former comments. This is so unexpectedly complicated!

When binary64 format is talking about an exponent range of -1022..1023, it implies base 2 - hence the "binary" in the name. However, the decContext is always about decimal exponents, which explains why the numbers are -383 and 384 for decimal64.

Another thing I've been researching is the included decDouble module, which supposedly should convert to and from the IEEE double number precision. However, in that header file we still see the maximum precision of 16:

  /* parameters for decDoubles */
  #define DECDOUBLE_Bytes   8      /* length                          */
  #define DECDOUBLE_Pmax    16     /* maximum precision (digits)      */
  #define DECDOUBLE_Emin   -383    /* minimum adjusted exponent       */
  #define DECDOUBLE_Emax    384    /* maximum adjusted exponent       */
  #define DECDOUBLE_EmaxD   3      /* maximum exponent digits         */
  #define DECDOUBLE_Bias    398    /* bias for the exponent           */
  #define DECDOUBLE_String  25     /* maximum string length, +1       */
  #define DECDOUBLE_EconL   8      /* exponent continuation length    */
  #define DECDOUBLE_Declets 5      /* count of declets                */
  /* highest biased exponent (Elimit-1) */
  #define DECDOUBLE_Ehigh (DECDOUBLE_Emax + DECDOUBLE_Bias - (DECDOUBLE_Pmax-1))

It's also obvious from the code that conversion to a decDouble is done by calling to a decimal64.

  /* decNumber conversions; these are implemented as macros so as not  */
  /* to force a dependency on decimal64 and decNumber in decDouble.    */
  /* decDoubleFromNumber returns a decimal64 * to avoid warnings.      */
  #define decDoubleToNumber(dq, dn) decimal64ToNumber((decimal64 *)(dq), dn)
  #define decDoubleFromNumber(dq, dn, set) decimal64FromNumber((decimal64 *)(dq), dn, set)

I'm still confused as to how decDouble will differ, but apparently, it defines 1 more digit in string representation compared to decimal64:

  /* parameters for decimal64s                                        */
...
  #define DECIMAL64_String 24           /* maximum string length, +1  */
---
  /* parameters for decDoubles */
...
  #define DECDOUBLE_String  25     /* maximum string length, +1       */

I think that more research is needed. Specifically, it seems like our prior approach was close.
I suspect that we need to properly formulate our requirements before we can decide on the best (i.e. the correct) implementation.

src/jv.c

leonid-s-usov · 2023-11-26T11:02:06Z

What if instead of changing the definitions we just use the decimal64FromNumber conversion utility rather than the decNumberReduce we're using now?

  // reduce the number to the shortest possible form
  // that fits into the 64 bit floating point representation
  decNumberReduce(&dec_double.number, p_dec_number, DEC_CONTEXT_TO_DOUBLE());

leonid-s-usov · 2023-11-26T13:33:44Z

It could be that our use case doesn't exactly fit a pure decimal64 application, where the life cycle is to consume string number representation, work with a decimal64 format, and then render it back into a string or serialize into a container format to later unpack as decimal64 again.

We are trying to make use of the decimal number "invisible" from the perspective of someone who expects JQ to utilize IEEE double precision binary floating point representation, and it's clearly stated that to preserve the binary64 through a conversion to decimal and back one would need 17 digits of decimal precision.

If an IEEE 754 double-precision number is converted to a decimal string with at least 17 significant digits, and then converted back to double-precision representation, the final result must match the original number.[1]

So, maybe what we'd like to do is initialize the context as DECIMAL64, but then increase the precision to 17 digits.

leonid-s-usov · 2023-11-26T14:43:29Z

I'm convinced now that the proper way to look at the task at hand is to render our number into a decimal string that can be fed into jvp_strtod.

Stated this way, we can even argue that the "reduce" step is optional - we can just take the result of the jvp_literal_number_literal and run the strtod on that one. The only reason to perform this kind of reduction is to employ the fact that no more than 17 decimal places are needed to represent any binary64 (double) number. This can save us from allocating space for digits that are insignificant. Whether this optimization is worth the trouble is questionable, but since jvp_literal_number_literal allocates the space on the heap I'd be reluctant to use that function.

From this perspective, I think that the right course of action would be to call the decNumberReduce function with a local on-stack context that is initialized with DEC_INIT_DECIMAL64 but with an increased precision of 17. I also think that since 17 is an upper bound of the actual decimal precision of 53 binary digits, we should not worry about the rounding, so let's just leave it default.

wader · 2023-11-26T17:11:27Z

I have spent some time today diving deeper into the topic. I discovered an unfortunate confusion I had about the min and max exponents which I might have passed on with my former comments. This is so unexpectedly complicated!

When binary64 format is talking about an exponent range of -1022..1023, it implies base 2 - hence the "binary" in the name. However, the decContext is always about decimal exponents, which explains why the numbers are -383 and 384 for decimal64.

Make sense. I somehow overlooked that even when there was a lack of base configuration.

I suspect that we need to properly formulate our requirements before we can decide on the best (i.e. the correct) implementation.

Ideally if possible I think we want something that behaves as most major javascript imeplementations when doing integer operations. That would cause least confusion and seems to be what most users expect.

It could be that our use case doesn't exactly fit a pure decimal64 application, where the life cycle is to consume string number representation, work with a decimal64 format, and then render it back into a string or serialize into a container format to later unpack as decimal64 again.

We are trying to make use of the decimal number "invisible" from the perspective of someone who expects JQ to utilize IEEE double precision binary floating point representation, and it's clearly stated that to preserve the binary64 through a conversion to decimal and back one would need 17 digits of decimal precision.

If an IEEE 754 double-precision number is converted to a decimal string with at least 17 significant digits, and then converted back to double-precision representation, the final result must match the original number.[1]

So, maybe what we'd like to do is initialize the context as DECIMAL64, but then increase the precision to 17 digits.

Can a decimal64 represent all exact integers that a binary64 can represent? or the modified decimal64 to 17 digits is exactly that?

I'm convinced now that the proper way to look at the task at hand is to render our number into a decimal string that can be fed into jvp_strtod.

Stated this way, we can even argue that the "reduce" step is optional - we can just take the result of the jvp_literal_number_literal and run the strtod on that one. The only reason to perform this kind of reduction is to employ the fact that no more than 17 decimal places are needed to represent any binary64 (double) number. This can save us from allocating space for digits that are insignificant. Whether this optimization is worth the trouble is questionable, but since jvp_literal_number_literal allocates the space on the heap I'd be reluctant to use that function.

From this perspective, I think that the right course of action would be to call the decNumberReduce function with a local on-stack context that is initialized with DEC_INIT_DECIMAL64 but with an increased precision of 17. I also think that since 17 is an upper bound of the actual decimal precision of 53 binary digits, we should not worry about the rounding, so let's just leave it default.

Ok, so if I understand correctly we could remove dec_ctx_dbl_key as it's only used by jvp_literal_number_to_double and then add a local stack decContext initialised as a decimal64 but with digits set to 17 and use that?

wader · 2023-11-26T17:19:10Z

@leonid-s-usov pushed a variant that i hope is close to what you suggested

src/jv.c

This is what the JSON spec suggests and will also be less confusing compared to other jq implementations and langauges. Related to jqlang#2939

leonid-s-usov · 2023-11-26T17:36:22Z

Can a decimal64 represent all exact integers that a binary64 can represent? or the modified decimal64 to 17 digits is exactly that?

As I understood, while decimal64 and binary64 are very close in their capabilities, they differ both in precision and dynamic range: decimal has a wider exponent range but a little worse precision.

When one wants to convert binary64 to decimal and back, they won't arrive at the same binary64 unless they use at least 17 decimal places of precision, and obviously a large enough exponent range. While decimal64 has more than enough exponent range, its precision of 16 places is not enough for the task.

leonid-s-usov

I hope we got it right this time 😅

wader · 2023-11-26T17:43:36Z

I hope we got it right this time 😅

Hope so too, lots of fiddling with this 😅

Are there some more tests we should add? current we test the bondries of the "exact integer" range and the random big (?) number from before.

wader · 2023-11-27T09:10:12Z

Reminder: does some documentation or wiki pages needs update? ex: https://github.com/jqlang/jq/wiki/FAQ#numbers

leonid-s-usov · 2023-11-27T09:34:56Z

Are there some more tests we should add? current we test the bondries of the "exact integer" range and the random big (?) number from before.

I thought maybe we could compare the results with a node script output, like you did in the other ticket, but otherwise the boundaries tests you've added seem sufficient. I don't quite remember if our testing system allows for arbitrary shell commands.

Reminder: does some documentation or wiki pages needs update? ex: https://github.com/jqlang/jq/wiki/FAQ#numbers

I don't think so, I believe we've just fixed a regression introduced by our earlier fix

wader · 2023-11-27T11:19:17Z

I thought maybe we could compare the results with a node script output, like you did in the other ticket, but otherwise the boundaries tests you've added seem sufficient. I don't quite remember if our testing system allows for arbitrary shell commands.

There is https://github.com/jqlang/jq/blob/master/tests/shtest if we install node. But i'm thinking maybe we can relay on this not changing anytime soon for node? :) then some jq.test tests should be enough.

Reminder: does some documentation or wiki pages needs update? ex: https://github.com/jqlang/jq/wiki/FAQ#numbers

I don't think so, I believe we've just fixed a regression introduced by our earlier fix

Ok!

emanuele6 · 2023-11-29T08:36:38Z

Thank you! =)

wader force-pushed the decnum-double-fix branch from 559ff54 to af9551e Compare November 23, 2023 17:47

wader commented Nov 23, 2023

View reviewed changes

wader commented Nov 25, 2023

View reviewed changes

leonid-s-usov reviewed Nov 26, 2023

View reviewed changes

src/jv.c Outdated Show resolved Hide resolved

leonid-s-usov mentioned this pull request Nov 26, 2023

Normalize and correct tests #2939

Closed

wader force-pushed the decnum-double-fix branch from af9551e to b8249e6 Compare November 26, 2023 17:18

leonid-s-usov reviewed Nov 26, 2023

View reviewed changes

src/jv.c Outdated Show resolved Hide resolved

Convert decnum to binary64 (double) instead of decimal64

f3610e7

This is what the JSON spec suggests and will also be less confusing compared to other jq implementations and langauges. Related to jqlang#2939

wader force-pushed the decnum-double-fix branch from b8249e6 to f3610e7 Compare November 26, 2023 17:34

leonid-s-usov approved these changes Nov 26, 2023

View reviewed changes

emanuele6 approved these changes Nov 29, 2023

View reviewed changes

emanuele6 added ieee754 libjq labels Nov 29, 2023

emanuele6 merged commit 98a2069 into jqlang:master Nov 29, 2023
28 checks passed

wader deleted the decnum-double-fix branch November 29, 2023 09:42

wader mentioned this pull request Dec 3, 2023

Wrong math for big integers #2962

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert decnum to binary64 (double) instead of decimal64 #2949

Convert decnum to binary64 (double) instead of decimal64 #2949

wader commented Nov 23, 2023

wader commented Nov 23, 2023

wader Nov 23, 2023

wader Nov 23, 2023 •

edited

wader Nov 25, 2023

leonid-s-usov commented Nov 26, 2023

leonid-s-usov commented Nov 26, 2023

leonid-s-usov commented Nov 26, 2023 •

edited

leonid-s-usov commented Nov 26, 2023

wader commented Nov 26, 2023

wader commented Nov 26, 2023

leonid-s-usov commented Nov 26, 2023

leonid-s-usov left a comment

wader commented Nov 26, 2023

wader commented Nov 27, 2023 •

edited

leonid-s-usov commented Nov 27, 2023 •

edited

wader commented Nov 27, 2023

emanuele6 commented Nov 29, 2023

Convert decnum to binary64 (double) instead of decimal64 #2949

Convert decnum to binary64 (double) instead of decimal64 #2949

Conversation

wader commented Nov 23, 2023

wader commented Nov 23, 2023

wader Nov 23, 2023

Choose a reason for hiding this comment

wader Nov 23, 2023 • edited

Choose a reason for hiding this comment

wader Nov 25, 2023

Choose a reason for hiding this comment

leonid-s-usov commented Nov 26, 2023

leonid-s-usov commented Nov 26, 2023

leonid-s-usov commented Nov 26, 2023 • edited

leonid-s-usov commented Nov 26, 2023

wader commented Nov 26, 2023

wader commented Nov 26, 2023

leonid-s-usov commented Nov 26, 2023

leonid-s-usov left a comment

Choose a reason for hiding this comment

wader commented Nov 26, 2023

wader commented Nov 27, 2023 • edited

leonid-s-usov commented Nov 27, 2023 • edited

wader commented Nov 27, 2023

emanuele6 commented Nov 29, 2023

wader Nov 23, 2023 •

edited

leonid-s-usov commented Nov 26, 2023 •

edited

wader commented Nov 27, 2023 •

edited

leonid-s-usov commented Nov 27, 2023 •

edited