Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert decnum to binary64 (double) instead of decimal64 #2949

Merged
merged 1 commit into from Nov 29, 2023

Conversation

wader
Copy link
Member

@wader wader commented Nov 23, 2023

This is what the JSON spec suggests and will also be less confusing compared to other jq implementations and langauges.

Related to #2939

@wader
Copy link
Member Author

wader commented Nov 23, 2023

@leonid-s-usov something like this?

src/jv.c Outdated
ctx->digits = DEC_NUBMER_DOUBLE_PRECISION;
ctx->emax = 1023;
ctx->emin = -1022;
ctx->round = DEC_ROUND_HALF_EVEN;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct?

src/jv.c Outdated
ctx->emin = -1022;
ctx->round = DEC_ROUND_HALF_EVEN;
ctx->traps = 0;
ctx->clamp = 1;
Copy link
Member Author

@wader wader Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this i think is about IEEE 754 clamping which wikipedia describes as:

Clamped: a result's exponent is too large for the destination format. By default, trailing zeros will be added to the coefficient to reduce the exponent to the largest usable value. If this is not possible (because this would cause the number of digits needed to be more than the destination format) then an overflow exception occurs.

Not sure what disabling would mean. Our current setup with decimal64 enables it also.

src/jv.c Outdated
ctx->emax = 1023;
ctx->emin = -1022;
ctx->round = DEC_ROUND_HALF_EVEN;
ctx->traps = 0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As i understand the decnum code traps are used to SIGFPE on some error conditions, probably not what we want and not what we do for dec_ctx_key above.

@leonid-s-usov
Copy link
Contributor

I have spent some time today diving deeper into the topic. I discovered an unfortunate confusion I had about the min and max exponents which I might have passed on with my former comments. This is so unexpectedly complicated!

When binary64 format is talking about an exponent range of -1022..1023, it implies base 2 - hence the "binary" in the name. However, the decContext is always about decimal exponents, which explains why the numbers are -383 and 384 for decimal64.

Another thing I've been researching is the included decDouble module, which supposedly should convert to and from the IEEE double number precision. However, in that header file we still see the maximum precision of 16:

  /* parameters for decDoubles */
  #define DECDOUBLE_Bytes   8      /* length                          */
  #define DECDOUBLE_Pmax    16     /* maximum precision (digits)      */
  #define DECDOUBLE_Emin   -383    /* minimum adjusted exponent       */
  #define DECDOUBLE_Emax    384    /* maximum adjusted exponent       */
  #define DECDOUBLE_EmaxD   3      /* maximum exponent digits         */
  #define DECDOUBLE_Bias    398    /* bias for the exponent           */
  #define DECDOUBLE_String  25     /* maximum string length, +1       */
  #define DECDOUBLE_EconL   8      /* exponent continuation length    */
  #define DECDOUBLE_Declets 5      /* count of declets                */
  /* highest biased exponent (Elimit-1) */
  #define DECDOUBLE_Ehigh (DECDOUBLE_Emax + DECDOUBLE_Bias - (DECDOUBLE_Pmax-1))

It's also obvious from the code that conversion to a decDouble is done by calling to a decimal64.

  /* decNumber conversions; these are implemented as macros so as not  */
  /* to force a dependency on decimal64 and decNumber in decDouble.    */
  /* decDoubleFromNumber returns a decimal64 * to avoid warnings.      */
  #define decDoubleToNumber(dq, dn) decimal64ToNumber((decimal64 *)(dq), dn)
  #define decDoubleFromNumber(dq, dn, set) decimal64FromNumber((decimal64 *)(dq), dn, set)

I'm still confused as to how decDouble will differ, but apparently, it defines 1 more digit in string representation compared to decimal64:

  /* parameters for decimal64s                                        */
...
  #define DECIMAL64_String 24           /* maximum string length, +1  */
---
  /* parameters for decDoubles */
...
  #define DECDOUBLE_String  25     /* maximum string length, +1       */

I think that more research is needed. Specifically, it seems like our prior approach was close.
I suspect that we need to properly formulate our requirements before we can decide on the best (i.e. the correct) implementation.

src/jv.c Outdated Show resolved Hide resolved
@leonid-s-usov
Copy link
Contributor

What if instead of changing the definitions we just use the decimal64FromNumber conversion utility rather than the decNumberReduce we're using now?

  // reduce the number to the shortest possible form
  // that fits into the 64 bit floating point representation
  decNumberReduce(&dec_double.number, p_dec_number, DEC_CONTEXT_TO_DOUBLE());

@leonid-s-usov
Copy link
Contributor

leonid-s-usov commented Nov 26, 2023

It could be that our use case doesn't exactly fit a pure decimal64 application, where the life cycle is to consume string number representation, work with a decimal64 format, and then render it back into a string or serialize into a container format to later unpack as decimal64 again.

We are trying to make use of the decimal number "invisible" from the perspective of someone who expects JQ to utilize IEEE double precision binary floating point representation, and it's clearly stated that to preserve the binary64 through a conversion to decimal and back one would need 17 digits of decimal precision.

If an IEEE 754 double-precision number is converted to a decimal string with at least 17 significant digits, and then converted back to double-precision representation, the final result must match the original number.[1]

So, maybe what we'd like to do is initialize the context as DECIMAL64, but then increase the precision to 17 digits.

@leonid-s-usov
Copy link
Contributor

I'm convinced now that the proper way to look at the task at hand is to render our number into a decimal string that can be fed into jvp_strtod.

Stated this way, we can even argue that the "reduce" step is optional - we can just take the result of the jvp_literal_number_literal and run the strtod on that one. The only reason to perform this kind of reduction is to employ the fact that no more than 17 decimal places are needed to represent any binary64 (double) number. This can save us from allocating space for digits that are insignificant. Whether this optimization is worth the trouble is questionable, but since jvp_literal_number_literal allocates the space on the heap I'd be reluctant to use that function.

From this perspective, I think that the right course of action would be to call the decNumberReduce function with a local on-stack context that is initialized with DEC_INIT_DECIMAL64 but with an increased precision of 17. I also think that since 17 is an upper bound of the actual decimal precision of 53 binary digits, we should not worry about the rounding, so let's just leave it default.

@wader
Copy link
Member Author

wader commented Nov 26, 2023

I have spent some time today diving deeper into the topic. I discovered an unfortunate confusion I had about the min and max exponents which I might have passed on with my former comments. This is so unexpectedly complicated!

When binary64 format is talking about an exponent range of -1022..1023, it implies base 2 - hence the "binary" in the name. However, the decContext is always about decimal exponents, which explains why the numbers are -383 and 384 for decimal64.

Make sense. I somehow overlooked that even when there was a lack of base configuration.

I suspect that we need to properly formulate our requirements before we can decide on the best (i.e. the correct) implementation.

Ideally if possible I think we want something that behaves as most major javascript imeplementations when doing integer operations. That would cause least confusion and seems to be what most users expect.

It could be that our use case doesn't exactly fit a pure decimal64 application, where the life cycle is to consume string number representation, work with a decimal64 format, and then render it back into a string or serialize into a container format to later unpack as decimal64 again.

We are trying to make use of the decimal number "invisible" from the perspective of someone who expects JQ to utilize IEEE double precision binary floating point representation, and it's clearly stated that to preserve the binary64 through a conversion to decimal and back one would need 17 digits of decimal precision.

If an IEEE 754 double-precision number is converted to a decimal string with at least 17 significant digits, and then converted back to double-precision representation, the final result must match the original number.[1]

So, maybe what we'd like to do is initialize the context as DECIMAL64, but then increase the precision to 17 digits.

Can a decimal64 represent all exact integers that a binary64 can represent? or the modified decimal64 to 17 digits is exactly that?

I'm convinced now that the proper way to look at the task at hand is to render our number into a decimal string that can be fed into jvp_strtod.

Stated this way, we can even argue that the "reduce" step is optional - we can just take the result of the jvp_literal_number_literal and run the strtod on that one. The only reason to perform this kind of reduction is to employ the fact that no more than 17 decimal places are needed to represent any binary64 (double) number. This can save us from allocating space for digits that are insignificant. Whether this optimization is worth the trouble is questionable, but since jvp_literal_number_literal allocates the space on the heap I'd be reluctant to use that function.

From this perspective, I think that the right course of action would be to call the decNumberReduce function with a local on-stack context that is initialized with DEC_INIT_DECIMAL64 but with an increased precision of 17. I also think that since 17 is an upper bound of the actual decimal precision of 53 binary digits, we should not worry about the rounding, so let's just leave it default.

Ok, so if I understand correctly we could remove dec_ctx_dbl_key as it's only used by jvp_literal_number_to_double and then add a local stack decContext initialised as a decimal64 but with digits set to 17 and use that?

@wader
Copy link
Member Author

wader commented Nov 26, 2023

@leonid-s-usov pushed a variant that i hope is close to what you suggested

src/jv.c Outdated Show resolved Hide resolved
This is what the JSON spec suggests and will also be less confusing compared to other jq implementations and langauges.

Related to jqlang#2939
@leonid-s-usov
Copy link
Contributor

Can a decimal64 represent all exact integers that a binary64 can represent? or the modified decimal64 to 17 digits is exactly that?

As I understood, while decimal64 and binary64 are very close in their capabilities, they differ both in precision and dynamic range: decimal has a wider exponent range but a little worse precision.

When one wants to convert binary64 to decimal and back, they won't arrive at the same binary64 unless they use at least 17 decimal places of precision, and obviously a large enough exponent range. While decimal64 has more than enough exponent range, its precision of 16 places is not enough for the task.

Copy link
Contributor

@leonid-s-usov leonid-s-usov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we got it right this time 😅

@wader
Copy link
Member Author

wader commented Nov 26, 2023

I hope we got it right this time 😅

Hope so too, lots of fiddling with this 😅

Are there some more tests we should add? current we test the bondries of the "exact integer" range and the random big (?) number from before.

@wader
Copy link
Member Author

wader commented Nov 27, 2023

Reminder: does some documentation or wiki pages needs update? ex: https://github.com/jqlang/jq/wiki/FAQ#numbers

@leonid-s-usov
Copy link
Contributor

leonid-s-usov commented Nov 27, 2023

Are there some more tests we should add? current we test the bondries of the "exact integer" range and the random big (?) number from before.

I thought maybe we could compare the results with a node script output, like you did in the other ticket, but otherwise the boundaries tests you've added seem sufficient. I don't quite remember if our testing system allows for arbitrary shell commands.

Reminder: does some documentation or wiki pages needs update? ex: https://github.com/jqlang/jq/wiki/FAQ#numbers

I don't think so, I believe we've just fixed a regression introduced by our earlier fix

@wader
Copy link
Member Author

wader commented Nov 27, 2023

I thought maybe we could compare the results with a node script output, like you did in the other ticket, but otherwise the boundaries tests you've added seem sufficient. I don't quite remember if our testing system allows for arbitrary shell commands.

There is https://github.com/jqlang/jq/blob/master/tests/shtest if we install node. But i'm thinking maybe we can relay on this not changing anytime soon for node? :) then some jq.test tests should be enough.

Reminder: does some documentation or wiki pages needs update? ex: https://github.com/jqlang/jq/wiki/FAQ#numbers

I don't think so, I believe we've just fixed a regression introduced by our earlier fix

Ok!

@emanuele6 emanuele6 merged commit 98a2069 into jqlang:master Nov 29, 2023
28 checks passed
@emanuele6
Copy link
Member

Thank you! =)

@wader wader deleted the decnum-double-fix branch November 29, 2023 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants