bignum/bigreal number representation needed #218

jmblog · 2013-11-23T11:34:14Z

There are cases that jq converts extra large numbers to ones with scientific (exponent) notation.

$ cat test.json
{"income":10000000,"total":11111111}

$ cat test.json | jq '.'
{
  "total": 11111111,
  "income": 1e+07
}

The text was updated successfully, but these errors were encountered:

nicowilliams · 2013-11-29T19:08:58Z

I've started on implementing delayed parsing of numbers so as to preserve their original form wherever possible (i.e., whenever the actual number isn't needed as a double in the jq program). It turns out that this will require a lot of work :( Even then we'll need a bignum library to support bignum math in jq.

stedolan · 2013-12-04T16:41:50Z

I'm not entirely sure it's wrong to do so. What format would you prefer?

tischwa · 2013-12-04T20:51:52Z

In term of math it is not wrong to write 1e+07 instead of 10000000.

But in terms of software it makes a big difference: Consider a Unix pipe (my usecase) like

% some program | jq 'a cool filter' | some other program B

Now jq's output is fed this to program B, which can deal with int64 or even int128, but not with floats, because in the original data there are no floats.

When jq makes a conversion like above, program B bails out.

See: #143 (comment)

To answer your question: I would prefer if jq would not change the representation of a number, if this number is just moved from jq input to jq output.

(like "sort -g", it does interpret the input numbers but still outputs the original lines.)

jmblog · 2013-12-05T00:24:14Z

@tischwa +1

nicowilliams · 2013-12-05T06:10:41Z

It'd be a lot easier to provide options for how numbers are formatted on output than to preserve input form (when not touched by arithmetic).

dfkoh · 2013-12-05T21:42:20Z

I've started on implementing delayed parsing of numbers so as to preserve their original form wherever possible (i.e., whenever the actual number isn't needed as a double in the jq program). It turns out that this will require a lot of work :( Even then we'll need a bignum library to support bignum math in jq.

@nicowilliams I already did this, in my fork: https://github.com/airfrog/jq
I can send you a pull request if you want to incorporate it into the main branch.

tischwa · 2013-12-05T21:52:26Z

Wouldn't it be possible to keep for each parsed number not only the numeric value, but also the input string?
If nothing is assigned to that numeric field during filtering, the input string could be output as is.
If a number is created in an arithmeic expression, the string is empty and the number is output according to some formatting option.

I remember this #143 (comment) where airfrog seemed to have something similar working. (Ahh, I just saw he sent a pull request.)

I think in terms of universal usability jq would gain a lot, if it would follow the typical philosophy of the classical Unix filter-like programs, which only modify the input, if they have to. Examles:

% cat num.txt
111111111111111111
222222222222222222

% jq '.' num.txt
111111111111111100
222222222222222200

% awk '{print $1, 1*$1}' num.txt
111111111111111111 111111111111111104
222222222222222222 222222222222222208

% sort -n num.txt
111111111111111111
222222222222222222

% sort -g num.txt
111111111111111111
222222222222222222

So usually the input is fed through, only if awk has to do the computation 1*$1 it switches internally to a numeric representation, the plain $1 is printed exactly as given in the input. Also sort -n/-g has to interpret the lines numerically but still gives the original input as output.

mericano1 · 2014-06-06T09:24:46Z

+1

nicowilliams · 2014-06-18T23:07:51Z

jq does have David M. Gay's bigint code in jv_dtoa.c. Perhaps it should use more of it. It's thread-safe, and the jv_dtoa_context stuff is really for caching reusable things -- an optimization we could remove if it made things easier. This is clearly more complete than libtomfloat for some things, namely: parsing and formatting numbers, as well as big2double and double2big conversions (which will be needed for API backwards compatibility reasons, and to be able to use libm functions). But it's also less complete for other things: fewer arithmetic operations are implemented (e.g., there's no divide, just a ratio() that returns a double). Either there's a lot of work to do on either codebase, or we find another, more complete library. Ideas?

OTOH, jq maybe doesn't need bignum operations, just a bignum representation falling back to doubles for arithmetic (and comparison?). But I'd prefer to only fallback to doubles for libm functions for which we find no better alternative.

pkoppstein · 2014-06-18T23:46:21Z

@nicowilliams wrote:

OTOH, jq maybe doesn't need bignum operations, just a bignum representation ....

bignum operations for the medium or long term; bignum representation for the short term (or tomorrow :-)

nicowilliams · 2014-06-19T00:09:54Z

Well, it's early days and there's still research to be done.

http://www.eskimo.com/~eresrch/float/ looks promising, though I've no idea what the license on it would be (I sent the author email about this). It's very complete, but a) it's fixed-precision (probably easy to change to be dynamic) and b) it doesn't handle normal string representation of numbers (probably also easy to fix). I haven't looked but I suspect it also doesn't do double2big and big2double conversion.

nicowilliams · 2014-06-19T18:32:05Z

The author of Big Float (http://www.eskimo.com/~eresrch/float/) has agreed to let us use it under friendly terms. I'll take a look at it and see how suitable it is.

nicowilliams · 2014-06-19T20:54:07Z

FYI: https://github.com/nicowilliams/bigfloat

kutzi · 2017-05-30T08:05:16Z

Would be really really nice to have some progress here. I've just been bitten by this bug and it's really hard to catch as the numbers jq spits out looks totally legid - i.e. within the same magnitude

nicowilliams · 2017-05-30T15:59:38Z

@kutzi (and anyone else interested in bignum support in jq) We have a PR (that I need to find time to finish) for 64-bit integer support (in addition to IEEE754 doubles). We're not likely to add any kind of bignums unless someone submits a PR. If you or anyone else wants to work on bignum support for jq, you'll need to be aware of a couple of things:

no GPL, no LGPL welcomed
jq's source code does not always jv_free() values known to be numbers, and valgrind can't check for that anyways given that numbers today are never allocated

I'd start by adding a compile-time option to use allocated numbers so that numeric jvs point to a malloc()ed double, that way failures to jv_free() numbers can be caught by valgrind and fixed.

Next I'd look for a suitable bignum library. There are quite a number of them, but they'd have to be a) C-coded or otherwise have a C API, b) licensed in a way that's friendly to jq's license and jq's users.

Lastly, I'd integrate such a bignum library much like Oniguruma: as a [git] submodule that is used if ./configure can't find it installed or if the user wants the submodule used.

gcsfred2 · 2022-01-28T15:50:54Z

Any updates?

mjustin · 2023-05-19T20:22:03Z

Would this be the right issue to ask about having jq preserve zeros after the decimal point as well, e.g. not converting 5.0 to 5? JSON is agnostic about the semantics of numbers, but the programs using the JSON may very well care to differentiate between 500, 500.0, and 5e2 (it might be doing precision-based calculations, for instance).

As a concrete example Java's BigDecimal class keeps track of both the unscaled value and the scale. So these could be reasonably be differentiated as different values when read by a Java program:

        System.out.println(new BigDecimal("500"));
        System.out.println(new BigDecimal("500.0"));
        System.out.println(new BigDecimal("5e2"));
        System.out.println();
        System.out.println(new BigDecimal(BigInteger.valueOf(500), 0));
        System.out.println(new BigDecimal(BigInteger.valueOf(5000), 1));
        System.out.println(new BigDecimal(BigInteger.valueOf(5), -2));

=>

$ echo '[500, 500.0, 5e2]' | jq -c
[500,500,500]

emanuele6 · 2023-09-08T20:22:50Z

jq 1.7 released with support for literal large numbers. closing

tischwa · 2023-09-09T16:26:40Z

Awesome, thank you!

dfkoh mentioned this issue Dec 5, 2013

Make numbers use their original string representation if possible #229

Closed

DRMacIver mentioned this issue Dec 9, 2013

Integer representations for numbers #234

Closed

nicowilliams added this to the 1.5 release milestone Jun 6, 2014

nicowilliams added the feature request label Jun 6, 2014

nicowilliams mentioned this issue Jun 17, 2014

Breaking on 64bit ids? #178

Closed

nicowilliams changed the title ~~Extra large numbers with scientific (exponent) notation~~ bignum/bigreal number representation needed Jun 17, 2014

This was referenced Aug 4, 2014

enhancement request: avoid displaying floating point approximations to integers as integers #529

Closed

jq change 64 bit integers #545

Closed

nicowilliams modified the milestones: 1.5 release, 2.0 release Aug 9, 2014

nicowilliams mentioned this issue Nov 26, 2014

jq mangles floats #627

Closed

pkoppstein mentioned this issue Dec 10, 2017

Floating-point strings have trailing zeroes removed. #1550

Closed

pkoppstein mentioned this issue Jun 7, 2018

Fails to parse big integer #1652

Closed

itchyny removed this from the 2.0 release milestone Jun 25, 2023

itchyny added the fixed in master label Jun 25, 2023

emanuele6 closed this as completed Sep 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bignum/bigreal number representation needed #218

bignum/bigreal number representation needed #218

jmblog commented Nov 23, 2013

nicowilliams commented Nov 29, 2013

stedolan commented Dec 4, 2013

tischwa commented Dec 4, 2013

jmblog commented Dec 5, 2013

nicowilliams commented Dec 5, 2013

dfkoh commented Dec 5, 2013

tischwa commented Dec 5, 2013

mericano1 commented Jun 6, 2014

nicowilliams commented Jun 18, 2014

pkoppstein commented Jun 18, 2014

nicowilliams commented Jun 19, 2014

nicowilliams commented Jun 19, 2014

nicowilliams commented Jun 19, 2014

kutzi commented May 30, 2017

nicowilliams commented May 30, 2017 •

edited

Loading

gcsfred2 commented Jan 28, 2022

mjustin commented May 19, 2023

emanuele6 commented Sep 8, 2023

tischwa commented Sep 9, 2023

bignum/bigreal number representation needed #218

bignum/bigreal number representation needed #218

Comments

jmblog commented Nov 23, 2013

nicowilliams commented Nov 29, 2013

stedolan commented Dec 4, 2013

tischwa commented Dec 4, 2013

jmblog commented Dec 5, 2013

nicowilliams commented Dec 5, 2013

dfkoh commented Dec 5, 2013

tischwa commented Dec 5, 2013

mericano1 commented Jun 6, 2014

nicowilliams commented Jun 18, 2014

pkoppstein commented Jun 18, 2014

nicowilliams commented Jun 19, 2014

nicowilliams commented Jun 19, 2014

nicowilliams commented Jun 19, 2014

kutzi commented May 30, 2017

nicowilliams commented May 30, 2017 • edited Loading

gcsfred2 commented Jan 28, 2022

mjustin commented May 19, 2023

emanuele6 commented Sep 8, 2023

tischwa commented Sep 9, 2023

nicowilliams commented May 30, 2017 •

edited

Loading