Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking on 64bit ids? #178

Closed
pablomendes opened this issue Sep 7, 2013 · 5 comments
Closed

Breaking on 64bit ids? #178

pablomendes opened this issue Sep 7, 2013 · 5 comments

Comments

@pablomendes
Copy link

If one parses a JSON object with a large number represented as int and as string (e.g. Twitter IDs), jq seems to break.

$ curl https://gist.github.com/gnip/764239/raw/6c6a2297f3e4e29a626f07db0c57b45af7d7e5d7/Twitter+%28json+format%29.js | \
 jq -r '"\(.id)\t\(.id_str)"'

Works fine.

28039652140     28039652140

But with 64bit Twitter IDs:

curl https://gist.github.com/gnip/764239/raw/6c6a2297f3e4e29a626f07db0c57b45af7d7e5d7/Twitter+%28json+format%29.js | \
 sed -r 's|28039652140|308125194256523266|' | \
 jq -r '"\(.id)\t\(.id_str)"'

It does not:

308125194256523>>260<<      308125194256523>>266<<

(Emphasis added.) A bug?

$ jq --version
jq version 1.3             

PS: just in case you arrived here searching for a solution for the specific case of Twitter JSON, just using id_str instead of id is probably the way to go. Just thought it would be worth mentioning the issue.

@lluchs
Copy link

lluchs commented Sep 10, 2013

See #143

@pablomendes
Copy link
Author

Thanks for the reference. In this particular case I would have loved to at least get a warning, if not an error message, so that the caller knows that the processing of that JSON piece was "not safe."

@nicowilliams
Copy link
Contributor

Well, you should see the recent 1,000+ mail threads in the IETF JSON WG
mailing list about: what JSON strings may contain, whether there may be
duplicate names (keys) in objects and what should happen if there are, the
range and precision of JSON numbers, whether there may be values other than
arrays or objects at the top-level, and what a JSON text is. Suffice it to
say that there's mostly only consensus about explaining what has been
implemented and giving some idea as to what can be expected to interop.

There are implementations that don't handle decimals very well at all (from
they only deal with C ints, to 32-bit floats, to non-IEEE 754 doubles,
....). This is just one of those things.

In general you really must read the docs for any JSON implementations you
intend to use. It's quite obnoxious that this is so. But it is so.

@stedolan
Copy link
Contributor

Yes, this is annoying. I don't really want to add 64-bit ints, because that just moves the screwup further away (64bit int overflow still exists). I would quite like to add proper bigint support at some stage, though.

I know you don't have control over this data format, but in general IDs should be strings, not numbers.

@nicowilliams
Copy link
Contributor

Close as dup of #218.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants