Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to silently ignore any lines with invalid JSON #682

Closed
kennethjor opened this issue Jan 29, 2015 · 10 comments

Comments

@kennethjor
Copy link

commented Jan 29, 2015

I primarily use jq to extract and filter messages from our application logs, which are JSON. However, occasionally the system will interleave two messages essentially making the line useless junk. When jq encounters this like it complains about it, and rightfully, but I would find it immensely useful if I could tell it to just silently ignore that line and carry on.

@nicowilliams

This comment has been minimized.

Copy link
Collaborator

commented Jan 29, 2015

@kennethjor Please see:

a) http://tools.ietf.org/html/draft-ietf-json-text-sequence-13 (soon to be an RFC), and

b) the --seq option to jq in the master branch in github.

That's likely the closest that jq will come to ignoring failures. This is designed with loggers in mind.

If you follow the discussions from the JSON WG you'll understand the problem. Say that you have an error in a JSON text, so you... discard it, but now you need to know where the next one starts, and that turns out to be difficult to figure out.

@kennethjor

This comment has been minimized.

Copy link
Author

commented Jan 30, 2015

@nicowilliams Thanks for that, that's excellent. Yes it did occur to me exactly how a processor like jq would be able to decide when invalid JSON had indeed ended. This seems to exactly fit the bill of what I need!

@kennethjor

This comment has been minimized.

Copy link
Author

commented Jan 30, 2015

@nicowilliams I think there's a bug in the sequence parsing. I would submit a pull request for a test at least, but my C is not the strongest. As is my understanding of JSON sequence, I'm surrounding the JSON text with 0x1e and 0x0a characters. jq parses it, so technically it works, but it complains about it at every line. Terminal output below:

kenn@klaatu:/tmp$ xxd test.json 
0000000: 1e7b 2261 223a 317d 0a                   .{"a":1}.
kenn@klaatu:/tmp$ cat test.json | jq --seq .
ignoring parse error: Truncated value at line 1, column 1
{
  "a": 1
}
kenn@klaatu:/tmp$ jq --version
jq-1.5rc1-15-g2e92c3e
@nicowilliams

This comment has been minimized.

Copy link
Collaborator

commented Jan 30, 2015

Hey, thanks for the report, and for trying it out! I'll take a look as soon
as possible.

@kennethjor

This comment has been minimized.

Copy link
Author

commented Jan 30, 2015

@nicowilliams No worries, jq is my new favourite tool and will forever be a standard part of my arsenal.

@nicowilliams

This comment has been minimized.

Copy link
Collaborator

commented Jan 30, 2015

@kennethjor So is --seq sufficient? Should we close this issue?

@kennethjor

This comment has been minimized.

Copy link
Author

commented Feb 1, 2015

@nicowilliams I haven't had a chance to check out your fix, but as for this issue, --seq is exactly what I needed. Thank you.

@kennethjor kennethjor closed this Feb 1, 2015

@dtolnay dtolnay added the support label Jul 27, 2015

@rjurney

This comment has been minimized.

Copy link

commented Dec 17, 2017

I wish this option existed and I don't understand why it can't work... because when parsing JSON, usually (or at least often) it is one record per line in a JSON Lines file. So it could just fail and emit nothing in a certain mode, like --validate-filter. Then I could grep/filter empty lines.

Maybe that behavior isn't what everyone needs, but many of us do! So that as an option would be great. It would make jq much more powerful for validating JSON Lines data. --seq doesn't work, as there is no sequence character in JSON Lines data.

@nicowilliams

This comment has been minimized.

Copy link
Collaborator

commented Dec 17, 2017

@rjurney Some kinds of invalidity in JSON texts can easily be handled (e.g., extra commas, trailing commas, missing commas, noise between texts...), but others can't be because handled easily or at all. E.g., {"foo":["bar"}true -- how to parse this?

What I'd like to do at some point is add lots of input/output formats, including some not-quite-JSON formats. But this is a volunteer project, so it's a matter of who has the time to contribute this.

@pkoppstein

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2017

For those who arrive here before the FAQ, see:

  1. "Q: Is there a way to have jq keep going after it hits an error in the input file?"

  2. the section Processing not-quite-valid JSON

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.