Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to make jq ignore invalid JSON and keep the original (non-JSON) output? #1547

Closed
gajus opened this issue Dec 7, 2017 · 11 comments
Closed

Comments

@gajus
Copy link

gajus commented Dec 7, 2017

$ jq --version
jq-1.5

e.g.

I have a program that outputs:

test
{"context":{"package":"@applaudience/showtime-api","namespace":"server","logLevel":20},"message":"started service on port 8001","sequence":0,"time":1512648653294,"version":"1.0.0"}
{"context":{"package":"@applaudience/showtime-api","namespace":"createHttpClientIdentifier","logLevel":20},"message":"created HttpClientIdentifier","sequence":1,"time":1512648653302,"version":"1.0.0"}

I want jq to parse the JSON lines and leave the non-JSON output untouched.

First, I have tried using --seq:

$ echo 'test
{"context":{"package":"@applaudience/showtime-api","namespace":"server","logLevel":20},"message":"started service on port 8001","sequence":0,"time":1512648715728,"version":"1.0.0"}
{"context":{"package":"@applaudience/showtime-api","namespace":"createHttpClientIdentifier","logLevel":20},"message":"created HttpClientIdentifier","sequence":1,"time":1512648715735,"version":"1.0.0"}' | jq --seq

Using --seq produces no output.

Then I tried using fromjson:

$ echo 'test
{"context":{"package":"@applaudience/showtime-api","namespace":"server","logLevel":20},"message":"started service on port 8001","sequence":0,"time":1512648715728,"version":"1.0.0"}
{"context":{"package":"@applaudience/showtime-api","namespace":"createHttpClientIdentifier","logLevel":20},"message":"created HttpClientIdentifier","sequence":1,"time":1512648715735,"version":"1.0.0"}' | jq -cRM 'fromjson?'
{"context":{"package":"@applaudience/showtime-api","namespace":"server","logLevel":20},"message":"started service on port 8001","sequence":0,"time":1512648715728,"version":"1.0.0"}
{"context":{"package":"@applaudience/showtime-api","namespace":"createHttpClientIdentifier","logLevel":20},"message":"created HttpClientIdentifier","sequence":1,"time":1512648715735,"version":"1.0.0"}

This method excludes the non-JSON output.

Then I have tried the try .. catch approach:

$ echo 'test
{"context":{"package":"@applaudience/showtime-api","namespace":"server","logLevel":20},"message":"started service on port 8001","sequence":0,"time":1512648715728,"version":"1.0.0"}
{"context":{"package":"@applaudience/showtime-api","namespace":"createHttpClientIdentifier","logLevel":20},"message":"created HttpClientIdentifier","sequence":1,"time":1512648715735,"version":"1.0.0"' | jq -cRM '. as $line | try fromjson catch $line'
"test"
{"context":{"package":"@applaudience/showtime-api","namespace":"server","logLevel":20},"message":"started service on port 8001","sequence":0,"time":1512648715728,"version":"1.0.0"}
"{\"context\":{\"package\":\"@applaudience/showtime-api\",\"namespace\":\"createHttpClientIdentifier\",\"logLevel\":20},\"message\":\"created HttpClientIdentifier\",\"sequence\":1,\"time\":1512648715735,\"version\":\"1.0.0\""

The problem with the output here is that the original non-JSON output received quotes, i.e. test became "test".

How to make jq ignore invalid JSON and keep the original (non-JSON) output?

@pkoppstein
Copy link
Contributor

In your particular case, you could simply add -r to the command-line switches:

jq -Rrc '. as $line | fromjson? // $line' gh-mixed-json-string.json
test
{"context":{"package":"@applaudience ...

The -r option causes top-level JSON strings to be printed as "raw" strings, so if you only want the erroneous lines to be printed in "raw" mode, a more complex solution will be required, but it is easily doable.

For future reference, please ask usage questions at stackoverflow.com with the jq tag:
https://stackoverflow.com/questions/tagged/jq

@gajus
Copy link
Author

gajus commented Jul 20, 2018

[..] a more complex solution will be required, but it is easily doable.

Can you clarify this?

Example,

$ echo '[nodemon] 1.18.2
[nodemon] to restart at any time, enter `rs`
[nodemon] watching: /Users/gajus/Documents/dev/applaudience/showtime-api/src/**/*
[nodemon] starting `"babel-node" src/bin/server.js`' | jq -crRC 'fromjson? | select(.context.logLevel > 20)'

does not print anything, whereas I expect it to simply print the input.

@gajus
Copy link
Author

gajus commented Jul 20, 2018

The closest thing I got to printing raw text and JSON is:

$ echo '[nodemon] 1.18.2
[nodemon] to restart at any time, enter `rs`
[nodemon] watching: /Users/gajus/Documents/dev/applaudience/showtime-api/src/**/*
[nodemon] starting `"babel-node" src/bin/server.js`
{"foo":"bar"}
' | jq -crRC '. as $raw | try fromjson catch $raw'
[nodemon] 1.18.2
[nodemon] to restart at any time, enter `rs`
[nodemon] watching: /Users/gajus/Documents/dev/applaudience/showtime-api/src/**/*
[nodemon] starting `"babel-node" src/bin/server.js`
{"foo":"bar"}

But now three appears to be no way to terminate the output at the catch, i.e. I want to be able to do:

$ $ echo '[nodemon] 1.18.2
[nodemon] to restart at any time, enter `rs`
[nodemon] watching: /Users/gajus/Documents/dev/applaudience/showtime-api/src/**/*
[nodemon] starting `"babel-node" src/bin/server.js`
{"foo":"bar"}
' | jq -crRC '. as $raw | try fromjson catch $raw | .? | select(.foo = "bar")'

The expected output is:

[nodemon] 1.18.2
[nodemon] to restart at any time, enter `rs`
[nodemon] watching: /Users/gajus/Documents/dev/applaudience/showtime-api/src/**/*
[nodemon] starting `"babel-node" src/bin/server.js`
{"foo":"bar"}

The actual output is:

{"foo":"bar"}

How do I force print at the catch clause?

@pkoppstein
Copy link
Contributor

pkoppstein commented Jul 20, 2018

@gajus - Unfortunately you seem to have completely misunderstood the sentence in which the clauses you quote ("a more complex solution will be required, but it is easily doable") appear.

In any case, you might like to try this:

jq -ncrR 'inputs as $raw | (fromjson? | select(.foo = "bar")) // $raw '

For future reference, please ask usage questions (including followup questions regarding the above) at stackoverflow.com using the jq tag: https://stackoverflow.com/questions/tagged/jq
That way, others will be more likely to benefit, and you will probably have more and/or better answers from which to choose.

@haizaar
Copy link

haizaar commented Jul 20, 2018

@gajus - I'm seconding @pkoppstein, his example works great:

$ echo '[nodemon] 1.18.2
[nodemon] to restart at any time, enter `rs`
[nodemon] watching: /Users/gajus/Documents/dev/applaudience/showtime-api/src/**/*
[nodemon] starting `"babel-node" src/bin/server.js`
{"foo":"bar"}
' | jq -Rrc '. as $line | try (fromjson | . + {"baz":.foo}) catch $line' 
[nodemon] 1.18.2
[nodemon] to restart at any time, enter `rs`
[nodemon] watching: /Users/gajus/Documents/dev/applaudience/showtime-api/src/**/*
[nodemon] starting `"babel-node" src/bin/server.js`
{"foo":"bar","baz":"bar"}

@gajus gajus closed this as completed Jul 20, 2018
@gajus
Copy link
Author

gajus commented Jul 20, 2018

Don't like this solution because it requires to wrap everything in a try..catch clause, i.e. this will make all other issues fail silently.

I think that jq should have a keyword/ function such as yield, that would force to print a particular message and break the current loop, i.e.jq -crRC '. as $raw | try fromjson catch yield($raw) | select(.foo = "bar") would produce the result that I expect without forcing everything to be wrapped in a try..catch block.

@pkoppstein
Copy link
Contributor

pkoppstein commented Jul 20, 2018

@gajus - Thank you for closing this trouble ticket.

From your last comment, it is evident that you are still having trouble understanding the flow of data in jq filters. Also, I think your comments about try/catch might reflect some misunderstanding of this feature. Perhaps these examples will help:

$ jq -n '1 | . as $in | try .a catch $in'
1

$ jq -n '1 | . as $in | try .a catch error( {"error": $in })'
jq: error (at <unknown>) (not a string): {"error":1}

$jq -n '1 | . as $in | try (try .a catch error( {"error": $in })) catch {"xyzzy": .}'
{
  "xyzzy": {
    "error": 1
  }
}

However, you are right that this solution is limited -- in effect, it assumes that each valid JSON entity occurs on a single line. That is, its applicability is effectively limited to JSONL that has been interspersed with lines of text.

@bitti
Copy link

bitti commented Aug 8, 2020

I see this is an old issue, still it makes me nervous that adding bogus features for solving the the wrong problem is even discussed. As a reminder the problem statement is:

I want jq to parse the JSON lines and leave the non-JSON output untouched.

Taken at face value, there is probably no way around of relying on the power of jq to distinguish between valid and invalid json, but looking at the examples and in fact probably 99% of the usecases regarding this issue, the question should be more like:

How do I skip parsing diagnostic output and only feed actual json logs to jq?

For probably most reasonably written software, valid json logs should be distinguishable by just looking at the first character: if it's not a { it's not a log. One line which exemplifies this is:

[4177]

That's valid json but probably not what the opener of this issue wanted to parse as a valid log entry. So I suggest doing what is unix standard and what proper programs should do in the first place: redirect diagnostic messages to stderr. Ok, one can argue that logs are "diagnostic messages" by definition, but I think we can agree that there is still a semantic difference between startup/shutdown or other exceptional output and actual log messages which are usually processed further down the pipeline (which is why we use json in the first place). But what if you can't fix the original program? Then you can work around it by doing the right thing yourself, e.g. awk can easily redirect non-matching lines to stderr:

$ echo '[nodemon] 1.18.2
[nodemon] to restart at any time, enter `rs`
[nodemon] watching: /Users/gajus/Documents/dev/applaudience/showtime-api/src/**/*
[nodemon] starting `"babel-node" src/bin/server.js`
{"foo":"bar"}' | awk '/^{/{print; fflush(); next} {print >"/dev/stderr" }'| jq
[nodemon] 1.18.2
[nodemon] to restart at any time, enter `rs`
[nodemon] watching: /Users/gajus/Documents/dev/applaudience/showtime-api/src/**/*
[nodemon] starting `"babel-node" src/bin/server.js`
{
  "foo": "bar"
}

This also has the advantage that you can still distinguish between json logs and other output and redirect the stderr stream to somewhere else (or you do it directly in awk in which case you may want to add another fflush()).

@justinabrahms
Copy link

For future folks who come by (including myself), this is what I did:

cat my-file-which-is-mostly.json | grep -E "^\{" | jq

It filters out anything that doesn't start with a {. Might not work for your use case.. but it might.

@bitti
Copy link

bitti commented Nov 12, 2020

Sure that's an easy way which should cover a lot of usecases. But the question was about how to keep the original output, hence my suggestion of redirecting non-matching lines to stderr.

A combination of shell functions I like to use is

rederr()(set -o pipefail;"$@" 2>&1 >&3|sed $'s,.*,\e[31m&\e[m,'>&2)3>&1
j() { awk '/^{/{print; fflush(); next} {print >"/dev/stderr" }'; }

which you can use like this

some-program-with-json-and-other-log-output | rederr j | jq

and if your terminal supports color, you can easily distinguish the non-json output by its red color .

@DenisShalaevSetronica
Copy link

Code:

$ echo 'Some non-json string
Other non-json sring
{"prg":"test", "level":"info", "message":"Parsed message from Json format"}
{"prg":"test" "message":"Comma is omitted"}' | jq -Rr '. as $line | (fromjson? | .message) // $line'

Output:

Some non-json string
Other non-json sring
Parsed message from Json format
{"prg":"test" "message":"Comma is omitted"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants