-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
insert fails on JSONL with whitespace #417
Comments
I've not really thought about standards as much here as I should. It looks like there are two competing specs for newline-delimited JSON! http://ndjson.org/ is the one I've been using in
https://jsonlines.org/ is the other one. It is slightly less clear, but it does say this:
My interpretation of both of these is that newlines in the middle of a JSON object shouldn't be allowed. So what's The The thing I like about newline-delimited JSON is that it's really trivial to parse - loop through each line, run it through Unless someone has written a robust Python implementation of a |
That makes sense; just a little hint that points folks towards doing the right thing might be helpful! fwiw, the reason I was using jq in the first place was just a quick way to extract one attribute from an actual JSON array. When I initially imported it, I got a table with a bunch of embedded JSON values, rather than a native table, because each array entry had two attributes, one with the data I actually wanted. Not sure how common a use-case this is, though (and easily fixed, aside from the jq weirdness!) |
Updated documentation: https://sqlite-utils.datasette.io/en/latest/cli.html#inserting-newline-delimited-json |
Any JSON that is newline-delimited and has whitespace (newlines) between the start of a JSON object and an attribute fails due to a parse error.
e.g. given the valid JSONL:
I would expect that
sqlite-utils insert --nl my.db mytable file.jsonl
would properly import the data intomytable
. However, the following error is thrown instead:json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)
It makes sense that since the file is intended to be newline separated, the thing being parsed is "{" (which obviously fails), however the default newline-separated output of
jq
isn't compact. Usingjq -c
avoids this problem, but the fix is unintuitive and undocumented.Proposed solutions:
jq -c
filter ahead of the insert step.jq -c
instead" error message.It might just have been too early in the morning when I was playing with this, but running pipes of data through sqlite-utils without the 'knack' of it led to some false starts.
The text was updated successfully, but these errors were encountered: