Add multiple=true() option to fn:parse-json and fn:json-doc #235

michaelhkay · 2022-11-10T09:36:27Z

It is common practice (though not, I believe, covered by any standard) to have files that contain multiple JSON objects. Often these will be arranged one per line, as in our own qt3tests use case R31 at https://github.com/w3c/qt3tests/blob/master/app/UseCaseR31/sales.json . In that example, the file can be parsed using unparsed-text-lines()!parse-json(). But in the more general case, where each object may itself be multi-line, there's no easy way of handling this.

I propose an option multiple=true() on fn:parse-json and fn:json-doc that enables parsing of an input containing multiple (zero or more) concatenated JSON texts. When this option is present, the result will always be delivered as an array, containing one member for each JSON text in the input. The wrapper array will be present even if the number of JSON texts in the input is zero or one.

If a JSON text ends with a letter or digit and the next JSON text starts with a letter or digit then they must be separated by whitespace.

The text was updated successfully, but these errors were encountered:

ndw · 2022-11-10T09:41:40Z

I think this is a good idea, but I don't understand the last paragraph. What does it mean for a JSON text to end with a letter or digit if there isn't whitespace after it. I infer that

three
four

(Or, I suppose, three four on a single line.)

Would parse as ["three", "four"]. But the only interpretation I can make of the last paragraph is that the input is

threefour

which must surely parse as ["threefour"].

MHK: The only JSON texts that can end in a letter or number are (a) numbers, or (b) true, false, and null. So you can write 12 true true 5 but you can't write 12truetrue5.

ChristianGruen · 2022-11-10T09:59:51Z

We should also clarify if "A""B" is a legal input, or if it must be "A" "B".
Other cases: "A"{"B":"C"}, []1, …

Maybe, multiple JSON fragments should always be separated by at least one whitespace character?

MHK: that would certainly be simpler.

dnovatchev · 2022-11-10T20:45:39Z

So, what separator will be used between the JSON (objects') texts ?

Just assuming "whitespace separation" seems to be rather error-prone. If the input contains a syntactically invalid JSON text, then this could be parsed, quite misleadingly, as two or more JSON objects.

Or am I completely misunderstanding this?

michaelhkay · 2022-11-10T21:34:02Z

There's no ambiguity. The top level construct in the JSON grammar is an object, array, string, number, or the keyword true, false, or null. When you've read one of those, the only thing that can follow at present is EOF. This proposal changes this so instead of EOF, you can have another top-level construct.

You'll find lots of people asking how to parse files that contain multiple JSON objects. The nearest thing to a standard is "json lines" - https://jsonlines.org - which holds one JSON value per line; but there's no reason to restrict it that way, it's just as easy to allow multiple JSON values (which may contain newlines) separated by arbitrary whitespace.

It's true that erroneous JSON might be mis-parsed. But this mode of parsing won't be the default, so people will only use it if this is the input format they need to handle.

benibela · 2022-11-10T21:53:40Z

JSONiq had a jsoniq-multiple-top-level-items option for that

liberal could parse anything

dnovatchev · 2022-11-11T02:21:01Z

It's true that erroneous JSON might be mis-parsed. But this mode of parsing won't be the default, so people will only use it if this is the input format they need to handle.

Maybe it would be even better to allow the user to specify a particular JSON-document string delimiter (with default some whitespace) so that the chance for such accidental errors could be minimized?

ChristianGruen · 2023-10-18T11:14:25Z

In terms of coherence, a dedicated fn:parse-json-fragments may be the better choice (unless we add a multiple option for fn:parse-xml and fn:doc).

michaelhkay · 2024-04-21T23:29:24Z

I propose to drop this issue, on the grounds that parsing of JSON Lines input can be readily achieved using

array{unparsed-text-lines($input) =!> parse-json()}

Note that JSON Lines does NOT allow multiple arbitrary JSON texts to be simply concatenated with newline separators as suggested in the original proposal. Each line of the input has to be a JSON text, which means newlines can only be used to separate JSON texts, not to separate tokens within a JSON text. It's therefore possible to start by splitting the input into lines, and then parsing each line.

ndw · 2024-04-23T15:59:38Z

The CG agreed to close this issue without further action at meeting 074

ChristianGruen added XQFO An issue related to Functions and Operators Enhancement A change or improvement to an existing feature labels Nov 10, 2022

michaelhkay mentioned this issue Oct 18, 2023

JSON serialization: Sequences, INF/NaN, function items #576

Open

michaelhkay added the Propose Closing with No Action The WG should consider closing this issue with no action label Apr 21, 2024

ndw closed this as completed Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multiple=true() option to fn:parse-json and fn:json-doc #235

Add multiple=true() option to fn:parse-json and fn:json-doc #235

michaelhkay commented Nov 10, 2022

ndw commented Nov 10, 2022 •

edited by michaelhkay

Loading

ChristianGruen commented Nov 10, 2022 •

edited by michaelhkay

Loading

dnovatchev commented Nov 10, 2022

michaelhkay commented Nov 10, 2022 •

edited

Loading

benibela commented Nov 10, 2022

dnovatchev commented Nov 11, 2022

ChristianGruen commented Oct 18, 2023

michaelhkay commented Apr 21, 2024 •

edited

Loading

ndw commented Apr 23, 2024

Add multiple=true() option to fn:parse-json and fn:json-doc #235

Add multiple=true() option to fn:parse-json and fn:json-doc #235

Comments

michaelhkay commented Nov 10, 2022

ndw commented Nov 10, 2022 • edited by michaelhkay Loading

ChristianGruen commented Nov 10, 2022 • edited by michaelhkay Loading

dnovatchev commented Nov 10, 2022

michaelhkay commented Nov 10, 2022 • edited Loading

benibela commented Nov 10, 2022

dnovatchev commented Nov 11, 2022

ChristianGruen commented Oct 18, 2023

michaelhkay commented Apr 21, 2024 • edited Loading

ndw commented Apr 23, 2024

ndw commented Nov 10, 2022 •

edited by michaelhkay

Loading

ChristianGruen commented Nov 10, 2022 •

edited by michaelhkay

Loading

michaelhkay commented Nov 10, 2022 •

edited

Loading

michaelhkay commented Apr 21, 2024 •

edited

Loading