Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multiple=true() option to fn:parse-json and fn:json-doc #235

Closed
michaelhkay opened this issue Nov 10, 2022 · 9 comments
Closed

Add multiple=true() option to fn:parse-json and fn:json-doc #235

michaelhkay opened this issue Nov 10, 2022 · 9 comments
Labels
Enhancement A change or improvement to an existing feature Propose Closing with No Action The WG should consider closing this issue with no action XQFO An issue related to Functions and Operators

Comments

@michaelhkay
Copy link
Contributor

It is common practice (though not, I believe, covered by any standard) to have files that contain multiple JSON objects. Often these will be arranged one per line, as in our own qt3tests use case R31 at https://github.com/w3c/qt3tests/blob/master/app/UseCaseR31/sales.json . In that example, the file can be parsed using unparsed-text-lines()!parse-json(). But in the more general case, where each object may itself be multi-line, there's no easy way of handling this.

I propose an option multiple=true() on fn:parse-json and fn:json-doc that enables parsing of an input containing multiple (zero or more) concatenated JSON texts. When this option is present, the result will always be delivered as an array, containing one member for each JSON text in the input. The wrapper array will be present even if the number of JSON texts in the input is zero or one.

If a JSON text ends with a letter or digit and the next JSON text starts with a letter or digit then they must be separated by whitespace.

@ChristianGruen ChristianGruen added XQFO An issue related to Functions and Operators Enhancement A change or improvement to an existing feature labels Nov 10, 2022
@ndw
Copy link
Contributor

ndw commented Nov 10, 2022

I think this is a good idea, but I don't understand the last paragraph. What does it mean for a JSON text to end with a letter or digit if there isn't whitespace after it. I infer that

three
four

(Or, I suppose, three four on a single line.)

Would parse as ["three", "four"]. But the only interpretation I can make of the last paragraph is that the input is

threefour

which must surely parse as ["threefour"].

MHK: The only JSON texts that can end in a letter or number are (a) numbers, or (b) true, false, and null. So you can write 12 true true 5 but you can't write 12truetrue5.

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Nov 10, 2022

We should also clarify if "A""B" is a legal input, or if it must be "A" "B".
Other cases: "A"{"B":"C"}, []1, …

Maybe, multiple JSON fragments should always be separated by at least one whitespace character?

MHK: that would certainly be simpler.

@dnovatchev
Copy link
Contributor

So, what separator will be used between the JSON (objects') texts ?

Just assuming "whitespace separation" seems to be rather error-prone. If the input contains a syntactically invalid JSON text, then this could be parsed, quite misleadingly, as two or more JSON objects.

Or am I completely misunderstanding this?

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Nov 10, 2022

There's no ambiguity. The top level construct in the JSON grammar is an object, array, string, number, or the keyword true, false, or null. When you've read one of those, the only thing that can follow at present is EOF. This proposal changes this so instead of EOF, you can have another top-level construct.

You'll find lots of people asking how to parse files that contain multiple JSON objects. The nearest thing to a standard is "json lines" - https://jsonlines.org - which holds one JSON value per line; but there's no reason to restrict it that way, it's just as easy to allow multiple JSON values (which may contain newlines) separated by arbitrary whitespace.

It's true that erroneous JSON might be mis-parsed. But this mode of parsing won't be the default, so people will only use it if this is the input format they need to handle.

@benibela
Copy link

JSONiq had a jsoniq-multiple-top-level-items option for that

liberal could parse anything

@dnovatchev
Copy link
Contributor

It's true that erroneous JSON might be mis-parsed. But this mode of parsing won't be the default, so people will only use it if this is the input format they need to handle.

Maybe it would be even better to allow the user to specify a particular JSON-document string delimiter (with default some whitespace) so that the chance for such accidental errors could be minimized?

@ChristianGruen
Copy link
Contributor

In terms of coherence, a dedicated fn:parse-json-fragments may be the better choice (unless we add a multiple option for fn:parse-xml and fn:doc).

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Apr 21, 2024

I propose to drop this issue, on the grounds that parsing of JSON Lines input can be readily achieved using

array{unparsed-text-lines($input) =!> parse-json()}

Note that JSON Lines does NOT allow multiple arbitrary JSON texts to be simply concatenated with newline separators as suggested in the original proposal. Each line of the input has to be a JSON text, which means newlines can only be used to separate JSON texts, not to separate tokens within a JSON text. It's therefore possible to start by splitting the input into lines, and then parsing each line.

@michaelhkay michaelhkay added the Propose Closing with No Action The WG should consider closing this issue with no action label Apr 21, 2024
@ndw
Copy link
Contributor

ndw commented Apr 23, 2024

The CG agreed to close this issue without further action at meeting 074

@ndw ndw closed this as completed Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement A change or improvement to an existing feature Propose Closing with No Action The WG should consider closing this issue with no action XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

5 participants