-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multiple=true() option to fn:parse-json and fn:json-doc #235
Comments
I think this is a good idea, but I don't understand the last paragraph. What does it mean for a JSON text to end with a letter or digit if there isn't whitespace after it. I infer that
(Or, I suppose, Would parse as
which must surely parse as MHK: The only JSON texts that can end in a letter or number are (a) numbers, or (b) true, false, and null. So you can write |
We should also clarify if Maybe, multiple JSON fragments should always be separated by at least one whitespace character? MHK: that would certainly be simpler. |
So, what separator will be used between the JSON (objects') texts ? Just assuming "whitespace separation" seems to be rather error-prone. If the input contains a syntactically invalid JSON text, then this could be parsed, quite misleadingly, as two or more JSON objects. Or am I completely misunderstanding this? |
There's no ambiguity. The top level construct in the JSON grammar is an object, array, string, number, or the keyword true, false, or null. When you've read one of those, the only thing that can follow at present is EOF. This proposal changes this so instead of EOF, you can have another top-level construct. You'll find lots of people asking how to parse files that contain multiple JSON objects. The nearest thing to a standard is "json lines" - https://jsonlines.org - which holds one JSON value per line; but there's no reason to restrict it that way, it's just as easy to allow multiple JSON values (which may contain newlines) separated by arbitrary whitespace. It's true that erroneous JSON might be mis-parsed. But this mode of parsing won't be the default, so people will only use it if this is the input format they need to handle. |
JSONiq had a
|
Maybe it would be even better to allow the user to specify a particular JSON-document string delimiter (with default some whitespace) so that the chance for such accidental errors could be minimized? |
In terms of coherence, a dedicated |
I propose to drop this issue, on the grounds that parsing of JSON Lines input can be readily achieved using
Note that JSON Lines does NOT allow multiple arbitrary JSON texts to be simply concatenated with newline separators as suggested in the original proposal. Each line of the input has to be a JSON text, which means newlines can only be used to separate JSON texts, not to separate tokens within a JSON text. It's therefore possible to start by splitting the input into lines, and then parsing each line. |
The CG agreed to close this issue without further action at meeting 074 |
It is common practice (though not, I believe, covered by any standard) to have files that contain multiple JSON objects. Often these will be arranged one per line, as in our own qt3tests use case R31 at https://github.com/w3c/qt3tests/blob/master/app/UseCaseR31/sales.json . In that example, the file can be parsed using
unparsed-text-lines()!parse-json()
. But in the more general case, where each object may itself be multi-line, there's no easy way of handling this.I propose an option multiple=true() on
fn:parse-json
andfn:json-doc
that enables parsing of an input containing multiple (zero or more) concatenated JSON texts. When this option is present, the result will always be delivered as an array, containing one member for each JSON text in the input. The wrapper array will be present even if the number of JSON texts in the input is zero or one.If a JSON text ends with a letter or digit and the next JSON text starts with a letter or digit then they must be separated by whitespace.
The text was updated successfully, but these errors were encountered: