Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unions revisited #88

Closed
goessner opened this issue Mar 26, 2021 · 22 comments
Closed

Unions revisited #88

goessner opened this issue Mar 26, 2021 · 22 comments

Comments

@goessner
Copy link
Collaborator

Consider following JSON value ...

{
  "id": "v",
  "a": { "u": [1,2], "v": [3,4] },
  "b": { "u": [11,12], "v": [13,14] },
  "c": [ [21,22], [23,24] ]
}

1. Basic Usage

  • JSONPath allows alternate names or array indices as a set.
  • Names or indices refer to their immediate parent object or array.

Examples are ...

expression result
$['a','b'].u [[1,2],[11,12]]
$.c[0,1] [[21,22],[23,24]]
$.a..['u',1] [[1,2],2,4]

2. Unions are Syntactic Sugar

Example expressions above can easily be singularized, i.e.

union expression singularized expressions
$['a','b'].u $['a'].u | $['b'].u
$.c[0,1] $.c[0] | $.c[1]
$.a..['u',1] $.a..['u'] | $.a..[1]

using XPath operator | for or.
So if all union expressions are only syntactic sugar, why do they exist? Possible answers are:

  • Sparing key strokes.
  • Better performance.

Last point is the more serious one. Only implementors can show individually, if processing union expressions is more performant than multiple invocations of their JSONPath command / function with singularized expressions or otherwise round.

Maybe this is motivation enough to slightly change the spec from

An implementation of this specification, from now on referred to
simply as "an implementation", SHOULD takes two inputs, a JSONPath
and a JSON value, and produce ...

to

An implementation of this specification, from now on referred to
simply as "an implementation", SHOULD takes two inputs, a single JSONPath or a list of JSONPath queries
and a JSON value, and produce ...

in order to allow implementors applying a list of queries to a JSON value and thus improving their performance.

3. Duplicates and Ordering

Accepting equivalence of an union expression and its set of singularized expressions according to Ch.2, neither duplicates nor ordering needs to be discussed anymore with unions, since

  • $[0,0] yields the same result as $[0] | $[0], which is obviously twice the same value.
  • $[1,2] should yield the same result as $[1] | $[2]. Exact order might still be not deterministic.

4. Variable Expressions

I know only a handful examples using the root selector, which are of practicle value. One of them is internal referencing ala $.a[($.id)] === $.a['v'] => [[3,4]]. Examples with the current node selector are questionable at best, like $.a.u[(@[0])] === $.a.u[1] => [2].

Results of those expressions are always interpreted as names or indices (JSON literals string and number) according to the nature of their parents.

Now there are rising some questions regarding this:

  • Do we allow those expressions, despite comments say, that their practical value is quite low.
  • If we do, wouldn't it be safe – as also proposed – to leave out the parenthesis, giving: $.a[$.id] and $.a.u[@[0]].
  • If we allow, how to deal with nesting ... $.a[@.u[@[0]]] ?
  • If we do, it might be consequent to also allow that with unions, as in $.a['u',$.id]?

5. Path Expressions

@danielaparker linked to an interesting discussion regarding unions containing paths, as in $..[id,a.u] wanting to get $.id | $.a.u. Applying singularization principle yields

  • $..[id] ... ok, if 'id' was used.
  • $..[a.u] ... nonsense, since $.a.u was meant.

Using $..['id',@.a.u] according to Ch.4 is completely different.

6. Resumee

  1. Union expressions can always be replaced by a set of singularized expressions.
  2. So union expressions might become obsolete, while propagating parallel evaluation of multiple query expressions.
  3. I propose to support union expressions for historical reasons and allow string or number literals exclusively.
  4. If variable expressions are allowed, only the root selector should be used. Current node selector as well as nested expression are disallowed.
  5. Unions shouldn't be first class citizens.

I simply took the term "union" from XPath 1.0. Now I also agree with most of others here, that "union" should be replaced by a better term.

@cabo
Copy link
Member

cabo commented Mar 26, 2021 via email

@glyn
Copy link
Collaborator

glyn commented Mar 26, 2021

An implementation of this specification, from now on referred to
simply as "an implementation", SHOULD takes two inputs, a single JSONPath or a list of JSONPath queries
and a JSON value, and produce ...

in order to allow implementors applying a list of queries to a JSON value and thus improving their performance.

Alternatively, we could spec the behaviour for a single JSONPath query and leave it up to implementations to support lists of queries. Since we don't seem to be in the business of standardising the API to implementations, that keeps our job a bit simpler. And we could do away with or at least simplify "unions".

  1. I propose to support union expressions for historical reasons and allow string or number literals exclusively.

Note that whether to remove duplicate nodes would still need to be decided for cases such as [0,0].

@danielaparker
Copy link

@cabo wrote:

On 2021-03-26, at 17:25, Stefan Goessner @.***> wrote: $['a','b'].u $['a'].u | $['b'].u
So $[‘a’,’b’][‘u’,’v’] ➔ $[‘a’][‘u’,’v’] | $[’b’][‘u’,’v’] ➔ $[‘a’][‘u’] | $[‘a’][‘v’] | $[‘b’][‘u’] | $['b’][‘v’] (i.e., ausmultiplizieren)? (Please excuse the smartquotes) Grüße, Carsten

Indeed.

Sparing key strokes ... and not have to mentally compute what all the single paths are. I don't see "unions" as merely syntactic sugar.

Daniel

@danielaparker
Copy link

danielaparker commented Mar 26, 2021

@goessner wrote:

6. Resumee

  1. Union expressions can always be replaced by a set of singularized expressions.
  2. So union expressions might become obsolete, while propagating parallel evaluation of multiple query expressions.
  3. I propose to support union expressions for historical reasons and allow string or number literals exclusively.
  4. If variable expressions are allowed, only the root selector should be used. Current node selector as well as nested expression are disallowed.
  5. Unions shouldn't be first class citizens.

I don't agree with the conclusion because the notational convenience of unions has value, the convenience of not having to repeat the first part of the path is significant, especially with combinations such as in @cabo's example. I don't think the performance concerns are as important, singularized expression evaluation can be optimized, and unions can also execute in parallel. Although with wildcards in the path, I think the edge goes to unions.

But I think this way of looking at the problem is helpful, because it suggests what the allowed items in a union expression should be. The proposal notes that union expressions can always be replaced by a set of singularized expressions. If conversely we require that a a set of singularized expressions can always be replaced by a single expression with unions, it suggests that the allowed items in a union should include all of indices, identifiers, slices, wildcards, and relative path expressions beginning with @.

For example, given the root value,

{
          "firstName": "John",
          "lastName" : "doe",
          "age"      : 26,
          "address"  : {
            "streetAddress": "naist street",
            "city"         : "Nara",
            "postalCode"   : "630-0192"
          }
}

and single expressions

$..'firstName'
$..address.city

the result would be

[
    "John",
    "Nara"
]

A corresponding expression with unions could be

$..['firstName',@.address.city]

and the result would be the same.

An alternative equivalent union would be

$[@..'firstName',@..address.city]

which suggests that a convenient way to provide a set of singular paths is through a union.

This understanding of unions is supported in the jsoncons implementation, and its author thinks it's a natural generalization of the union concept. "Variable Expressions" don't really fit into this dual view (although the jsoncons implementation supports them with the parentheses providing disambiguation). Personally, I think "Variable Expressions" could be dropped, or kept for historical reasons only.

Daniel

@danielaparker
Copy link

@goessner wrote:

3. Duplicates and Ordering

Accepting equivalence of an union expression and its set of singularized expressions according to Ch.2, neither duplicates nor ordering needs to be discussed anymore with unions, since

  • $[0,0] yields the same result as $[0] | $[0], which is obviously twice the same value.
  • $[1,2] should yield the same result as $[1] | $[2]. Exact order might still be not deterministic.

I'm not convinced. I think the issue of duplicates is orthogonal to looking at unions in this way.

Consider the root value

[
    "first",
    "second",
    "third",
    "forth",
    "fifth"
]

and singular paths

"$[1]
"$[0:3]

The resulting values and paths are

["second","first","second","third"]
["$[1]","$[0]","$[1]","$[2]"]

The issue of whether to remove the duplicate item "$[1]" would be the same for operator | as it would be for the union operator. It depends entirely on how the operators are defined.

Note that it would be possible for an implementation to provide an option to return results with duplicates or without duplicates. The implementation jsoncons supports both options.

@gregsdennis
Copy link
Collaborator

I simply took the term "union" from XPath 1.0. Now I also agree with most of others here, that "union" should be replaced by a better term. - @goessner

I think "union" is fine for "multiple indices combined into a single bracket-notation selector." Previous usages of this term had been as for what we now call a "selector." This was my primary argument in #21.

$..['firstName',@.address.city] - @danielaparker

While I'm happy to have multiple indices, I think each index needs to be valid unto itself. The current syntax wouldn't accept $[@.address.city], so I don't think the union should.

Additionally, I'm not sure I like the idea of paths being indices, whether or not they use @. I think I would prefer $..['firstName','city'], though it doesn't do quite the same thing since yours looks specfically for address.city. We should probably take this syntax to another issue.

The issue of whether to remove the duplicate item "$[1]" would be the same for operator | as it would be for the union operator. It depends entirely on how the operators are defined. - @danielaparker

Agreed. $[0,0] would follow suit.

@danielaparker
Copy link

@gregsdennis wrote:

$..['firstName',@.address.city] - @danielaparker

While I'm happy to have multiple indices, I think each index needs to be valid unto itself. The current syntax wouldn't accept $[@.address.city], so I don't think the union should.

I don't fully understand this point (putting aside concerns about the @ notation.) The grammar as currently presented in the draft doesn't distinguish between bracketed expressions with one entry and unions. Bracketed expressions are defined entirely in terms of union elements. The grammar is currently incomplete, but I'm interested in what the grammar does with "*" and filters.

@glyn
Copy link
Collaborator

glyn commented Mar 27, 2021

As another data point on unions and filters, @cburgmer's Proposal A has a clever restricted syntax within filters which disallows "*" and ensures that comparisons inside filters are only operating on single values.

@gregsdennis
Copy link
Collaborator

I'm perfectly happy with the union (more than one index in a bracket). My concern was the @ syntax inside the bracket (which you put aside). That's not defined anywhere, and I think it should be discussed separately.

@danielaparker
Copy link

@glyn wrote:

As another data point on unions and filters, @cburgmer's Proposal A has a clever restricted syntax within filters which disallows "*" and ensures that comparisons inside filters are only operating on single values.

@glyn, Thanks for the link. As far as I can tell the grammar in the draft hasn't changed since your original upload, it would be nice to see it move forward :-) Or do you feel that it's gone as far as it can before other issues are resolved?

@glyn
Copy link
Collaborator

glyn commented Mar 27, 2021

@glyn, Thanks for the link. As far as I can tell the grammar in the draft hasn't changed since your original upload, it would be nice to see it move forward :-) Or do you feel that it's gone as far as it can before other issues are resolved?

I'd like to see a PR for .. soon as well as script expressions and then probably a series of PRs for filter expressions, picking off the consensus first before we move on to the more contentious items.

For the record, I agreed with the WG chairs to focus on the compliance test suite and reference implementation (and thereby provide a counterbalance to, and critique of, the "pure" spec work) rather than doing more spec work. Also, I'd like the spec details to genuinely be a product of multiple minds and I don't have much written evidence that many others have yet engaged with the details of how selectors are combined into JSONPaths. When I see the PRs I just mentioned start to appear, I'll be very happy...

@danielaparker
Copy link

@goessner wrote:

2. Unions are Syntactic Sugar

Example expressions above can easily be singularized, i.e.

union expression singularized expressions
$['a','b'].u $['a'].u | $['b'].u
$.c[0,1] $.c[0] | $.c[1]
$.a..['u',1] $.a..['u'] | $.a..[1]
using XPath operator | for or.
So if all union expressions are only syntactic sugar, why do they exist? Possible answers are:

But also note that in XPath, the union and | operators are equivalent, and parentheses are supported, so I think the XPath style | operator equivalent of $.a..['u',1] would be

$.a..('u' | 1)

@goessner
Copy link
Collaborator Author

hmm ... if we agreed, that would break existing implementations ... !?

@danielaparker
Copy link

danielaparker commented Mar 27, 2021

@goessner wrote:

hmm ... if we agreed, that would break existing implementations ... !?

Assuming this is replying to this, of course, I'm not proposing that notation :-) That notation would change the JSONPath parse tree from a simple list of selectors to a full tree with operands and operator precedence, as it is in XPath and JMESPath. I think the existing union notation is fine. I'm only suggesting that $.a..('u' | 1) would be the equivalent JSONPath notation if JSONPath did support the | operator in the same way as XPath, and you mentioned both in your comment.

On this point, there would be no break to existing implementations, it would be a generalization only.

@danielaparker
Copy link

danielaparker commented Mar 27, 2021

@gregsdennis wrote:

I'm perfectly happy with the union (more than one index in a bracket). My concern was the @ syntax inside the bracket (which you put aside). That's not defined anywhere, and I think it should be discussed separately.

Okay, putting aside the specific notation, I'll just note that the union in XPath that inspired the union in JSONPath allows expressions as union elements (in the same way as I suggested with the @ notation, they're evaluated against the current item), and the somewhat analogous multi-select-list in JMESPath allows expressions (again evaluated against the current item).

These are some motivations. I think @goessner's thoughts about the equivalence of an | operator and a union provide additional motivation if interpreted in the right way, meaning the way in which I want them to be interpreted :-)

There are a few implementations in JSONPath comparisons that support an example of this but without the leading '@'. There have been requests for this feature on stackoverflow and elsewhere.

I think that covers the reasons in favour.

@danielaparker
Copy link

danielaparker commented Mar 27, 2021

@gregsdennis wrote:

Additionally, I'm not sure I like the idea of paths being indices, whether or not they use @. I think I would prefer $..['firstName','city'], though it doesn't do quite the same thing since yours looks specfically for address.city. We should probably take this syntax to another issue.

I think the issue here is with the bracket notation being overloaded in JSONPath for both indexes, on the one hand, and XPath style unions, on the other. For "indexes" interpreted broadly, it's natural to restrict to numbers, slices and wildcards, and perhaps identifiers. For unions, it's natural to allow paths, as XPath does.

JMESPath also uses brackets for both indices and multi-select-list (analogous to unions), but in the grammar distinguishes between them. It distinguishes between a bracket specifier, with one element,

bracket-specifier = "[" (number / "*" / slice-expression) "]" / "[]"
bracket-specifier =/ "[?" expression "]" ; analogous to JSONPath filter

and a multi-select-list, which only allows comma separated expressions (paths). In JMESPath, identifiers cannot start with a number, so there is no ambiguity.

But even without that grammatical distinction, there would be no ambiguity to allow both indexes and paths as union elements in JSONPath unions.

The reason for raising this in this issue is that it fits naturally with @goessner's discussion about the equivalence of the 'or' operator and unions, and clearly in the or operator paths are allowed. In XPath, the 'or' operator is explicitly equivalent to the union, and any union can be replaced anywhere in the path by an equivalent or operator, with parenthesis.

@gregsdennis
Copy link
Collaborator

I think we need to put what qualifies as an "index" (used here to describe a single element inside the bracket notation, e.g. [1] or ['foo']) in another issue. This issue seems to center around supporting more than one "index", i.e. a "union".

I am fine with combining more than one "index" separated by commas. Further, whatever is decided to qualify as an "index" should be unionable in this way. For example, these should all be valid:

  • $[1,'foo']
  • $['foo',-1]
  • $[1,'foo',3:5]

where 1, foo, -1, and 3:5 are all examples of an "index" as used in this comment. (We should probably have a generic term for this. Also see @goessner's comment, point 3.)

@gregsdennis
Copy link
Collaborator

Putting this here b/c I'm not sure where else to put it.

A StackOverflow question regarding support for dotted paths inside brackets.

The main point of confusion here is the idea that a dotted path could legitimately represent a key.

Consider the following JSON:

{"foo.bar": 0, "foo": {"bar": 1}}

What would the path $['foo.bar'] return? My implementations would return the 0, but not the 1. I'm not necessarily pushing for this behavior, though.

The same argument applies whether a path starts with a @ or not.

I think this is something that we need to cover in the spec, even if we decide not to support it.

(This would be one of the options for a key or index or whatever we end up calling a thing inside the brackets.)

@cabo
Copy link
Member

cabo commented Mar 29, 2021

Starting to interpret indexing strings as if they were JSONPath syntax leads to a slippery slope. Strong opinion against that.

@goessner
Copy link
Collaborator Author

Following @gregsdennis' hint I raised a new issue bracket notation #92 , consensus in which will directly influence this issue and hopefully leads to closing both.

PS: Excuse me for introducing '|' as a shortcut for 'or' (one character saved ... minimization principle :-) and the confusion followed. It was never meant to propose a new syntax here ala XPath.

PPS: in slightly provocative
"... if we agreed, that would break existing implementations ... !?"
should have been written as
"... if we agreed, that would break user expected behavior of existing implementations ... !?"

@glyn
Copy link
Collaborator

glyn commented Mar 29, 2021

I'm happy to close this issue.

@cabo
Copy link
Member

cabo commented Jan 17, 2022

I'm happy to close this issue.

I don't think anything actionable remains.

Closing.

@cabo cabo closed this as completed Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants