Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support general union sequence types #122

Closed
rhdunn opened this issue Aug 9, 2022 · 13 comments · Fixed by #1132
Closed

Support general union sequence types #122

rhdunn opened this issue Aug 9, 2022 · 13 comments · Fixed by #1132
Labels
Feature A change that introduces a new feature PR Pending A PR has been raised to resolve this issue XPath An issue related to XPath

Comments

@rhdunn
Copy link
Contributor

rhdunn commented Aug 9, 2022

Use Case

There are a number of cases where it is beneficial to define a type more precisely (specifically in parameters and return types) as a union of item or sequence types, for example:

  1. a binary type over xs:hexBinary and xs:base64Binary;
  2. an element that accepts ol or ul html list element names;
  3. an options parameter that accepts strings (xs:string*) an element (element(options)) or a map;
  4. a function that takes JSON types (map, array, xs:integer, xs:decimal, xs:string).

There are a number of MarkLogic APIs that make use of this. Several EXPath and EXQuery specifications can take advantage of this. I've also used this in my XQuery IntelliJ plugin when defining vendor APIs that have changed over the different versions.

Examples

(: BaseX API change in 8.5 :)
declare function archive:options($archive as xs:base64Binary)
     as (element(archive:options) | map(*)) external;

declare function html:list($list as (element(ol) | element(ul))) { ... };

(: https://docs.marklogic.com/cts:classify -- MarkLogic defines this as `(element() | map:map)?` :)
declare function cts:classify($data-nodes as node()*,
                              $classifier as element(cts:classifier),
                              $options as (element()? | map:map?))
     as element(cts:label)* external;

(: https://docs.marklogic.com/cts:search :)
declare function cts:search($expression as node()*,
                            $query as cts:query?,
                            $options as (cts:order* | xs:string*))
     as node()* external;

Existing Support

  1. Local Union Types -- This handles support for unions over atomic types.
  2. Extending element and attribute tests to NameTest unions #23 -- This provides a more concise syntax for unions over element or attribute names.
  3. Types -- The Formal Semantics specification defines union types.
  4. Sequence Type Union -- This is the definition in my XQuery IntelliJ plugin.

Note: Due to SequenceTypeUnion being present in typeswitch expressions XQuery implementations will have existing code to handle matching these unioned types.

Syntax

3.4 Sequence Types

SequenceTypeUnion ::= SequenceType  ("|"  SequenceType)*
SequenceType ::= EmptySequenceType | (ItemType OccurrenceIndicator?) | ParenthesizedSequenceType
EmptySequenceType ::= "empty-sequence" "(" ")"
ParenthesizedSequenceType ::= "(" SequenceTypeUnion ")"
ItemType ::= AnyItemTest | TypeName | KindTest | FunctionTest | MapTest | ArrayTest |
             AtomicOrUnionType | RecordTest | LocalUnionType | EnumerationType

Design Note: SequenceTypeUnion is an existing BNF symbol used in typeswitch expressions that is unchanged in this issue.

3.6 Item Types

ItemTypeUnion ::= ItemType  ("|"  ItemType)*
ParenthesizedItemType ::= "("  ItemTypeUnion  ")"
ParenthesizableItemType ::= ItemType | ParenthesizedItemType

Design Note: ItemTypeUnion mirrors SequenceTypeUnion, allowing the non-sequence unions to be used in the contexts where only item types are allowed. Implementations can make use of the SequenceTypeUnion logic after the syntax/parser validates the item type restriction in those contexts.

Design Note: An alternative to this -- in order to minimize grammar changes -- would be to replace the ItemType with an ItemTypeBase symbol (or appropriately named alternative), and then define ItemType accordingly:
ItemTypeBase ::= AnyItemTest | TypeName | KindTest | ...
ItemTypeUnion ::= ItemTypeBase ("|" ItemTypeBase)*
ItemType ::= ItemTypeBase | ParenthesizedItemType
SequenceType ::= EmptySequenceType | (ItemTypeBase OccurrenceIndicator?) | ParenthesizedSequenceType

Other Changes

Design Notes: If ItemType is changed to ParenthesizableItemType, these are the other areas in the current XPath/XQuery 4.0 grammar that need changing.

ContextItemDecl ::= "declare"  "context"  "item"  ("as"  ParenthesizableItemType)?
                    ((":="  VarValue)  |  ("external"  (":="  VarDefaultValue)?))
ItemTypeDecl ::= "item-type" EQName "as" ParenthesizableItemType
TypedMapTest ::= "map" "(" ParenthesizableItemType "," SequenceType ")"
LocalUnionType ::= "union" "(" ParenthesizableItemType ("," ParenthesizableItemType)* ")"

Text

4.22.2 Typeswitch

The effective case definition is defined as:

The effective case in a typeswitch expression is the first case clause in which the value of the operand expression matches a SequenceType in the SequenceTypeUnion of the case clause, using the rules of SequenceType matching.

In order to make that fit this proposal, the wording should be updated to something like:

The effective case in a typeswitch expression is the first case clause in which the value of the operand expression matches the SequenceTypeUnion of the case clause, using the rules of SequenceType matching.

3.7.2 The judgement subtype-itemtype(A, B)

Section (2) Conditions for atomic and union types: should add the following rules:

  1. A is an ItemTypeUnion in the form (T1 | T2 | ...) and every type T in (T1, T2, ...) satisfies subtype-itemType(T, B).
  2. B is an ItemTypeUnion in the form (T1 | T2 | ...) and any type T in (T1, T2, ...) satisfies subtype-itemType(A, T).

3.7.1 The judgement subtype(A, B)

The first paragraph in this section shall be replaced by:

The judgement subtype(A, B) determines if the sequence type A is a subtype of the sequence type B. A can either be empty-sequence(), xs:error, an ItemType, Ai, possibly followed by an occurrence indicator, or a SequenceTypeUnion. Similarly B can either be empty-sequence(), xs:error, an ItemType, Bi, possibly followed by an occurrence indicator, or a SequenceTypeUnion.

The result of the subtype(A, B) judgement can be determined as follows:

  1. If A is a SequenceTypeUnion in the form (T1 | T2 | ...) and every type T in (T1, T2, ...) satisfies subtype(T, B), then subtype(A, B) is true.
  2. If B is a SequenceTypeUnion in the form (T1 | T2 | ...) and any type T in (T1, T2, ...) satisfies subtype(A, T), then subtype(A, B) is true.
  3. Otherwise, the result of the subtype(A, B) judgement can be determined from the table below, which makes use of the auxiliary judgement subtype-itemtype(Ai, Bi) defined in 3.7.2 The judgement subtype-itemtype(A, B) .
@michaelhkay
Copy link
Contributor

What is the effect on the type subsumption rules, that is the judgements subtype(A, B) and subtype-itemType(A, B)?

What is the effect on the function conversion rules?

What is the effect on XSLT patterns?

Generally I'm sympathetic to the proposal. The current proposal to allow unions only over atomic types is a relatively timid compromise, motivated by the fact that schema-defined union types are already supported. This more ambitious proposal certainly meets additional requirements. I'm slightly concerned though at the amount of detail that will need to be worked through, for example the effect on the function conversion rules.

@rhdunn
Copy link
Contributor Author

rhdunn commented Aug 10, 2022

The effective case in the typeswitch section is defined as:

The effective case in a typeswitch expression is the first case clause in which the value of the operand expression matches a SequenceType in the SequenceTypeUnion of the case clause, using the rules of SequenceType matching.

In order to make that fit this proposal, the wording should be updated to something like:

The effective case in a typeswitch expression is the first case clause in which the value of the operand expression matches the SequenceTypeUnion of the case clause, using the rules of SequenceType matching.

3.7.2 The judgement subtype-itemtype(A, B)

For ItemType unions, the rules should be the same as for LocalTypeUnion values as they follow the same behaviour. Specifically, in section (2) Conditions for atomic and union types: you would have:

  1. A is an ItemTypeUnion in the form (T1 | T2 | ...) and every type T in (T1, T2, ...) satisfies subtype-itemType(T, B).

3.7.1 The judgement subtype(A, B)

This would be supported by something like:

If A is a SequenceTypeUnion in the form (T1 | T2 | ...) and every type T in (T1, T2, ...) satisfies subtype(T, B), then subtype(A, B) is true.

Design Note: If the subtype judgement is reworded into a set of rules instead of the current table, then this would be a similar rule like is done for the other union types.

@michaelhkay
Copy link
Contributor

What about (A|B) being a subtype of (A|B|C)?

Also need to consider cast and castable. Which raises the question of whether the order is significant, consider cast 1987 to (xs:integer|xs:gYear). As I say, a lot of detail to work through.

@rhdunn
Copy link
Contributor Author

rhdunn commented Aug 10, 2022

I'm writing a new issue about that as the rules are currently broken when B is a pure or local union type.

@rhdunn
Copy link
Contributor Author

rhdunn commented Aug 10, 2022

To make it easier to follow, I've edited the proposal above to include the text changes. This includes adding the case when B is the union type for this proposal. Issue #124 covers the issue I noted above which I found when drafting these additional rules. That should cover the case you mentioned, and also make the effective case logic work correctly with the updated text that makes SequenceTypeUnion a fully supported type.

@rhdunn
Copy link
Contributor Author

rhdunn commented Aug 10, 2022

Note also that with the way LocalUnionType is specified using subtype-itemtype instead of derives-from, it is equivalent to ItemTypeUnion.

@cedporter cedporter added XPath An issue related to XPath XQuery An issue related to XQuery Feature A change that introduces a new feature labels Sep 14, 2022
@dnovatchev dnovatchev removed the XQuery An issue related to XQuery label Sep 22, 2022
@michaelhkay
Copy link
Contributor

I would prefer to focus on unions of item types, excluding sequence types. I think that would remove a whole area of complexity, without significant loss of functionality.

But are unions enough? For example, to represent the type J representing the result of fn:parse-json() accurately you need (xs:string | xs:boolean | xs:double | map(xs:string, J) | array(J))? - the definition is recursive, and any attempt to make the type system properly closed (even just for item types) is probably going to run into that.

@rhdunn
Copy link
Contributor Author

rhdunn commented Sep 30, 2022

I'm happy to split this into two issues -- unions of item types and unions of sequence types.

As for the other part, I think I have an issue here around self-referencing/recursive item-types in the context of the rng definition that has the same problem w.r.t. the next property. If not, it should be tracked.

I think that like with the variadic feature we should keep the successively more complex capabilities separate so we can at least get some of the functionality in place even if we don't get to cover all the use cases in this release.

@ChristianGruen ChristianGruen changed the title [XPath] [XQuery] Support general union sequence types Support general union sequence types Apr 27, 2023
@michaelhkay
Copy link
Contributor

michaelhkay commented Feb 5, 2024

I'd like to refresh this issue and see if we can make progress. There are definitely use cases, for example defining a function that works on both maps and arrays.

I'm a lot more comfortable defining general unions of item types rather than unions of sequence types. Mixing multiple cardinalities seems very disruptive in terms of type inference, subsumption rules, etc, and it's hard to identify convincing use cases.

For general unions of item types, there's a decision to be made: should this generalize the existing concept of union types (and thus reuse the syntax union(A,B)) or should it be something new and different? From the user perspective, reusing union(A, B) would seem a lot clearer than having two different mechanisms. The only real obstacle is that our union types are then even further removed from XSD union types. But it's already the case that they are not quite the same thing, so I think this should be manageable. Perhaps we should be very pedantic with the terminology and refer consistently to a "union item type" or a "union simple type" to distinguish them.

Whereas it makes sense for atomic union types to have a constructor function that creates an instance of the first one of the member types "that works", this doesn't make sense for generalised union item types. However, the matching semantics are straightforward (X matches union(A, B) if X matches A or X matches B), and the itemType-subtype rules also seem to need no change except in terminology and examples: if A is a union type, then A ⊆ B if every type T in the transitive membership of A satisfies T ⊆ B .

@ChristianGruen
Copy link
Contributor

I'm a lot more comfortable defining general unions of item types

+1

From the user perspective, reusing union(A, B) would seem a lot clearer than having two different mechanisms.

+1

Whereas it makes sense for atomic union types to have a constructor function that creates an instance of the first one of the member types "that works"

What’s the reason for having a constructor function of union types? Is it simply backward-compatibility? It feels confusing to me to be able to cast a value to a type that actually represents multiple times.

@michaelhkay
Copy link
Contributor

What’s the reason for having a constructor function of union types?

Good question. I think the answer is that you want types defined using union to look as much like types defined by restriction as possible. For example if you define my:nonZeroInteger as union(positiveInteger, negativeInteger) then you want to be able to say my:nonZeroInteger(23) just as if it were a type derived by restriction.

@ChristianGruen
Copy link
Contributor

Good question. I think the answer is that you want types defined using union to look as much like types defined by restriction as possible. For example if you define my:nonZeroInteger as union(positiveInteger, negativeInteger) then you want to be able to say my:nonZeroInteger(23) just as if it were a type derived by restriction.

Maybe we could discuss that again. I’m still uncomfortable with having a xs:numeric constructor, but maybe that’s just personal (I compared union types with abstract types, with was certainly misleading).

@michaelhkay
Copy link
Contributor

I've been doing more work on this. I started off thinking that sequence type unions were sufficient, and we could manage without item type unions, but working it through, I don't think that's the case. There are definite cases where we need something like (choice(array(*), map(*))*.

@michaelhkay michaelhkay added the PR Pending A PR has been raised to resolve this issue label Apr 5, 2024
@ndw ndw closed this as completed in #1132 Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A change that introduces a new feature PR Pending A PR has been raised to resolve this issue XPath An issue related to XPath
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants