Support general union sequence types #122

rhdunn · 2022-08-09T12:20:01Z

Use Case

There are a number of cases where it is beneficial to define a type more precisely (specifically in parameters and return types) as a union of item or sequence types, for example:

a binary type over xs:hexBinary and xs:base64Binary;
an element that accepts ol or ul html list element names;
an options parameter that accepts strings (xs:string*) an element (element(options)) or a map;
a function that takes JSON types (map, array, xs:integer, xs:decimal, xs:string).

There are a number of MarkLogic APIs that make use of this. Several EXPath and EXQuery specifications can take advantage of this. I've also used this in my XQuery IntelliJ plugin when defining vendor APIs that have changed over the different versions.

Examples

(: BaseX API change in 8.5 :)
declare function archive:options($archive as xs:base64Binary)
     as (element(archive:options) | map(*)) external;

declare function html:list($list as (element(ol) | element(ul))) { ... };

(: https://docs.marklogic.com/cts:classify -- MarkLogic defines this as `(element() | map:map)?` :)
declare function cts:classify($data-nodes as node()*,
                              $classifier as element(cts:classifier),
                              $options as (element()? | map:map?))
     as element(cts:label)* external;

(: https://docs.marklogic.com/cts:search :)
declare function cts:search($expression as node()*,
                            $query as cts:query?,
                            $options as (cts:order* | xs:string*))
     as node()* external;

Existing Support

Local Union Types -- This handles support for unions over atomic types.
Extending element and attribute tests to NameTest unions #23 -- This provides a more concise syntax for unions over element or attribute names.
Types -- The Formal Semantics specification defines union types.
Sequence Type Union -- This is the definition in my XQuery IntelliJ plugin.

Note: Due to SequenceTypeUnion being present in typeswitch expressions XQuery implementations will have existing code to handle matching these unioned types.

Syntax

3.4 Sequence Types

SequenceTypeUnion ::= SequenceType  ("|"  SequenceType)*
SequenceType ::= EmptySequenceType | (ItemType OccurrenceIndicator?) | ParenthesizedSequenceType
EmptySequenceType ::= "empty-sequence" "(" ")"
ParenthesizedSequenceType ::= "(" SequenceTypeUnion ")"
ItemType ::= AnyItemTest | TypeName | KindTest | FunctionTest | MapTest | ArrayTest |
             AtomicOrUnionType | RecordTest | LocalUnionType | EnumerationType

Design Note: SequenceTypeUnion is an existing BNF symbol used in typeswitch expressions that is unchanged in this issue.

3.6 Item Types

ItemTypeUnion ::= ItemType  ("|"  ItemType)*
ParenthesizedItemType ::= "("  ItemTypeUnion  ")"
ParenthesizableItemType ::= ItemType | ParenthesizedItemType

Design Note: ItemTypeUnion mirrors SequenceTypeUnion, allowing the non-sequence unions to be used in the contexts where only item types are allowed. Implementations can make use of the SequenceTypeUnion logic after the syntax/parser validates the item type restriction in those contexts.

Design Note: An alternative to this -- in order to minimize grammar changes -- would be to replace the ItemType with an ItemTypeBase symbol (or appropriately named alternative), and then define ItemType accordingly:
ItemTypeBase ::= AnyItemTest | TypeName | KindTest | ...
ItemTypeUnion ::= ItemTypeBase ("|" ItemTypeBase)*
ItemType ::= ItemTypeBase | ParenthesizedItemType
SequenceType ::= EmptySequenceType | (ItemTypeBase OccurrenceIndicator?) | ParenthesizedSequenceType

Other Changes

Design Notes: If ItemType is changed to ParenthesizableItemType, these are the other areas in the current XPath/XQuery 4.0 grammar that need changing.

ContextItemDecl ::= "declare"  "context"  "item"  ("as"  ParenthesizableItemType)?
                    ((":="  VarValue)  |  ("external"  (":="  VarDefaultValue)?))
ItemTypeDecl ::= "item-type" EQName "as" ParenthesizableItemType
TypedMapTest ::= "map" "(" ParenthesizableItemType "," SequenceType ")"
LocalUnionType ::= "union" "(" ParenthesizableItemType ("," ParenthesizableItemType)* ")"

Text

4.22.2 Typeswitch

The effective case definition is defined as:

The effective case in a typeswitch expression is the first case clause in which the value of the operand expression matches a SequenceType in the SequenceTypeUnion of the case clause, using the rules of SequenceType matching.

In order to make that fit this proposal, the wording should be updated to something like:

The effective case in a typeswitch expression is the first case clause in which the value of the operand expression matches the SequenceTypeUnion of the case clause, using the rules of SequenceType matching.

3.7.2 The judgement subtype-itemtype(A, B)

Section (2) Conditions for atomic and union types: should add the following rules:

A is an ItemTypeUnion in the form (T1 | T2 | ...) and every type T in (T1, T2, ...) satisfies subtype-itemType(T, B).
B is an ItemTypeUnion in the form (T1 | T2 | ...) and any type T in (T1, T2, ...) satisfies subtype-itemType(A, T).

3.7.1 The judgement subtype(A, B)

The first paragraph in this section shall be replaced by:

The judgement subtype(A, B) determines if the sequence type A is a subtype of the sequence type B. A can either be empty-sequence(), xs:error, an ItemType, Ai, possibly followed by an occurrence indicator, or a SequenceTypeUnion. Similarly B can either be empty-sequence(), xs:error, an ItemType, Bi, possibly followed by an occurrence indicator, or a SequenceTypeUnion.

The result of the subtype(A, B) judgement can be determined as follows:

If A is a SequenceTypeUnion in the form (T1 | T2 | ...) and every type T in (T1, T2, ...) satisfies subtype(T, B), then subtype(A, B) is true.
If B is a SequenceTypeUnion in the form (T1 | T2 | ...) and any type T in (T1, T2, ...) satisfies subtype(A, T), then subtype(A, B) is true.
Otherwise, the result of the subtype(A, B) judgement can be determined from the table below, which makes use of the auxiliary judgement subtype-itemtype(Ai, Bi) defined in 3.7.2 The judgement subtype-itemtype(A, B) .

The text was updated successfully, but these errors were encountered:

michaelhkay · 2022-08-10T17:10:11Z

What is the effect on the type subsumption rules, that is the judgements subtype(A, B) and subtype-itemType(A, B)?

What is the effect on the function conversion rules?

What is the effect on XSLT patterns?

Generally I'm sympathetic to the proposal. The current proposal to allow unions only over atomic types is a relatively timid compromise, motivated by the fact that schema-defined union types are already supported. This more ambitious proposal certainly meets additional requirements. I'm slightly concerned though at the amount of detail that will need to be worked through, for example the effect on the function conversion rules.

rhdunn · 2022-08-10T18:55:33Z

The effective case in the typeswitch section is defined as:

The effective case in a typeswitch expression is the first case clause in which the value of the operand expression matches a SequenceType in the SequenceTypeUnion of the case clause, using the rules of SequenceType matching.

In order to make that fit this proposal, the wording should be updated to something like:

The effective case in a typeswitch expression is the first case clause in which the value of the operand expression matches the SequenceTypeUnion of the case clause, using the rules of SequenceType matching.

3.7.2 The judgement subtype-itemtype(A, B)

For ItemType unions, the rules should be the same as for LocalTypeUnion values as they follow the same behaviour. Specifically, in section (2) Conditions for atomic and union types: you would have:

A is an ItemTypeUnion in the form (T1 | T2 | ...) and every type T in (T1, T2, ...) satisfies subtype-itemType(T, B).

3.7.1 The judgement subtype(A, B)

This would be supported by something like:

If A is a SequenceTypeUnion in the form (T1 | T2 | ...) and every type T in (T1, T2, ...) satisfies subtype(T, B), then subtype(A, B) is true.

Design Note: If the subtype judgement is reworded into a set of rules instead of the current table, then this would be a similar rule like is done for the other union types.

michaelhkay · 2022-08-10T19:07:24Z

What about (A|B) being a subtype of (A|B|C)?

Also need to consider cast and castable. Which raises the question of whether the order is significant, consider cast 1987 to (xs:integer|xs:gYear). As I say, a lot of detail to work through.

rhdunn · 2022-08-10T19:09:02Z

I'm writing a new issue about that as the rules are currently broken when B is a pure or local union type.

rhdunn · 2022-08-10T19:39:33Z

To make it easier to follow, I've edited the proposal above to include the text changes. This includes adding the case when B is the union type for this proposal. Issue #124 covers the issue I noted above which I found when drafting these additional rules. That should cover the case you mentioned, and also make the effective case logic work correctly with the updated text that makes SequenceTypeUnion a fully supported type.

rhdunn · 2022-08-10T19:42:05Z

Note also that with the way LocalUnionType is specified using subtype-itemtype instead of derives-from, it is equivalent to ItemTypeUnion.

michaelhkay · 2022-09-30T17:14:20Z

I would prefer to focus on unions of item types, excluding sequence types. I think that would remove a whole area of complexity, without significant loss of functionality.

But are unions enough? For example, to represent the type J representing the result of fn:parse-json() accurately you need (xs:string | xs:boolean | xs:double | map(xs:string, J) | array(J))? - the definition is recursive, and any attempt to make the type system properly closed (even just for item types) is probably going to run into that.

rhdunn · 2022-09-30T18:44:42Z

I'm happy to split this into two issues -- unions of item types and unions of sequence types.

As for the other part, I think I have an issue here around self-referencing/recursive item-types in the context of the rng definition that has the same problem w.r.t. the next property. If not, it should be tracked.

I think that like with the variadic feature we should keep the successively more complex capabilities separate so we can at least get some of the functionality in place even if we don't get to cover all the use cases in this release.

michaelhkay · 2024-02-05T11:03:44Z

I'd like to refresh this issue and see if we can make progress. There are definitely use cases, for example defining a function that works on both maps and arrays.

I'm a lot more comfortable defining general unions of item types rather than unions of sequence types. Mixing multiple cardinalities seems very disruptive in terms of type inference, subsumption rules, etc, and it's hard to identify convincing use cases.

For general unions of item types, there's a decision to be made: should this generalize the existing concept of union types (and thus reuse the syntax union(A,B)) or should it be something new and different? From the user perspective, reusing union(A, B) would seem a lot clearer than having two different mechanisms. The only real obstacle is that our union types are then even further removed from XSD union types. But it's already the case that they are not quite the same thing, so I think this should be manageable. Perhaps we should be very pedantic with the terminology and refer consistently to a "union item type" or a "union simple type" to distinguish them.

Whereas it makes sense for atomic union types to have a constructor function that creates an instance of the first one of the member types "that works", this doesn't make sense for generalised union item types. However, the matching semantics are straightforward (X matches union(A, B) if X matches A or X matches B), and the itemType-subtype rules also seem to need no change except in terminology and examples: if A is a union type, then A ⊆ B if every type T in the transitive membership of A satisfies T ⊆ B .

ChristianGruen · 2024-02-05T11:25:29Z

I'm a lot more comfortable defining general unions of item types

+1

From the user perspective, reusing union(A, B) would seem a lot clearer than having two different mechanisms.

+1

Whereas it makes sense for atomic union types to have a constructor function that creates an instance of the first one of the member types "that works"

What’s the reason for having a constructor function of union types? Is it simply backward-compatibility? It feels confusing to me to be able to cast a value to a type that actually represents multiple times.

michaelhkay · 2024-02-05T11:56:27Z

What’s the reason for having a constructor function of union types?

Good question. I think the answer is that you want types defined using union to look as much like types defined by restriction as possible. For example if you define my:nonZeroInteger as union(positiveInteger, negativeInteger) then you want to be able to say my:nonZeroInteger(23) just as if it were a type derived by restriction.

ChristianGruen · 2024-02-05T12:25:21Z

Good question. I think the answer is that you want types defined using union to look as much like types defined by restriction as possible. For example if you define my:nonZeroInteger as union(positiveInteger, negativeInteger) then you want to be able to say my:nonZeroInteger(23) just as if it were a type derived by restriction.

Maybe we could discuss that again. I’m still uncomfortable with having a xs:numeric constructor, but maybe that’s just personal (I compared union types with abstract types, with was certainly misleading).

michaelhkay · 2024-04-05T08:52:48Z

I've been doing more work on this. I started off thinking that sequence type unions were sufficient, and we could manage without item type unions, but working it through, I don't think that's the case. There are definite cases where we need something like (choice(array(*), map(*))*.

rhdunn mentioned this issue Aug 10, 2022

[XPath] [XQuery] Incorrect subtype-itemtype rules for pure and local union types #124

Closed

cedporter added XPath An issue related to XPath XQuery An issue related to XQuery Feature A change that introduces a new feature labels Sep 14, 2022

dnovatchev removed the XQuery An issue related to XQuery label Sep 22, 2022

ChristianGruen changed the title ~~[XPath] [XQuery] Support general union sequence types~~ Support general union sequence types Apr 27, 2023

michaelhkay referenced this issue Jul 22, 2023

Type of $options parameter updated; obsolete error dropped

9a78506

michaelhkay mentioned this issue Apr 5, 2024

122 Choice item types (generalizing local union types) #1132

Merged

michaelhkay added the PR Pending A PR has been raised to resolve this issue label Apr 5, 2024

ndw closed this as completed in #1132 Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support general union sequence types #122

Support general union sequence types #122

rhdunn commented Aug 9, 2022 •

edited

Loading

michaelhkay commented Aug 10, 2022

rhdunn commented Aug 10, 2022

michaelhkay commented Aug 10, 2022

rhdunn commented Aug 10, 2022

rhdunn commented Aug 10, 2022

rhdunn commented Aug 10, 2022 •

edited

Loading

michaelhkay commented Sep 30, 2022

rhdunn commented Sep 30, 2022

michaelhkay commented Feb 5, 2024 •

edited

Loading

ChristianGruen commented Feb 5, 2024

michaelhkay commented Feb 5, 2024

ChristianGruen commented Feb 5, 2024

michaelhkay commented Apr 5, 2024

Support general union sequence types #122

Support general union sequence types #122

Comments

rhdunn commented Aug 9, 2022 • edited Loading

Use Case

Examples

Existing Support

Syntax

3.4 Sequence Types

3.6 Item Types

Other Changes

Text

4.22.2 Typeswitch

3.7.2 The judgement subtype-itemtype(A, B)

3.7.1 The judgement subtype(A, B)

michaelhkay commented Aug 10, 2022

rhdunn commented Aug 10, 2022

3.7.2 The judgement subtype-itemtype(A, B)

3.7.1 The judgement subtype(A, B)

michaelhkay commented Aug 10, 2022

rhdunn commented Aug 10, 2022

rhdunn commented Aug 10, 2022

rhdunn commented Aug 10, 2022 • edited Loading

michaelhkay commented Sep 30, 2022

rhdunn commented Sep 30, 2022

michaelhkay commented Feb 5, 2024 • edited Loading

ChristianGruen commented Feb 5, 2024

michaelhkay commented Feb 5, 2024

ChristianGruen commented Feb 5, 2024

michaelhkay commented Apr 5, 2024

rhdunn commented Aug 9, 2022 •

edited

Loading

rhdunn commented Aug 10, 2022 •

edited

Loading

michaelhkay commented Feb 5, 2024 •

edited

Loading