Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type names #397

Closed
michaelhkay opened this issue Mar 14, 2023 · 16 comments
Closed

Type names #397

michaelhkay opened this issue Mar 14, 2023 · 16 comments
Labels
Editorial Minor typos, wording clarifications, example fixes, etc. Propose Closing with No Action The WG should consider closing this issue with no action XPath An issue related to XPath

Comments

@michaelhkay
Copy link
Contributor

michaelhkay commented Mar 14, 2023

The draft specifications propose the introduction of item type declarations that can associate a name with an item type. The feature probably still needs some work, which this issue aims to explore.

The main purpose of introducing named item types is that the ItemType for a record structure or a function signature can become quite complex and lengthy, and you don't want to have to repeat them every time they are used because it means you have to make the same change everywhere when a change occurs. Another motivation is to allow type definitions (for example, of records or functions) to be recursive.

I considered allowing named sequence types rather than just item types, but the rules for where you can and can't have an occurrence indicator get complicated, so I pulled back from that.

It seems natural to say:

  • Item type names are QNames
  • In XPath, type names (and their mapping to item types) appear in the static context
  • In XQuery, type names follow the conventions for global variables and function declarations. That suggests they can appear either in the main module or a library module; in a library module they must be in the namespace of the module; they can be annotated as %public or %private; an import module declaration makes the name visible in the importing module.
  • In XSLT, a name declared in a module is automatically available throughout the stylesheet package, and can be exposed to other packages using the same visibility mechanisms as other stylesheet components. However, I don't think it makes sense to allow a type name to be overridden, either using import precedence or using xsl:override.

The question then arises, should item type names be in the same "symbol space" as named atomic and union types? There seem to be several options here:

(a) Item type names are in a different symbol space from atomic types; the are no rules barring the same name being used for a named item type and an atomic type, and they are disambiguated by requiring item type names to be distinguished using some kind of marker syntax such as type(name), rather than just a bare name.
(b) Item type names are in the same symbol space as atomic types, which means there must be a rule that an item type name must not be the same as an atomic type name that is visible in the same place. We could try and define this rule for individual names, or at the level of namespaces (if there are any atomic/union types in a particular namespace in the static context of any module, then there must be no declared type names in that namespace in that module, either declared in that module or imported from another module).
(c) Atomic type names "shadow" item type names, or vice versa: if the same name is used for both, then one of them takes precedence. Probably not a good idea.

I'm inclined to go for (b). Note that a simple rule that item type names can't be in a reserved namespace will prevent conflict for all non-schema-aware applications, since those applications only access atomic types in the xs namespace.

Now, what about circular definitions?

There are legitimate circular definitions, like declare item type LIST = record(payload as item()*, next? as LIST), and there are "impossible" definitions, like declare item type THING = THING. Do we have to define the rules needed to ban "impossible" definitions, or can we just leave it that the determination of whether something is an instance of THING is non-terminating? I think we probably need to define the rules, which will require careful thought.

Where can item type names be used? The simple answer is: anywhere an ItemType is allowed. But what about contexts that only allow some ItemTypes and not others? For example, (a) "cast as", (b) as arguments of a LocalUnionType, (c) as the key type in a map type. (The solution in the current draft is that the syntax allows any ItemType to be used in these contexts, and there are semantic rules to constrain what kind of item types are allowed).

If we allow $v cast as my:X where my:X is a declared item type name, should we also allow the constructor function my:X($v)? That would presumably also mean that item type names and function names cannot overlap.

Should we define any "built-in" item type names? We've been defining built-in functions (such as build-uri and parse-uri) whose signatures use record type definitions. Should we define built-in names for these record definitions?

An editorial issue: I think it's becoming increasingly difficult to get away with overloading the word ItemType to mean both the abstract concept of an item type, and the specific BNF construct used to define it. Same for SequenceType. I think we should probably move to having a defined term "item type" and a BNF construct such as ItemTypeDesignator to represent the two separate meanings.

@dnovatchev
Copy link
Contributor

dnovatchev commented Mar 17, 2023

@michaelhkay , Thank you for submitting this.

The draft specifications propose the introduction of item type declarations that can associate a name with an item type.

I searched the XPath 4.0 Spec (https://qt4cg.org/specifications/xquery-40/xpath-40.html#id-types) but couldn't find anywhere any definition or introduction to the concept of "type name".

Could you, please, provide a link?

Also, it seems a little bit strange to introduce "type name" while we still don't have type as a first-class object (like belonging to the XDM) of the language.

So, should we be talking at length about what must be the name of something ("type") that is not a part of the language?

Could you, please, tell us what is the type of the right-hand-side of the instance of operator? Not just its syntax definition in terms of grammar rules, but what really is a type? And is instance of really an operator of the language if its RHS is something that is not a first-class object of the language?

In other programming languages, such as C#, type is a first-class object of the language and just one of its properties is its "name".

It seems that it may be premature to try to define "type name" if we haven't a good definition of "type". If type was a well-defined member of the XDM (like map and array, and, ...), or we could define a type as a specific kind of map or record, then it would be natural for this to have a name.

Aren't we putting the cart before the horse?

Please, don't get me wrong, I am not opposed to having this feature, we just need to specify this properly.

@michaelhkay
Copy link
Contributor Author

I searched the XPath 4.0 Spec (https://qt4cg.org/specifications/xquery-40/xpath-40.html#id-types) but couldn't find anywhere any definition or introduction to the concept of "type name".

Sorry, I should have provided a reference. See section 5.19 of the XQuery spec ("item type declarations") and section 5.5 of the XSLT specification ("defining named item types"). Also §2.2.1 Static Context in the XPath specification (section "item type aliases").

Also, it seems a little bit strange to introduce "type name" while we still don't have type as a first-class object

I think the two things are quite orthogonal: either can be done without the other.

It seems that it may be premature to try to define "type name" if we haven't a good definition of "type".

I think the specs are quite clear about what a type is (well, they could be clearer, of course). It's true that making types into first-class objects would give some benefits. But I's not convinced the benefit would exceed the cost in added complexity. In any case, it's a separate issue that has no bearing on this one.

@dnovatchev
Copy link
Contributor

Also §2.2.1 Static Context in the XPath specification (section "item type aliases").

Thanks, I am reading there:

"Item type aliases can be defined in XQuery 4.0 and in XSLT 4.0, but not in XPath 4.0 itself."

Why not defined in XPath?

See section 5.19 of the XQuery spec ("item type declarations")

Again, in this section ItemType is only defined as a non-terminal symbol of the language grammar, and nothing more intrinsic:

[214] ItemType ::= AnyItemTest | TypeName | KindTest | FunctionTest
         | MapTest | ArrayTest | AtomicOrUnionType
         | RecordTest | LocalUnionType | EnumerationType
         | ParenthesizedItemType

Yes, we do have a whole chapter 3. in the documents that explains Types, but these are rather verbal and thus maybe not quite strict enough.

Also, this chapter in both specs for XPath and XQuery, starts with the statement:

The type system of XQuery 4.0 is based on [XML Schema 1.0] or [XML Schema 1.1].

However, function items in general and maps, arrays, and records (and their accessors) in particular are not covered in XML Schema, and again it seems that something is missing. There is no mention of the XDM (all of its "Other Items") in this respect.

And because of this incompleteness of the XPath language spec, instance of actually cannot be defined as a true operator, because an operator has arguments that are objects of the language, but the RHS of instance of isn't a first-class object of the language and is defined just with grammar rules. So a question like: "What is the type of the right-hand-side of instance of ?" does not have an answer at present.

Also, the examples for instance of seem to contain a circular reference, for example:

  • 5 instance of xs:integer
    This example returns true because the given value is an instance of the given type.

It feels concerning that we haven't defined precisely some of the central concepts of the language.

@michaelhkay
Copy link
Contributor Author

Why not defined in XPath?

Same reason XPath doesn't have all the other paraphernalia of XQuery: it's designed to be a small expression language intended for use as a sublanguage of XSLT, XQuery, XSD, XForms, etc. If you want that kind of machinery, use XQuery.

that explains Types, but these are rather verbal and thus maybe not quite strict enough.

If you would like to try and define a more formal exposition of the type system, I'm sure that would be a great contribution. Of course at one time we had the Formal Semantics, which failed in my view for two reasons: (a) it advocated the "strict static typing" approach, which I think experience has shown is the wrong choice for XML processing, and (b) the formalisms were only understood by a small elite community, which made it difficult to maintain and meant that everything it said had to be repeated in plain English.

the examples for instance of seem to contain a circular reference

The relevant section starts "The boolean operator instance of returns true if the value of its first operand matches the [SequenceType] in its second operand, according to the rules for [SequenceType matching]; otherwise it returns false. The link is to section 3.5 where the concept is explained. The language used in explaining examples is informal.

I'd really be grateful if you could comment on this specific issue: does the proposal improve the specification, or not? Comments on other deficiencies of the type system or its exposition really aren't helpful; if you feel the spec can be improved in other areas, please make a proposal to improve it.

@dnovatchev
Copy link
Contributor

Why not defined in XPath?

Same reason XPath doesn't have all the other paraphernalia of XQuery: it's designed to be a small expression language intended for use as a sublanguage of XSLT, XQuery, XSD, XForms, etc. If you want that kind of machinery, use XQuery.

I find this particular decision - not to allow typenames to be defined in XPath -- rather arbitrary and lacking more thorough justification. It makes little sense, if any at all, to be able to define types in XPath, but only allow giving names to these, XPath-defined types in another language. Then, to do so, the same type, already defined in XPath, has to be defined once again in the outer language, just to be able to give this type a name. Should we list here all the negative impacts of such redundancy?

I would like this to be discussed by all members of the group.

@michaelhkay
Copy link
Contributor Author

I think that the typical use case for XPath is for a single application to invoke multiple XPath expressions; and those multiple XPath expressions will typically use the same types. It's not possible (well, not desirable) for one XPath expression to do anything that changes the static or dynamic context for other independently invoked XPath expressions, which is why we don't have any XPath expressions that modify the static context - the scope of such changes would be too local. We could introduce such constructs (the "with" expression has been proposed for 4.0) but they would be of no use in the typical use case I have described.

I think the idea that the static context for evaluating an XPath expression is established by some host language, and not in the expression itself, is a defining characteristic of the XPath design philosophy, and is probably the primary thing that distinguishes it architecturally from XQuery. I think it's a distinction that should be maintained; if we were to add a construct to XPath for declaring type names, there would be no good argument for not adding the whole of the XQuery prolog; in fact there would be no case for keeping XPath as a separate language.

As a practical test, I think asking the question "is this feature needed/useful in XPath expressions that are embedded in XSLT" is a good test. If it isn't (because the feature works better at the XSLT level) then I think it's usually a valid conclusion that it shouldn't be in XPath at all.

@ChristianGruen
Copy link
Contributor

I like option (b). And I agree that XPath should be kept minimal whenever possible.

@Arithmeticus
Copy link
Contributor

Unstated benefits of this proposal are that it augments the declarative apparatus of the languages, and that it makes much more tractable the task of deeply defining a map or array. Providing an example of each, when this proposal develops into a spec revision, would be useful.

@dnovatchev
Copy link
Contributor

dnovatchev commented Sep 11, 2023

I think that the typical use case for XPath is for a single application to invoke multiple XPath expressions; and those multiple XPath expressions will typically use the same types. It's not possible (well, not desirable) for one XPath expression to do anything that changes the static or dynamic context for other independently invoked XPath expressions, which is why we don't have any XPath expressions that modify the static context - the scope of such changes would be too local.

As 6 months have passed since March, let us revisit this topic.

During this time the Balisage conference happened and some people changed their strong opinions about what should and what should not be done in pure XPath. In particular, to quote:

"It is also motivated by other issues that have been raised here proposing improved capabilities for writing applications in pure XPath."

If we have a "pure XPath application", it obviously has the same need for type-names (same use cases) as applications written in XSLT and/or XQuery.

I think the idea that the static context for evaluating an XPath expression is established by some host language, and not in the expression itself, is a defining characteristic of the XPath design philosophy

In the case of a "pure XPath application" there is no "host language". Also, type-names could by design have local scope (the scope of the let - expression they are defined with), which leaves any outer scopes, including the static scope intact, if this is desired.

Here are some of the advantages of defining type-names within an XPath expression:

  • The same type-name definition will be used whenever the expression is evaluated - be it standalone or under XSLT, XQuery or any other host.
  • Stand-alone XPath applications are possible
  • This eliminates the problem (redundancy) of needing to rewrite the same type-name definition once for each different possible host (XSLT, XQuery or any other). Repeated manual effort and possibilities for errors are eliminated.
  • It becomes impossible to have the definitions of the same type-name out of sync across different hosts (XSLT, XQuery or any other) that need this type-definition.
  • Locally-scoped type-names allow different authors - in space and time - who are not aware of each other to use the same type-names without having to be concerned about potential name conflicts.

Conclusion:
Based on this I propose to provide the type-name definition capability in XPath.

@michaelhkay
Copy link
Contributor Author

Are you proposing that a type name declared in one XPath expression should be usable in other XPath expressions? How does that work? At present the only lasting effect of an XPath expression is the result it returns.

@dnovatchev
Copy link
Contributor

Are you proposing that a type name declared in one XPath expression should be usable in other XPath expressions? How does that work? At present the only lasting effect of an XPath expression is the result it returns.

What I proposed was to add the capability of defining a type-name within an XPath expression.

This in no way means to ban the ability of a host language (that can evaluate different and independent XPath expressions), to have its own type-name definition facility, that could be evaluated statically and made available in all contained XPath expressions.

Another way could be to have in XPath a keyword "global" (as in Python) that would make a type-name definition globally-accessible.

Something related is to have a variable's value be a type-name -- and this is why in my initial comments I mentioned the idea to make the type a first-class object of the language. These are not so "orthogonal" after all

@ndw
Copy link
Contributor

ndw commented Sep 12, 2023

I sometimes find the discussion threads about "pure XPath applications" confusing.

My mental model of how XPath works outside of a host language is that every expression is utterly independent. It is the host language (if there is one) that provides an environment where different XPath expressions are visible to each other.
Are there other models at play that attempt to allow multiple XPath expressions to interact? How is that model not an example of a host language?

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Sep 12, 2023

I can certainly imagine such environments. For example, Xidel, xmlstarlet, and Saxon's Gizmo all allow you to execute a sequence of XPath expressions interactively, and allow one expression (or command) to change the context for subsequent commands or expressions. In such a tool, it makes excellent sense to build the static context incrementally e.g. by declaring namespace prefixes, variables, functions -- and indeed types -- on the fly. But (a) I'm not at all sure that's what we are talking about here, and (b) if it is, then I think that standardising this kind of language is outside our scope. The whole point about XPath is that it's a core expression language that can be integrated into a wide variety of environments, and the static context is the key mechanism that architecturally separates what's in the language from what's in the calling environment.

@dnovatchev
Copy link
Contributor

I can certainly imagine such environments. For example, Xidel, xmlstarlet, and Saxon's Gizmo all allow you to execute a sequence of XPath expressions interactively, and allow one expression (or command) to change the context for subsequent commands or expressions. In such a tool, it makes excellent sense to build the static context incrementally e.g. by declaring namespace prefixes, variables, functions -- and indeed types -- on the fly. But (a) I'm not at all sure that's what we are talking about here, and (b) if it is, then I think that standardising this kind of language is outside our scope. The whole point about XPath is that it's a core expression language that can be integrated into a wide variety of environments, and the static context is the key mechanism that architecturally separates what's in the language from what's in the calling environment.

Not exactly.

The purpose of an XPath application and also of an XPath function library is to contain code that only uses the common subset of both XSLT and XQuery. As such, it has several advantages over being entirely locked in either XSLT or XQuery (or in any other host-language):

  • It is immediately executable under XSLT without needing any change
  • It is immediately executable under XQuery without needing any change
  • It is immediately executable under any other host of XPath without needing any change
  • Only a single code-base is maintained - thus no redundancy, no out-of-sync code-bases
  • A lot of maintenance labor is eliminated.
  • One doesn't need to learn N-languages and has to be good only with a single language.

As for the context, maybe we need to learn from other languages.

For example, in Python there are 4 contexts (LEGB):

  • Local
  • Enclosing
  • Global
  • Built-ins

Commonality between several independent XPath applications is achieved by the fact that all of them (import and) use the same functions and bindings from the same XPath function libraries.

If we had type as a first-class citizen of the XDM, then types and (accessed from them) type-names could also be contained in a central XPath function library and distributed to any XPath application that uses (imports) that function library, without even needing to redefine the same type in each of the XPath applications.

@michaelhkay
Copy link
Contributor Author

This thread has been open for over a year and despite extensive discussion there have been no concrete suggestions for changes to the status quo specification, so I am proposing it for closure with no action.

@michaelhkay michaelhkay added the Propose Closing with No Action The WG should consider closing this issue with no action label Apr 11, 2024
@ndw
Copy link
Contributor

ndw commented Apr 16, 2024

The CG agreed to close this issue without further action at meeting 073.

@ndw ndw closed this as completed Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Editorial Minor typos, wording clarifications, example fixes, etc. Propose Closing with No Action The WG should consider closing this issue with no action XPath An issue related to XPath
Projects
None yet
Development

No branches or pull requests

5 participants