Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplifying the language - types have behaviour. #899

Closed
MarkNicholls opened this issue Dec 13, 2023 · 9 comments
Closed

Simplifying the language - types have behaviour. #899

MarkNicholls opened this issue Dec 13, 2023 · 9 comments
Labels
Abandoned PR was rejected, withdrawn, or superseded Enhancement A change or improvement to an existing feature Propose Closing with No Action The WG should consider closing this issue with no action XSLT An issue related to XSLT

Comments

@MarkNicholls
Copy link

MarkNicholls commented Dec 13, 2023

I may misunderstand something but I always find the use of types and "as" to be counter intuitive (I'd prefer to be able to run an xslt 3+ script in some sort of 'strict' mode that was a bit more rigid, but thus simpler) e.g.

consider

      <xsl:variable name="foo1">
         <foo/>
      </xsl:variable>

question - what is the type of foo1?
answer - (according to my saxon/oxygen setup the answer is) "document-node"

consider

      <xsl:variable name="foo2" as="element(foo)">
         <foo/>
      </xsl:variable>

question - is this code valid then (I would as someone not used to xslt 2+ assume not)?
answer - yes

but surely this code is identical to foo1, so the 'type' of variable is actually changing the interpretation of the expression.

For me that's quite confusing

It would appear that these 2 values are not two different views (interfaces) of the same underlying value, else this

<xsl:variable name="foo3" as="element(foo)" select="$foo1"/>

would be valid.

It isnt (i.e. this doesn't appear to be some subtle OO style scenario where an evaluation can have multiple interfaces, here 'document-node' and 'element' are presumably disjoint types).

For me conceptually types are descriptions of expressions, they have no behaviour, yet here they appear to (to me) effect the interpretation, not simply describe it.

For me, I'd prefer a 'strict' mode where either

      <xsl:variable name="foo2" as="element(foo)">
         <foo/>
      </xsl:variable>

is a type error, because the expression is clearly a document-node OR some other mechanism to clarify the ambiguity without this conceptual wrinkle.

An expression should either have 1 interpretation, or if its ambigious, that should be an error, I don't think the language should default to prefer one over another.

P.S.
why doesnt this work? I genuinely don't know how to explicitly declare something as a document-node.

  <xsl:variable name="foo1" as=document-node()>
     <foo/>
  </xsl:variable>
@ChristianGruen ChristianGruen added XSLT An issue related to XSLT Enhancement A change or improvement to an existing feature labels Dec 13, 2023
@michaelhkay
Copy link
Contributor

The use of the "as" attribute on xsl:variable to inhibit construction of a document node (or result tree fragment) was a difficult and not very attractive solution to a backwards compatibility problem with XSLT 1.0. Indeed, the presence of the attribute changes the semantics of the expression, which isn't nice, but it's the kind of thing you sometimes have to do for compatibility reasons. This was hotly debated at the time (I still think of it as "Issue 99"), and I think it's unlikely that we can now come up with any improvements that preserve compatibility.

To construct a document node, you can use

<xsl:variable name="foo1" as="document-node()">
    <xsl:document>
       <foo/>
    </xsl:document>
</xsl:variable>

(which will in fact construct a document node whether or not the "as" attribute is present).

@MarkNicholls
Copy link
Author

MarkNicholls commented Dec 13, 2023

Backwards compatability was my suspicion, and in fact its something I've been very grateful for.

This was why I wondered if it was sensible to have a 'strict' mode, where these wrinkles were ironed out.
'non strict' mode would effectively be a preprocess to running the standard canonical 'strict' format (easy for me to say, I don't have to implement it, but conceptually that seems attractive) the canonical format can then be simplified and enhanced, and the legacy handled as an optional add on outside of the core definitions.

the language would have fewer wrinkles, and unexpected semantics.

Visual basic used to have 'strict' and 'non strict' mode that I think did something similar (though for different reasons) and haskell has all sorts of pragma options to extend and modify the interpretation of the language (though again for different reasons), i expect there are multple other examples.

P.S.

I havent ever considered putting xsl:document inside an element, but it all becomes clear, thanks.

@michaelhkay
Copy link
Contributor

michaelhkay commented Dec 14, 2023

We could certainly attempt to introduce a strict mode, in which the types of all variables and parameters must be declared. I fear that it would be unpopular, requiring unwanted verbosity such as <xsl:variable name="one" select="1" as="xs:integer"/>, and explicit addition of the <xsl:document> instruction in cases where wrapping temporary trees in a document node is wanted behaviour.

Another bit of tidying up we could attempt would be to add a coercion rule such that if the required type is document-node(), the supplied value is wrapped in a document node. This would mean that omitting the "as" attribute on an <xsl:variable> with no select attribute would be equivalent to specifying as="document-node()", and constructs like

<xsl:variable name="doc" as="document-node()">
  <e/>
</xsl:variable>

would become legal rather than throwing a type error. But we would need to check for any adverse effect on XQuery.

It's not clear that any changes in this area would really deliver any benefit. It's very hard in these cases to make the language less complex; fixes for problems like this have a nasty tendency to make the rules more complex rather than less complex.

@MarkNicholls
Copy link
Author

MarkNicholls commented Dec 14, 2023

Visual Basic strict mode did just that, but actually whilst I can see that may have some value as a canonical format, its not really the sweet spot for me, as you say, I fear that would be unpopular.

The wrinkle for me is not the lack of explicit typing, but the same expression having different behaviour it it is "assigned" a different type, and the automatic defaulting of this type when this scenario arises.

I'm perfectly happy with <xsl:variable name="one" select="1"/>and I wouldnt want to see that as an error, I think all valid type assignments to this expression would yield the same interpretation?

I tend to approach XSLT in 2 modes

a) I have a legacy XSLT 1.0 stylesheet that needs a change, that change warrants using XSLT 3.0 but I don't want the massive task of a significant upgrade, so strict = false.
b) I have a green field XSLT, which I choose to do in XSLT 3.0 (really by default) but get tripped up by type rules.

I would never actively want a completely strict XSLT where everything is has to be declared.

My suspicion though that this is a bigger ask, I suspect my element/document issue may be the tip of an iceberg?

I notice I used "and" above in my description of the gripe, so various options would be

  1. everything needs a declaration,
  2. expressions that could have multiple types that impact behaviour are an error (and thus require an explicit type)
  3. those expressions that currently fit into this fuzzy zones are 'cleaned up', their interpretation is formalised and uniquely defined (if valid).

options 1 is easy I presume, but painful and probably unpopular.
option 2 is harder, something has to detect the ambiguous scenarios and raise the error, but I think I would use this mode if I were writing green field code today.
option 3 is actually where I genuinely think you should aim at, maybe as an aspiration, backwards compatibility is a very valuable thing, but I personally think there are too many wrinkles, and these wrinkles are are a significant obstacle to learning the language and for an intermediate developer, using it (I did waste 30 mins today wrestling with just this issue and ended up undoing some change and starting again because I could work out what types my expressions were and why).

@michaelhkay
Copy link
Contributor

Unless we can translate this issue into something that is a specific proposal to change the specifications in a particular way, I think we should probably close it.

@ChristianGruen ChristianGruen added the Propose Closing with No Action The WG should consider closing this issue with no action label Jan 3, 2024
@MarkNicholls
Copy link
Author

My specific gripe (if you like) is the element/document ambiguity, its quite confusing even to a relatively middling XSLT developer (me), you effectively have to force the XSLT engine to 'cast' the data into the intended type using what looks like a type declaration, and that in itself feels very unnatural.

So my understanding is literal elements inside an xsl:variable construct are implied to be document-nodes (even now I'm not sure), and ideally they would always be elements, unless there was some sort of explicit document-node element construct.

I think this would break backwards compatibility to xslt 1.0?....so the only way to do it would be to have some sort of 'strict' mode,

Whether that is worthwhile is debatable, I would turn it on, but thats not an overwhelming argument.
If there are other wrinkles in the language that exist for backwards compatibility, but actually in 'normal' usage are better turned off...then it would seem to be a reasonable way to separate a clean utopian spec, from the ugly reality of having to cope with legacy.

my view is that of someone with relative (compared to everyone else here) little exposure to the language, it IS quite a difficult language to master.

@michaelhkay
Copy link
Contributor

My specific gripe (if you like) is the element/document ambiguity,

Indeed, this causes users a lot of problems. See for example https://stackoverflow.com/questions/53023985/whats-the-rationale-behind-result-tree-fragments and many other SO issues if you search for "XSLT result tree fragment".

We all know this is a problem, the question is whether anyone is able to come up with a better solution than the one we have now.

David Wheeler is credited with the phrase "backwards compatibility means deliberately repeating other people's mistakes".

It would be lovely to get rid of the implicit document node creation in xsl:variable, but it is used so widely that a switch to disable it would be very rarely used. Equally, a switch to require all variables to have an "as" attribute would be widely ignored. And in particular, such facilities would be ignored by all those people who fall into the trap because they are unaware that the trap exists. So I don't hold out much hope.

@MarkNicholls
Copy link
Author

fair enough.

the very lowest thing, which is nothing to do with the spec really is that this is flagged as a warning.
If i can configure my saxon engine to make this warning terminal, then i would.

@ndw
Copy link
Contributor

ndw commented Jan 15, 2024

As the discussion shows, this is a complex problem. No solution has materialized that appears, on balance, to be an improvement over the status quo. At meeting 060, the group decided to close this issue without further action.

@ndw ndw closed this as completed Jan 15, 2024
@michaelhkay michaelhkay added the Abandoned PR was rejected, withdrawn, or superseded label Apr 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Abandoned PR was rejected, withdrawn, or superseded Enhancement A change or improvement to an existing feature Propose Closing with No Action The WG should consider closing this issue with no action XSLT An issue related to XSLT
Projects
None yet
Development

No branches or pull requests

4 participants