New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong base-uri(/*) returned with Saxon 9.7 and 9.8 under certain conditions #281

Open
gimsieke opened this Issue Oct 1, 2018 · 3 comments

Comments

Projects
None yet
3 participants
@gimsieke
Contributor

gimsieke commented Oct 1, 2018

While many of our and our customers’ pipelines could be migrated from Calabash 1.1.15 with Saxon 9.6 to Calabash 1.1.21 with Saxon 9.8, I noticed a regression in a specific project. After hours of debugging, I managed to reproduce it with a minimal example.

The source in this example, Untitled2.xml, is

<?xml version="1.0" encoding="UTF-8"?>
<doc xml:base="file:/foo/bar.xml">
  <foo/>
</doc>

The pipeline, Untitled4.xpl, is

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" 
  xmlns:cx="http://xmlcalabash.com/ns/extensions"
  xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0" name="mystep">

  <p:input port="source" primary="true"/>
  <p:output port="result" primary="true"/>

  <p:import href="http://xmlcalabash.com/extension/steps/library-1.0.xpl"/>
  
  <cx:message>
    <p:with-option name="message"
      select="'before:   base-uri(): ',   base-uri(),
                     ',  /*/@xml:base: ', /*/@xml:base,
                     ',  base-uri(/*): ', base-uri(/*)"/> 
  </cx:message>
  
  <p:xslt name="xslt">
    <p:input port="parameters">
      <p:empty/>
    </p:input>
    <p:input port="stylesheet">
      <p:document href="Untitled3.xsl"/>
    </p:input>
  </p:xslt>
  
  <cx:message>
    <p:with-option name="message"
      select="' after:   base-uri(): ',   base-uri(),
                     ',  /*/@xml:base: ', /*/@xml:base,
                     ',  base-uri(/*): ', base-uri(/*)"/> 
  </cx:message>
  
</p:declare-step>

The XSLT, Untitled3.xsl, is:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:math="http://www.w3.org/2005/xpath-functions/math"
  exclude-result-prefixes="xs math"
  version="3.0">
  
  <xsl:template match="node() | @*">
    <xsl:copy>
      <xsl:apply-templates select="@*, node()" mode="#current"/>
    </xsl:copy>
  </xsl:template>
  
  <xsl:template match="foo">
    <xsl:result-document href="f">
      <xsl:copy-of select="."/>
    </xsl:result-document>
  </xsl:template>
  
  <xsl:template match="@xml:base"/>
  
</xsl:stylesheet>

What happens during the transformation is that /*/@xml:base is removed, and /doc/foo is sent to the secondary port by an xsl:result-document instruction.

Invoking it with Calabash 1.1.22 with Saxon 9.8 or Calabash 1.1.19 with Saxon 9.7 like this:

java -jar xmlcalabash-1.1.22-98.jar -i source=Untitled2.xml Untitled4.xpl

gives the same incorrect results:

Message: before:   
  base-uri(): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled2.xml,
  /*/@xml:base: file:/foo/bar.xml,
  base-uri(/*): file:/foo/bar.xml
Message:  after: 
  base-uri(): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled2.xml,
  /*/@xml:base: ,
  base-uri(/*): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled3.xsl
<doc>

</doc>

It is incorrect because the result does not have an /*/@xml:base attribute any more and therefore base-uri(/*) should be the same as base-uri(). But base-uri(/*) is now the URI of the XSLT file. (It is not necessarily the URI of the XSLT file that contains the xsl:result-document instruction. In this example, it is, because there is only a single XSLT file.)

The correct output, produced with the Saxon-9.6 versions of XML Calabash 1.1.15 or 1.1.19, is:

Message: before:   
  base-uri(): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled2.xml,
  /*/@xml:base: file:/foo/bar.xml,
  base-uri(/*): file:/foo/bar.xml
Message:  after:  
  base-uri(): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled2.xml,
  /*/@xml:base: ,
  base-uri(/*): file:/C:/cygwin/home/gerrit/…/bugreport_gerrit_2018-10-01/Untitled2.xml
<doc>

</doc>

It doesn’t matter that the attached XSLT is version 3.0, the same error occurs with 2.0.

@ndw

This comment has been minimized.

Show comment
Hide comment
@ndw

ndw Oct 2, 2018

Owner

There’s a lot of complex behavior going on here (thank you 1.0e6 for the small, focused test case), the relevant bit of code is in XSLT.java:

// Before Saxon 9.8, it was possible to simply set the base uri of the
// output document. That became impossible in Saxon 9.8, but I still
// think there might be XProc pipelines that rely on the fact that the
// base URI doesn't change when processed by XSLT. So we're doing it
// the hard way.
TreeWriter fixbase = new TreeWriter(runtime);
fixbase.startDocument(document.getBaseURI());
fixbase.addSubtree(xformed);
fixbase.endDocument();
xformed = fixbase.getResult();

For some reason, that doesn’t work for your stylesheet. Deep in the guts of the TinyTree implementation, there’s a systemIdMap with two entries in it, Untitled2.xml and Untitled3.xsl, and the second one is used.

In the course of misunderstanding the issue at first, I discovered that you can “fix” this bug by adding an explicit template for the document node to your stylesheet:

<xsl:template match="/">
  <xsl:copy>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>

With that explicit copy, the systemIdMap has only a single value, Untitled2.xml.

Is that enough of a workaround for you?

(I’ll pass this along to Saxonica, but I have no idea if it’s a bug or not.)

Owner

ndw commented Oct 2, 2018

There’s a lot of complex behavior going on here (thank you 1.0e6 for the small, focused test case), the relevant bit of code is in XSLT.java:

// Before Saxon 9.8, it was possible to simply set the base uri of the
// output document. That became impossible in Saxon 9.8, but I still
// think there might be XProc pipelines that rely on the fact that the
// base URI doesn't change when processed by XSLT. So we're doing it
// the hard way.
TreeWriter fixbase = new TreeWriter(runtime);
fixbase.startDocument(document.getBaseURI());
fixbase.addSubtree(xformed);
fixbase.endDocument();
xformed = fixbase.getResult();

For some reason, that doesn’t work for your stylesheet. Deep in the guts of the TinyTree implementation, there’s a systemIdMap with two entries in it, Untitled2.xml and Untitled3.xsl, and the second one is used.

In the course of misunderstanding the issue at first, I discovered that you can “fix” this bug by adding an explicit template for the document node to your stylesheet:

<xsl:template match="/">
  <xsl:copy>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>

With that explicit copy, the systemIdMap has only a single value, Untitled2.xml.

Is that enough of a workaround for you?

(I’ll pass this along to Saxonica, but I have no idea if it’s a bug or not.)

@ndw

This comment has been minimized.

Show comment
Hide comment
@ndw
Owner

ndw commented Oct 2, 2018

@raducoravu

This comment has been minimized.

Show comment
Hide comment
@raducoravu

raducoravu Oct 3, 2018

@ndw Maybe this issue is connected to this one: #255

raducoravu commented Oct 3, 2018

@ndw Maybe this issue is connected to this one: #255

gimsieke added a commit to transpect/xproc-util that referenced this issue Oct 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment