Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with catalog resolver since v1.5.4 (or xmlresolver 2.1.1->4+) #345

Closed
nkutsche opened this issue May 23, 2023 · 3 comments
Closed

Comments

@nkutsche
Copy link

Hi,

I'm observing different behavior using different combinations of Calabash/xmlresolver versions. Not sure which changes was made intentionally and which has to be fixed. So I'm just posting what I see.

I've created a Maven based GitHub project to reproduce the behavior on the CI.

The Module

I've packaged a jar containing three files:

  1. /catalog.xml (on top level that Calabash finds it)
  2. /xpl/module.xpl
  3. /txt/unparsed-text.txt

The catalog contains something like this:

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
    <rewriteURI uriStartString="http://www.data2type.de/" rewritePrefix="./"/>
</catalog>

The module.xpl is a library, containing one p:declared-step which just do:

<p:declare-step type="d2t:module">
    <cx:message>
        <p:with-option name="message" select="'static-base-uri: ' || static-base-uri()"/>
        <p:input port="source">
            <p:empty/>
        </p:input>
    </cx:message>
    <cx:message>
        <p:with-option name="message" select="'unparsed-text-available: ' || unparsed-text-available('../txt/unparsed-text.txt')"/>
    </cx:message>
    <p:sink/>
</p:declare-step>

The unparsed text contains just some character data.

The Main Project

Now I try to include the module via catalog URI into a main.xpl:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
    xmlns:d2t="http://www.data2type.de"
    xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">
    
    <p:import href="http://www.data2type.de/xpl/module.xpl"/>
    
    <d2t:module/>
    
</p:declare-step>

I call Calabash with the module jar file in the classpath and I get different output by different calabash or xmlresolver versions:

Calabash 1.5.3 with xmlresolver 4.6.0

Message: static-base-uri: http://www.data2type.de/xpl/module.xpl
Message: unparsed-text-available: true

See GitHub logs

Calabash 1.5.4 with xmlresolver 5.1.2

Message: static-base-uri: http://www.data2type.de/xpl/module.xpl
Message: unparsed-text-available: false

See GitHub logs

Calabash 1.5.3 with xmlresolver 2.1.1

Message: static-base-uri: jar:file:/home/runner/.m2/repository/de/data2type/xml-resolver-issues-module/1.0.0-SNAPSHOT/xml-resolver-issues-module-1.0.0-SNAPSHOT.jar!/xpl/module.xpl
Message: unparsed-text-available: true

See GitHub logs

Calabash 1.5.4 with xmlresolver 2.1.1

Message: static-base-uri: jar:file:/home/runner/.m2/repository/de/data2type/xml-resolver-issues-module/1.0.0-SNAPSHOT/xml-resolver-issues-module-1.0.0-SNAPSHOT.jar!/xpl/module.xpl
Message: unparsed-text-available: true

See GitHub logs

Summary:

  • If I use xmlresolver version 4+ (as recommended) the static URI of the module.xpl is no more resolved if the module is packaged in a jar archive. (This happens since Calabash v1.4.0.)
  • Since Calabash v1.5.4 the resolving of unparsed text files by the catalog seems to be broken. Using older xmlresolver seems to be a workaround.
@nkutsche
Copy link
Author

Ahh, the same old story of URLs and their lookups into the web...!

Calabash 1.5.3 & xmlresolver 4.6.0 returns Message: unparsed-text-available: true but the unparsed text which is returned comes from the HTTP request, like:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://www.data2type.de/txt/unparsed-text.txt">here</a>.</p>
</body></html>

So it turns out, that my first intention was correct, that my problems with the catalog resolver was introduced by the version 1.4.0 or by xmlresolver 4.2.0. I extended the module.xpl and added new CI steps for 1.4.0 and 1.3.2 to make it clear.

The root cause seems to be that the static base uri is no more resolved if the imported module is inside a jar archive. But maybe the main issue is, that the unparsed text resolver ignores the catalog?

@ndw
Copy link
Owner

ndw commented Aug 26, 2023

What we have here is a nasty confluence of issues.

Part of the problem is that a new resolver option was added Mask jar URIs. Because that option is true by default, the URI returned from the catalog when we attempt to lookup the text file is http://www.data2type.de/txt/unparsed-text.txt. If you set that option to false, then the jar: URI for the text file will be returned and everything works the way things used to work.

But why mask them? Well, it turns out that java.net.URI doesn't consider a jar: URI to be hierarchical. Consider the case where you resolve a document to a jar: URI. If that document contains a relative URI reference, then it will be resolved against the base URI, but because java.net.URI doesn't do the right thing with jar: URIs, that will fail. And unless you put the jar URIs in the catalog, they won't get resolved correctly by the catalog.

So the idea behind masking the jar URIs is a compromise. You get back the original URI but the content from the local resource.

Unfortunately, XML Calabash wasn't using the XML Resolver for unparsed text URIs. That meant you got back the wrong thing even though, in this case, the catalog would resolve things correctly.

I think I've fixed this issue for Saxon 10, 11, and 12.

@ndw
Copy link
Owner

ndw commented Aug 26, 2023

Please reopen this issue if it isn't resolved in 1.5.7.

@ndw ndw closed this as completed Aug 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants