Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calabash runs out of memory in combination with streaming in saxon #60

Closed
j-maly opened this issue Oct 11, 2012 · 4 comments
Closed

Calabash runs out of memory in combination with streaming in saxon #60

j-maly opened this issue Oct 11, 2012 · 4 comments

Comments

@j-maly
Copy link

@j-maly j-maly commented Oct 11, 2012

As a part of my pipeline, I am calling an XSLT stylesheet split.xsl which uses streaming.
split.xsl consumes one larger XML file and splits it into a bunch of smaller ones through xsl:result-document.
split.xsl is in fact called repeatedly for a bunch of those larger XML files. Something like

foreach large file f
   split file f into smaller files
   save each smaller file

This process quickly eats up all free physical memory.

This is how the saving part looks like:

<p:xslt name="xslt-split" template-name="main" initial-mode="s">
    <p:input port="source">
        <p:empty/>
    </p:input>
    <p:input port="stylesheet">
        <p:document href="split.xsl"/>
    </p:input>
    <p:input port="parameters">
        <p:pipe step="prepare-params-for-xslt-split" port="result"/>
    </p:input>
</p:xslt>

<p:store name="store-xslt-result">
    <p:with-option name="href" select="concat($split-folder,'/result-', $bare-filename)"/>
</p:store>

<p:for-each name="store-xslt-secondary-results">
    <p:iteration-source>
        <p:pipe step="xslt-split" port="secondary"/>
    </p:iteration-source>
    <p:store>
        <p:with-option name="href" select="p:base-uri()"/>
    </p:store>
</p:for-each>

I am not sure what causes the problem (but I think it is not Saxon, because the streaming really works and the big document is never loaded into the memory whole in one time).

Is it because that Calabash still holds the documents coming out from the secondary port in the memory? Do you have any suggestions on how to make this process work? I can provide the complete example if need be.

@innovimax
Copy link
Contributor

@innovimax innovimax commented Nov 8, 2012

Jakub,

You should probably have a look at QuiXProc which is based on Calabash http://code.google.com/p/quixproc/

Mohamed

@j-maly
Copy link
Author

@j-maly j-maly commented Nov 8, 2012

Hi Mohamed,
I think I will, but can you tell me in advance - how would QuiXProc behave in this scenario?

@innovimax
Copy link
Contributor

@innovimax innovimax commented Nov 18, 2012

It should be able to work it

You will probably need to rewrite the pipeline to avoid some non streamable construct

Let's continue this discussion on xproc-dev ?

@ndw
Copy link
Owner

@ndw ndw commented Jan 28, 2013

I hope to support streaming in XML Calabash V2; there's nothing I can do about this in 1.x.

@ndw ndw closed this Jan 28, 2013
ndw added a commit that referenced this issue Jan 29, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants