New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calabash runs out of memory in combination with streaming in saxon #60

Closed
j-maly opened this Issue Oct 11, 2012 · 4 comments

Comments

Projects
None yet
3 participants
@j-maly

j-maly commented Oct 11, 2012

As a part of my pipeline, I am calling an XSLT stylesheet split.xsl which uses streaming.
split.xsl consumes one larger XML file and splits it into a bunch of smaller ones through xsl:result-document.
split.xsl is in fact called repeatedly for a bunch of those larger XML files. Something like

foreach large file f
   split file f into smaller files
   save each smaller file

This process quickly eats up all free physical memory.

This is how the saving part looks like:

<p:xslt name="xslt-split" template-name="main" initial-mode="s">
    <p:input port="source">
        <p:empty/>
    </p:input>
    <p:input port="stylesheet">
        <p:document href="split.xsl"/>
    </p:input>
    <p:input port="parameters">
        <p:pipe step="prepare-params-for-xslt-split" port="result"/>
    </p:input>
</p:xslt>

<p:store name="store-xslt-result">
    <p:with-option name="href" select="concat($split-folder,'/result-', $bare-filename)"/>
</p:store>

<p:for-each name="store-xslt-secondary-results">
    <p:iteration-source>
        <p:pipe step="xslt-split" port="secondary"/>
    </p:iteration-source>
    <p:store>
        <p:with-option name="href" select="p:base-uri()"/>
    </p:store>
</p:for-each>

I am not sure what causes the problem (but I think it is not Saxon, because the streaming really works and the big document is never loaded into the memory whole in one time).

Is it because that Calabash still holds the documents coming out from the secondary port in the memory? Do you have any suggestions on how to make this process work? I can provide the complete example if need be.

@innovimax

This comment has been minimized.

Show comment
Hide comment
@innovimax

innovimax Nov 8, 2012

Contributor

Jakub,

You should probably have a look at QuiXProc which is based on Calabash http://code.google.com/p/quixproc/

Mohamed

Contributor

innovimax commented Nov 8, 2012

Jakub,

You should probably have a look at QuiXProc which is based on Calabash http://code.google.com/p/quixproc/

Mohamed

@j-maly

This comment has been minimized.

Show comment
Hide comment
@j-maly

j-maly Nov 8, 2012

Hi Mohamed,
I think I will, but can you tell me in advance - how would QuiXProc behave in this scenario?

j-maly commented Nov 8, 2012

Hi Mohamed,
I think I will, but can you tell me in advance - how would QuiXProc behave in this scenario?

@innovimax

This comment has been minimized.

Show comment
Hide comment
@innovimax

innovimax Nov 18, 2012

Contributor

It should be able to work it

You will probably need to rewrite the pipeline to avoid some non streamable construct

Let's continue this discussion on xproc-dev ?

Contributor

innovimax commented Nov 18, 2012

It should be able to work it

You will probably need to rewrite the pipeline to avoid some non streamable construct

Let's continue this discussion on xproc-dev ?

@ndw

This comment has been minimized.

Show comment
Hide comment
@ndw

ndw Jan 28, 2013

Owner

I hope to support streaming in XML Calabash V2; there's nothing I can do about this in 1.x.

Owner

ndw commented Jan 28, 2013

I hope to support streaming in XML Calabash V2; there's nothing I can do about this in 1.x.

@ndw ndw closed this Jan 28, 2013

ndw added a commit that referenced this issue Jan 29, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment