Major optimization in parser code #307

bertfrees · 2020-05-27T11:27:36Z

This is a long-standing issue that I finally had to tackle because due to our increasingly complex pipelines it had become impossible to execute our code without throwing a huge amount of memory at it.

The problem was that when parsing a XProc step declaration, every single import would result in parsing that file plus all imports of that file, leading to the same files possibly being parsed multiple times. And when you, like us, work with libraries that collect a lot of steps that in turn depend on other libraries, you are soon dealing with tens of thousands of files being parsed. This results in a huge memory consumption before the execution of the pipeline has even started.

Actually I'm a bit surprised that no one else has noticed the issue before. Nevertheless I think my fix will be interesting for others too.

In the end the solution that I came up with is pretty simple: I cache the PipelineLibrary objects. But in order to be able to do this I had to move some things around. I hope you like it.

by caching PipelineLibrary objects.

see ndw#307

ndw · 2020-10-11T17:09:58Z

Applied to version 1.2.5 for Saxon 9.9 and Saxon 10.x

bertfrees · 2020-10-11T17:17:28Z

Thanks!

bertfrees added 2 commits May 27, 2020 13:00

Cleanup: remove some unused code

b3f7a5b

Don't parse same XProc file twice

281f0dc

by caching PipelineLibrary objects.

bertfrees added a commit to daisy/xmlcalabash1 that referenced this pull request Aug 19, 2020

Major optimization in parser code

e675a4f

see ndw#307

ndw closed this Oct 11, 2020

bertfrees deleted the parser-optimization branch March 4, 2021 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major optimization in parser code #307

Major optimization in parser code #307

bertfrees commented May 27, 2020

ndw commented Oct 11, 2020

bertfrees commented Oct 11, 2020

Major optimization in parser code #307

Major optimization in parser code #307

Conversation

bertfrees commented May 27, 2020

ndw commented Oct 11, 2020

bertfrees commented Oct 11, 2020