Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Is it possible to get the parse results lazily? #18

Closed
airblade opened this Issue Jun 1, 2011 · 3 comments

Comments

Projects
None yet
3 participants

airblade commented Jun 1, 2011

I'd like to use sax-machine to parse pretty big XML documents. Its SAX nature is appealing but it seems that it builds all the output at once in memory, instead of "streaming" the results out. For example:

class Document
  include SAXMachine
  # etc
end

# other classes etc

records = Document.parse File.read(large_xml_file)
records.each do |record|
  # etc
end

At the moment, if I understand sax-machine correctly, the parsing step parses the entire document there and then. From a memory point of view, this seems to negate the benefit of using a SAX parser.

Instead I would like to keep memory down by parsing one record at a time in the enumeration. Is this possible?

Contributor

archiloque commented Jul 4, 2011

You could probably subclass SAXHandler#end_element and replace the part where it setts the element in the parent by a callback call with the element in paramameter, what do you think of it ?

airblade commented Jul 5, 2011

Hmm, wouldn't that still lead to large memory consumption?

Anyway, shortly after opening this issue, I found an article by Greg Weber on this exact topic. He solved the problem with fibers and the code is now in ezkl's fork.

@airblade airblade closed this Jul 5, 2011

Contributor

archiloque commented Jul 5, 2011

I think it would achieve the same thing as Greg done except I thought about using callbacks instead of simply relying on enumerators, I'll send the article to Paul to see if he is interested, thanks for the link !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment