Parse the output of thin_dump line by line #19

AnchorCat · 2014-06-02T06:04:44Z

For large volumes, the output of thin_dump can be tens of megabytes in
size, resulting in enormous memory usage when attempting to parse the
entire document as XML at once. This solution is less elegant, but it
gets the job done much more efficiently by regex matching one line at a
time.

This pull request contains a reimplementation of the code changed in [0],
so I will be closing that pull request shortly.

[0] #18

For large volumes, the output of thin_dump can be tens of megabytes in size, resulting in enormous memory usage when attempting to parse the entire document as XML at once. This solution is less elegant, but it gets the job done much more efficiently by regex matching one line at a time.

mpalmer · 2014-09-03T02:53:52Z

While I didn't consider the non-trivial thin_dump case, and rexml/document isn't the right solution here, I'm loathe to throw regexes at this problem. I think that REXML::StreamListener should work reasonably for this. I'd definitely accept a modified patch using that class, or the REXML SAX2 API if you're feeling masochistic.

AnchorCat mentioned this pull request Jun 2, 2014

Fix the block maps for data ranges #18

Closed

mpalmer closed this Jan 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse the output of thin_dump line by line #19

Parse the output of thin_dump line by line #19

AnchorCat commented Jun 2, 2014

mpalmer commented Sep 3, 2014

Parse the output of thin_dump line by line #19

Parse the output of thin_dump line by line #19

Conversation

AnchorCat commented Jun 2, 2014

mpalmer commented Sep 3, 2014