Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse the output of thin_dump line by line #19

Closed
wants to merge 1 commit into from

Conversation

AnchorCat
Copy link

For large volumes, the output of thin_dump can be tens of megabytes in
size, resulting in enormous memory usage when attempting to parse the
entire document as XML at once. This solution is less elegant, but it
gets the job done much more efficiently by regex matching one line at a
time.

This pull request contains a reimplementation of the code changed in [0],
so I will be closing that pull request shortly.

[0] #18

For large volumes, the output of thin_dump can be tens of megabytes in
size, resulting in enormous memory usage when attempting to parse the
entire document as XML at once. This solution is less elegant, but it
gets the job done much more efficiently by regex matching one line at a
time.
@mpalmer
Copy link
Owner

mpalmer commented Sep 3, 2014

While I didn't consider the non-trivial thin_dump case, and rexml/document isn't the right solution here, I'm loathe to throw regexes at this problem. I think that REXML::StreamListener should work reasonably for this. I'd definitely accept a modified patch using that class, or the REXML SAX2 API if you're feeling masochistic.

@mpalmer mpalmer closed this Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants