-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding minify option in Parser, so the parsed Document occupies less memory. #2003
Comments
I'm always interested in new ideas on how to use less memory! Can you tell me more about your idea and your use case?
One feature I've been thinking about for a while is to implement a streaming type parser that would just emit tokens / nodes, and perhaps the current stack, but not retain a full DOM. But I see this as being a difficult interface for users of the library to work with. So am interested in hearing your use case (and everyone's! Others please feel free to comment also) and what developer experience would be best. |
Thanks for the reply. Rarely I have Document that may occupies 300M. for example: comparing the parsed documents of the following two lines. the first line has 5 more empty nodes than the second: so minify the html in parsing can save memory spaces. |
Thanks -- it makes sense. Have you measured the memory impact of removing those nodes? Would you be able to share an example of the document? Please contact me directly (jonathan@hedley.net) if you can. Or, is there an example file I could use as a proxy? I have been thinking of adding a Would be keen to get real-world examples and beta testers for this functionality. |
I've built out a new feature -- StreamParser -- that should address this. Take a look at the examples in #2096. Would be great if you can give it a try. |
When parsing large html, the memory usage of the parsed Document is huge.
So maybe we can add minify option in Parser,
so the parsed Document could have less nodes, and occupies less memory.
The text was updated successfully, but these errors were encountered: