-
-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive RAM usage on startup. #52
Comments
Yes, parse5 loads HTML entities data in memory and this is a requirement for the HTML5 parsing algorithm. BTW it's 2015, how 30mb RAM usage can be called "excessive" (especially for the sandboxed VM app)? Are you running it on Apple Watch or something? =) |
@inikulin Do you have the script for the generation of the trie still lying around/could you put it in the repo? I was thinking of playing with the structure to try and reduce the memory requirement somewhat. |
@Sebmaster Seems like generator died with my previous desktop. I'm ultra-busy right now, give me a couple of days, I'll rollout the new one. The only idea regarding optimization that I have is to keep trie in the JSON form, so v8 will not generate AST representation for it. But I'm not sure we will win a lot we this approach. Morevover I have concerns about browserified version, as far as I know there is no easy way to mock FS with browserify. |
I was thinking of trying to either move it to an array-only structure or try a trie compacting algorithm (although that'd slow down lookups a tiny bit probably). |
Yeah and this is not an option. CPU performance is the priority. |
What if CPU load increases by less than 1% but RAM usage decreases by over 80% Would you still not implement a theoretical change that would change for proportions like those? |
No. Let's take a look at the problem using big O notation. Trie consturcted once at startup and never modified, so in our case it's space complexity is constant - O(1). Currently the time complexity of the trie lookup is O(m), where m is the average length of the word. If we will have time increased by 1% we will get O(1,01m). Meanwhile, if we will descrease trie size by 80% we will still get constant space complexity - O(1). Speaking clearly we will gain nothing, but we will loose in speed. Constant 22.5mb consumption (7.5mb is consumed by runtime itself for me) doesn't seems like a big deal for me nowadays. |
O(1.01m) = O(m) |
@domenic Argh, yes, my bad, up to constants. Never do the math at the morning =0. |
Taking into account my wrong math it worth making a try. |
@Sebmaster Here is the trie generator https://github.com/inikulin/parse5/tree/master/tools parse5 bootstrap consumed 15Mb for me, BTW |
@Sebmaster any progress on this? |
I have no complaints, but one of our users does, so if that user (or someone else) wants to do a PR that helps, and you're willing to review it, it seems like it would be a nice thing to do. |
I'm trying to figure out that we want to accomplish here. I mean, which memory usage can be considered non-excessive and if current memory consumption causes any real-life problems (actually it does, but it happens in quite exotic envirionments - #54 ). |
As small as possible without compromising the main objective, that is performance. |
Well, the most significant memory footprint of parse5 comes from the name entities trie. The only optimization that comes to my mind is to replace it with the Patricia tree. But I'm quite sceptical about it: it might work well with long suffixes, but it's not the case for the named entites. I don't think we can win more than 10-20%. I'm wondering if it's even worth a try. |
Reduce in-memory size of named entities data from ~8.5Mb to ~250Kb (closes #52)
I noticed parse5 uses 30mb of RAM by just being required. I managed to pin the usage and it seems its by the file lib/tokenization/named_entity_trie.js.
The text was updated successfully, but these errors were encountered: