Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive RAM usage on startup. #52

Closed
ghost opened this issue Jun 18, 2015 · 16 comments
Closed

Excessive RAM usage on startup. #52

ghost opened this issue Jun 18, 2015 · 16 comments

Comments

@ghost
Copy link

ghost commented Jun 18, 2015

I noticed parse5 uses 30mb of RAM by just being required. I managed to pin the usage and it seems its by the file lib/tokenization/named_entity_trie.js.

@inikulin
Copy link
Owner

Yes, parse5 loads HTML entities data in memory and this is a requirement for the HTML5 parsing algorithm. BTW it's 2015, how 30mb RAM usage can be called "excessive" (especially for the sandboxed VM app)? Are you running it on Apple Watch or something? =)

@Sebmaster
Copy link
Contributor

@inikulin Do you have the script for the generation of the trie still lying around/could you put it in the repo? I was thinking of playing with the structure to try and reduce the memory requirement somewhat.

@inikulin inikulin reopened this Jun 29, 2015
@inikulin
Copy link
Owner

@Sebmaster Seems like generator died with my previous desktop. I'm ultra-busy right now, give me a couple of days, I'll rollout the new one. The only idea regarding optimization that I have is to keep trie in the JSON form, so v8 will not generate AST representation for it. But I'm not sure we will win a lot we this approach. Morevover I have concerns about browserified version, as far as I know there is no easy way to mock FS with browserify.

@Sebmaster
Copy link
Contributor

I was thinking of trying to either move it to an array-only structure or try a trie compacting algorithm (although that'd slow down lookups a tiny bit probably).

@inikulin
Copy link
Owner

Yeah and this is not an option. CPU performance is the priority.

@ghost
Copy link
Author

ghost commented Jun 29, 2015

What if CPU load increases by less than 1% but RAM usage decreases by over 80% Would you still not implement a theoretical change that would change for proportions like those?

@inikulin
Copy link
Owner

What if CPU load increases by less than 1% but RAM usage decreases by over 80% Would you still not implement a theoretical change that would change for proportions like those?

No. Let's take a look at the problem using big O notation. Trie consturcted once at startup and never modified, so in our case it's space complexity is constant - O(1). Currently the time complexity of the trie lookup is O(m), where m is the average length of the word. If we will have time increased by 1% we will get O(1,01m). Meanwhile, if we will descrease trie size by 80% we will still get constant space complexity - O(1). Speaking clearly we will gain nothing, but we will loose in speed. Constant 22.5mb consumption (7.5mb is consumed by runtime itself for me) doesn't seems like a big deal for me nowadays.

@domenic
Copy link

domenic commented Jun 30, 2015

O(1.01m) = O(m)

@inikulin
Copy link
Owner

@domenic Argh, yes, my bad, up to constants. Never do the math at the morning =0.

@inikulin
Copy link
Owner

Taking into account my wrong math it worth making a try.

@inikulin
Copy link
Owner

@Sebmaster Here is the trie generator https://github.com/inikulin/parse5/tree/master/tools

parse5 bootstrap consumed 15Mb for me, BTW

@inikulin
Copy link
Owner

@Sebmaster any progress on this?
@domenic Do you have any complains about excessive RAM usage in jsdom. I mean, does even worth discussion and efforts?

@domenic
Copy link

domenic commented Jul 23, 2015

I have no complaints, but one of our users does, so if that user (or someone else) wants to do a PR that helps, and you're willing to review it, it seems like it would be a nice thing to do.

@inikulin
Copy link
Owner

I'm trying to figure out that we want to accomplish here. I mean, which memory usage can be considered non-excessive and if current memory consumption causes any real-life problems (actually it does, but it happens in quite exotic envirionments - #54 ).

@ghost
Copy link
Author

ghost commented Jul 25, 2015

As small as possible without compromising the main objective, that is performance.

@inikulin
Copy link
Owner

Well, the most significant memory footprint of parse5 comes from the name entities trie. The only optimization that comes to my mind is to replace it with the Patricia tree. But I'm quite sceptical about it: it might work well with long suffixes, but it's not the case for the named entites. I don't think we can win more than 10-20%. I'm wondering if it's even worth a try.

@inikulin inikulin added this to the 3.0 milestone Apr 20, 2016
@inikulin inikulin changed the title Excessive RAM usage. Excessive RAM usage on startup. May 18, 2016
@inikulin inikulin self-assigned this Sep 2, 2016
inikulin added a commit that referenced this issue Nov 1, 2016
Reduce in-memory size of named entities data from ~8.5Mb to ~250Kb (closes #52)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants