Excessive RAM usage on startup. #52

ghost · 2015-06-18T15:19:12Z

I noticed parse5 uses 30mb of RAM by just being required. I managed to pin the usage and it seems its by the file lib/tokenization/named_entity_trie.js.

inikulin · 2015-06-18T19:58:51Z

Yes, parse5 loads HTML entities data in memory and this is a requirement for the HTML5 parsing algorithm. BTW it's 2015, how 30mb RAM usage can be called "excessive" (especially for the sandboxed VM app)? Are you running it on Apple Watch or something? =)

Sebmaster · 2015-06-19T07:31:07Z

@inikulin Do you have the script for the generation of the trie still lying around/could you put it in the repo? I was thinking of playing with the structure to try and reduce the memory requirement somewhat.

inikulin · 2015-06-29T11:05:40Z

@Sebmaster Seems like generator died with my previous desktop. I'm ultra-busy right now, give me a couple of days, I'll rollout the new one. The only idea regarding optimization that I have is to keep trie in the JSON form, so v8 will not generate AST representation for it. But I'm not sure we will win a lot we this approach. Morevover I have concerns about browserified version, as far as I know there is no easy way to mock FS with browserify.

Sebmaster · 2015-06-29T11:39:46Z

I was thinking of trying to either move it to an array-only structure or try a trie compacting algorithm (although that'd slow down lookups a tiny bit probably).

inikulin · 2015-06-29T11:42:32Z

Yeah and this is not an option. CPU performance is the priority.

ghost · 2015-06-29T22:13:05Z

What if CPU load increases by less than 1% but RAM usage decreases by over 80% Would you still not implement a theoretical change that would change for proportions like those?

inikulin · 2015-06-30T08:52:37Z

What if CPU load increases by less than 1% but RAM usage decreases by over 80% Would you still not implement a theoretical change that would change for proportions like those?

No. Let's take a look at the problem using big O notation. Trie consturcted once at startup and never modified, so in our case it's space complexity is constant - O(1). Currently the time complexity of the trie lookup is O(m), where m is the average length of the word. If we will have time increased by 1% we will get O(1,01m). Meanwhile, if we will descrease trie size by 80% we will still get constant space complexity - O(1). Speaking clearly we will gain nothing, but we will loose in speed. Constant 22.5mb consumption (7.5mb is consumed by runtime itself for me) doesn't seems like a big deal for me nowadays.

domenic · 2015-06-30T09:09:55Z

O(1.01m) = O(m)

inikulin · 2015-06-30T09:15:39Z

@domenic Argh, yes, my bad, up to constants. Never do the math at the morning =0.

inikulin · 2015-06-30T09:24:12Z

Taking into account my wrong math it worth making a try.

inikulin · 2015-06-30T12:08:44Z

@Sebmaster Here is the trie generator https://github.com/inikulin/parse5/tree/master/tools

parse5 bootstrap consumed 15Mb for me, BTW

inikulin · 2015-07-23T13:10:02Z

@Sebmaster any progress on this?
@domenic Do you have any complains about excessive RAM usage in jsdom. I mean, does even worth discussion and efforts?

domenic · 2015-07-23T16:01:13Z

I have no complaints, but one of our users does, so if that user (or someone else) wants to do a PR that helps, and you're willing to review it, it seems like it would be a nice thing to do.

inikulin · 2015-07-24T08:10:36Z

I'm trying to figure out that we want to accomplish here. I mean, which memory usage can be considered non-excessive and if current memory consumption causes any real-life problems (actually it does, but it happens in quite exotic envirionments - #54 ).

ghost · 2015-07-25T01:31:55Z

As small as possible without compromising the main objective, that is performance.

inikulin · 2016-01-11T19:51:04Z

Well, the most significant memory footprint of parse5 comes from the name entities trie. The only optimization that comes to my mind is to replace it with the Patricia tree. But I'm quite sceptical about it: it might work well with long suffixes, but it's not the case for the named entites. I don't think we can win more than 10-20%. I'm wondering if it's even worth a try.

…loses #52)

Reduce in-memory size of named entities data from ~8.5Mb to ~250Kb (closes #52)

inikulin closed this as completed Jun 18, 2015

inikulin reopened this Jun 29, 2015

inikulin mentioned this issue Jun 29, 2015

Problem running in Nashorn (JVM) #54

Closed

inikulin added enhancement performance requires-investigation hard labels Aug 11, 2015

inikulin added this to the 3.0 milestone Apr 20, 2016

RReverser mentioned this issue May 18, 2016

Streaming implementation buffers text infinitely #130

Closed

inikulin changed the title ~~Excessive RAM usage.~~ Excessive RAM usage on startup. May 18, 2016

inikulin removed the requires-investigation label Jun 6, 2016

inikulin self-assigned this Sep 2, 2016

inikulin added a commit that referenced this issue Oct 29, 2016

Reduce in-memory size of named entities data from ~8.5Mb to ~250Kb (c…

92efdac

…loses #52)

inikulin added a commit that referenced this issue Oct 29, 2016

Reduce in-memory size of named entities data from ~8.5Mb to ~250Kb (c…

328e67c

…loses #52)

inikulin added a commit that referenced this issue Oct 31, 2016

Reduce in-memory size of named entities data from ~8.5Mb to ~250Kb (c…

8b050ce

…loses #52)

inikulin closed this as completed in 9fef5eb Nov 1, 2016

inikulin added a commit that referenced this issue Nov 1, 2016

Merge pull request #162 from inikulin/gh52

85ba050

Reduce in-memory size of named entities data from ~8.5Mb to ~250Kb (closes #52)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive RAM usage on startup. #52

Excessive RAM usage on startup. #52

ghost commented Jun 18, 2015

inikulin commented Jun 18, 2015

Sebmaster commented Jun 19, 2015

inikulin commented Jun 29, 2015

Sebmaster commented Jun 29, 2015

inikulin commented Jun 29, 2015

ghost commented Jun 29, 2015

inikulin commented Jun 30, 2015

domenic commented Jun 30, 2015

inikulin commented Jun 30, 2015

inikulin commented Jun 30, 2015

inikulin commented Jun 30, 2015

inikulin commented Jul 23, 2015

domenic commented Jul 23, 2015

inikulin commented Jul 24, 2015

ghost commented Jul 25, 2015

inikulin commented Jan 11, 2016

Excessive RAM usage on startup. #52

Excessive RAM usage on startup. #52

Comments

ghost commented Jun 18, 2015

inikulin commented Jun 18, 2015

Sebmaster commented Jun 19, 2015

inikulin commented Jun 29, 2015

Sebmaster commented Jun 29, 2015

inikulin commented Jun 29, 2015

ghost commented Jun 29, 2015

inikulin commented Jun 30, 2015

domenic commented Jun 30, 2015

inikulin commented Jun 30, 2015

inikulin commented Jun 30, 2015

inikulin commented Jun 30, 2015

inikulin commented Jul 23, 2015

domenic commented Jul 23, 2015

inikulin commented Jul 24, 2015

ghost commented Jul 25, 2015

inikulin commented Jan 11, 2016