Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pubsuffix.js using a lot of memory #48

Closed
luanmuniz opened this issue Jul 26, 2015 · 31 comments
Closed

pubsuffix.js using a lot of memory #48

luanmuniz opened this issue Jul 26, 2015 · 31 comments

Comments

@luanmuniz
Copy link

Hi Guys,

I had a memory leak in my application and debugging it i found a few problems.
beside the leak in question i found that pubsuffix.js is taking a lot of memory allocation.

I don`t know if i have read the logs correctly but it is a big allocation in arrays and strings.

Tracking problems i found the following issues that i think are related:
request/request#792
#12

Is there anyway we can improve the code?
if you guys want, i can send a heapshoot from my application

@inikulin
Copy link
Contributor

Hi,

Can you provide the numbers, please? Is it leaking or just constantly consuming some huge amount of memory?

@luanmuniz
Copy link
Author

I think the later. Can i send the heap shot to your email (The one at your profile page)?

@Myztiq
Copy link

Myztiq commented Jul 31, 2015

So, I wrote a quick little script to try to figure this out because it came up in my tests for memory leak.

https://github.com/Myztiq/cookie-leak

I got here from bunyan-loggly -> loggly -> reuqest -> tough-cookie.

Please let me know if there is anything obvious I might be missing here.

@luanmuniz
Copy link
Author

@Myztiq i got pretty much the same path from request to tough-cookie
and i think your test show it pretty nice. The Array is full and that never go away.

Thank you!

@Myztiq
Copy link

Myztiq commented Aug 1, 2015

It looks like a super tiny memory leak though. But that might add up.

@Myztiq
Copy link

Myztiq commented Aug 1, 2015

On second thought, my cookie in the test is a=b when I change the cookie to be a more reasonable 2024 characters it leaks 1.4mb over 1 million calls. Up from .3mb over 10 million with cookie of a=b

@luanmuniz
Copy link
Author

I have almost 13mb in an application that have 80mb on total, just at the start (without start running the heavy calls)

@Myztiq
Copy link

Myztiq commented Aug 1, 2015

With memwatch I actually found removing the console logs I had in there for progress decreased the memory usage more than removing the item I was testing, this leads me to believe memwatch is flawed. I have then noticed memwatch is incredibly old and out of date.

On further review with node-inspector I am not seeing the memory leak. I took a profile before running 10 million times, then took another afterwards and saw no noticeable difference, we are talking <.3MB for 10 million iterations!!! And that .3MB was made up of the timeouts used to keep node alive for the profiling and had nothing to do with this plugin, all the memory used was within node native modules.

To verify you can run it from my branch I made to test it. https://github.com/Myztiq/cookie-leak/tree/node-inspector Instructions are in the readme.md

@luanmuniz
Copy link
Author

I run your 2 branches in node-inspector. I can see the leak at master but i could not found the leak at the node-inspector branch.

Here is the heapsnapshoot.
https://s3-us-west-2.amazonaws.com/open-souce-assets/20150801T125416.heapsnapshot

Look at the 3 first at (array) and the first one at (string).
They have the same size from my application, but instead of 2 itens i have 8

@Myztiq
Copy link

Myztiq commented Aug 1, 2015

I think it's just a problem of a single large object that pubsuffix.js creates. I pre-warm the cache in both by making a fake call before starting profiling, but it depends on when you hit it from node-inspector in master.

@luanmuniz Also, if you remove console.logs throughout the code that I have you will notice the leak actually gets significantly better from memwatch, as in, almost half of the leak is resolved.

@Myztiq
Copy link

Myztiq commented Aug 1, 2015

I'd be pretty surprised if there was in fact a leak because this is a core dependency of request and that is used by quite a few people, all their machines would be leaking memory and they would end up here months ago. I even tried creating my own cookie store that just ignored whatever it was passed and returned immediately in the master branch and there was no effect on the leak.

@luanmuniz
Copy link
Author

The snapshot i sent is from the master branch at almost 90%.

Well, it can be request but i found the (array) with a lot of itens using node-inspector at the master branch, so im not sure.

@Myztiq
Copy link

Myztiq commented Aug 1, 2015

@luanmuniz Yes, but did you snapshot it when it was sitting at 100% long enough to garbage collect? While it's pounding the machine as fast as possible making new objects I am sure you will encounter a high amount of memory churn, but the problem will be when the churn is over, did the memory go back to normal. If it doesn't thats a leak, if it does its just a high memory intensive application.

The problem is when it doesn't go back down, if it doesn't we know that the system won't be stable for extended periods of time.

@luanmuniz
Copy link
Author

I have just sent the PR above with a forced GC. and i got the same result :(
But what you said before about pubsuffix.js create a large object at the beginning got my interest, i will make some tests here related to that too.

@stash-sfdc
Copy link
Contributor

Hey folks. I finally have a bit of time to look at this today. I'll try to review what's been found and see if I can reproduce it.

@stash-sfdc
Copy link
Contributor

So, I made this script and played with node-inspector:

var tough;

var N = 1000000;

console.log('waiting');
process.stdin.once('data', load);

function load() {
  tough = require('./lib/cookie.js');
  console.log('loaded, waiting');
  process.stdin.once('data', phaseOne);
}

function parseABunch(seed) {
  var result = seed;
  for (var i = 0; i < N; i++) {
    var numbers = ("" + Math.random()).replace(/^\d\./,'');
    var cookie = tough.parse('a='+numbers+'; domain=xyz.google.com; path=/');
    result ^= parseInt(cookie.value, 10);
  }
  return result;
}

function phaseOne() {
  var result = parseABunch(42);
  console.log('done '+N+' parses, waiting (ignore:' + result + ')');
  process.stdin.once('data', phaseTwo);
}

function phaseTwo() {
  var result = parseABunch(4242);
  console.log('done '+N+' parses, waiting (ignore:' + result + ')');
  process.stdin.once('data', done);
}

function done() {
  console.log('exiting', tough);
  process.exit(0);
}

Here's as screenshot of the memory profile:

screenshot removed

Loading the library is the big spike, the first million parses is the tiny blip around ~11s. The second parse of a million cookies (and prompt to hit a key) only adds 4.1KB to the heap.

As far as I can tell, looking at retainers, loading tough cookie seems to retain 670KB. I think almost half of that is the source code (lib/pubsuffix.js) and another almost half is the data structure in memory (strings and symbols of the big hashtable object). I guess I could cut that in half if I loaded from JSON instead of a static lib, but I'm not convinced yet this is worth it.

Meta: there seems to be a new MIT-licensed public suffix library on the block: https://github.com/wrangr/psl -- wonder how this would change things (it uses the JSON-loading technique, not a source-generator technique like tough cookie does)

@stash-sfdc
Copy link
Contributor

lol, wrong screenshot

@stash-sfdc
Copy link
Contributor

here we go:

screen shot 2015-09-10 at 4 43 06 pm

@stash-sfdc
Copy link
Contributor

@luanmuniz @Myztiq can you folks still reproduce this? Is there something significantly different from the script I used and the ones you used to test? Something to do with Logly or request?

@Myztiq
Copy link

Myztiq commented Sep 11, 2015

@stash-sfdc There is no memory leak for me as I mentioned earlier in the thread. Essentially there is a big memory usage upfront but no leak. It was just showing up as top on my stack because it was getting destroyed and re-created in my system so it showed up as a new big memory add (I later found the big memory remove).

The node-inspector branch that I have linked to above does not show a memory leak. 👍

@owenallenaz
Copy link

I came here with a similar idea as other people who have reached this point. I think the reason we believe pubsuffix.js is the culprit for a lot of memory is that we see it multiple times within a heapdump analysis. For our application this is entirely caused by the fact that a lot of major NPM packages include tough-cookie (by way of request) and each one loads it's own version of pubsuffix due to node_modules dependency model. Collectively they add up. In newer versions of NPM this is resolved using npm dedupe which is now the standard with npm install which causes each package to be installed at the highest level it can according to the tag associated with each module.

@coty-crg
Copy link

Adding to this. In Node v0.12.7, my app has a huge memory leak. After a few minutes of a few thousand people hitting it it will die from memory consumption. Node-inspector lead me to pubsuffix.js, which led me to this page. If I hit any route, including one serving static content through express, the memory leak occurs.

I upgraded to Node v4.x and I don't see the problem, which is weird. Hopefully this helps narrow down the issue.

@bradisbell
Copy link

I upgraded to Node v4.x and I don't see the problem, which is weird. Hopefully this helps narrow down the issue.

For me in either Node v4.2.2 or v0.12.7, I have the exact same problem with a ton of repeated memory consumption by pubsuffix.js. Memory usage is comparable for both versions.

@luanmuniz
Copy link
Author

Hi Guys, sorry that took so long to respond here.
Just by upgrading from v0.10.28 to v4.2.1 the problem is gone.
Just so you guys can see, this is the memwatch diff for v.10.28

{ before:
   { nodes: 67533,
     time: Wed Nov 18 2015 09:47:34 GMT-0200 (BRST),
     size_bytes: 9585920,
     size: '9.14 mb' },
  after:
   { nodes: 4031655,
     time: Wed Nov 18 2015 09:51:45 GMT-0200 (BRST),
     size_bytes: 669053552,
     size: '638.06 mb' },
  change:
   { size_bytes: 659467632,
     size: '628.92 mb',
     freed_nodes: 6706,
     allocated_nodes: 3970828,
     details: [...] } }
{ what: 'Array',
  size_bytes: 127078160,
  size: '121.19 mb',
  '+': 342345,
  '-': 2106 }
{ what: 'Buffer',
  size_bytes: 2783760,
  size: '2.65 mb',
  '+': 57995,
  '-': 0 }
{ what: 'BufferList',
  size_bytes: 112,
  size: '112 bytes',
  '+': 1,
  '-': 0 }
{ what: 'Caseless',
  size_bytes: 192,
  size: '192 bytes',
  '+': 2,
  '-': 0 }

And this is for v4.2.1

{ before: { nodes: 78877, size_bytes: 11975308, size: '11.42 mb' },
  after: { nodes: 79333, size_bytes: 11867820, size: '11.32 mb' },
  change:
   { size_bytes: -107488,
     size: '-104.97 kb',
     freed_nodes: 5604,
     allocated_nodes: 6060,
     details: [...] } }
{ what: 'Array',
  size_bytes: 123608,
  size: '120.71 kb',
  '+': 2321,
  '-': 1941 }
{ what: 'ArrayBuffer',
  size_bytes: 0,
  size: '0 bytes',
  '+': 1,
  '-': 1 }
{ what: 'Closure',
  size_bytes: 6696,
  size: '6.54 kb',
  '+': 108,
  '-': 15 }
{ what: 'Code',
  size_bytes: -427904,
  size: '-417.88 kb',
  '+': 918,
  '-': 1432 }

This is the same application, the only difference is the node version and i had to use memwatch-next in v4.2.1

@ianpogi5
Copy link

@luanmuniz were you able to deploy in production? how? as of today's release, Meteor 1.3 only supports node v0.10.41. Deploying to a node 4.x.x or higher will result to a fibers@1.0.5 dependency error.

@luanmuniz
Copy link
Author

@ianpogi5 The module i was using was request but now i'm using got in replacement.
https://www.npmjs.com/package/got

got doesn't use this module, so this problem does not exist there.

@rfink
Copy link

rfink commented Jun 30, 2016

Having this issue as well - node 4.2.4, also node 4.4.7

@marko-jankovic
Copy link

I have the same problem in v6.6.0.

What is the purpose of pubsuffix and why you need to use 8k domains such as: google, yahoo, yandex, zipoo??? amazon etc.

Thank you!

@stash-sfdc
Copy link
Contributor

@marko-jankovic the use of public suffixes is important for security. Without them, different domains could alter each others cookies.

@yocontra
Copy link

yocontra commented Mar 9, 2018

Can that file be imported lazily so its only using memory when it is actually needed (when either permuteDomain or getPublicSuffix are called)? This should be a really straightforward fix IMO, unless I'm missing something it seems like you can avoid loading it in most cases. This is a common pattern for things like this.

@stash
Copy link
Collaborator

stash commented Jul 30, 2018

tough-cookie moved to psl so probably this isn't an issue anymore. If it is, perhaps this is an improvement that could be made to psl itself?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests