Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@sentry/node memory leaks on node 10.13.0 #1762

Closed
2 tasks done
abraxxas opened this issue Nov 22, 2018 · 53 comments · Fixed by #2108
Closed
2 tasks done

@sentry/node memory leaks on node 10.13.0 #1762

abraxxas opened this issue Nov 22, 2018 · 53 comments · Fixed by #2108

Comments

@abraxxas
Copy link

abraxxas commented Nov 22, 2018

Package + Version

  • @sentry/browser
  • @sentry/node

Version:

4.3.4

Description

I tried integrating sentry into ah next.js project. I tried it using this template https://github.com/sheerun/next.js/tree/with-sentry-fix/examples/with-sentry and discovered what seems to be a memory-leak in sentry. If you check out this project and add memwatch

if (!dev) {
  memwatch.on("leak", info => {
    console.log("memwatch::leak");
    console.error(info);
  });

  memwatch.on("stats", stats => {
    console.log("memwatch::stats");
    console.error(Util.inspect(stats, true, null));
  });
}

and bombard the server with requests, i requested the following ressources:

    "/",
    "/_next/static/r1zovjaZ1TujaA0hJEp91/pages/_app.js",
    "/_next/static/r1zovjaZ1TujaA0hJEp91/pages/_error.js",
    "/_next/static/r1zovjaZ1TujaA0hJEp91/pages/index.js",
    "/_next/static/runtime/main-1eaa6d1d0c8e7d048efd.js",
    "/_next/static/chunks/commons.b34d260fee0c4a698139.js",
    "/_next/static/runtime/webpack-42652fa8b82c329c0559.js"

With this the memory-usage grows endlessly for me. As soon as i remove the request and errorHandler from server.js the memory-leak stops. So it seems to be connected to those 2

@kamilogorek
Copy link
Member

kamilogorek commented Nov 29, 2018

@abraxxas can you help me reproduce this issue? How exactly are you triggering this memleak?
I tried your description with no luck, it just produces stats events, never leak one.

@abraxxas
Copy link
Author

abraxxas commented Nov 29, 2018

@kamilogorek Thanks for your response. What i did was the following. I checked out this example https://github.com/sheerun/next.js/tree/with-sentry-fix/examples/with-sentry and added memwatch to the server.js

if (!dev) {
  memwatch.on("leak", info => {
    console.log("memwatch::leak");
    console.error(info);
  });

  memwatch.on("stats", stats => {
    console.log("memwatch::stats");
    console.error(Util.inspect(stats, true, null));
  });
}

then i ran the example using node 10.x (with 8.x i observed no memory issues) and requested the following ressources using our gatling testsuite:

    "/",
    "/_next/static/r1zovjaZ1TujaA0hJEp91/pages/_app.js",
    "/_next/static/r1zovjaZ1TujaA0hJEp91/pages/_error.js",
    "/_next/static/r1zovjaZ1TujaA0hJEp91/pages/index.js",
    "/_next/static/runtime/main-1eaa6d1d0c8e7d048efd.js",
    "/_next/static/chunks/commons.b34d260fee0c4a698139.js",
    "/_next/static/runtime/webpack-42652fa8b82c329c0559.js"

(note the hashes might change for you)

but you should be able to achieve the same results with this approach very simple https://www.simonholywell.com/post/2015/06/parallel-benchmark-many-urls-with-apachebench/

after a few thousand requests our memory usage was at almost 1GB and never reduced even when idling. As soon as i remove the request and errorHandler from server.js the memory-leak stops. So it seems to be connected to those 2. Maybe you either had too few requests or used node 8.x?

@kamilogorek
Copy link
Member

kamilogorek commented Nov 29, 2018

@abraxxas confirmed. Node 10 + ~300req for each resource does the job. Will investigate further. Thanks!

@abraxxas
Copy link
Author

abraxxas commented Nov 29, 2018

@kamilogorek
Copy link
Member

kamilogorek commented Nov 29, 2018

@abraxxas ignore my previous comment (the one I removed), it's totally on our side :)

It appears to be the issue in Node's core itself. Please refer to this comment #1762 (comment)

@tpbowden
Copy link

tpbowden commented Dec 2, 2018

@kamilogorek Have you had chance to look into this yet? It is causing a huge memory leak for us. After looking at the heap, this line looks like it could be the root of problem:

scope.addEventProcessor(async (event: SentryEvent) => parseRequest(event, req, options)),

The inspector showed thousands of entries into the eventProcessors list
image

I don't have any context around how things are architected, but we have noticed that requests are not correctly scoped and are giving the wrong metadata (see #1773) so it appears everything is being managed in global state and not cleaned up when a request ends

@kamilogorek
Copy link
Member

kamilogorek commented Dec 4, 2018

@abraxxas @tpbowden there's an issue with leaking domain module in Node's core itself. We'll keep monitoring it and try to come up with a temporary solution before it's fixed in the core. Related issue: nodejs/node#23862

@tpbowden
Copy link

tpbowden commented Dec 10, 2018

@kamilogorek do you have any ideas for a workaround or temporary fix for this? Progress on the node issue looks quite slow

@alinchican
Copy link

alinchican commented Dec 17, 2018

We're currently using PM2 to restart Node.js processes when memory reaches a certain threshold. https://pm2.io/doc/en/runtime/features/memory-limit/#max-memory-threshold-auto-reload

@sunknudsen
Copy link

sunknudsen commented Feb 13, 2019

Using lab for unit testing. The leaks are still present. I know leaks can be painful to debug. Is there an ETA for a fix?

1 tests complete
Test duration: 1832 ms
The following leaks were detected:__extends, __assign, __rest, __decorate, __param, __metadata, __awaiter, __generator, __exportStar, __values, __read, __spread, __await, __asyncGenerator, __asyncDelegator, __asyncValues, __makeTemplateObject, __importStar, __importDefault

npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! theregulars-reviews-backend@0.0.1 test: lab build/test
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the theregulars-reviews-backend@0.0.1 test script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR! /Users/sunknudsen/.npm/_logs/2019-02-13T14_41_28_595Z-debug.log

@garthenweb
Copy link

garthenweb commented Feb 13, 2019

@sunknudsen The issue is actually fixed in NodeJS, see nodejs/node#23862 and nodejs/node#25993. We probably need to wait for a release.

@sunknudsen
Copy link

sunknudsen commented Feb 13, 2019

@garthenweb Does this affect @sentry/node because of how the package was developed? One of the projects that I am developing on hapi (and relies on many other dependencies) doesn’t produce leaks (at least they are not caught by lab).

@garthenweb
Copy link

garthenweb commented Feb 13, 2019

@sunknudsen More or less. It's the combination of the (deprecated) domain package and promises as far as I understood. See https://github.com/getsentry/sentry-javascript/blob/master/packages/node/src/handlers.ts#L233

In my case, I just removed the sentry middlewares from my (express) server to fix it.

@MartijnHols
Copy link

MartijnHols commented Mar 17, 2019

This isn't an issue in node but in Sentry as disabling the Sentry handlers fixes the issue;

image

@HazAT
Copy link
Member

HazAT commented Mar 19, 2019

@MartijnHols We are currently working on a major release that should significantly reduce the memory footprint our SDK. If you feel adventurous you could try it out #1919

@MartijnHols
Copy link

MartijnHols commented Mar 22, 2019

@HazAT Thanks, I installed it in production last night (at 23:10 in the graph) and re-enabled the handlers with the results below. There's normally a slight spike in CPU usage around 23:00-24:00 (as you can see from the previous day), but this seemed higher. The standard CPU usage is also a lot more spiky than without the handlers. Not sure if this is due to the changes in the new version or it's the handlers being enabled. I'll try disabling the handlers again in a few hours. There are about 2.5 errors caught per hour.

image

@HazAT
Copy link
Member

HazAT commented Mar 22, 2019

@MartijnHols Thanks for trying it out!

Two things to keep in mind, the memory leak fix for domains in node landed in node only recently in 11.10.
Also, we had to unpublish 5.0.0-beta1 because it was mistakenly tagged as latest, 5.0.0-rc.1 is now the latest next version.
Please try 5.0.0-rc.1 we made a small change how events are queued that should improve load/memory by a lot.

@MartijnHols
Copy link

MartijnHols commented Mar 22, 2019

Updating to node 11.12 seems to have stabilized the memory and CPU usage. It seems there isn't any discernible difference now in resource usage when comparing it to having the handlers disabled, maybe even slightly better. It seems to also catch errors fine with all the info I need (it might have more console "breadcrumbs" which is nice). Not sure what else I can check for 5.0.0. I'll let you know if I run into any issues, but probably not.

LGTM. Thanks!

@adriaanmeuris
Copy link

adriaanmeuris commented Mar 22, 2019

I'd be happy to give this a try as well. @HazAT do you know if the fix in 11.10 is already backported to the active LTS release 10.x?

@sibelius
Copy link

sibelius commented May 30, 2019

https://nodejs.org/en/blog/release/v10.16.0/

fixed some memory leaks, can someone test it?

@julienfouilhe
Copy link

julienfouilhe commented May 31, 2019

I have the same problem on node 11.10, but it seems to be fixed on Node 12.3.1

@BYK
Copy link
Contributor

BYK commented Jun 4, 2019

There's a PR open for the agent-base issue: TooTallNate/node-agent-base#25

We can either fork this and override the dependency in Yarn resolutions or find a way to get it merged.

@jardakotesovec
Copy link

jardakotesovec commented Jun 6, 2019

Might be bit offtopic, but could also cause some leaks that there is no domain cleanup in Handlers.ts, only domain.create?
Medium article or SO suggests there should be removeListeners and domain.exit. But did not find some definitive answer to that.

UPDATE: Now I see that handlers are using domain.run which internally calls domain.exit. Still removing the listeners/emitters might make difference, but honestly have no idea.

@tstirrat15
Copy link
Contributor

tstirrat15 commented Jun 19, 2019

I upgraded to Node 12.4.0 and I'm still seeing bad behavior that seems to be tied to Sentry.

Here's some node-clinic doctor runs, done with the --autocannon option over 90 seconds.

It only seems really leaky when the request handlers are in place. If you look at the bottoms of the GC troughs in the run without the handlers, they're all at about the same level (65-70mb), where the run with the handlers seems to be climbing about 5mb with each GC cycle.

@BYK
Copy link
Contributor

BYK commented Jun 20, 2019

@tstirrat15 this fix has not been released yet, so that's probably why you are still having this issue. Can you try from the latest master if that's an option?

@kamilogorek
Copy link
Member

kamilogorek commented Jun 20, 2019

@tstirrat15 5.4.2 with the fix included has been released, give it a try :)

@tstirrat15
Copy link
Contributor

tstirrat15 commented Jun 20, 2019

Here's another run with v5.4.2 in place. It still seems a bit leaky...

@kamilogorek
Copy link
Member

kamilogorek commented Jun 20, 2019

GC is always kicking in correctly and restores memory to the baseline. Memory usage increase is caused by breadcrumbs collected from the events and event queue, but it'll stop at 100 breadcrumbs and won't increase further. It'd be nice if we could see like ~15-30min dump and see whether peak memory stops at some point.

@tstirrat15
Copy link
Contributor

tstirrat15 commented Jun 20, 2019

Hmm... sounds good. I'll get this PR through to production and see if the behavior changes. Thank you!

@Madumo
Copy link
Contributor

Madumo commented Mar 23, 2020

Looks like it's related to bundling Sentry using webpack on the server - only happens when the app has been built by Webpack. Adding @sentry/node to externals has fixed our memory problem. Any idea why this happens?

@tpbowden You are right about this, I have the same issue. I was running the SDK v5.15.0 and node v12.3.1, both supposed to include all the required fixes mentioned here.

I am bundling all dependencies within my server bundle with webpack. This way I can ship a docker image without node_modules, but something is messing up the sentry SDK and it leaks memory when bundled this way.

It might be a bug made by some optimization process. My wild guess is it's probably terser. Some optimization is probably messing up the usage of the domain module, and the closure of the callback passed to scope.addEventProcessor is no longer garbage collected, so every request made is leaking a good amount of memory.

I'm also using razzle.js which is a bit behind on the webpack/terser versions, maybe it's already fixed.

This doesn't seem to be a bug on sentry's side anymore. I will continue to investigate this and open an issue where appropriate and keep this thread updated.

@kamilogorek
Copy link
Member

kamilogorek commented Mar 23, 2020

This doesn't seem to be a bug on sentry's side anymore. I will continue to investigate this and open an issue where appropriate and keep this thread updated.

Keep us posted, thank you!

@Madumo
Copy link
Contributor

Madumo commented Mar 23, 2020

@kamilogorek Could you point me where in the code the event processors callback added to the _eventProcessors array within the Scope instance are removed? I can't find it. It seems that all the requests are adding an event processor callback to this array, and they are never removed. If I would know how they are supposed to be removed, it might help me understand the bug better.

Screen Shot 2020-03-23 at 15 49 03

@Madumo
Copy link
Contributor

Madumo commented Mar 23, 2020

Or maybe it's the entire scope that is supposed to be unique and garbage collected for each request? It seems that each request are getting the same scope instance 🤔

@kamilogorek
Copy link
Member

kamilogorek commented Mar 24, 2020

Or maybe it's the entire scope that is supposed to be unique and garbage collected for each request?

That is correct. However, scope should be cloned for every new domain instance 🤔

See this call stack:

const local = domain.create();
local.add(req);
local.add(res);
local.on('error', next);
local.run(() => {
getCurrentHub().configureScope(scope =>
scope.addEventProcessor((event: Event) => parseRequest(event, req, options)),
);
next();
});

export function getCurrentHub(): Hub {
// Get main carrier (global for every environment)
const registry = getMainCarrier();
// If there's no hub, or its an old API, assign a new one
if (!hasHubOnCarrier(registry) || getHubFromCarrier(registry).isOlderThan(API_VERSION)) {
setHubOnCarrier(registry, new Hub());
}
// Prefer domains over global if they are there (applicable only to Node environment)
if (isNodeEnv()) {
return getHubFromActiveDomain(registry);
}
// Return hub that lives on a global object
return getHubFromCarrier(registry);
}

function getHubFromActiveDomain(registry: Carrier): Hub {
try {
// We need to use `dynamicRequire` because `require` on it's own will be optimized by webpack.
// We do not want this to happen, we need to try to `require` the domain node module and fail if we are in browser
// for example so we do not have to shim it and use `getCurrentHub` universally.
const domain = dynamicRequire(module, 'domain');
const activeDomain = domain.active;
// If there no active domain, just return global hub
if (!activeDomain) {
return getHubFromCarrier(registry);
}
// If there's no hub on current domain, or its an old API, assign a new one
if (!hasHubOnCarrier(activeDomain) || getHubFromCarrier(activeDomain).isOlderThan(API_VERSION)) {
const registryHubTopStack = getHubFromCarrier(registry).getStackTop();
setHubOnCarrier(activeDomain, new Hub(registryHubTopStack.client, Scope.clone(registryHubTopStack.scope)));
}
// Return hub that lives on a domain
return getHubFromCarrier(activeDomain);
} catch (_Oo) {
// Return hub that lives on a global object
return getHubFromCarrier(registry);
}
}

setHubOnCarrier(activeDomain, new Hub(registryHubTopStack.client, Scope.clone(registryHubTopStack.scope)));

@Madumo
Copy link
Contributor

Madumo commented Mar 24, 2020

Ha! I think I found something.

We use dynamicRequire:

// We need to use `dynamicRequire` because `require` on it's own will be optimized by webpack.
// We do not want this to happen, we need to try to `require` the domain node module and fail if we are in browser
// for example so we do not have to shim it and use `getCurrentHub` universally.
const domain = dynamicRequire(module, 'domain');

But when I step into the dynamicRequire code:

export function dynamicRequire(mod: any, request: string): any {
// tslint:disable-next-line: no-unsafe-any
return mod.require(request);
}

require is undefined on mod 🤯

So it enters the catch block of the getHubFromActiveDomain function and instead use getHubFromCarrier()!

Since in my setup everyting is bundled by webpack, there is probably some assumptions made on the mod object that is broken by webpack. Do you have an idea on how this could be fixed? 🤔

Recap

We use dynamicRequire:
Screen Shot 2020-03-24 at 12 05 04

mod.require is undefined:
Screen Shot 2020-03-24 at 12 20 01

What the mod object looks like:
Screen Shot 2020-03-24 at 12 20 38

We end up using getHubFromCarrier:
Screen Shot 2020-03-24 at 12 21 22

@Madumo
Copy link
Contributor

Madumo commented Mar 24, 2020

I manually patched the Hub module directly in my node_modules folder. I removed the line using the dynamicRequire and just added import domain from 'domain'; at the top of the file and... it now works perfectly! No more leaks! 🎉

Maybe the dynamicRequire hack was needed before, but is no longer needed with newer versions of webpack? 🤔

I also tried to replace :

const domain = dynamicRequire(module, 'domain');

with:

const domain = require('domain');

and it also works fine. I don't know which of those two solutions you would prefer.

Would you like me to open a PR with this fix?

@wojtekelvis
Copy link

wojtekelvis commented May 21, 2022

Hello,

trying to understand if what I have found is a next memory leak, but it seems so.

After upgrading our @sentry/node from 5.19.1 to the newest one, version 6.19.7 discovered that inside SessionFlusher constructor there is an assignment with setInterval function, but as far as I was testing it, I have never really caught any close function invocation meant for clearing initial setInterval function.
As a result after issuing less than 100k requests to my application, my memory starts to hit 700MB and almost 70% of CPU and growing exponentially. Disabling AutoSessionTracking flag completely resolves the issue.

@wojtekelvis
Copy link

wojtekelvis commented May 21, 2022

Just as a side note, but as far as I can see also, after our upgrades (even with AutoSessionTracking set to false ) our CPU is around 35% more loaded with work, is this the desired behavior? our current node.js version is 16.15.0 running on ubuntu 18, aws ec2 t3 micro.

@lforst
Copy link
Member

lforst commented May 23, 2022

Hi @wojtekelvis, the interval in the session flusher should only be created once and it should be cleaned up when the sentry client is closed. Do you happen to call Sentry.Handlers.requestHandler() continuously in your application? Can you share your stack and Sentry setup? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.