Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak #134

Closed
mattmcla opened this issue Mar 22, 2014 · 103 comments
Closed

Memory Leak #134

mattmcla opened this issue Mar 22, 2014 · 103 comments
Labels

Comments

@mattmcla
Copy link

I've been hunting memory leaks for the past 4 days but this one stumped me for a bit. While the application was idle, it was consuming a consistant amount of memory. That's when I noticed new relic appeared to be crashing and not recovering.

Logs:
https://gist.github.com/mattmcla/958c26fb8e8374981016

packages:

"dependencies": {
"express": "~3.2.5",
"jade": "~0.35.0",
"versionator": "~0.4.0",
"oauth": "~0.9.10",
"pg": "~2.8.2",
"hat": "0.0.3",
"knox": "~0.8.8",
"mime": "~1.2.11",
"slugify": "~0.1.0",
"dateutil": "~0.1.0",
"newrelic": "~1.4.0",
"orchestrate": "0.0.4",
"bluebird": "~1.0.4",
"dotenv": "~0.2.4",
"less-middleware": "~0.2.0-beta"
},
"engines": {
"node": "0.10.22",
"npm": "1.3.x"
}

It's only been a hour or so since I removed new relic but I'm not seeing any leaky behavior.

@rwky
Copy link

rwky commented Mar 24, 2014

I've been experiencing the same issue

@groundwater
Copy link
Contributor

Hi @mattmcla we definitely do not want the module causing memory leaks. Our tracing module patches node core pretty deeply, so it's totally possible that we're causing a leak.

I would like to gather a bit more information here. It is expected that overhead will increase when using our module, but are you seeing a steady growth of memory over time? I have tried to reproduce the leak locally based on descriptions, but have been unable to show any significant growth of memory, even when placing a high load on our sample apps.

I am fairly sure we're capable of causing a leak, I just don't have enough information to reproduce the issue. If you have some sample code, I would be happy to try running it.

Alternatively, if you are able to core-dump the leaky process, I should be able to inspect the core for leaks. This would only work with Linux or SmartOS cores.

I hope this isn't causing too much extra overhead on your application, but I really appreciate any help you can provide here.

@mattmcla
Copy link
Author

mem

Here's a chart from you guys that outlines what was going on. As you can see after each restart memory consumption would go through the roof. The app consumes 42M just after start and typically settles at around 82M per instance after it's warmed up. With NR installed it would just consume memory until Heroku would restart the application, the site would get sluggish, etc. All the things that speak to a memory leak.

Our application is very new at this point and there are long periods of idle time but during those times memory would just get consumed and never let go.

Here is how we're kicking our express app off:

https://gist.github.com/mattmcla/b82b064a639efa4b7e00

Other than that it's a pretty straight forward expressjs app. I did try using it with just one CPU (still using cluster but overriding the numCPUs to 1) to the same effect. As for getting a core dump, that would be a bit of work as we're on Heroku. I'll look into it if it's necessary.

What I can tell you, is with new relic removed, we're running smooth.

@mattmcla
Copy link
Author

Also, in my first comment I linked to a gist of our logs. It does show a stack trace coming from the new relic module. When those errors occur is when the memory starts climbing until there is no memory left.

@groundwater
Copy link
Contributor

@mattmcla this is great information, thank you!

The errors in those logs are enough to suggest that there is a leaky way we're handling re-connection attempts. This may or may not be related to SSL, but given that this coincides with 1.4.0, I'm going to start looking there.

I still need to generate a reproducible case, but I believe I have enough information to go on. If you have any other information, logs or code samples I would gladly accept them. You can email me directly at jacob@newrelic.com.

@rwky if you have any logs to share, I would also love to see them.

Sorry for the spikes! I hope we can get this sorted out soon, and I appreciate all the detailed information you're providing.

@mattmcla
Copy link
Author

Cool, yeah, I can replicate it every time, but it sometimes takes an hour before the failure happens. It lead me astray a number of times. Let me know if there is anything else I can do.

@rwky
Copy link

rwky commented Mar 25, 2014

@groundwater here's our trace:

New Relic for Node.js was unable to start due to an error:
 Error: socket hang up
 at createHangUpError (http.js:1472:15)
 at CleartextStream.socketCloseListener (http.js:1522:23)
 at CleartextStream.EventEmitter.emit (events.js:117:20)
 at tls.js:696:10
 at node_modules/newrelic/node_modules/continuation-local-storage/node_modules/async-listener/glue.js:177:31
 at process._tickDomainCallback (node.js:459:13)
 at process.<anonymous> (node_modules/newrelic/node_modules/continuation-local-storage/node_modules/async-listener/index.js:18:15)

The same error is repeated throughout the logs, it only happened since upgrading to 1.4.0 (I've now pinned it at 1.3.2) and it takes a couple of hours to see a problem.

@groundwater
Copy link
Contributor

I just wanted to update everyone here with that's been going on this last week.

We are focusing in on the socket hang up error that certain customers have been experiencing, but it has been difficult to reproduce. This may be intimately related to the network where the application runs, so it may be situational. Through some artificial means, I have been able to trigger the above hang up errors intermittently however, and I am using that to debug the problem. I will try to keep this thread updated.

This may or may not be related to the memory leak issues reported; I don't yet have a sample app that shows the same kind of leaks. Given the lack of data in this area, I'm focusing on fixing the socket error first. Once that's cleared, I will be diving deeper into the memory leak issue.

In the mean time, if anyone has an app they can share which reliably leaks memory with the newrelic module installed, I would be grateful for a solid repro case.

@rwky
Copy link

rwky commented Apr 1, 2014

Our app is far too large (and confidential) to share but once you've fixed the socket hang up I can upgrade new relic and see if the issue persists.

@mattmcla
Copy link
Author

mattmcla commented Apr 1, 2014

Our app is our secret sauce so it's not something I'm willing to share. As mentioned by rwky, if you fix the socket hang up issue I'll be more than happy to give NR another shot.

@groundwater
Copy link
Contributor

@rwky @mattmcla much appreciated! I'll keep you updated.

@sebastianhoitz
Copy link
Contributor

I just wanted to jump in and say that we are experiencing the same issues.
Memory consumption is consistently getting bigger.

Edit: We are running on dedicated servers. No Heroku or similar platform.

@brettcave
Copy link

We've also experiencing this issue. Our express app dies with the following error message:

Allocation failed - process out of memory

We've just adopted NR into our node app. I'm busy trying to replicate the issue. Previously, we've been running our application for over a year, and it has been stable in terms of memory and CPU usage and general availability. Since adding in NR, we have noticed instability.

We're using express 2.5.x, jade 0.25.0 and nr 1.3.x with node 0.10.x and a handful of other modules. We're running in AWS, and using NodeJS clustering.

  • Update: I've just run some small load tests against our app: 1 with NR and 1 without. Difference in memory and CPU and memory is negligible, and I was not able to replicate the out of memory error.

@brettcave
Copy link

System memory usage per host. Hosts change after each deployment. We deployed NR integration on March 31. The app crashed after about 54h with Allocation failed error.

nodejs-memoryusage

@brettcave
Copy link

have just boosted nodejs memory from 512m for more stability, with --max-old-space-size=1024

edit I have upgraded to NR 1.4.0, and it's more unstable than 1.3.x: crashes within 20 minutes of starting up the application consistently.

@groundwater
Copy link
Contributor

We just released version 1.5.0 of the newrelic module, which includes a fix for the socket hangup error I've noticed in a number of user logs.

Unfortunately I do not know if this was causing the memory leak or not. I very much believe we have caused a leak somewhere, but I don't know where yet. We have not been able to reproduce the problem, which makes tracking it down difficult.

I would be grateful to anyone here would can try version 1.5.0 and let us know if the memory leak is still happening. At the very least, we will know definitively that it is not related to the socket hangup error. At best, we will have fixed the leak issue.

Thanks again for your wonderful support!

@brettcave
Copy link

Hi groundwater,

Thanks for the release. I've been running 1.5.0 for over 5 hours now and have not had a crash. 1.4.0 was crashing within 20 minutes, so on first look, the new version looks a lot more stable. I am going to add some load to the app and see how it handles over the next couple of days.

@rwky
Copy link

rwky commented Apr 12, 2014

Unfortunately we're still experiencing memory leaks with 1.5.0. The socket hangup has been fixed though.

@mattmcla
Copy link
Author

Memory still leaks with 1.5.0

mem

This happened well after the memory started leaking

07:35:48.416  2014-04-13 14:35:48.217255+00:00 app web.1     - - {"name":"newrelic","hostname":"62cfab7c-9e03-4611-ba47-c92a5acc3305","pid":8,"component":"collector_api","level":50,"err":{"message":"write ECONNRESET","name":"Error","stack":"Error: write ECONNRESET\n    at errnoException (net.js:901:11)\n    at Object.afterWrite (net.js:718:19)","code":"ECONNRESET","errno":"ECONNRESET","syscall":"write"},"msg":"Calling metric_data on New Relic failed unexpectedly. Data will be held until it can be submitted:","time":"2014-04-13T14:35:48.199Z","v":0}
{"name":"newrelic","hostname":"62cfab7c-9e03-4611-ba47-c92a5acc3305","pid":8,"level":30,"err":{"message":"write ECONNRESET","name":"Error","stack":"Error: write ECONNRESET\n    at errnoException (net.js:901:11)\n    at Object.afterWrite (net.js:718:19)","code":"ECONNRESET","errno":"ECONNRESET","syscall":"write"},"msg":"Error on submission to New Relic (data held for redelivery):","time":"2014-04-13T14:35:48.217Z","v":0}
{"name":"newrelic","hostname":"62cfab7c-9e03-4611-ba47-c92a5acc3305","pid":8,"component":"collector_api","level":50,"err":{"message":"write ECONNRESET","name":"Error","stack":"Error: write ECONNRESET\n    at errnoException (net.js:901:11)\n    at Object.afterWrite (net.js:718:19)","code":"ECONNRESET","errno":"ECONNRESET","syscall":"write"},"msg":"Calling metric_data on New Relic failed unexpectedly. Data will be held until it can be submitted:","time":"2014-04-13T14:35:48.228Z","v":0}
{"name":"newrelic","hostname":"62cfab7c-9e03-4611-ba47-c92a5acc3305","pid":8,"level":30,"err":{"message":"write ECONNRESET","name":"Error","stack":"Error: write ECONNRESET\n    at errnoException (net.js:901:11)\n    at Object.afterWrite (net.js:718:19)","code":"ECONNRESET","errno":"ECONNRESET","syscall":"write"},"msg":"Error on submission to New Relic (data held for redelivery):","time":"2014-04-13T14:35:48.229Z","v":0}

In the picture you can see the line plateau. This is due to throttling on Heroku. All of this also happened during the night and we were receiving no traffic. It's also worth noting we're running 4 instances on 2 dyno's.

@jmdobry
Copy link

jmdobry commented Apr 14, 2014

My Node app would run stable in memory in the 70-80 MB range. The last couple of weeks I've noticed that the memory usage has started to grow steadily until it caps the server, requiring a restart of the app. I've been debugging memory leaks for days, banging my head against this wall. I just recently tried removing newrelic-node from my app, and memory is stable again.

newrelic_agent.log looks fine except for the following error which appears periodically:

{"name":"newrelic","hostname":"www","pid":13499,"component":"collector_api","level":50,"err":{"message":"socket hang up","name":"Error","stack":"Error: socket hang up
  at createHangUpError (http.js:1472:15)
  at CleartextStream.socketCloseListener (http.js:1522:23)
  at CleartextStream.EventEmitter.emit (events.js:117:20)
  at tls.js:692:10
  at /var/www/app/node_modules/newrelic/node_modules/continuation-local-storage/node_modules/async-listener/glue.js:177:31
  at process._tickCallback (node.js:415:13)
  ","code":"ECONNRESET"},"msg":"Calling metric_data on New Relic failed unexpectedly. Data will be held until it can be submitted:","time":"2014-04-14T17:04:23.872Z","v":0}
{"name":"newrelic","hostname":"www","pid":13499,"level":30,"err":{"message":"socket hang up","name":"Error","stack":"Error: socket hang up
  at createHangUpError (http.js:1472:15)
  at CleartextStream.socketCloseListener (http.js:1522:23)
  at CleartextStream.EventEmitter.emit (events.js:117:20)
  at tls.js:692:10
  at /var/www/app/node_modules/newrelic/node_modules/continuation-local-storage/node_modules/async-listener/glue.js:177:31
  at process._tickCallback (node.js:415:13)
  ","code":"ECONNRESET"},"msg":"Error on submission to New Relic (data held for redelivery):","time":"2014-04-14T17:04:23.873Z","v":0}

@othiym23
Copy link
Contributor

@jmdobry that issue, at least, is fixed in v1.5.0 of New Relic for Node, which we released on Friday (see @groundwater's comment upthread), but for at least a few people this hasn't fixed the memory leak issue, which means the two probably aren't correlated. We're actively investigating this issue, but it's only happening for some people, and we haven't been able to reproduce the problem locally. Sorry, and thanks for your report and your patience while we figure this out!

@jmdobry
Copy link

jmdobry commented Apr 14, 2014

@othiym23 Good to hear about the socket hangup. Still have the leak though. For what it's worth, the leak seemed worse the more verbose the logging level was.

@brettcave
Copy link

we're also still seeing the memory leak + instability (though not sure whether it's related to socket hangup). We're using a local port monitor that restarts the service when it stops responding (3s timeout), and we start experiencing non-responsiveness within an hour. However, we run 2 Node apps side by side, and only 1 of the apps ever becomes unresponsive, even though they have similar architectures.

Here's some info that might help with reproducing. As with others, we are unfortunately unable to share our app, but maybe something in here can be used to help narrow down the issue. Some libraries that are in the unstable app, but not in the stable app:

  • handlebars
  • events
  • requirejs
  • prettyjson
  • mandrill-api

Middleware configured in unstable app, not in stable app:

  • i18n (0.3.x)

The graph below shows host memory usage with NR enabled until about 5:30pm, after which we disabled it and restarted the services due to unresponsiveness.

newrelic-memusage

@sebastianhoitz
Copy link
Contributor

Just want to jump in here and say that we are also using

  • handlebars
  • i18n

Do the other people experiencing this also use these modules?

On Tue, Apr 15, 2014 at 11:11 AM, Brett cave notifications@github.comwrote:

we're also still seeing the memory leak + instability (though not sure
whether it's related to socket hangup). We're using a local port monitor
that restarts the service when it stops responding (3s timeout), and we
start experiencing non-responsiveness within an hour. However, we run 2
Node apps side by side, and only 1 of the apps ever becomes unresponsive,
even though they have similar architectures.

Here's some info that might help with reproducing. As with others, we are
unfortunately unable to share our app, but maybe something in here can be
used to help narrow down the issue. Some libraries that are in the unstable
app, but not in the stable app:

  • handlebars
  • events
  • requirejs
  • prettyjson
  • mandrill-api

Middleware configured in unstable app, not in stable app:

  • i18n (0.3.x)

The graph below shows host memory usage with NR enabled until about
5:30pm, after which we disabled it and restarted the services due to
unresponsiveness.

[image: newrelic-memusage]https://cloud.githubusercontent.com/assets/129494/2705296/b7084846-c47b-11e3-9991-7e3dc2c00d27.png


Reply to this email directly or view it on GitHubhttps://github.com//issues/134#issuecomment-40460516
.

Sebastian Hoitz

Geschäftsführer, Entwickler, Coffee 2 Code converter

komola GmbH
Rebenring 33
38106 Braunschweig

Telefon: +49 531 3804200
Mobil: +49 175 2517038

Geschäftsführer: Sebastian Hoitz, Thomas Schaaf
Amtsgericht Braunschweig | HRB 201595

@jfeust
Copy link

jfeust commented Apr 18, 2014

We were in communcation with @groundwater as soon as 1.4.0 was release because we were experiencing this same memory leak. We've been running 1.3.2 due to this issue. I just want to throw our voice in that this is a major issue for us because business is starting to require RUM data. I just tried 1.5.0 and still have the leak. In about an hour we are reaching the heroku dyno limit.

"dependencies": {
"analytics-node": "0.5.0",
"connect-flash": "0.1.1",
"express": "3.4.6",
"jade": "~1.3.1",
"logfmt": "0.18.1",
"mongodb": "1.4.x",
"mongoose": "3.8.3",
"request": "2.30.0",
"underscore": "1.5.2",
"newrelic": "1.3.2",
"express-cdn": "https://registry.npmjs.org/express-cdn/-/express-cdn-0.1.9.tgz"
},

We're running over https on heroku and doing a lot of https api requests using the request http client. We're also using cluster.

@jfeust
Copy link

jfeust commented Apr 18, 2014

@sebastianhoitz nope, we're not using handlebars or i18n and are still experiencing the memory leak with 1.5.0

@groundwater
Copy link
Contributor

I think there is a memory leak, but we still haven't been able to reproduce it. I think the fastest path to a solution from here is getting our hands on some hard evidence.

I completely understand if you cannot share you app, but perhaps there are other solutions.

  1. We can do a screen share, and go hunting. There are several directions we can take here depending on what you're comfortable with.
  2. You can send a heapdump of the application after it's been leaking memory for a while (the longer the better)
  3. You can send a Linux or SmartOS core dump, which we can inspect using Joyent's handy mdb tool.

If you'd like to email me directly at jacob@newrelic.com we can talk about details.

Thanks to everyone for their great help so far. It sucks when we cause problems for your apps, and we really really appreciate you helping us fix these things.

@secretfader
Copy link

I just finished preparing heapdumps that should help resolve this issue. I'm emailing them to you, @groundwater, and the New Relic support team.

@secretfader
Copy link

The latest theory that I've heard indicates that the memory leak has something to do with wrapping MongoDB queries. I created a simple proof-of-concept app that seems to verify this. If anyone has changes or tweaks that might help them shake out this bug, feel free to fork it.

https://github.com/nicholaswyoung/new-relic-leak

@groundwater
Copy link
Contributor

@nicholaswyoung I ran your demo app, and drove 500k requests to it. I did not get a memory leak. Neither memwatch nor my external metrics indicating any problems, and the memory did not grow beyond about 80mb peak. The memory promptly dropped when I stopped driving traffic to the application.

I've asked my colleague to look at it, just in case there is something about system setup involved. Can you give me the exact command you used to drive traffic, and how long before the leak occurred?

@txase
Copy link

txase commented Jul 2, 2014

@Chuwiey, hrm, forgot that I can't see your email address through github. Can you email me at chase@newrelic.com so we can have a more direct dialog?

Thanks!

@Chuwiey
Copy link

Chuwiey commented Jul 2, 2014

Responding via email...

@Rowno
Copy link

Rowno commented Jul 5, 2014

This memory leak is pretty insane. Here's the memory usage I'm seeing before and after newrelic is included. The only difference in the code is the newrelic module being required, no other code is changed. The log level is set to warn.

New Relic memory leak

@framerate
Copy link

Not sure if it's the same issue, but running with stats set to "trace" OR "info" I'm seeing a ~50MB increase every 30 minutes. Disabling this module (but still using new relic on the server) reports no increase in RAM over time.

Emailed heather@newrelic (my contact) more info, but I wanted to post here as well.

@txase
Copy link

txase commented Jul 8, 2014

@Rowno Without y-axis labels, your chart doesn't tell us much. Could you provide the amount of memory usage you are seeing before and after?

We're in the middle of doing some deep inspection of core dumps and other data to try to gather as much information as we can. Stay tuned for more info.

@txase
Copy link

txase commented Jul 8, 2014

@framerate Yes, using a verbose logging level can cause a serious increase in memory usage. We used to default to "trace" level, but have backed that off to just "info" level at this point.

That said, we believed "info" was unlikely to cause a noticeable memory usage. Can you double check that "info" level is still problematic?

The reason verbose log levels are problematic for memory usage is due to garbage collection. Normally, one would expect log message data to be ephemeral and be collected quickly during garbage collection scavenge cycles. However, we are finding log data persisting into the old-generational space of memory. The end result is that a lot of log messages end up sitting around in memory waiting for a slower, less frequent mark and sweep cycle to be collected. This means the memory usage in the steady state for a given app is higher than if all log messages were collected immediately by scavenge cycles.

We're still investigating. Stay tuned!

@framerate
Copy link

@txase I let it run over night and sadly found an actual memory leak in my API so my data is corrupted, but it still appears with a small sample size that even with 'info' logging I go up ~2MB every 5 minutes with newrelic running at app level. Initial tests you'll see running overnight the slight "slope". Then the drop off when I restarted without running new relic on the application and it baselines.

@txase
Copy link

txase commented Jul 8, 2014

@framerate The small slope could be an indication of a leak, or simply a rise over time that hasn't plateaued yet. Due to how the agent works, we need to isolate memory usage due to a leak versus usage due to higher request throughput. For your particular environment, it might be useful to let the app run a few days (going through day/night peak cycles), and then check for continually rising memory usage indicating a memory leak. When we've asked other customers to try this, they eventually see memory usage level-off.

Thanks for following up!

@framerate
Copy link

@txase - This is running on a micro AWS instance. The reason this is on my radar is because my server seems to hit 100% Ram (micro has no swap) and become non-responsive. Could be related to this issue, could not be, but it seems to be :(.

So running a few days and watching becomes an issue since the 100% RAM never drops back down. Granted some of the leaks were mine, but the above graph is running a clean app with/without newrelic agent.

I'm going to keep investigating, but I have to turn off newrelic until I have time to circle back next sprint.

@txase
Copy link

txase commented Jul 8, 2014

@framerate We're preparing a document with things you can do to mitigate memory usage. The gist is that you can try one of the following:

  • If your app doesn't do any crypto (HTTPS, SSL, etc.), reduce tls.SLAB_BUFFER_SIZE to 128 KB
  • Make sure you log at "info" level or higher
  • Reduce number of cluster workers if you use them (cluster probably isn't helpful on AWS micro instances anyways)
  • Upgrade to an instance with more memory

To a certain extent, we simply record a lot of data in order to provide our customers with as much info as possible. Using our product will entail a certain overhead in memory, and you may need to increate the available memory.

Thanks!

@framerate
Copy link

Thanks Chase! I'm upgrading to a medium soon. I don't mind the overheard, I
just need to make sure my app stays stable and obviously a micro instance
is part of the problem.

I look forward to seeing this doc! Thanks!


*- justin *| @framerate http://twitter.com/framerate | framerate.info

On Tue, Jul 8, 2014 at 11:29 AM, Chase Douglas notifications@github.com
wrote:

@framerate https://github.com/framerate We're preparing a document with
things you can do to mitigate memory usage. The gist is that you can try
one of the following:

  • If your app doesn't do any crypto (HTTPS, SSL, etc.), reduce
    tls.SLAB_BUFFER_SIZE to 128 KB
  • Make sure you log at "info" level or higher
  • Reduce number of cluster workers if you use them (cluster probably
    isn't helpful on AWS micro instances anyways)
  • Upgrade to an instance with more memory

To a certain extent, we simply record a lot of data in order to provide
our customers with as much info as possible. Using our product will entail
a certain overhead in memory, and you may need to increate the available
memory.

Thanks!


Reply to this email directly or view it on GitHub
#134 (comment)
.

@supergrilo
Copy link

I've been experiencing the same issue

I've two server with no requests, on server with newrelic module, memory leak is visible.

#Server with newrelic module
node_with_newrelic_module

#Server without newrelic module
node_without_newrelic_module

$ node --version
v0.10.29

├─┬ newrelic@1.8.0
│ ├── bunyan@0.14.6
│ ├─┬ continuation-local-storage@3.0.0
│ │ ├─┬ async-listener@0.4.5
│ │ │ └── shimmer@1.0.0
│ │ └─┬ emitter-listener@1.0.1
│ │ └── shimmer@1.0.0
│ └── yakaa@1.0.0

@framerate
Copy link

Thanks @supergrilo. I had to remove newrelic for now. They suggested that I run it on a server with more ram and it'll eventually plateau. How much ram is on your machine? (mine was ~600MB micro AWS instance)

@supergrilo
Copy link

@framerate

My machine has 4G of memory ram, but v8 are using only 1.4G for default.

@jmaitrehenry
Copy link

Hello, I have the same problem on my node.js apps.
My apps run withotu SSL/TLS stuff, I have a load balancer in front of nodejs apps.
My logs are set in 'info'.
I have 2 process in my cluster and run in a 2G instance server.

One of my node have newrelic, the other one not.
The first one use 1.3G of memory for the nodejs process, and the other one 300M.

$ node --version
v0.10.26

$ npm list
https://gist.github.com/Precea/702a4f0bb62ed110acd5

@etiennea
Copy link

Still have this!

@rictorres
Copy link

Yep, same here. Running Ghost 0.4.

At first I thought it was related to pm2, but then I tried with forever and also node filename.js

@ericsantos
Copy link

Hi, I have the same problem on my node.js apps.
Please increase the prioritization of this issue ;)

@xavigil
Copy link

xavigil commented Jul 23, 2014

+1

Some weeks ago I rolled back the new relic deployment and I am watching this issue since then. I understand it is not an easy problem to solve, but hopefully you can give it higher priority.

Thanks.

@wraithan
Copy link
Contributor

The best way to get the priority bumped on a problem you are having is to
contact our support: https://support.newrelic.com/

The github issue tracker is being phased out, and all of our internal tools
for tracking issue priority depend on going through the support system.

On Wed, Jul 23, 2014 at 2:07 PM, ericsantos notifications@github.com
wrote:

Hi, I have the same problem on my node.js apps.

Please increase the prioritize of this issue ;)


Reply to this email directly or view it on GitHub
#134 (comment)
.

@ruimarinho
Copy link

I was experiencing the same issue running on node 0.11. I had to remove it because the memory increase was huge - from ~170mb to ~2gb. Like @rictorres, I originally thought this could be related to pm2, but the issue still presented itself when running with node --harmony.

@wraithan I don't think that at this point it should be up to us, as users and/or customers, to report this issue on yet another tracker, considering that a discussion is already on-going here.

@wraithan
Copy link
Contributor

@ruimarinho I get that. But without account data, module lists, being able to correlate things across those, etc. It makes our job harder. On top of that, product management uses support tickets to push what is important.

@ruimarinho Also of note, we don't really support --harmony

We can't reproduce an unbounded leak. We can find cases of higher than user desired memory, but nothing that actually shows a leak. Most of the memory usage appears to be in flight objects, especially in the higher memory cases. That large number of in flight objects cause V8 to allocate a lot of memory (it is greedy) and puts pressure on the GC.

@ruimarinho
Copy link

I understand that node --harmony is not fully supported yet and that is one of the reasons why I was following this issue closely but had not reported anything back yet. My only purpose, like many other users involved in this discussion, is to find a fix that will allow us to continue using the newrelic service as part of our instrumentation tools.

Indeed, I can't really reproduce a memory leak but the observed memory usage is much, much higher than what you would normally get without the module installed. Like you said, this may not exactly be a bug in the module, but it is a tradeoff that some of us are not willing to make.

Nevertheless, the workarounds mentioned above to limit this issue did not work for me, but I'll gladly try other suggestions if you have any available. My help may be limited to node with the harmony flag enabled, but right now the symptoms are similar to what others are experience in node 0.10.

@victorquinn
Copy link

+1 here, we are seeing the same memory leak.

Hope this is resolved quickly, but honestly just glad to know about it -- I've been waking up at odd hours of the night for a couple of months to restart our Node processes to prevent the memory leak from overwhelming our servers and we didn't know the culprit until today. We have a few clusters of servers running on AWS with a handful of different Node apps, all with NewRelic with sawtooth memory usage graphs. Disabling NewRelic's Node module solved it immediately.

Just submitted a NewRelic support request as @wraithan suggested. Looking forward to a fix here.

@etiennea
Copy link

A big warning should be put somewhere! This agent should not be used in production as it will almost certainly decrease the performance substantially of your app due to this leak!!!! The stable version is 1.3.2

@zamiang
Copy link

zamiang commented Jul 31, 2014

@wraithan will followup with newrelic support but do want to add some learnings to this more public forum since it is very active.

In short, we have tried newrelic 1.9 with RUM enabled, 1.3.2 with RUM disabled (not supported) and sans newrelic. We did see some improvements when going from 1.9 to 1.3.2 but the when removing newrelic entirely we saw a significant drop in memory usage over time.

Here is a screenshot of our heroku dashboard with newrelic-node 1.9 installed vs no newrelic. Note that throughput is about the same. We are an https only app serving mostly web pages. Any sudden drops in app memory are from a deploy which restarts the app. I understand that monitoring isn't cheap but we saw significant improvements across the board when removing newrelic and are looking at other smaller monitoring solutions now.

screenshot-2014-07-31-12 03 58

@txase
Copy link

txase commented Aug 1, 2014

Hi folks,

First off, we've received a lot of very helpful information in this thread. We appreciate the amount of time and effort people have put into helping us determine potential issues in our agent. Our greatest concern is ensuring that we do not negatively impact our customers, so we take this issue very seriously.

We've spent a lot of time behind the scenes looking into the memory usage of our agent. We worked with a small number of customers who provided core dumps of their apps, and this has led to a few discoveries:

http://docs.newrelic.com/docs/agents/nodejs-agent/troubleshooting/large-memory-usage

However, continuing this issue on GitHub will not help us. If, after consulting the documentation above, you continue to experience memory usage issues, please follow up with us at node-github-issue-134@newrelic.com. If possible, please contact us using the email address you use to log into New Relic, and include your account # and application name(s). This is a temporary address specifically set up to help us create a direct support ticket for you. Creating a dedicated support ticket will allow us to work with you on an individual basis to gather the information we need. We highly encourage you to follow up there, and we will be locking this issue.

Relatedly, we are winding down our use of GitHub issues. It can be difficult to support our customers through GitHub because we can’t share confidential information. Instead, please contact us through our dedicated portal at http://support.newrelic.com for any other issues you encounter. We are better equipped to support you there, and issues filed there are resolved more quickly.

Towards this end, we will soon be turning off GitHub issues. Once we flip the switch, all access to issues, both active and closed, will be gone. This is an unfortunate limitation of how GitHub handles issues once the feature is disabled.

Thank you for your understanding as we undergo this transition.

@txase txase closed this as completed Aug 1, 2014
@newrelic newrelic locked and limited conversation to collaborators Aug 1, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests