Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when connection dropped on emulator #574

Closed
ThomWright opened this issue Apr 17, 2019 · 8 comments
Closed

Memory leak when connection dropped on emulator #574

ThomWright opened this issue Apr 17, 2019 · 8 comments
Assignees
Labels
api: pubsub Issues related to the googleapis/nodejs-pubsub API. priority: p2 Moderately-important priority. Fix may not be included in next release. triaged for GA type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@ThomWright
Copy link

If my service has created a subscription, when it loses the connection to pubsub then CPU spikes to >100%, and within ~10 seconds the memory increases from ~100M to >1G and Node.js runs out of memory.

I see this when I'm my service is connected to the pubsub emulator locally in my Docker environment, and I stop the pubsub emulator container.

I create a subscription like so: await topic.subscription(subscriptionName).get({autoCreate: true})

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

<--- Last few GCs --->

[81:0x55b3a043d260]    60390 ms: Mark-sweep 1317.4 (1442.6) -> 1308.5 (1445.6) MB, 1678.1 / 0.8 ms  (average mu = 0.161, current mu = 0.074) allocation failure scavenge might not succeed
[81:0x55b3a043d260]    62150 ms: Mark-sweep 1321.6 (1445.6) -> 1312.6 (1447.6) MB, 1636.3 / 0.8 ms  (average mu = 0.117, current mu = 0.070) allocation failure scavenge might not succeed


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x36427a8b9021]
Security context: 0x04297dc1d9d1 <JSObject>
    1: start [0xe2ed8390ef9] [/app/node_modules/grpc/src/client_interceptors.js:~1096] [pc=0x36427a8badc9](this=0x2274239ba661 <Object map = 0x27478d995581>,0x2274239b1f29 <Metadata map = 0x27478d985c21>,0x2274239ba8a1 <InterceptingListener map = 0x27478d988ba1>)
    2: arguments adaptor frame: 3->2
    3: start [0x20c97d8de871] [/app/node_modules/grpc/src/c...

Environment details

  • OS: Alpine Linux (Docker image: mhart/alpine-node:11)
  • Node.js version: Latest v11
  • npm version: v6.9.0
  • @google-cloud/pubsub version: 0.28.1

Steps to reproduce

  1. Use the pubsub-emulator
  2. Create a subscription to a topic on the emulator
  3. Stop the emulator
  4. Bye bye memory

Thanks!

@callmehiphop callmehiphop added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Apr 17, 2019
@callmehiphop callmehiphop changed the title Memory leak when connection dropped Memory leak when connection dropped on emulator Apr 17, 2019
@sduskis
Copy link

sduskis commented May 2, 2019

It sounds like we would have to add some sort of throttling to stop this behavior. That sounds like a feature request to me.

@sduskis
Copy link

sduskis commented May 3, 2019

It looks like there is also some sort of multiplier total bytes used per message, as per this other issue.

@nmiddendorff
Copy link

I'm also seeing this error when running the emulator. I occasionally see a warning indicating a memory leak as well.

(node:7694) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 channel.ready listeners added. Use emitter.setMaxListeners() to increase limit

@sduskis
Copy link

sduskis commented May 22, 2019

The emulator connection was never retried, and the RPCs were constantly retried on the broken connection without any back-off. That caused the event loop to not move to the next phase, since it was stuck on constantly trying to resolve the failed promise, and then retrying. @ajaaym will add a link to an article about this problem.

This should have been fixed in the 0.29.* release which added back-offs. That back-off allows the event loop to move on to other events, including gc.

@ajaaym
Copy link

ajaaym commented May 22, 2019

@sduskis
Copy link

sduskis commented May 22, 2019

@ThomWright and @nmiddendorff, can you please upgrade to the latest version of the client? That should fix the problem.

@bcoe, I found this issue to be fascinating. I'd appreciate your perspective on this topic.

@sduskis
Copy link

sduskis commented May 28, 2019

I'm closing this. If this is still a problem, please open up a new issue.

@sduskis sduskis closed this as completed May 28, 2019
@bcoe
Copy link
Contributor

bcoe commented May 28, 2019

@sduskis agreed, this bug was nasty and interesting. I'm not shocked shocked that grpc retrying in an unbounded way, without actually scheduling any IO, could saturate resources ... adding a back-off seems like the right solution 😄

In practice, my experience at npm was that we rarely had to dig quite so deeply into the nitty-gritty of the event loop except when debugging edge-cases like this; as an example, the route that served package meta-information handled many hundreds of concurrent requests at any given time, serving them in sub-50ms time, and I don't remember bumping into an issue like this.

Sounds like what was probably happening in this case was that async tasks were being scheduled pretty much as fast as CPU could toss them on a stack?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the googleapis/nodejs-pubsub API. priority: p2 Moderately-important priority. Fix may not be included in next release. triaged for GA type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

6 participants