Memory leak when connection dropped on emulator #574

ThomWright · 2019-04-17T14:54:54Z

If my service has created a subscription, when it loses the connection to pubsub then CPU spikes to >100%, and within ~10 seconds the memory increases from ~100M to >1G and Node.js runs out of memory.

I see this when I'm my service is connected to the pubsub emulator locally in my Docker environment, and I stop the pubsub emulator container.

I create a subscription like so: await topic.subscription(subscriptionName).get({autoCreate: true})

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

<--- Last few GCs --->

[81:0x55b3a043d260]    60390 ms: Mark-sweep 1317.4 (1442.6) -> 1308.5 (1445.6) MB, 1678.1 / 0.8 ms  (average mu = 0.161, current mu = 0.074) allocation failure scavenge might not succeed
[81:0x55b3a043d260]    62150 ms: Mark-sweep 1321.6 (1445.6) -> 1312.6 (1447.6) MB, 1636.3 / 0.8 ms  (average mu = 0.117, current mu = 0.070) allocation failure scavenge might not succeed


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x36427a8b9021]
Security context: 0x04297dc1d9d1 <JSObject>
    1: start [0xe2ed8390ef9] [/app/node_modules/grpc/src/client_interceptors.js:~1096] [pc=0x36427a8badc9](this=0x2274239ba661 <Object map = 0x27478d995581>,0x2274239b1f29 <Metadata map = 0x27478d985c21>,0x2274239ba8a1 <InterceptingListener map = 0x27478d988ba1>)
    2: arguments adaptor frame: 3->2
    3: start [0x20c97d8de871] [/app/node_modules/grpc/src/c...

Environment details

OS: Alpine Linux (Docker image: mhart/alpine-node:11)
Node.js version: Latest v11
npm version: v6.9.0
@google-cloud/pubsub version: 0.28.1

Steps to reproduce

Use the pubsub-emulator
Create a subscription to a topic on the emulator
Stop the emulator
Bye bye memory

Thanks!

The text was updated successfully, but these errors were encountered:

sduskis · 2019-05-02T20:39:22Z

It sounds like we would have to add some sort of throttling to stop this behavior. That sounds like a feature request to me.

sduskis · 2019-05-03T02:14:05Z

It looks like there is also some sort of multiplier total bytes used per message, as per this other issue.

nmiddendorff · 2019-05-03T20:34:58Z

I'm also seeing this error when running the emulator. I occasionally see a warning indicating a memory leak as well.

(node:7694) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 channel.ready listeners added. Use emitter.setMaxListeners() to increase limit

sduskis · 2019-05-22T14:34:01Z

The emulator connection was never retried, and the RPCs were constantly retried on the broken connection without any back-off. That caused the event loop to not move to the next phase, since it was stuck on constantly trying to resolve the failed promise, and then retrying. @ajaaym will add a link to an article about this problem.

This should have been fixed in the 0.29.* release which added back-offs. That back-off allows the event loop to move on to other events, including gc.

ajaaym · 2019-05-22T14:59:23Z

here is the "How even quick async functions can block the Event-Loop, starve I/O" and "Promises, Next-Ticks and Immediates— NodeJS Event Loop Part 3"

sduskis · 2019-05-22T21:18:38Z

@ThomWright and @nmiddendorff, can you please upgrade to the latest version of the client? That should fix the problem.

@bcoe, I found this issue to be fascinating. I'd appreciate your perspective on this topic.

sduskis · 2019-05-28T12:24:40Z

I'm closing this. If this is still a problem, please open up a new issue.

bcoe · 2019-05-28T21:48:10Z

@sduskis agreed, this bug was nasty and interesting. I'm not shocked shocked that grpc retrying in an unbounded way, without actually scheduling any IO, could saturate resources ... adding a back-off seems like the right solution 😄

In practice, my experience at npm was that we rarely had to dig quite so deeply into the nitty-gritty of the event loop except when debugging edge-cases like this; as an example, the route that served package meta-information handled many hundreds of concurrent requests at any given time, serving them in sub-50ms time, and I don't remember bumping into an issue like this.

Sounds like what was probably happening in this case was that async tasks were being scheduled pretty much as fast as CPU could toss them on a stack?

callmehiphop added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Apr 17, 2019

callmehiphop changed the title ~~Memory leak when connection dropped~~ Memory leak when connection dropped on emulator Apr 17, 2019

sduskis added the triaged for GA label May 10, 2019

sduskis assigned ajaaym May 14, 2019

sduskis mentioned this issue May 22, 2019

Pubsub: Pull Subscriber unable to re-connect after a while googleapis/google-cloud-python#7910

Closed

sduskis closed this as completed May 28, 2019

ThomWright mentioned this issue Jun 26, 2019

grpc-js: A channel that never connects keeps the process alive grpc/grpc-node#767

Closed

google-cloud-label-sync bot added the api: pubsub Issues related to the googleapis/nodejs-pubsub API. label Jan 31, 2020

ThomWright mentioned this issue Mar 4, 2020

Various problems seem to be happening in regards to grpc-js #868

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak when connection dropped on emulator #574

Memory leak when connection dropped on emulator #574

ThomWright commented Apr 17, 2019

sduskis commented May 2, 2019

sduskis commented May 3, 2019

nmiddendorff commented May 3, 2019

sduskis commented May 22, 2019

ajaaym commented May 22, 2019 •

edited by sduskis

Loading

sduskis commented May 22, 2019

sduskis commented May 28, 2019

bcoe commented May 28, 2019

Memory leak when connection dropped on emulator #574

Memory leak when connection dropped on emulator #574

Comments

ThomWright commented Apr 17, 2019

Environment details

Steps to reproduce

sduskis commented May 2, 2019

sduskis commented May 3, 2019

nmiddendorff commented May 3, 2019

sduskis commented May 22, 2019

ajaaym commented May 22, 2019 • edited by sduskis Loading

sduskis commented May 22, 2019

sduskis commented May 28, 2019

bcoe commented May 28, 2019

ajaaym commented May 22, 2019 •

edited by sduskis

Loading