Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse Server version 2.5.1 and newer leaking memory and eventually dying #4235

Closed
mman opened this issue Oct 3, 2017 · 24 comments

Comments

@mman
Copy link

commented Oct 3, 2017

Issue Description

Running 2.5.0 in production since July 1st and having flat CPU and memory consumption around 200MB. Tried to upgrade to 2.6.2, 2.6.1, 2.6.0, 2.5.3, 2.5.2, 2.5.1, and all are leaking memory rather fast and with a very similar pattern. 1G consumed in about 10 minutes. Looking at the changelog It might be related to the parse-server-push-adapter being bumped to version 2.0 in 2.5.1.

Steps to reproduce

Not sure yet. Need help figuring out what is happening.

Expected Results

Stable memory consumption.

Actual Outcome

Parse Server leaking memory until it dies (gets killed by docker).

Environment Setup

  • Server
    • parse-server version 2.5.1, 2.5.2, 2.5.3, 2.6.0
    • Operating System: Ubuntu 16.04
    • Hardware: N/A
    • Localhost or remote server? Digital Ocean

Logs/Trace

No errors or strange messages reported, everything working as usual, just memory increasing.

@mman

This comment has been minimized.

Copy link
Author

commented Oct 3, 2017

Just to add, reverted back to 2.5.0 and memory consumption is back to stable ~180MB.

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Oct 3, 2017

@mman

This comment has been minimized.

Copy link
Author

commented Oct 3, 2017

I'm using docker node image based on Ubuntu 16.04 to rebuild all parse servers fresh:

# node --version
v7.9.0
@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Oct 3, 2017

We’ll need heapdumps to measure growth and see if it’s related to parse itself or something else

@mman

This comment has been minimized.

Copy link
Author

commented Oct 3, 2017

Here are the dependencies from my package.json, I'm using fairly simple setup with only s3 adapter, mailgun for mail, and default push adapter.

Looking at the recent changes I'm aware of push getting upreved and s3 being the one that is under heavy changes as well...

: "index.js",
  "dependencies": {
    "express": "~4.2.x",
    "mailgun-js": "*",
    "stathat": "*",
    "parse-server": "2.5.0",
    "parse-server-s3-adapter": "*",
    "parse-server-mailgun": "^2.0.0"
  },
@mman

This comment has been minimized.

Copy link
Author

commented Oct 6, 2017

I have checked this again, especiall in relation to #4238 and figured out that my docker env got somewhat out of sync and has been running 8.6.0 somewhere along with 7.9.0 elsewhere. Not sure where 7.9.0 came from as node official page does not list it anymore...

I have now streamlined all my custom docker images to be based on node v6.11.4 which I understand is what most folks should use (https://hub.docker.com/_/node/), I'll keep it running over the weekend to observe memory consumption on 2.5.0 and on Monday I'll upgrade to 2.6.3 to see what it feels like on node 6.11.4.

What node version are you using @flovilmart ?

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Oct 6, 2017

We're currently running 6.11.1.

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Oct 6, 2017

And this is our memory usage, as you can see, we have spikes etc.. but usual consumption is around 150mb

capture d ecran 2017-10-06 a 08 48 41

@mman

This comment has been minimized.

Copy link
Author

commented Oct 9, 2017

Just to confirm node@v6.11.4 and parse-server 2.5.0 super stable over the weekend, just tried to rebuild everything with parse-server 2.6.3 and both my experimental instances died after about 20 minutes on memory exhausted:

The only suspicious error message I was able to find was:

Oct  9 17:27:13 XXXX 2b641e9a52e3[1316]: (node:15) Warning: Possible EventEmitter memory leak detected. 11 wakeup listeners added. Use emitter.setMaxListeners() to increase limit
@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2017

Are you able to capture heapdumps in order to identify where the memory is getting swallowed?

@mman

This comment has been minimized.

Copy link
Author

commented Oct 9, 2017

I will be happy to, please copy paste here a command for me to execute and I'll send it immediately... (read: I'm node newbie)...

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2017

@mman
there's a few steps involved:

  • Add heapdump as a dependency (npm install --save heapdump)
  • Add heapdump to your cloud code main (require('heapdump'))
  • trigger a snapshot with kill -USR2 PID where PID is the PID of the node process
  • Take a few of those and use chrome inspector to see the memory growth.

more infos here: https://www.npmjs.com/package/heapdump

I don't recommend you send them over as it will contain the contents of the memory (dbURL, masterKey etc...) so it's unsafe to post them publicly :)

@mman

This comment has been minimized.

Copy link
Author

commented Oct 9, 2017

@flovilmart Got the snapshots few minutes apart, one at 119MB the other at 279MB. I'm trying to make sense of them and if I look at objects allocated between the two snapshots, the chrome dev tools are showing me a lot of Object and (array) instances that are referenced from InMemoryCache.CacheController.InMemoryCacheAdapter.LRUCache.Map.Node.Entry.

Looking at 2.5.1 and your change #3979 I see a weak connection here but don't have enough knowledge to speculate.

Would you be willing to take a look at the dumps if I share them with you personally?

thanks,
Martin

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2017

SO this is the in memory cache, which makes sense that it grows, however, it should naturally stabilize depending on the parameters (max number of keys, ttl etc...) what is the RAM allowance on your instances?

@mman

This comment has been minimized.

Copy link
Author

commented Oct 9, 2017

Here is the detailed screenshot:

screen shot 2017-10-09 at 20 52 49

@mman

This comment has been minimized.

Copy link
Author

commented Oct 9, 2017

I was able to kill 2 instances both at 1G physical which are typically looking like this (mem limit at ~ 900 MB):

CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
249220d87afe        0.00%               345.3MiB / 992.3MiB   34.79%              2.11GB / 129MB      37.9MB / 0B         21

the other two which I have at 2G physical I had no morale to let keep growing and reimaged them back to 2.5.0 :))

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2017

You can have a look at cacheMaxSize, by default we let 10000 objects in, which can make your memory grow still it doesn’t explain why we don’t notice it in our setup. Can you try using enableSingleSchemaCache, this may help as this will prevent caching the schema on a per request basis.

@mman

This comment has been minimized.

Copy link
Author

commented Oct 9, 2017

Looks like 2.6.3 with enableSingleSchemaCache: true stabilized itself at around ~150 MB and is not growing any further... What does enableSingleSchemaCache do anyway? Is it safe to keep it on? if yes should it be on by default?

Thanks @flovilmart for helping me with this one, looks like I can be with you the latest release again :)

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2017

Instead of fetching / caching the schema on a per request basis, this will cache it in memory and update it on mutations. You could encounter a very very rare race condition where 2 requests would try to update the schema in a different way but in practice, this should never happen.

I’ll try to provide a fix that would drop manually schémas after the request is flushed/ended, as LRUCache don’t actively prune old values.

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2017

It could probably be on by default actually, I should remove this per request cache and see if it yields any error from the integration tests.

@mman

This comment has been minimized.

Copy link
Author

commented Oct 10, 2017

Just to confirm, after nearly 20 hours up and running, the Parse Server 2.6.3 with enableSingleSchemaCache: true works like a charm and with stable memory consumption.

Feel free to keep this bug open and reference it from a (potentially upcoming PR that will make enableSingleSchemaCache: true by default) or just close it.

It's resolved for me now.

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2017

That's great news @mman ! as a bonus now you know how to debug the heap memory consumption on running processes!

@mman

This comment has been minimized.

Copy link
Author

commented Oct 10, 2017

Absolutely correct @flovilmart, thanks for teaching me new tricks :)

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Oct 11, 2017

@mman closing as we're tracking the issue on #4247! Thanks for the effort in the debugging!

@flovilmart flovilmart closed this Oct 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.