Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory mismanagement (leak) since version 3.9 #18934

Closed
Durisvk opened this issue Apr 25, 2023 · 10 comments
Closed

Memory mismanagement (leak) since version 3.9 #18934

Durisvk opened this issue Apr 25, 2023 · 10 comments
Labels
bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. kind/bug A reported bug. team/client Issue for team Client. tech/engines/query engine Issue in the Query Engine tech/typescript Issue for tech TypeScript. topic: performance/memory topic: performance

Comments

@Durisvk
Copy link

Durisvk commented Apr 25, 2023

Bug description

There is an issue with how Prisma allocates a memory for a single query being called multiple times.

Running the same query for 10000 times produces the following process.memoryUsage().heapUsed on the latest version of Prisma:

Memory usage before: 8.776458740234375 MB
Average memory usage after 2000: 12.91 MB (+1.39 MB)
Average memory usage after 3000: 13.03 MB (+0.12 MB)
Average memory usage after 4000: 15.29 MB (+2.26 MB)
Average memory usage after 5000: 17.31 MB (+2.02 MB)
Average memory usage after 6000: 22.54 MB (+5.23 MB)
Average memory usage after 7000: 23.18 MB (+0.64 MB)
Average memory usage after 8000: 25.82 MB (+2.64 MB)
Average memory usage after 9000: 28.01 MB (+2.19 MB)
Memory usage after: 31.48168182373047 MB

I recommend checking https://github.com/Durisvk/prisma-memory-mismanagement-reproduction/blob/main/README.md for more details.

A little context:
I am currently debugging a memory leak issue on production. A memory leaks in between the requests. However when I try to use --inspect mode in NodeJS I am unable to tell whether the memory is still leaking or it's just Prisma's weird memory management.

I believe the issue is also related to this: #17925 but we can't technically call it a leak because the memory after a few tens of thousands cycles falls down without a Heap out of memory issue (even though it grows by the factor of 3.34x at its worst in the meantime)

How to reproduce

  1. Clone the following repository:

https://github.com/Durisvk/prisma-memory-mismanagement-reproduction

git clone git@github.com:Durisvk/prisma-memory-mismanagement-reproduction.git
  1. Follow the steps in README

  2. See how the memory management differs

Expected behavior

No response

Prisma information

// Add your schema.prisma
// Add your code using Prisma Client

Environment & setup

  • OS: macOS
  • Database: PostgreSQL
  • Node.js version:

Prisma Version

prisma                  : 3.9.2
@prisma/client          : 3.9.2
Current platform        : darwin
Query Engine (Node-API) : libquery-engine bcc2ff906db47790ee902e7bbc76d7ffb1893009 (at node_modules/@prisma/engines/libquery_engine-darwin.dylib.node)
Migration Engine        : migration-engine-cli bcc2ff906db47790ee902e7bbc76d7ffb1893009 (at node_modules/@prisma/engines/migration-engine-darwin)
Introspection Engine    : introspection-core bcc2ff906db47790ee902e7bbc76d7ffb1893009 (at node_modules/@prisma/engines/introspection-engine-darwin)
Format Binary           : prisma-fmt bcc2ff906db47790ee902e7bbc76d7ffb1893009 (at node_modules/@prisma/engines/prisma-fmt-darwin)
Default Engines Hash    : bcc2ff906db47790ee902e7bbc76d7ffb1893009
Studio                  : 0.457.0

But also with

prisma                  : 4.3.1
@prisma/client          : 4.3.1
Current platform        : darwin
Query Engine (Node-API) : libquery-engine c875e43600dfe042452e0b868f7a48b817b9640b (at node_modules/@prisma/engines/libquery_engine-darwin.dylib.node)
Migration Engine        : migration-engine-cli c875e43600dfe042452e0b868f7a48b817b9640b (at node_modules/@prisma/engines/migration-engine-darwin)
Introspection Engine    : introspection-core c875e43600dfe042452e0b868f7a48b817b9640b (at node_modules/@prisma/engines/introspection-engine-darwin)
Format Binary           : prisma-fmt c875e43600dfe042452e0b868f7a48b817b9640b (at node_modules/@prisma/engines/prisma-fmt-darwin)
Format Wasm             : @prisma/prisma-fmt-wasm 4.3.0-32.c875e43600dfe042452e0b868f7a48b817b9640b
Default Engines Hash    : c875e43600dfe042452e0b868f7a48b817b9640b
Studio                  : 0.473.0

Then the deltas become smaller but the memory is mostly growing rather than going down (even though a single query is being called repeatedly).

@Durisvk
Copy link
Author

Durisvk commented Apr 25, 2023

Here's a comparison between the two releases that made a marginal contribution to issue being introduced:

3.8.0...3.9.0

And here's a commit I suspect introduced the issue: f395aba but don't take it for granted (it's just my personal feeling). Since I've spent last few weeks in node --inspect heap snapshots and the terms: dmmf, getCallSite (which is a clear memory leak) and executingQueryPromise sound too familiar

regarding getCallSite I worked around it using { errorFormat: 'minimal' } option but that wasn't too big of a leak.

@jkomyno
Copy link
Contributor

jkomyno commented Apr 26, 2023

Hi @Durisvk, thanks for opening this issue: we're heavily working on improving the performance of Prisma as a whole.
In fact, the newest releases (combined with the new jsonProtocol preview feature) should already lead to faster CPU time and smaller memory footprint). Could you please include prisma@4.13.0 in your reproductions and write back here your findings, so we can keep this issue up to date? Thank you!

@jkomyno jkomyno added topic: performance/memory tech/engines/query engine Issue in the Query Engine team/client Issue for team Client. bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. labels Apr 26, 2023
@Durisvk
Copy link
Author

Durisvk commented Apr 26, 2023

Hey @jkomyno. Here's the memory progression with prisma@4.13.0 and jsonProtocol added to the preview features:

Memory usage before: 7.3544158935546875 MB
Average memory usage after 2000: 10.51 MB (+1.95 MB)
Average memory usage after 3000: 11.38 MB (+0.87 MB)
Average memory usage after 4000: 12.27 MB (+0.89 MB)
Average memory usage after 5000: 10.53 MB (-1.75 MB)
Average memory usage after 6000: 11.71 MB (+1.19 MB)
Average memory usage after 7000: 12.25 MB (+0.54 MB)
Average memory usage after 8000: 12.98 MB (+0.73 MB)
Average memory usage after 9000: 14.13 MB (+1.16 MB)
Memory usage after: 17.440841674804688 MB

Just to add more context - the prisma schema contains 1 model with 2 fields. The database table is empty and I am calling the same findMany() function on the same empty table repeatedly 10000 times. I don't see a reason why the memory should grow to more than double.

I see the tracing is disabled by default and is available as a previewFeature. Any ideas what could cause the drastic change in memory progression between the versions 3.8 and 3.9 (3.8.0...3.9.0)?

In our application we have multiple scheduled jobs which are querying the database every minute and this is how the memory consumption looks for 7 days:

image

not great, right?

@aqrln aqrln added the tech/typescript Issue for tech TypeScript. label Apr 26, 2023
@Durisvk
Copy link
Author

Durisvk commented Apr 26, 2023

@aqrln the typescript is not relevant in this case. The reproduction is set up in plain javascript

@aqrln
Copy link
Member

aqrln commented Apr 26, 2023

@Durisvk the label doesn't mean the problem is related to typescript itself, it only means that it's related to the part of Prisma codebase written in TypeScript (as opposed to the engines written in Rust) and needs TypeScript engineers to look into it, and it's relevant to anything that happens in the JavaScript land. I added the label because your report shows the JavaScript heap usage growth.

@Durisvk
Copy link
Author

Durisvk commented Apr 28, 2023

Hey, updates here.

I've pulled your repo and put memory logs around the following line:
https://github.com/prisma/prisma/blob/main/packages/engine-core/src/library/LibraryEngine.ts#L478
before and after and calculate the delta.

I believe this line is referring the query to the native Rust libquery-engine here:
https://github.com/prisma/prisma-engines/blob/main/query-engine/query-engine-node-api/src/engine.rs#L328

And it generates 858 bytes on every single cycle (we're talking query that returns 0 rows) and this time it's on SQLite database (so it's not relevant to postgres specifically).

Then after a few thousand queries, it drops some memory, fortunately.

It appears the problem is happening in the native part of the system. Now I'm not a Rust expert and I also don't know how the V8 process manages the memory of a natively bound module but I believe it's shared since it's imported as a regular Node module.

Should I open this issue in https://github.com/prisma/prisma-engines repo so that Rust engineers can write some memory tests around that query function?

@SevInf @jkomyno I see you guys are quite involved in the native part of the system - can you assist here? Our production is crashing over and over again and I've fixed all the other leaks involved so the only one left is Prisma.

@Durisvk
Copy link
Author

Durisvk commented Apr 28, 2023

New update here. Just added a jemalloc package to prisma-engines and the memory doesn't seem to grow in the Rust world in between the same calls:

...
Allocated: 2883488, active: 3334144, metadata: 7060880 (n_thp 0), resident: 9744384, mapped: 47374336, retained: 0
...
Allocated: 2955168, active: 3420160, metadata: 7061136 (n_thp 0), resident: 9830400, mapped: 47460352, retained: 0
...
Allocated: 2955168, active: 3420160, metadata: 7061136 (n_thp 0), resident: 9830400, mapped: 47460352, retained: 0
...
Allocated: 2955168, active: 3420160, metadata: 7061136 (n_thp 0), resident: 9830400, mapped: 47460352, retained: 0
..
Allocated: 2955168, active: 3420160, metadata: 7061136 (n_thp 0), resident: 9830400, mapped: 47460352, retained: 0
...

and so on.

The only thing increasing here is the time spent ctl n_lock_ops:
image

anyways, so the problem is not in native prisma-engines.

However before calling the native engine and after calling the native engine the memory usage increases by 853 bytes even though the result coming from the query engine has only 23 bytes. And it only goes up from there until it reaches some point (double or triple the initial memory usage with an empty query) and then it drops. Could it be some strange caching mechanism of napi bindings?

@janpio
Copy link
Member

janpio commented Apr 28, 2023

Something to verify that: Add engineType = "binary" to your generator block in the schema, regenerate the Client and try again. Then the Rust code runs in a completely different way without Node-API - which should isolate the problem optimally.

@SevInf
Copy link
Contributor

SevInf commented Apr 29, 2023

@Durisvk could you also try running your reproduction with lower --max_old_space_size? From the symotoms you are describing (heap grows for a while, then drop to the old value) it would seem like a normal GC behaviour to me - it's just that the process have enough memory to not have to call major GC.
We also have almost identical test for memory leak here, one major difference being forced GC after every iteration and it shows no leak.

@Durisvk
Copy link
Author

Durisvk commented Apr 29, 2023

Yeah, you were right, the memory progression looks much better with --max-old-space-size=8. Almost as good as in version 3.8. I wonder what has changed between 3.8 and 3.9.

$ node --max-old-space-size=8 index.js

Memory usage before: 6.721260070800781 MB
Average memory usage after 2000: 7.02 MB (-0.09 MB)
Average memory usage after 3000: 6.99 MB (-0.03 MB)
Average memory usage after 4000: 6.96 MB (-0.03 MB)
Average memory usage after 5000: 6.99 MB (+0.03 MB)
Average memory usage after 6000: 7.75 MB (+0.76 MB)
Average memory usage after 7000: 7.98 MB (+0.24 MB)
Average memory usage after 8000: 7.85 MB (-0.14 MB)
Average memory usage after 9000: 7.94 MB (+0.09 MB)
Average memory usage after 10000: 7.93 MB (-0.01 MB)
Average memory usage after 11000: 7.88 MB (-0.05 MB)
Average memory usage after 12000: 7.95 MB (+0.07 MB)
Average memory usage after 13000: 7.81 MB (-0.14 MB)
Average memory usage after 14000: 7.91 MB (+0.09 MB)
Average memory usage after 15000: 7.85 MB (-0.06 MB)
Average memory usage after 16000: 7.85 MB (0 MB)
Average memory usage after 17000: 8.2 MB (+0.35 MB)
Average memory usage after 18000: 8.13 MB (-0.07 MB)
Average memory usage after 19000: 8.27 MB (+0.14 MB)
Average memory usage after 20000: 8.02 MB (-0.25 MB)
Average memory usage after 21000: 8.26 MB (+0.24 MB)
Average memory usage after 22000: 8.03 MB (-0.24 MB)
Average memory usage after 23000: 8.16 MB (+0.14 MB)
Average memory usage after 24000: 8.12 MB (-0.04 MB)
Average memory usage after 25000: 8.26 MB (+0.13 MB)
Average memory usage after 26000: 8.11 MB (-0.14 MB)
Average memory usage after 27000: 7.98 MB (-0.14 MB)
Average memory usage after 28000: 8.22 MB (+0.24 MB)
Average memory usage after 29000: 8.05 MB (-0.17 MB)
Average memory usage after 30000: 8.21 MB (+0.16 MB)
Average memory usage after 31000: 9.72 MB (+1.51 MB)
Average memory usage after 32000: 9.71 MB (-0.01 MB)
Average memory usage after 33000: 9.96 MB (+0.25 MB)
Average memory usage after 34000: 10.16 MB (+0.2 MB)
Average memory usage after 35000: 9.65 MB (-0.51 MB)
Average memory usage after 36000: 9.72 MB (+0.07 MB)
Memory usage after: 8.478965759277344 MB

Thank you for clarifying. You can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. kind/bug A reported bug. team/client Issue for team Client. tech/engines/query engine Issue in the Query Engine tech/typescript Issue for tech TypeScript. topic: performance/memory topic: performance
Projects
None yet
Development

No branches or pull requests

5 participants