Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Likely memory leak in internal-channel.ts #2436

Closed
avermeil opened this issue Apr 28, 2023 · 5 comments
Closed

Likely memory leak in internal-channel.ts #2436

avermeil opened this issue Apr 28, 2023 · 5 comments

Comments

@avermeil
Copy link

Problem description

Hi there! I'm a maintainer of the google-ads-node library, which depends on @grpc/grpc-js.

Every call to the google ads API leaves behind a bit of unreclaimed memory. Looking into it, it seems to be related to the internal-channel.ts file's setInterval() call. In the screenshot below, there are 13 calls to the google ads API, and indeed 13 instances of ClientHttp2Session hanging around well after the call is over.

image

I took a look at the file, and from my limited understanding, this could be the culprit: https://github.com/grpc/grpc-node/blob/master/packages/grpc-js/src/internal-channel.ts#L191. The setInterval ID is never cleared, causing V8 to keep a reference to this forever, which includes the entire InternalChannel class. Unref'ing the setInterval doesn't change that behaviour as far as I can tell.

Reproduction steps

This isn't super tight, but here's how I reproduce the problem:

  • Install google-ads-api from NPM
  • In a while(true) loop, make a .report([params]) or .query([params]) call to the Google Ads API, making sure to await so it only does one at a time
  • Run the code with node --inspect script.js & attach the chrome debugger
  • Give it a few iterations to warm up, then start recording an allocation timeline
  • The timeline should reveal data that never gets garbage collected coming from grpc-js

Environment

  • OS name, version and architecture: MacOS 13.2.1 (ARM), as well as Ubuntu 20.04.4 (x86_64)
  • Node version: v16.15.1
  • Node installation method: MacOS installer, and apt-get for Ubuntu
  • If applicable, compiler version: N/A
  • Package name and version: gRPC@1.8.13

Additional context

There are a number of other objects in the heap that aren't able to be garbage collected due to intenal-channel. Here's another one:
image

In production, this causes our nodejs processes to crash every few hours once they fill the available memory:
image

I can provide the full heap timeline file, as well as any other information upon request :)

@avermeil
Copy link
Author

Just noticed this issue, which is very similar.

I'm going to study that issue to see if we're falling into a similar trap.

@murgatroid99
Copy link
Member

murgatroid99 commented Apr 28, 2023

I will just note that the ClientHttp2Session is intended to be reused. If you are seeing that one of those is created for each request that you make, the most likely explanation is that you are creating a new client object for each request. The fact that they are sticking around most likely indicates that you are not closing those clients, as in the issue you linked. In fact, if you create multiple identical client objects, it will reuse a single ClientHttp2Session, so this also indicates that you are constructing the client objects in a way that actively prevents that.

@avermeil
Copy link
Author

avermeil commented May 2, 2023

You're right @murgatroid99 -- I've tried closing the clients and that has helped enormously.

But even then, we still do have a lot of ClientHttp2Sessions hanging around permanently even after all requests are over. Now it seems to stem from channelz.ts:

image

You are constructing the client objects in a way that actively prevents reusing a single ClientHttp2Session

That's entirely possible. Our system makes calls using many different refresh_tokens -- out of 1000 api calls, we'll be using perhaps 200 different refresh tokens. Is it possible to re-use a ClientHttp2Session for different refresh_tokens?

We're using google-gax as a wrapper, and it's possible that its generated code doesn't use grpc.js optimally for our use-case.

Here's an example of google-gax's generated code (the part that initialises the GRPC client):

And ultimately, this is how we load the GRPC client services: https://github.com/Opteo/google-ads-api/blob/master/src/service.ts#L94 (the only option we pass in is sslCreds).

Does anything stick out as bad practice here? Perhaps we should be passing in the refresh_tokens differently, with something other than sslCreds, to avoid making so many new connections? Or perhaps we should only be calling a grpc.credentials.createSsl() once, and re-using that for every request regardless of refresh_token?

Thanks so much for your help!

@murgatroid99
Copy link
Member

It seems like the serviceCache in service.ts should result in you only constructing a few clients. Am I missing something there?

It is possible that there is a bug with channelz here. If you construct the client with the option 'grpc.enable_channelz': 0, it will disable that, which should stop it from retaining that reference.

@avermeil
Copy link
Author

avermeil commented May 5, 2023

It seems like the serviceCache in service.ts should result in you only constructing a few clients. Am I missing something there?

Indeed, but we're instantiating the whole Service class on most requests, so the cache only does so much.

If you construct the client with the option 'grpc.enable_channelz': 0, it will disable that, which should stop it from retaining that reference.

I'm not sure how we would set 'grpc.enable_channelz': 0 in the context of a bazelisk-generated API... But I've released a version of the library that closes clients properly, and it's pretty much eliminated the memory growth problem.

I consider this problem solved 👍 . Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants