Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc-js 1.4.1 - Channelz tracing uses unbounded memory which causes OOM #1941

Closed
tom-engineering opened this issue Oct 17, 2021 · 14 comments · Fixed by #1943
Closed

grpc-js 1.4.1 - Channelz tracing uses unbounded memory which causes OOM #1941

tom-engineering opened this issue Oct 17, 2021 · 14 comments · Fixed by #1943

Comments

@tom-engineering
Copy link

Problem description

Up front: I wouldn't necessarily call this a bug but it has caused an issue in our code.

We use a single grpc-js client to service calls to the Google Ads API. This is instantiated when the server is initialized and used throughout the life of the program to try and reduce memory leakage. This works quite well, memory use is relatively stable, but when grpc-js is upgraded to 1.4.1 (from 1.3.7), memory usage starts to increase linearly even though our implementation is the same. This seems to be caused by Date objects (and others) being created when events are appended to the ChannelzTrace object associated with our channel.

This may otherwise be a small amount of memory but we service a very large amount of traffic through this server and it eventually causes an OOM event.

Should we cease caching our grpc client? What would be the recommended lifecycle of a client in that case? Or is there a way to opt out of tracing?

Reproduction steps

Create a single grpc Client object and fire traffic to it at a rate of 5 requests per second over 10 minutes and observe memory usage climbing without reaching a cap.

Environment

  • OS name, version and architecture - Tested on Linux ARM64 and macOS ARM64
  • Node version - 16
  • Node installation method - Docker image on server and nvm locally
  • Package name and version - 1.4.1

Additional context

I'm not sure what would be most helpful but I am happy to provide anything required.

@tom-engineering tom-engineering changed the title 1.4.1 - Channelz tracing uses unbounded memory which causes OOM grpc-js 1.4.1 - Channelz tracing uses unbounded memory which causes OOM Oct 17, 2021
@murgatroid99
Copy link
Member

It's not supposed to do traces on a per-request basis. The traces are all about connection state, which shouldn't change more quickly when there are more requests. Because of that, if anything, caching the client should decrease the number of traces that occur, so I wouldn't recommend changing that.

It might help to see what information is actually getting stored in channelz in your case. You can run a channelz server with code like this:

const server = new grpc.Server();
grpc.addAdminServicesToServer(server);
// Modify the first argument to suit your environment
server.bindAsync('localhost:0', ServerCredentials.createInsecure(), (error, port) => {
  console.log(`Serving channelz on port ${port}`);
  server.start();
});

Then you can access it using the grpcdebug tool. I would be interested to see the output of grpcdebug <server address> channelz channels and grpcdebug <server address> channelz channel <channel id> where "server address" is the address of the channelz server, and "channel id" is the ID of one of the channels from the first command that has a significant number of calls started.

In the meantime, I will push out an update that rotates out old channelz trace events on a per-channel/subchannel/server basis.

@tom-engineering
Copy link
Author

Thank you for the swift and helpful response! We have had issues with pushing too many concurrent requests down the same channel which seems to (at least implicitly) affect channel status, and we force round-robin on as well - I don't suppose that may be adding to the volume of trace events? I will find some time to inspect the channelz logs.

@murgatroid99
Copy link
Member

I would be interested to see a trace log for one of your tests, which can be output by setting the environment variables GRPC_TRACE=all and GRPC_VERBOSITY=DEBUG. They should show the details of the things you are describing.

@murgatroid99
Copy link
Member

murgatroid99 commented Oct 26, 2021

This took a little longer than I expected, but grpc-js version 1.4.2 is out with two relevant changes: Each channel, subchannel, and server will now store only a limited number of trace events at any one time, and channelz stats tracking can be disabled entirely for a channel and its subchannels, or a server, with the channel option 'grpc.enable_channelz': 0.

@murgatroid99
Copy link
Member

If that change does solve the problem, it would still help to see a log to understand what caused the problem in the first place.

@tom-engineering
Copy link
Author

tom-engineering commented Nov 5, 2021

Posting this here as a potential FYI as it may be a bug with Google Ads API (or us messing something up) but setting the 'grpc.enable_channelz': 0 option using grpc-js 1.4.2 makes all our calls to the Google Ads API exceed deadline (set to 60s). I've tried to condense down the code we actually use to make the request but I've had to cut quite a bit, hopefully it may provide some clues. I'll post any updates if we find them.

Not setting the "grpc.enable_channelz" option at all works seemingly fine, and setting a random unknown option of "grpc.unknown_option": 0 works fine as well (was testing to see if it passes the value to the GA API which potentially doesn't like it).

const Client = grpc.makeGenericClientConstructor({}, "", {});

const auth = new JWT(authOptions);
const callCredentials = grpc.credentials.createFromGoogleCredential(auth);
const channelCredentials = grpc.credentials.createSsl();
const serviceConfig = JSON.stringify({
  loadBalancingConfig: [{ round_robin: {} }],
});

const client = new Client(
  GOOGLE_ADS_ENDPOINT,
  grpc.credentials.combineChannelCredentials(
    channelCredentials,
    callCredentials
  ),
  { "grpc.service_config": serviceConfig, "grpc.enable_channelz": 0 }
);

client.makeUnaryRequest(
  `/google.ads.googleads.${GOOGLE_ADS_VERSION}.services.${serviceName}/${method.name}`,
  (value: Uint8Array) => Buffer.from(value),
  (value: Buffer) => value,
  requestData,
  this.metadata,
  {
    deadline: timeout ? Date.now() + timeout : undefined,
  },
  function (err, value) {
    if (isServiceError(err)) {
      err = new GaClientError(err);
    }
    callback(err, value);
  }
);

@murgatroid99
Copy link
Member

No, now that you pointed it out, I found a bug with the handling of that option in grpc-js. I will fix that. In the meantime, without that option, do you still see the unbounded memory growth?

@tom-engineering
Copy link
Author

We are still seeing significant additional memory usage over the baseline (so far about 2.5x memory usage - 800MiB to 2GiB), but we do now utilise a static pool of grpc clients so there may be many multiples of channels and subchannels logging. I'm just waiting to see if it tops out or keeps going in prod but we may hit our memory limit still so bear with me on that. I'll try and get a sample of some logs in the meantime.

@murgatroid99
Copy link
Member

The bug with that option is fixed in version 1.4.3.

@Gilwe
Copy link

Gilwe commented Nov 7, 2021

Hi, we're in the same boat but we see also an increase in memory with 1.4.3 (also node 16).

Our case is an express server that serves graphql and for each graphql will call an internal grpc service. when we used grpc library we had no memory problems but with @grpc/grpc-node the memory is climbing on every request.
When I took a heap snapshot I noticed that we had a lot of ChannelImplementation rows so I explicitly ran close for all the grpc clients (client = all services created from grpc.loadPackageDefinition()) which made the memory increment not to be as steep but the memory leak was still there.

I tried to specify enable_channelz:0 but saw no improvement.

@murgatroid99 logs from what would be helpful for you?

@tom-engineering
Copy link
Author

tom-engineering commented Nov 8, 2021

@murgatroid99 I am casting around in the dark so I really apologise if this is way off and a distraction but is this line the culprit?

registerChannelzSocket(this.subchannelAddressString, () => this.getChannelzSocketInfo()!);

There's a guarded invocation in the same function that assigns the output to a ref, but then this unguarded invocation that does nothing with the output at the end of the function. I'm wondering if the unguarded version should be there at all?

private createSession(proxyConnectionResult: ProxyConnectionResult) {
  [...]
  if (this.channelzEnabled) {
    this.channelzSocketRef = registerChannelzSocket(this.subchannelAddressString, () => this.getChannelzSocketInfo()!);
    this.childrenTracker.refChild(this.channelzSocketRef);
  }
  [...]
  registerChannelzSocket(this.subchannelAddressString, () => this.getChannelzSocketInfo()!);
}

I don't seem to get a memory leak if I comment out the unguarded line when enable_channelz:0 is set, so it's possible that this may be the cause of the actual leak and why turning enable_channelz off doesn't seem to help.

@murgatroid99
Copy link
Member

You're right, that line shouldn't be there at all, which explains why I missed it when adding the guards, and the output never gets recorded, so it never gets unregistered. I have published a new version 1.4.4 that removes it.

@Gilwe A lot of ChannelImplementation objects is a big red flag here. If you are creating a new client object for every request you make, that would leak a lot of memory. We discourage that usage pattern, instead you should create one client, and use it to make all of your requests. If you do have to create many clients, you should close each one when you are done using it.

Even with the extra line mentioned in the last comment, setting the channel option to disable channelz should have had a significant impact on the rate of the memory leak, if you set the option 'grpc.enable_channelz': 0 as stated in my earlier comment. If you actually tried to set an option called enable_channelz, that would do nothing because it's not a recognized option.

If you still have problems with the new version, as stated in my earlier comment, the logs that would help here are client trace logs, which you can get by running your code with the environment variables GRPC_TRACE=all and GRPC_VERBOSITY=DEBUG. Just the first few hundred lines would probably be enough to understand what is happening.

@tom-engineering
Copy link
Author

tom-engineering commented Nov 8, 2021

1.4.4 has completely fixed the memory leak for me with channelz tracing disabled, thank you! Not sure if the above was to me or @Gilwe but I had been setting "grpc.enable_channelz" correctly, I had just abbreviated in my comment.

In terms of the actual channelz tracing itself, in our case it looks like it might be frequent TRANSIENT_FAILURE state transitions that fill up the logs. This is a set of entries from a completely idle application I left running by accident on my local machine just now with no traffic going through it:

D 2021-11-08T20:47:51.384Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 connection closed with error connect ETIMEDOUT 2a00:1450:4009:819::200a:443
D 2021-11-08T20:47:51.385Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 connection closed
D 2021-11-08T20:47:51.385Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:47:51.385Z | round_robin | READY -> READY
D 2021-11-08T20:47:51.385Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:47:51.385Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:47:51.385Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:47:51.385Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 TRANSIENT_FAILURE -> CONNECTING
D 2021-11-08T20:47:51.385Z | round_robin | READY -> READY
D 2021-11-08T20:47:51.385Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:47:51.386Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:47:51.386Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 creating HTTP/2 session
D 2021-11-08T20:47:51.387Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:819::200a:443,142.250.179.234:443]
D 2021-11-08T20:47:51.387Z | subchannel_refcount | (-1) 2a00:1450:4009:819::200a:443 refcount 2 -> 1
D 2021-11-08T20:47:51.387Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 2 -> 1
D 2021-11-08T20:47:51.387Z | round_robin | Connect to address list 2a00:1450:4009:819::200a:443,142.250.179.234:443
D 2021-11-08T20:47:51.387Z | subchannel_refcount | (-1) 2a00:1450:4009:819::200a:443 refcount 1 -> 2
D 2021-11-08T20:47:51.387Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 1 -> 2
D 2021-11-08T20:47:51.387Z | round_robin | READY -> READY
D 2021-11-08T20:47:51.387Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:47:51.387Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:49:06.385Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 connection closed with error connect ETIMEDOUT 2a00:1450:4009:819::200a:443
D 2021-11-08T20:49:06.385Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 connection closed
D 2021-11-08T20:49:06.385Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:49:06.386Z | round_robin | READY -> READY
D 2021-11-08T20:49:06.386Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:49:06.386Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:49:06.386Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:49:06.386Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 TRANSIENT_FAILURE -> CONNECTING
D 2021-11-08T20:49:06.386Z | round_robin | READY -> READY
D 2021-11-08T20:49:06.386Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:49:06.386Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:49:06.386Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 creating HTTP/2 session
D 2021-11-08T20:49:06.388Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:819::200a:443,142.250.179.234:443]
D 2021-11-08T20:49:06.388Z | subchannel_refcount | (-1) 2a00:1450:4009:819::200a:443 refcount 2 -> 1
D 2021-11-08T20:49:06.388Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 2 -> 1
D 2021-11-08T20:49:06.388Z | round_robin | Connect to address list 2a00:1450:4009:819::200a:443,142.250.179.234:443
D 2021-11-08T20:49:06.388Z | subchannel_refcount | (-1) 2a00:1450:4009:819::200a:443 refcount 1 -> 2
D 2021-11-08T20:49:06.388Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 1 -> 2
D 2021-11-08T20:49:06.388Z | round_robin | READY -> READY
D 2021-11-08T20:49:06.388Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:49:06.388Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:49:21.456Z | subchannel | (-1) 142.250.179.234:443 connection closed by GOAWAY with code 0
D 2021-11-08T20:49:21.456Z | subchannel | (-1) 142.250.179.234:443 READY -> IDLE
D 2021-11-08T20:49:21.457Z | round_robin | READY -> CONNECTING
D 2021-11-08T20:49:21.457Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> CONNECTING
D 2021-11-08T20:49:21.457Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> CONNECTING
D 2021-11-08T20:49:21.457Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:49:21.457Z | subchannel | (-1) 142.250.179.234:443 IDLE -> CONNECTING
D 2021-11-08T20:49:21.457Z | round_robin | CONNECTING -> CONNECTING
D 2021-11-08T20:49:21.457Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:49:21.457Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:49:21.457Z | subchannel | (-1) 142.250.179.234:443 creating HTTP/2 session
D 2021-11-08T20:49:21.459Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:819::200a:443,142.250.179.234:443]
D 2021-11-08T20:49:21.459Z | subchannel_refcount | (-1) 2a00:1450:4009:819::200a:443 refcount 2 -> 1
D 2021-11-08T20:49:21.459Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 2 -> 1
D 2021-11-08T20:49:21.459Z | round_robin | Connect to address list 2a00:1450:4009:819::200a:443,142.250.179.234:443
D 2021-11-08T20:49:21.459Z | subchannel_refcount | (-1) 2a00:1450:4009:819::200a:443 refcount 1 -> 2
D 2021-11-08T20:49:21.459Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 1 -> 2
D 2021-11-08T20:49:21.459Z | round_robin | CONNECTING -> CONNECTING
D 2021-11-08T20:49:21.459Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:49:21.459Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:49:21.513Z | subchannel | (-1) 142.250.179.234:443 CONNECTING -> READY
D 2021-11-08T20:49:21.513Z | round_robin | CONNECTING -> READY
D 2021-11-08T20:49:21.514Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> READY
D 2021-11-08T20:49:21.514Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> READY
D 2021-11-08T20:50:21.387Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 connection closed with error connect ETIMEDOUT 2a00:1450:4009:819::200a:443
D 2021-11-08T20:50:21.387Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 connection closed
D 2021-11-08T20:50:21.387Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:50:21.388Z | round_robin | READY -> READY
D 2021-11-08T20:50:21.388Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:50:21.388Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:50:21.388Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:50:21.388Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 TRANSIENT_FAILURE -> CONNECTING
D 2021-11-08T20:50:21.388Z | round_robin | READY -> READY
D 2021-11-08T20:50:21.388Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:50:21.388Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:50:21.388Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 creating HTTP/2 session
D 2021-11-08T20:50:21.390Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:823::200a:443,172.217.169.74:443]
D 2021-11-08T20:50:21.390Z | subchannel_refcount | (-1) 2a00:1450:4009:819::200a:443 refcount 2 -> 1
D 2021-11-08T20:50:21.390Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 2 -> 1
D 2021-11-08T20:50:21.390Z | round_robin | Connect to address list 2a00:1450:4009:823::200a:443,172.217.169.74:443
D 2021-11-08T20:50:21.390Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 Subchannel constructed with options {
  "grpc.service_config": "{\"loadBalancingConfig\":[{\"round_robin\":{}}]}",
  "grpc.enable_channelz": 0
}
D 2021-11-08T20:50:21.390Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 0 -> 1
D 2021-11-08T20:50:21.390Z | subchannel | (-1) 172.217.169.74:443 Subchannel constructed with options {
  "grpc.service_config": "{\"loadBalancingConfig\":[{\"round_robin\":{}}]}",
  "grpc.enable_channelz": 0
}
D 2021-11-08T20:50:21.390Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 0 -> 1
D 2021-11-08T20:50:21.390Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 1 -> 2
D 2021-11-08T20:50:21.391Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 IDLE -> CONNECTING
D 2021-11-08T20:50:21.391Z | round_robin | READY -> CONNECTING
D 2021-11-08T20:50:21.391Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> CONNECTING
D 2021-11-08T20:50:21.391Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> CONNECTING
D 2021-11-08T20:50:21.391Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 1 -> 2
D 2021-11-08T20:50:21.391Z | subchannel | (-1) 172.217.169.74:443 IDLE -> CONNECTING
D 2021-11-08T20:50:21.391Z | round_robin | CONNECTING -> CONNECTING
D 2021-11-08T20:50:21.391Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:50:21.391Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:50:21.391Z | round_robin | CONNECTING -> CONNECTING
D 2021-11-08T20:50:21.391Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:50:21.391Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:50:21.391Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 creating HTTP/2 session
D 2021-11-08T20:50:21.394Z | subchannel | (-1) 172.217.169.74:443 creating HTTP/2 session
D 2021-11-08T20:50:21.449Z | subchannel | (-1) 172.217.169.74:443 CONNECTING -> READY
D 2021-11-08T20:50:21.449Z | round_robin | CONNECTING -> READY
D 2021-11-08T20:50:21.449Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> READY
D 2021-11-08T20:50:21.449Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> READY
D 2021-11-08T20:50:26.887Z | subchannel_refcount | (-1) 2a00:1450:4009:819::200a:443 refcount 1 -> 0
D 2021-11-08T20:50:26.887Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:50:26.888Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 1 -> 0
D 2021-11-08T20:50:26.888Z | subchannel | (-1) 142.250.179.234:443 READY -> TRANSIENT_FAILURE
D 2021-11-08T20:50:26.889Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 TRANSIENT_FAILURE -> IDLE
D 2021-11-08T20:50:26.889Z | subchannel | (-1) 142.250.179.234:443 TRANSIENT_FAILURE -> IDLE
D 2021-11-08T20:51:36.394Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed with error connect ETIMEDOUT 2a00:1450:4009:823::200a:443
D 2021-11-08T20:51:36.394Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed
D 2021-11-08T20:51:36.394Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:51:36.395Z | round_robin | READY -> READY
D 2021-11-08T20:51:36.395Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:51:36.395Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:51:36.395Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:51:36.395Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 TRANSIENT_FAILURE -> CONNECTING
D 2021-11-08T20:51:36.395Z | round_robin | READY -> READY
D 2021-11-08T20:51:36.395Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:51:36.395Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:51:36.395Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 creating HTTP/2 session
D 2021-11-08T20:51:36.398Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:823::200a:443,172.217.169.74:443]
D 2021-11-08T20:51:36.398Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 2 -> 1
D 2021-11-08T20:51:36.398Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 2 -> 1
D 2021-11-08T20:51:36.398Z | round_robin | Connect to address list 2a00:1450:4009:823::200a:443,172.217.169.74:443
D 2021-11-08T20:51:36.398Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 1 -> 2
D 2021-11-08T20:51:36.398Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 1 -> 2
D 2021-11-08T20:51:36.398Z | round_robin | READY -> READY
D 2021-11-08T20:51:36.398Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:51:36.398Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:52:51.396Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed with error connect ETIMEDOUT 2a00:1450:4009:823::200a:443
D 2021-11-08T20:52:51.396Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed
D 2021-11-08T20:52:51.396Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:52:51.397Z | round_robin | READY -> READY
D 2021-11-08T20:52:51.397Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:52:51.397Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:52:51.397Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:52:51.397Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 TRANSIENT_FAILURE -> CONNECTING
D 2021-11-08T20:52:51.397Z | round_robin | READY -> READY
D 2021-11-08T20:52:51.397Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:52:51.397Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:52:51.398Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 creating HTTP/2 session
D 2021-11-08T20:52:51.399Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:823::200a:443,172.217.169.74:443]
D 2021-11-08T20:52:51.399Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 2 -> 1
D 2021-11-08T20:52:51.399Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 2 -> 1
D 2021-11-08T20:52:51.399Z | round_robin | Connect to address list 2a00:1450:4009:823::200a:443,172.217.169.74:443
D 2021-11-08T20:52:51.399Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 1 -> 2
D 2021-11-08T20:52:51.399Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 1 -> 2
D 2021-11-08T20:52:51.399Z | round_robin | READY -> READY
D 2021-11-08T20:52:51.399Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:52:51.399Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:54:06.406Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed with error connect ETIMEDOUT 2a00:1450:4009:823::200a:443
D 2021-11-08T20:54:06.407Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed
D 2021-11-08T20:54:06.407Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:54:06.407Z | round_robin | READY -> READY
D 2021-11-08T20:54:06.407Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:54:06.407Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:54:06.407Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:54:06.408Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 TRANSIENT_FAILURE -> CONNECTING
D 2021-11-08T20:54:06.408Z | round_robin | READY -> READY
D 2021-11-08T20:54:06.408Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:54:06.408Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:54:06.408Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 creating HTTP/2 session
D 2021-11-08T20:54:06.410Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:823::200a:443,172.217.169.74:443]
D 2021-11-08T20:54:06.411Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 2 -> 1
D 2021-11-08T20:54:06.411Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 2 -> 1
D 2021-11-08T20:54:06.411Z | round_robin | Connect to address list 2a00:1450:4009:823::200a:443,172.217.169.74:443
D 2021-11-08T20:54:06.411Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 1 -> 2
D 2021-11-08T20:54:06.411Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 1 -> 2
D 2021-11-08T20:54:06.411Z | round_robin | READY -> READY
D 2021-11-08T20:54:06.411Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:54:06.411Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:54:21.474Z | subchannel | (-1) 172.217.169.74:443 connection closed by GOAWAY with code 0
D 2021-11-08T20:54:21.474Z | subchannel | (-1) 172.217.169.74:443 READY -> IDLE
D 2021-11-08T20:54:21.474Z | round_robin | READY -> CONNECTING
D 2021-11-08T20:54:21.474Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> CONNECTING
D 2021-11-08T20:54:21.474Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> CONNECTING
D 2021-11-08T20:54:21.474Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:54:21.474Z | subchannel | (-1) 172.217.169.74:443 IDLE -> CONNECTING
D 2021-11-08T20:54:21.474Z | round_robin | CONNECTING -> CONNECTING
D 2021-11-08T20:54:21.475Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:54:21.475Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:54:21.475Z | subchannel | (-1) 172.217.169.74:443 creating HTTP/2 session
D 2021-11-08T20:54:21.475Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:823::200a:443,172.217.169.74:443]
D 2021-11-08T20:54:21.476Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 2 -> 1
D 2021-11-08T20:54:21.476Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 2 -> 1
D 2021-11-08T20:54:21.476Z | round_robin | Connect to address list 2a00:1450:4009:823::200a:443,172.217.169.74:443
D 2021-11-08T20:54:21.476Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 1 -> 2
D 2021-11-08T20:54:21.476Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 1 -> 2
D 2021-11-08T20:54:21.476Z | round_robin | CONNECTING -> CONNECTING
D 2021-11-08T20:54:21.476Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:54:21.476Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:54:21.528Z | subchannel | (-1) 172.217.169.74:443 CONNECTING -> READY
D 2021-11-08T20:54:21.528Z | round_robin | CONNECTING -> READY
D 2021-11-08T20:54:21.528Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> READY
D 2021-11-08T20:54:21.528Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> READY
D 2021-11-08T20:55:21.410Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed with error connect ETIMEDOUT 2a00:1450:4009:823::200a:443
D 2021-11-08T20:55:21.410Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed
D 2021-11-08T20:55:21.410Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:55:21.410Z | round_robin | READY -> READY
D 2021-11-08T20:55:21.410Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:55:21.410Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:55:21.410Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:55:21.411Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 TRANSIENT_FAILURE -> CONNECTING
D 2021-11-08T20:55:21.411Z | round_robin | READY -> READY
D 2021-11-08T20:55:21.411Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:55:21.411Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:55:21.411Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 creating HTTP/2 session
D 2021-11-08T20:55:21.412Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:823::200a:443,172.217.169.74:443]
D 2021-11-08T20:55:21.412Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 2 -> 1
D 2021-11-08T20:55:21.412Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 2 -> 1
D 2021-11-08T20:55:21.412Z | round_robin | Connect to address list 2a00:1450:4009:823::200a:443,172.217.169.74:443
D 2021-11-08T20:55:21.412Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 1 -> 2
D 2021-11-08T20:55:21.412Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 1 -> 2
D 2021-11-08T20:55:21.412Z | round_robin | READY -> READY
D 2021-11-08T20:55:21.412Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:55:21.412Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:56:36.413Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed with error connect ETIMEDOUT 2a00:1450:4009:823::200a:443
D 2021-11-08T20:56:36.414Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed
D 2021-11-08T20:56:36.414Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:56:36.415Z | round_robin | READY -> READY
D 2021-11-08T20:56:36.415Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:56:36.415Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:56:36.415Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:56:36.415Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 TRANSIENT_FAILURE -> CONNECTING
D 2021-11-08T20:56:36.416Z | round_robin | READY -> READY
D 2021-11-08T20:56:36.416Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:56:36.416Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:56:36.416Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 creating HTTP/2 session
D 2021-11-08T20:56:36.419Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:815::200a:443,142.250.179.234:443]
D 2021-11-08T20:56:36.419Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 2 -> 1
D 2021-11-08T20:56:36.419Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 2 -> 1
D 2021-11-08T20:56:36.419Z | round_robin | Connect to address list 2a00:1450:4009:815::200a:443,142.250.179.234:443
D 2021-11-08T20:56:36.419Z | subchannel | (-1) 2a00:1450:4009:815::200a:443 Subchannel constructed with options {
  "grpc.service_config": "{\"loadBalancingConfig\":[{\"round_robin\":{}}]}",
  "grpc.enable_channelz": 0
}
D 2021-11-08T20:56:36.419Z | subchannel_refcount | (-1) 2a00:1450:4009:815::200a:443 refcount 0 -> 1
D 2021-11-08T20:56:36.419Z | subchannel | (-1) 142.250.179.234:443 Subchannel constructed with options {
  "grpc.service_config": "{\"loadBalancingConfig\":[{\"round_robin\":{}}]}",
  "grpc.enable_channelz": 0
}
D 2021-11-08T20:56:36.419Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 0 -> 1
D 2021-11-08T20:56:36.419Z | subchannel_refcount | (-1) 2a00:1450:4009:815::200a:443 refcount 1 -> 2
D 2021-11-08T20:56:36.419Z | subchannel | (-1) 2a00:1450:4009:815::200a:443 IDLE -> CONNECTING
D 2021-11-08T20:56:36.420Z | round_robin | READY -> CONNECTING
D 2021-11-08T20:56:36.420Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> CONNECTING
D 2021-11-08T20:56:36.420Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> CONNECTING
D 2021-11-08T20:56:36.420Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 1 -> 2
D 2021-11-08T20:56:36.420Z | subchannel | (-1) 142.250.179.234:443 IDLE -> CONNECTING
D 2021-11-08T20:56:36.420Z | round_robin | CONNECTING -> CONNECTING
D 2021-11-08T20:56:36.420Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:56:36.420Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:56:36.420Z | round_robin | CONNECTING -> CONNECTING
D 2021-11-08T20:56:36.420Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:56:36.420Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:56:36.420Z | subchannel | (-1) 2a00:1450:4009:815::200a:443 creating HTTP/2 session
D 2021-11-08T20:56:36.420Z | subchannel | (-1) 142.250.179.234:443 creating HTTP/2 session
D 2021-11-08T20:56:36.478Z | subchannel | (-1) 142.250.179.234:443 CONNECTING -> READY
D 2021-11-08T20:56:36.478Z | round_robin | CONNECTING -> READY
D 2021-11-08T20:56:36.478Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> READY
D 2021-11-08T20:56:36.478Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> READY
D 2021-11-08T20:56:36.929Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 1 -> 0
D 2021-11-08T20:56:36.929Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:56:36.930Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 1 -> 0
D 2021-11-08T20:56:36.930Z | subchannel | (-1) 172.217.169.74:443 READY -> TRANSIENT_FAILURE
D 2021-11-08T20:56:36.930Z | subchannel | (-1) 172.217.169.74:443 TRANSIENT_FAILURE -> IDLE
D 2021-11-08T20:56:46.331Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 TRANSIENT_FAILURE -> IDLE

Is that expected behaviour?

From my side, given that we no longer experience a memory leak and we have the option of disabling channelz tracing, I consider my issue completely resolved - but I'm more than happy to continue to contribute logs or testing if there is more that can be done. Thank you again for your help @murgatroid99

@murgatroid99
Copy link
Member

It looks like the server is telling the client to go away after the connection is idle for 75 seconds. The client behavior when handling that is normal, specifically for the round_robin load balancing policy. round_robin tries to keep all of its connections alive, so it will reconnect every time the server tells it to go away. The default pick_first, on the other hand, will simply let the connection drop until the user makes another request.

However, the total channelz memory usage should be bounded for the duration of that log. There's no mention of any channel construction, so the number of channels is constant. I see 4 instances of subchannels getting constructed, and 4 instances of them getting removed (when the refcount goes to 0), so the total number of subchannels is approximately constant. And each subchannel should register a socket with channelz when it transitions to CONNECTING, and it should unregister it when it transitions to TRANSIENT_FAILURE or IDLE, so the number of registered sockets should be constant as well. And with the changes in 1.4.3, the number of log statements retained for each is strictly limited.

Have you checked whether you still see the memory leak with channelz enabled as of 1.4.4, and if you do, can you share a log with channelz enabled of a timespan in which memory significantly increases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants