Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client leaks memory - repro provided with single getInstances call. #1306

Closed
tomdee opened this issue Feb 10, 2021 · 27 comments
Closed

Client leaks memory - repro provided with single getInstances call. #1306

tomdee opened this issue Feb 10, 2021 · 27 comments
Assignees
Labels
api: spanner Issues related to the googleapis/nodejs-spanner API. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@tomdee
Copy link

tomdee commented Feb 10, 2021

I have a specific use case where I need to be able to create multiple Spanner clients with different credentials inside a single long running process. This process is currently leaking memory and crashing.

Environment details

  • OS: Ubuntu 20.04
  • Node.js version: v14.15.1
  • npm version: 6.14.8
  • @google-cloud/spanner version: v5.4.0

Steps to reproduce

Run this sample code (after filling in some credentials)

index.js

import { Spanner } from "@google-cloud/spanner";
import { HeapDiff } from "@airbnb/node-memwatch";

// Call the spanner fetch 10 times, outputting memeory after each one
const projectId = "FILL_IN";
const credentials = {  client_email: `FILL_IN`,  private_key: `FILL IN`};

const REPETITIONS = 10;
console.log(`Running ${REPETITIONS} times`);
for (var i = 0; i < REPETITIONS; i++) {
  await getSpannerInstances();
}

async function getSpannerInstances() {
  var hd = new HeapDiff();
  const spanner = new Spanner({
    projectId,
    credentials,
  });
  await spanner.getInstances();
  var diff = hd.end();
  console.dir(diff.change.size_bytes);
}

package.json

{
  "type": "module",
  "scripts": {
    "start": "node index.js"
  },
  "dependencies": {
    "@airbnb/node-memwatch": "^2.0.0",
    "@google-cloud/spanner": "^5.4.0"
  }
}

The output indicates that it's leaking around 700-800KB of RAM on each loop through. There's no global state in this simple repro which is indicating that there's something wrong in @google-cloud/spanner

Running 10 times
2601376
914860
899300
729308
779676
797884
764027
790885
729727
759473

Thanks for creating this library and let me know if there are any more diagnostics that I can provide.

@product-auto-label product-auto-label bot added the api: spanner Issues related to the googleapis/nodejs-spanner API. label Feb 10, 2021
@skuruppu skuruppu added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Feb 11, 2021
@skuruppu
Copy link
Contributor

Thanks for creating the issue @tomdee. Assigning to @olavloite to take a look.

@olavloite
Copy link
Contributor

@tomdee Thanks for the detailed report!

It seems that this is not directly caused by the client library itself, but by the generated client that is used to communicate with Spanner (the gapic client). I've been running some different tests, and have slightly changed your reproduction case to the following:

const REPETITIONS = 10;
console.log(`Running ${REPETITIONS} times`);
for (let i = 0; i < REPETITIONS; i++) {
  await useGapic();
  global.gc();
  const used = process.memoryUsage().heapUsed / 1024 / 1024;
  console.log(`Current mem usage ${Math.round(used * 100) / 100} MB`);
}

async function useGapic() {
  const client = new InstanceAdminClient({
    projectId: 'my-project',
    keyFile: '/path/to/my-key.json',
  });
  await client.listInstanceConfigs({
    parent: 'projects/my-project',
  });
  await client.close();
}

The result of the above is similar to your observation:

Running 10 times
Current mem usage 26.63 MB
Current mem usage 26.9 MB
Current mem usage 27.08 MB
Current mem usage 27.28 MB
Current mem usage 27.44 MB
Current mem usage 27.82 MB
Current mem usage 27.77 MB
Current mem usage 27.94 MB
Current mem usage 28.07 MB
Current mem usage 28.23 MB

And running it for a lot more than 10 times shows the same pattern of ever increasing memory usage until it goes out-of-memory.

@tomdee
Copy link
Author

tomdee commented Feb 11, 2021

@olavloite This is great, thanks for adding these additional insights.

Is there another project that I should raise a ticket on or are you following up on this?

@olavloite
Copy link
Contributor

@olavloite This is great, thanks for adding these additional insights.

Is there another project that I should raise a ticket on or are you following up on this?

I'm currently investigating the problem to see if I can find the root cause. If needed, then I'll file an issue with the corresponding project and link it here.

@yoshi-automation yoshi-automation added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Feb 16, 2021
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Feb 18, 2021
@skuruppu
Copy link
Contributor

To appease the bots, this issue is being investigated in googleapis/gapic-generator-typescript#823.

@skuruppu
Copy link
Contributor

skuruppu commented May 5, 2021

@olavloite it seems that there may be a fix based on googleapis/gapic-generator-typescript#823 (comment)? Would you be able to try updating google-gax to that version?

@olavloite
Copy link
Contributor

@skuruppu

I've tried it out and added my findings to googleapis/gapic-generator-typescript#823. The TLDR version:

  1. The update to v2.11.0 or later for google-gax helps, but does not solve the entire problem.
  2. Adding a close() method to the operations client as suggested by @Utsav2 also further reduces the amount of memory that seems to be leaded for each iteration.
  3. Even after applying the above two fixes, there still seems to be a small amount of memory being leaked for each iteration of opening and closing a client.

@skuruppu
Copy link
Contributor

skuruppu commented May 5, 2021

@olavloite, that's really great. I'm glad you were able to narrow it down further.

olavloite added a commit to olavloite/nodejs-spanner that referenced this issue May 5, 2021
Version 2.11.0 and later of google-gax contains a fix that should reduce
memory consumption somewhat when opening and closing clients multiple times.

Updates googleapis#1306
gcf-merge-on-green bot pushed a commit that referenced this issue May 5, 2021
Version 2.11.0 and later of google-gax contains a fix that should reduce memory consumption somewhat when opening and closing clients multiple times.

Updates #1306
AVaksman pushed a commit to AVaksman/nodejs-spanner that referenced this issue Jun 29, 2021
Version 2.11.0 and later of google-gax contains a fix that should reduce memory consumption somewhat when opening and closing clients multiple times.

Updates googleapis#1306
@skuruppu skuruppu removed their assignment Jul 2, 2021
@skuruppu
Copy link
Contributor

skuruppu commented Jul 2, 2021

@olavloite is there any other action we can take on this issue?

olavloite added a commit that referenced this issue Jul 2, 2021
Adds a close method to the Spanner client that will (try to) clean up the resources
it has in use.

Towards #1306
skuruppu pushed a commit that referenced this issue Jul 7, 2021
Adds a close method to the Spanner client that will (try to) clean up the resources
it has in use.

Towards #1306
olavloite added a commit that referenced this issue Jul 8, 2021
feat: implement request and transaction tagging

fix: resolve conflicts

build: add generated-files bot config (#1352)

This PR was generated using Autosynth. 🌈

Synth log will be available here:
https://source.cloud.google.com/results/invocations/a8c8456e-ad55-437d-a21c-2a0857049024/targets

- [ ] To automatically regenerate this PR, check this box. (May take up to 24 hours.)

Source-Link: googleapis/synthtool@e6f3d54
Source-Link: googleapis/synthtool@04573fd

chore: add SECURITY.md (#1351)

chore: add SECURITY.md

chore: re-generated to pick up changes from self (#1354)

This PR was generated using Autosynth. 🌈

Synth log will be available here:
https://source.cloud.google.com/results/invocations/af2f31a3-fd48-4502-8b32-6f442ac8825e/targets

- [ ] To automatically regenerate this PR, check this box. (May take up to 24 hours.)

test: enable numeric integration tests on emulator (#1358)

The emulator now supports the `NUMERIC` data type.

deps: bump google-gax to latest (#1359)

Version 2.11.0 and later of google-gax contains a fix that should reduce memory consumption somewhat when opening and closing clients multiple times.

Updates #1306

samples: added comments for timeout settings (#1355)

fix: lint issue (#1372)

chore(deps): update dependency @types/node to v14 (#1374)

[![WhiteSource Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [@types/node](https://togithub.com/DefinitelyTyped/DefinitelyTyped) | [`^10.3.0` -> `^14.0.0`](https://renovatebot.com/diffs/npm/@types%2fnode/10.17.60/14.17.0) | [![age](https://badges.renovateapi.com/packages/npm/@types%2fnode/14.17.0/age-slim)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://badges.renovateapi.com/packages/npm/@types%2fnode/14.17.0/adoption-slim)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://badges.renovateapi.com/packages/npm/@types%2fnode/14.17.0/compatibility-slim/10.17.60)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://badges.renovateapi.com/packages/npm/@types%2fnode/14.17.0/confidence-slim/10.17.60)](https://docs.renovatebot.com/merge-confidence/) |

---

📅 **Schedule**: "after 9am and before 3pm" (UTC).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻️ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box.

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#github/googleapis/nodejs-spanner).

fix: ensure table funcs accept gaxOptions directly (#1371)

chore(deps): update dependency sinon to v11 (#1376)

[![WhiteSource Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [sinon](https://sinonjs.org/) ([source](https://togithub.com/sinonjs/sinon)) | [`^10.0.0` -> `^11.0.0`](https://renovatebot.com/diffs/npm/sinon/10.0.0/11.1.0) | [![age](https://badges.renovateapi.com/packages/npm/sinon/11.1.0/age-slim)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://badges.renovateapi.com/packages/npm/sinon/11.1.0/adoption-slim)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://badges.renovateapi.com/packages/npm/sinon/11.1.0/compatibility-slim/10.0.0)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://badges.renovateapi.com/packages/npm/sinon/11.1.0/confidence-slim/10.0.0)](https://docs.renovatebot.com/merge-confidence/) |

---

<details>
<summary>sinonjs/sinon</summary>

[Compare Source](https://togithub.com/sinonjs/sinon/compare/v11.0.0...31be9a5d5a4762ef01cb195f29024616dfee9ce8)

\==================

-   Add sinon.promise() implementation ([#&#8203;2369](https://togithub.com/sinonjs/sinon/issues/2369))
-   Set wrappedMethod on getters/setters ([#&#8203;2378](https://togithub.com/sinonjs/sinon/issues/2378))
-   \[Docs] Update fake-server usage & descriptions ([#&#8203;2365](https://togithub.com/sinonjs/sinon/issues/2365))
-   Fake docs improvement ([#&#8203;2360](https://togithub.com/sinonjs/sinon/issues/2360))
-   Update nise to 5.1.0 (fixed [#&#8203;2318](https://togithub.com/sinonjs/sinon/issues/2318))

[Compare Source](https://togithub.com/sinonjs/sinon/compare/v10.0.1...v11.0.0)

\==================

-   Explicitly use samsam 6.0.2 with fix for [#&#8203;2345](https://togithub.com/sinonjs/sinon/issues/2345)
-   Update most packages ([#&#8203;2371](https://togithub.com/sinonjs/sinon/issues/2371))
-   Update compatibility docs ([#&#8203;2366](https://togithub.com/sinonjs/sinon/issues/2366))
-   Update packages (includes breaking fake-timers change, see [#&#8203;2352](https://togithub.com/sinonjs/sinon/issues/2352))
-   Warn of potential memory leaks ([#&#8203;2357](https://togithub.com/sinonjs/sinon/issues/2357))
-   Fix clock test errors

[Compare Source](https://togithub.com/sinonjs/sinon/compare/v10.0.0...v10.0.1)

\==================

-   Upgrade sinon components (bumps y18n to 4.0.1)
-   Bump y18n from 4.0.0 to 4.0.1

</details>

---

📅 **Schedule**: "after 9am and before 3pm" (UTC).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻️ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box.

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#github/googleapis/nodejs-spanner).

docs: align timeout values with documentation (#1379)

The timeout values in the sample do not correspond with the official
documentation surrounding the sample:
https://cloud.google.com/spanner/docs/custom-timeout-and-retry#node

chore: migrate to owl bot (#1380)

* chore: migrate to owl bot

* chore: copy files from googleapis-gen 46f25fb1121747b994ff5818963fda84b5e6bfd3

* chore: run the post processor

* 🦉 Updates from OwlBot

* chore: update post processor docker image

* 🦉 Updates from OwlBot

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>

docs: add missing JSDoc for table API  (#1382)

feat: support setting `optimizerStatisticsPackage` (#1225)

`optimizerStatisticsPackage` can be set in `QueryOptions` when running Cloud Spanner queries.

Can be configured through the following mechanisms:
1. Through the `SPANNER_OPTIMIZER_STATISTICS_PACKAGE` environment variable.
1. At `Database` level using `spanner.instance('instance-name').database('database-name', sessionPoolOptions, queryOptions)`.
1. At query level using `ExecuteSqlRequest.queryOptions`.

If the options are configured through multiple mechanisms then:
1. Options set at an environment variable level will override options configured at the `Database` level.
1. Options set at a query-level will override options set at either the `Database` or environment variable level.

If no options are set, the optimizer statistics package will default to:
1. The optimizer statistics package the database is pinned to in the backend.
1. If the database is not pinned to a specific statistics package, then the Cloud Spanner backend will use the "latest" version.

docs: improve doc for gaxoptions (#1384)

chore: Report warning on `.github/workflows/ci.yaml` (#1110) (#1385)

* fix: Report warning on `.github/workflows/ci.yaml`

Not all files in `.github/workflows` are managed, only `ci.yaml`.

Related false-positive: googleapis/repo-automation-bots#1952 (comment)

* fix: Report warning on `.github/workflows/ci.yaml`

Not all files in `.github/workflows` are managed, only `ci.yaml`.
Source-Link: googleapis/synthtool@2430f8d
Post-Processor: gcr.io/repo-automation-bots/owlbot-nodejs:latest@sha256:14aaee566d6fc07716bb92da416195156e47a4777e7d1cd2bb3e28c46fe30fe2

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>

chore: release 5.8.0 (#1383)

Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>

fix: unknown errors should not be retried (#1388)

Unknown errors that are returned during a transaction should not cause the entire
transaction to be automatically retried, as there is no absolute guarantee that the
initial attempt did not succeed. A `Transaction Outcome Unknown` (or other Unknown)
error could be returned by the `Commit` RPC, and in that case there is a possibility
that the transaction was actually committed.

Fixes #1387

chore(nodejs): use cloud-rad publication process (#1112) (#1391)

VERSION is used in @google-cloud/cloud-rad to publish ref docs for
a particular version. Pass VERSION in via Stubby or Fusion.
Source-Link: googleapis/synthtool@740366b
Post-Processor: gcr.io/repo-automation-bots/owlbot-nodejs:latest@sha256:bbdd52de226c00df3356cdf25460397b429ab49552becca645adbc412f6a4ed5

build: add auto-approve to Node libraries (#1100) (#1393)

* build: add auto-approve to Node libraries

Co-authored-by: Benjamin E. Coe <bencoe@google.com>
Source-Link: googleapis/synthtool@5cae043
Post-Processor: gcr.io/repo-automation-bots/owlbot-nodejs:latest@sha256:65aa68f2242c172345d7c1e780bced839bfdc344955d6aa460aa63b4481d93e5

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Benjamin E. Coe <bencoe@google.com>

chore: release 5.8.1 (#1389)

:robot: I have created a release \*beep\* \*boop\*
---

* unknown errors should not be retried ([#1388](https://www.github.com/googleapis/nodejs-spanner/issues/1388)) ([1d6f4e2](https://www.github.com/googleapis/nodejs-spanner/commit/1d6f4e2923bc1ac20c0a73c342332ec2ae259812)), closes [#1387](https://www.github.com/googleapis/nodejs-spanner/issues/1387)
---

This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

fix: lint

chore(nodejs): remove api-extractor dependencies (#1395)

build: remove errant comma (#1113) (#1396)

Source-Link: googleapis/synthtool@41ccd8c
Post-Processor: gcr.io/repo-automation-bots/owlbot-nodejs:latest@sha256:c9c7828c165b1985579098978877935ee52dda2b1b538087514fd24fa2443e7a

feat(spanner): add processing_units to Instance resource (#1398)

PiperOrigin-RevId: 378758342

Source-Link: googleapis/googleapis@d869871

Source-Link: googleapis/googleapis-gen@54cfa76

chore: release 5.9.0 (#1399)

:robot: I have created a release \*beep\* \*boop\*
---

* **spanner:** add processing_units to Instance resource ([#1398](https://www.github.com/googleapis/nodejs-spanner/issues/1398)) ([878cd3f](https://www.github.com/googleapis/nodejs-spanner/commit/878cd3f1596526b6e4e2457babd3dc2c2add11ad))
---

This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

fix: make request optional in all cases (#1400)

... chore: update gapic-generator-ruby to the latest commit chore: release gapic-generator-typescript 1.5.0

Committer: @miraleung
PiperOrigin-RevId: 380641501

Source-Link: googleapis/googleapis@076f7e9

Source-Link: googleapis/googleapis-gen@27e4c88

chore(codegen): add JSON type (#1403)

* feat(spanner): add JSON type

PiperOrigin-RevId: 381156241

Source-Link: googleapis/googleapis@fb5c4fb

Source-Link: googleapis/googleapis-gen@b99c170

* 🦉 Updates from OwlBot

See https://github.com/googleapis/repo-automation-bots/blob/master/packages/owl-bot/README.md

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>

chore: release 5.9.1 (#1401)

:robot: I have created a release \*beep\* \*boop\*
---

* make request optional in all cases ([#1400](https://www.github.com/googleapis/nodejs-spanner/issues/1400)) ([0b78770](https://www.github.com/googleapis/nodejs-spanner/commit/0b78770bfef6f463abb0f336999f7dfd61b5b2fe))
---

This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

build(node): don't throw on deprecation in unit tests (#1405)

Fixes #1134 🦕

Removes the commit body and relative PR# from the commit message.

For example, for this commit: googleapis/synthtool@9763f20

`post-processor-changes.txt` would contain

```
build: enable npm for php/python builds

Source-Link: googleapis/synthtool@9763f20
```

instead of

```
build: enable npm for php/python builds (#1133)

* build: enable npm for php/python builds

* update comment
Source-Link: googleapis/synthtool@9763f20
```
Source-Link: googleapis/synthtool@e934b93
Post-Processor: gcr.io/repo-automation-bots/owlbot-nodejs:latest@sha256:805e2e389eafefa5ed484c30b83a7a875e6b1c7ee125d812e8b01ecc531c3fac

fix: typos

Revert "feat: generate protos"

This reverts commit 0400361.

test: add mock server test for tagging

fix: reset buffered chunked value before retry (#1397)

* fix: reset buffered chunked value before retry

Retrying a stream of PartialResultSets must correctly reset to the last state that it
had when it received a PartialResultSet with a resume token. That means:
1. The pending value that should be merged with the next result should be reset to the
   pending value of the last PartialResultSet with a resume token.
2. The stream must correctly check for the existence of a resume token.
3. The stream must ensure that all values that are already in the pipeline must be
   handled before resetting the stream to the last resume token to ensure that no
   PartialResultSet is added twice to the stream.

Fixes #1392

Co-authored-by: Alex <7764119+AVaksman@users.noreply.github.com>

chore: release 5.9.2 (#1406)

:robot: I have created a release \*beep\* \*boop\*
---

* reset buffered chunked value before retry ([#1397](https://www.github.com/googleapis/nodejs-spanner/issues/1397)) ([da2ca7b](https://www.github.com/googleapis/nodejs-spanner/commit/da2ca7b15539119fada7869c206ad24460d8edfa)), closes [#1392](https://www.github.com/googleapis/nodejs-spanner/issues/1392)
---

This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

docs: fix the broken link and improve jsdocs (#1402)

Fixes #1378

build(node): do not throw on deprecation (#1140) (#1408)

Refs googleapis/nodejs-service-usage#22
Source-Link: googleapis/synthtool@6d26b13
Post-Processor: gcr.io/repo-automation-bots/owlbot-nodejs:latest@sha256:e59b73e911585903ee6b8a1c5246e93d9e9463420f597b6eb2e4b616ee8a0fee

fix(deps): require google-gax v2.17.0 (#1409)

chore: release 5.9.3 (#1410)

:robot: I have created a release \*beep\* \*boop\*
---

* **deps:** require google-gax v2.17.0 ([#1409](https://www.github.com/googleapis/nodejs-spanner/issues/1409)) ([080d82f](https://www.github.com/googleapis/nodejs-spanner/commit/080d82f455324d2010187904532032e7905e14ac))
---

This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

feat: create instances with processing units (#1279)

Only one of processing units or nodes should be specified.

samples: indexing samples conform to rubric (#1347)

This PR Contains Indexing Samples that covers the new Canonical format.
It does not remove the old samples.
Towards #1295

fix: replace projectId placeholder in formatted names (#1407)

The `{{projectId}}` placeholder was only replaced in the names of the objects that were included in the requests, and not in the headers and formatted names. This would cause the `CLOUD_RESOURCE_HEADER` to be filled with the `{{projectId}}` placeholder instead of the actual value.

As the actual project ID is gotten through an async method, and the constructor and `instance` methods are synchronous, we can only replace the actual values once it is retrieved for the first request.

Fixes #1375

chore: release 5.10.0 (#1411)

:robot: I have created a release \*beep\* \*boop\*
---

* create instances with processing units ([#1279](https://www.github.com/googleapis/nodejs-spanner/issues/1279)) ([05c2135](https://www.github.com/googleapis/nodejs-spanner/commit/05c213522a32627186ad9b474b416c1b9996df1f))

* replace projectId placeholder in formatted names ([#1407](https://www.github.com/googleapis/nodejs-spanner/issues/1407)) ([4364d2b](https://www.github.com/googleapis/nodejs-spanner/commit/4364d2b25638384a6a1bea2e283b1219b4e5cdf3)), closes [#1375](https://www.github.com/googleapis/nodejs-spanner/issues/1375)
---

This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

build: auto-approve renovate-bot PRs for minor updates (#1145) (#1413)

Source-Link: googleapis/synthtool@39652e3
Post-Processor: gcr.io/repo-automation-bots/owlbot-nodejs:latest@sha256:41d5457ff79c3945782ab7e23bf4d617fd7bf3f2b03b6d84808010f7d2e10ca2

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>

feat(spanner): add leader_options to InstanceConfig and default_leader to Database (#1414)

* feat(spanner): add leader_options to InstanceConfig and default_leader to Database

PiperOrigin-RevId: 382409094

Source-Link: googleapis/googleapis@2c6e4dd

Source-Link: googleapis/googleapis-gen@ba248c5

* 🦉 Updates from OwlBot

See https://github.com/googleapis/repo-automation-bots/blob/master/packages/owl-bot/README.md

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>

chore: release 5.11.0 (#1415)

:robot: I have created a release \*beep\* \*boop\*
---

* **spanner:** add leader_options to InstanceConfig and default_leader to Database ([#1414](https://www.github.com/googleapis/nodejs-spanner/issues/1414)) ([e67adc2](https://www.github.com/googleapis/nodejs-spanner/commit/e67adc281d603d741af49d957eff05fd4184d38e))
---

This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

fix: add close method to Spanner client (#1416)

Adds a close method to the Spanner client that will (try to) clean up the resources
it has in use.

Towards #1306

chore: release 5.11.1 (#1417)

:robot: I have created a release \*beep\* \*boop\*
---

* add close method to Spanner client ([#1416](https://www.github.com/googleapis/nodejs-spanner/issues/1416)) ([69cd0b4](https://www.github.com/googleapis/nodejs-spanner/commit/69cd0b474ab6c836724813fd8bea88ec2e1ac9f5)), closes [#1306](https://www.github.com/googleapis/nodejs-spanner/issues/1306)
---

This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

fix: set transaction tag after begin

fix: include transactionTag for single commits
@pascaldelange
Copy link

Hi,

I'm not sure if it's related to this issue (or if it deserves its own), but I have been observing a memory leak with the Spanner client, in particular when working together with the debug agent but also to a lesser extent by itself.

The details are described in this issue about a memory leak with the debug agent.

@olavloite
Copy link
Contributor

Hi @pascaldelange Thanks for the link with the other issue. I've read through the thread and also your example application, and I don't think these two issues are directly related, as the issue in this thread is related to a memory leak in the (generated) Spanner client when a client is opened and closed multiple times during the lifetime of the application. As far as I can tell from your sample application, you only open a single client and keep that one open during the entire lifetime of the application.

It is however interesting to read that you experience a 10MB/hour increase when you run the application without the debug agent. How exactly are you measuring the increase? As in; do you force a garbage collect before measuring, or could it be that the increase is just caused by garbage that has not yet been cleaned up? Will the application eventually run out of memory if you run it long enough in a memory-limited environment?

@pascaldelange
Copy link

Hi @olavloite,
I was measuring it by looking at the memory usage stats of the GAE monitoring. The number 10Mb/hour is based in a test instance (based on the code shared) where only the liveness calls are made, it might be different with real traffic (in production we only observed it with the debug agent on).
In the test case I have not waited until the application runs out of memory (which at this rate might have taken a while), but in prod with debug agent on (same symptoms but faster increase) the instance would run out of memory every few hours and restart, causing transient errors on the service in the event.

@olavloite
Copy link
Contributor

@pascaldelange Thanks for the additional information. My questions are mainly aimed at determining whether there actually is a memory leak when you are only executing the liveness calls without the debug agent. The memory leak when using the debug agent seems quite clear, and also the (small) memory leak when opening and closing a Spanner client multiple times is well documented.
I'm not sure about the case where only a SELECT 1 query is executed. An increase in memory usage over a limited time, without the application actually running out of memory, might be just an indication of normal usage:

  1. Using the Spanner client will automatically initialize a session pool. That session pool will consume some memory.
  2. Executing queries etc. will always require some memory. Not all of that memory will be reclaimed directly by the garbage collector, so a temporary increase in memory consumption is not necessarily an indication of a memory leak.

Would it be possible for you to run the same test script on your environment (the one without the Debug Agent), but with an explicit garbage collect after each query? See this Stack Overflow post for a good explanation on how to run the garbage collector from code.

@pascaldelange
Copy link

Hi,
I can try that (it will have to wait for a couple of weeks due to vacations though).
I'm not sure if that's sufficient to indicate a memory leak, but at least the linear increase in memory usage was only visible after changing to node 12+ (not present with node 10).
I'll keep you posted.

@olavloite
Copy link
Contributor

With the latest changes in the generated clients the memory leak when opening and closing a new client repeatedly has been significantly reduced, but unfortunately not completely eliminated. Running a simple script that opens a simple generated client, executes one request with it and then closes the client will now leak approx 2MB after 100 iterations. That is approx 1/10th of the original problem.

Test script

const {Spanner, v1} = require('@google-cloud/spanner');

const REPETITIONS = 100;
console.log(`Running ${REPETITIONS} times`);

function main() {
    async function test() {
        for (let i = 0; i < REPETITIONS; i++) {
            await useGapic();
            global.gc();
            const used = process.memoryUsage().heapUsed / 1024 / 1024;
            console.log(`${i}: Current mem usage ${Math.round(used * 100) / 100} MB`);
        }
    }

    async function useGapic() {
        const client = new v1.InstanceAdminClient({
            projectId: 'my-project',
            keyFile: '/path/to/my-key.json',
        });
        await client.listInstanceConfigs({
            parent: 'projects/my-project',
        });
        await client.close();
    }
    test();
}

process.on('unhandledRejection', err => {
    console.error(err.message);
    process.exitCode = 1;
});
main();

Results for 100 iterations

Running 100 times
0: Current mem usage 17.28 MB
1: Current mem usage 17.34 MB
2: Current mem usage 17.43 MB
3: Current mem usage 17.53 MB
4: Current mem usage 17.57 MB
5: Current mem usage 17.6 MB
6: Current mem usage 17.63 MB
7: Current mem usage 17.66 MB
8: Current mem usage 17.69 MB
9: Current mem usage 17.71 MB
...
90: Current mem usage 19.01 MB
91: Current mem usage 19.03 MB
92: Current mem usage 19.04 MB
93: Current mem usage 19.06 MB
94: Current mem usage 19.07 MB
95: Current mem usage 19.08 MB
96: Current mem usage 19.1 MB
97: Current mem usage 19.1 MB
98: Current mem usage 19.11 MB
99: Current mem usage 19.13 MB

I've been trying out a number of different things, but I've not been able to exactly pinpoint the problem, although the following is clear:

  1. Opening and closing a generated gapic client without executing any requests will not leak any memory.
  2. Opening a gapic client, executing one single simple request and then closing the client will leak a bit of memory. It does not seem to make any real difference which request is being executed.

@alicejli If you have any additional ideas where the last bit might be coming from, then I'm all ears.

@skuruppu skuruppu added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. 🚨 This issue needs some love. labels Sep 8, 2021
@skuruppu
Copy link
Contributor

skuruppu commented Sep 8, 2021

Downgrading to P2 since most of the memory leak is now fixed but there's still some room for improvement.

@alicejli
Copy link

Hi @olavloite,

Sorry I missed this! I have a question out to the grpc team to see if they're able to pinpoint what's going on, as I am able to replicate your findings and the memory leakage seems to be consistent across all methods (so it doesn't seem to be a client library issue). Will update here with any findings.

Thanks,

Alice

@alicejli
Copy link

It looks like grpc is likely not the cause of this memory leak after some discussion and testing with the grpc team (when running a script with a "pure" grpc call a million times, we found that the memory was regularly being garbage collected and was more fluctuating than growing).

The other potential lead on this memory leak issue is it seems there may be an issue with the google-cloud/common package (here: googleapis/nodejs-common#664).

Will comment back here with any updates on findings there.

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Nov 3, 2021
@henrybell
Copy link

I've been able to reproduce a similar issue with the following code snippet, heavily "inspired" by that of @olavloite above, and which is intended to roughly emulate what my application is doing:

/*
 * Run with:
 * $ node --expose-gc index.js
 */

'use strict';

const {Spanner} = require('@google-cloud/spanner');

const REPETITIONS = 3600;

function main() {

  async function spannerClient() {
    
    const projectId = 'my-project-id'
    const instanceId = 'autoscale-standalone'
    const databaseName = 'test-database'
    const tableName = 'spannerAutoscaler'

    /*
      Create DB with:

      CREATE TABLE spannerAutoscaler (
        id STRING(MAX),
        lastScalingTimestamp TIMESTAMP,
        createdOn TIMESTAMP,
        updatedOn TIMESTAMP,
      ) PRIMARY KEY (id)

     */

    var row = {
      id: 'testDbEntry',
      lastScalingTimestamp: Spanner.timestamp(Date.now()),
      createdOn: Spanner.timestamp(Date.now()),
      updatedOn: Spanner.timestamp(Date.now()),
    };
    
    const client = new Spanner({projectId: projectId});
    const database = client.instance(instanceId).database(databaseName);
    const table = database.table(tableName);
    await table.upsert(row);
    await database.close();
  }

  async function testSpannerClient() {
    for (let i = 0; i < REPETITIONS; i++) {
      await spannerClient();
      global.gc();
      const used = process.memoryUsage().heapUsed / 1024 / 1024;
      console.log(`${i}: Current mem usage ${Math.round(used * 100) / 100} MB`);
    }
  }

  try {
    testSpannerClient();
  } catch (err) {
    console.error(err.message);
  }
}

main();

In a similar way I see increasing resource usage over time, though the pace does seem to slow somewhat:

0: Current mem usage 24.85 MB
1: Current mem usage 24.99 MB
2: Current mem usage 25.11 MB
3: Current mem usage 25.18 MB
4: Current mem usage 25.29 MB
5: Current mem usage 25.39 MB
.....
595: Current mem usage 40.8 MB
596: Current mem usage 40.82 MB
597: Current mem usage 40.84 MB
598: Current mem usage 40.88 MB
599: Current mem usage 40.88 MB
600: Current mem usage 40.89 MB

Any insight or assistance you might be able to provide would be most helpful, and my apologies if I am missing something.

@surbhigarg92
Copy link
Contributor

Hi @henrybell Thanks for the detailed information. I am able to reproduce this issue and I am working on it.

Please hang on as we are analysing this memory leak. I will update as soon as I have some findings.

@surbhigarg92
Copy link
Contributor

Hi @henrybell Our findings indicate that leak is coming from autogenerated code and there was an old bug in the gapic-generator which needs to be opened.

I will follow up with the gapic team to reopen and accept the issue.

@henrybell
Copy link

Thanks @surbhigarg92 for the update and for the work so far! I subscribed to the bug you linked to follow along.

@surbhigarg92
Copy link
Contributor

@henrybell You can subscribe to the new bug, I have created a new bug for the same issue

@henrybell
Copy link

Thanks @surbhigarg92 -- subscribed.

@surbhigarg92
Copy link
Contributor

surbhigarg92 commented Mar 28, 2023

@henrybell New version of grpc-js has been published which includes a fix for memory leak.

As its a minor release version upgrade, there is no changes required in NodeJs-spanner client library. Delete the node_modules folder and package.lock.json file and do npm install .

There is a minor change required in the code snippet provided by you. We need to close the client.

Add client.close() after await database.close() .

Can you please try above steps and let me know if the issue is resolved ?

@henrybell
Copy link

Here's a chart showing the results -- as you can see it's a dramatic improvement. This is the same code (with the addition of the client.close() call), same version of the NodeJS Spanner client, but with grpc-js versions 1.8.12 (red) vs 1.8.13 (green).

spanner-client

There is still a very slight upward trend over time, but this is small as you can see from the scale on the y axis (this chart shows only 1.8.13 for clarity):

spanner2

I will incorporate the fix into the Spanner Autoscaler and run some soak tests to further validate. Thanks!

@surbhigarg92
Copy link
Contributor

@henrybell Since most of the memory leak issue is resolved. I am closing this issue.
Thanks a lot for reaching out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/nodejs-spanner API. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

8 participants