New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes ingress memory leak #1694

Closed
sgrankin opened this Issue Nov 7, 2017 · 8 comments

Comments

Projects
None yet
4 participants
@sgrankin
Copy link
Contributor

sgrankin commented Nov 7, 2017

Issue Type: Bug report

What happened:
Slow memory leak while using linkerd as a kubernetes ingress.

What you expected to happen:
No memory leak :)

How to reproduce it (as minimally and precisely as possible):
Appears to consistently reproduce in our configuration:

  • Linkerd 1.3.1 docker image (original & modified with updated java8 jdk)
  • Configuration as in https://gist.github.com/sgrankin/360471146e7c273fecce48e859d498e5 and with several 'default' and non-'default' namespace ingress configurations inside kubernetes.
  • Not sure if there's a particular trigger since the start of the leak appears slow.
  • Typical heap size (note the Y axis is logarithmic):

screen shot 2017-11-07 at 17 03 05

Anything else we need to know?:

  • I have heap dumps of a few hours after process start and then 15 hours later that I can provide (directly—I don't think there's anything confidential in there but would prefer to avoid posting publicly).
  • Via the heap dumps, I believe a (the?) potential leak may be here:
    // TODO: this would be much less ugly if the type of the scan was [W, Boolean]
    I think the closure is holding a reference to this, which is holding a reference to as: AsyncStream[T], keeping the full stream in memory. (Not sure why the growth is non-linear though...)

Environment:

  • linkerd/namerd version, config files:
    1.3.1, config attached above.
  • Platform, version, and config files (Kubernetes, DC/OS, etc):
    v1.7.6+coreos.0
  • Cloud provider or hardware configuration:
    AWS
@hawkw

This comment has been minimized.

Copy link
Member

hawkw commented Nov 7, 2017

Hi @sgrankin, thanks for reporting! I think the theory you've proposed seems quite possible, but would love to take a look at your heap dumps to confirm.

@sgrankin

This comment has been minimized.

Copy link
Contributor

sgrankin commented Nov 8, 2017

@hawkw emailed to you @buoyant.io. Thanks!

sgrankin added a commit to sgrankin/linkerd that referenced this issue Nov 20, 2017

@sgrankin

This comment has been minimized.

Copy link
Contributor

sgrankin commented Nov 20, 2017

@hawkw I ended up trying out some fixes; the AsyncStream seems to have been the leaked object, but the leak I theorized was not the (only?) one. I couldn't track down how it was leaking though, so I hacked up a non-AsyncStream based variant of the watch code. It's been running for 2 days on our dev deployment with steady heap usage and no stale endpoint issues (so far).

The change is here: sgrankin@a454fa0
If this looks like a good approach to you, I can clean it up and submit a PR.

@hawkw

This comment has been minimized.

Copy link
Member

hawkw commented Nov 20, 2017

Hi @sgrankin, once again, thanks for your help. We'll take a look at your changes and get back to you on that as soon as possible!

@wmorgan

This comment has been minimized.

Copy link
Member

wmorgan commented Nov 20, 2017

@sgrankin yes, thank you for digging into this!

@adleong adleong added this to Bug in Linkerd Roadmap Nov 21, 2017

sgrankin added a commit to sgrankin/linkerd that referenced this issue Nov 25, 2017

Fix AsyncStream leak in kubernetes watch (linkerd#1694)
k8s watch returned an AsyncStream, the head of which was leaking (as
determined via heap dumps), leading to significant leaks over 24-48hrs.

Since the AsyncStream was not strictly necessary, rewrite `watch` as
an update function suitable for Var.async (as used by `activity`).
- The change attempts to preserve previous bug fixes and comments
  documenting them.
- The AsyncStream enrichments and their tests have been removed as they
  are no longer used.
- A `watch` helper was added to ApiTest to hide the more complicated
  usage pattern around .watch.
- Fix orphaned infinite-retry watches in ingress tests by adding
  no-data /watch/ handlers.

Fixes linkerd#1694
@adleong

This comment has been minimized.

Copy link
Member

adleong commented Nov 25, 2017

@sgrankin this looks great! I've skimmed your change and I agree with the general approach. I'd like to review it in greater detail so if you could create a PR against Linkerd that would be greatly appreciated and we can go through the nitty-gritty details there.
Thank you!!!

sgrankin added a commit to sgrankin/linkerd that referenced this issue Nov 27, 2017

Fix AsyncStream leak in kubernetes watch (linkerd#1694)
k8s watch returned an AsyncStream, the head of which was leaking (as
determined via heap dumps), leading to significant leaks over 24-48hrs.

Since the AsyncStream was not strictly necessary, rewrite `watch` as
an update function suitable for Var.async (as used by `activity`).
- The change attempts to preserve previous bug fixes and comments
  documenting them.
- The AsyncStream enrichments and their tests have been removed as they
  are no longer used.
- A `watch` helper was added to ApiTest to hide the more complicated
  usage pattern around .watch.
- Fix orphaned infinite-retry watches in ingress tests by adding
  no-data /watch/ handlers.

Fixes linkerd#1694

sgrankin added a commit to sgrankin/linkerd that referenced this issue Nov 27, 2017

Fix AsyncStream leak in kubernetes watch (linkerd#1694)
k8s watch returned an AsyncStream, the head of which was leaking (as
determined via heap dumps), leading to significant leaks over 24-48hrs.

Since the AsyncStream was not strictly necessary, rewrite `watch` as
an update function suitable for Var.async (as used by `activity`).
- The change attempts to preserve previous bug fixes and comments
  documenting them.
- The AsyncStream enrichments and their tests have been removed as they
  are no longer used.
- A `watch` helper was added to ApiTest to hide the more complicated
  usage pattern around .watch.
- Fix orphaned infinite-retry watches in ingress tests by adding
  no-data /watch/ handlers.

Fixes linkerd#1694
@sgrankin

This comment has been minimized.

Copy link
Contributor

sgrankin commented Nov 27, 2017

Sorry for the delay—just finally got some time this weekend to clean up the diff. PR opened at #1714

@hawkw

This comment has been minimized.

Copy link
Member

hawkw commented Nov 27, 2017

🙌 thanks @sgrankin!

sgrankin added a commit to sgrankin/linkerd that referenced this issue Nov 28, 2017

Fix AsyncStream leak in kubernetes watch (linkerd#1694)
k8s watch returned an AsyncStream, the head of which was leaking (as
determined via heap dumps), leading to significant leaks over 24-48hrs.

Since the AsyncStream was not strictly necessary, rewrite `watch` as
an update function suitable for Var.async (as used by `activity`).
- The change attempts to preserve previous bug fixes and comments
  documenting them.
- The AsyncStream enrichments and their tests have been removed as they
  are no longer used.
- A `watch` helper was added to ApiTest to hide the more complicated
  usage pattern around .watch.
- Fix orphaned infinite-retry watches in ingress tests by adding
  no-data /watch/ handlers.

Fixes linkerd#1694

@hawkw hawkw added this to the 1.3.3 milestone Nov 28, 2017

sgrankin added a commit to sgrankin/linkerd that referenced this issue Nov 29, 2017

Fix AsyncStream leak in kubernetes watch (linkerd#1694)
k8s watch returned an AsyncStream, the head of which was leaking (as
determined via heap dumps), leading to significant leaks over 24-48hrs.

Since the AsyncStream was not strictly necessary, rewrite `watch` as
an update function suitable for Var.async (as used by `activity`).
- The change attempts to preserve previous bug fixes and comments
  documenting them.
- The AsyncStream enrichments and their tests have been removed as they
  are no longer used.
- A `watch` helper was added to ApiTest to hide the more complicated
  usage pattern around .watch.
- Fix orphaned infinite-retry watches in ingress tests by adding
  no-data /watch/ handlers.

Fixes linkerd#1694

sgrankin added a commit to sgrankin/linkerd that referenced this issue Nov 29, 2017

Fix AsyncStream leak in kubernetes watch (linkerd#1694)
k8s watch returned an AsyncStream, the head of which was leaking (as
determined via heap dumps), leading to significant leaks over 24-48hrs.

Since the AsyncStream was not strictly necessary, rewrite `watch` as
an update function suitable for Var.async (as used by `activity`).
- The change attempts to preserve previous bug fixes and comments
  documenting them.
- The AsyncStream enrichments and their tests have been removed as they
  are no longer used.
- A `watch` helper was added to ApiTest to hide the more complicated
  usage pattern around .watch.
- Fix orphaned infinite-retry watches in ingress tests by adding
  no-data /watch/ handlers.

Fixes linkerd#1694

sgrankin added a commit to sgrankin/linkerd that referenced this issue Nov 29, 2017

Fix AsyncStream leak in kubernetes watch (linkerd#1694)
k8s watch returned an AsyncStream, the head of which was leaking (as
determined via heap dumps), leading to significant leaks over 24-48hrs.

Since the AsyncStream was not strictly necessary, rewrite `watch` as
an update function suitable for Var.async (as used by `activity`).
- The change attempts to preserve previous bug fixes and comments
  documenting them.
- The AsyncStream enrichments and their tests have been removed as they
  are no longer used.
- A `watch` helper was added to ApiTest to hide the more complicated
  usage pattern around .watch.
- Fix orphaned infinite-retry watches in ingress tests by adding
  no-data /watch/ handlers.

Fixes linkerd#1694

sgrankin added a commit to sgrankin/linkerd that referenced this issue Nov 30, 2017

Fix AsyncStream leak in kubernetes watch (linkerd#1694)
k8s watch returned an AsyncStream, the head of which was leaking (as
determined via heap dumps), leading to significant leaks over 24-48hrs.

Since the AsyncStream was not strictly necessary, rewrite `watch` as
an update function suitable for Var.async (as used by `activity`).
- The change attempts to preserve previous bug fixes and comments
  documenting them.
- The AsyncStream enrichments and their tests have been removed as they
  are no longer used.
- A `watch` helper was added to ApiTest to hide the more complicated
  usage pattern around .watch.
- Fix orphaned infinite-retry watches in ingress tests by adding
  no-data /watch/ handlers.

Fixes linkerd#1694

@siggy siggy closed this in #1714 Nov 30, 2017

siggy added a commit that referenced this issue Nov 30, 2017

Fix AsyncStream leak in kubernetes watch (#1694) (#1714)
k8s watch returned an AsyncStream, the head of which was leaking (as
determined via heap dumps), leading to significant leaks over 24-48hrs.

Since the AsyncStream was not strictly necessary, rewrite `watch` as
an update function suitable for Var.async (as used by `activity`).
- The change attempts to preserve previous bug fixes and comments
  documenting them.
- The AsyncStream enrichments and their tests have been removed as they
  are no longer used.
- A `watch` helper was added to ApiTest to hide the more complicated
  usage pattern around .watch.
- Fix orphaned infinite-retry watches in ingress tests by adding
  no-data /watch/ handlers.

Fixes #1694

@adleong adleong removed this from Bug in Linkerd Roadmap Dec 5, 2017

tbrooks8 pushed a commit to tbrooks8/linkerd that referenced this issue Dec 20, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment