Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-4210: UPSTREAM: 116833: Fix memory leak in kubelet volume_manager populator processedPods #1541

Conversation

mpatlasov
Copy link

findAndRemoveDeletedPods() processes only pods from volume_manager cache: dswp.desiredStateOfWorld.GetVolumesToMount(). podWorker calls volume_manager WaitForUnmount() asynchronously. If it happens after populator cleaned up resources, an entry is added to processedPods and will never be seen. Let's cleanup such entries if they don't have a pod and marked for deletion.

What type of PR is this?

/kind bug
/sig storage

What this PR does / why we need it:

Remove memory leakage from kubelet volume-manager.

Which issue(s) this PR fixes:

Fixes kubernetes#116831

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@openshift-ci-robot
Copy link

@mpatlasov: Jira Issue OCPBUGS-4210 is in a security level that is not in the allowed security levels for this repo.
Allowed security levels for this repo are:

  • Red Hat Employee
  • default

In response to this:

findAndRemoveDeletedPods() processes only pods from volume_manager cache: dswp.desiredStateOfWorld.GetVolumesToMount(). podWorker calls volume_manager WaitForUnmount() asynchronously. If it happens after populator cleaned up resources, an entry is added to processedPods and will never be seen. Let's cleanup such entries if they don't have a pod and marked for deletion.

What type of PR is this?

/kind bug
/sig storage

What this PR does / why we need it:

Remove memory leakage from kubelet volume-manager.

Which issue(s) this PR fixes:

Fixes kubernetes#116831

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. label Apr 12, 2023
@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 12, 2023
@openshift-ci
Copy link

openshift-ci bot commented Apr 12, 2023

@mpatlasov: The label(s) sig/storage cannot be applied, because the repository doesn't have them.

In response to this:

findAndRemoveDeletedPods() processes only pods from volume_manager cache: dswp.desiredStateOfWorld.GetVolumesToMount(). podWorker calls volume_manager WaitForUnmount() asynchronously. If it happens after populator cleaned up resources, an entry is added to processedPods and will never be seen. Let's cleanup such entries if they don't have a pod and marked for deletion.

What type of PR is this?

/kind bug
/sig storage

What this PR does / why we need it:

Remove memory leakage from kubelet volume-manager.

Which issue(s) this PR fixes:

Fixes kubernetes#116831

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

@mpatlasov: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci
Copy link

openshift-ci bot commented Apr 12, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mpatlasov
Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mpatlasov
Copy link
Author

/retest

3 similar comments
@mpatlasov
Copy link
Author

/retest

@mpatlasov
Copy link
Author

/retest

@mpatlasov
Copy link
Author

/retest

… processedPods

`findAndRemoveDeletedPods()` processes only pods from volume_manager cache: `dswp.desiredStateOfWorld.GetVolumesToMount()`. `podWorker` calls volume_manager `WaitForUnmount()` asynchronously. If it happens after populator cleaned up resources, an entry is added to `processedPods` and will never be seen. Let's cleanup such entries if they don't have a pod and marked for deletion.
@mpatlasov mpatlasov force-pushed the fix-memleak-in-kubelet-volumemanager-openshift branch from 3c089da to 18baf8a Compare April 20, 2023 21:32
@openshift-ci-robot openshift-ci-robot added the backports/validated-commits Indicates that all commits come to merged upstream PRs. label Apr 20, 2023
@openshift-ci-robot
Copy link

@mpatlasov: the contents of this pull request could be automatically validated.

The following commits are valid:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci-robot openshift-ci-robot removed the backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. label Apr 20, 2023
@mpatlasov
Copy link
Author

/retest

10 similar comments
@mpatlasov
Copy link
Author

/retest

@mpatlasov
Copy link
Author

/retest

@mpatlasov
Copy link
Author

/retest

@mpatlasov
Copy link
Author

/retest

@mpatlasov
Copy link
Author

/retest

@mpatlasov
Copy link
Author

/retest

@mpatlasov
Copy link
Author

/retest

@mpatlasov
Copy link
Author

/retest

@mpatlasov
Copy link
Author

/retest

@mpatlasov
Copy link
Author

/retest

@openshift-ci
Copy link

openshift-ci bot commented May 1, 2023

@mpatlasov: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade 18baf8a link false /test 4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 30, 2023
@mpatlasov
Copy link
Author

/test 4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade

@openshift-ci
Copy link

openshift-ci bot commented Aug 1, 2023

@mpatlasov: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test artifacts
  • /test configmap-scale
  • /test e2e-aws-jenkins
  • /test e2e-aws-ovn-cgroupsv2
  • /test e2e-aws-ovn-crun
  • /test e2e-aws-ovn-downgrade
  • /test e2e-aws-ovn-fips
  • /test e2e-aws-ovn-serial
  • /test e2e-aws-ovn-upgrade
  • /test e2e-azure-ovn-upgrade
  • /test e2e-gcp
  • /test e2e-gcp-ovn-upgrade
  • /test images
  • /test integration
  • /test k8s-e2e-conformance-aws
  • /test k8s-e2e-gcp-ovn
  • /test k8s-e2e-gcp-serial
  • /test unit
  • /test verify
  • /test verify-commits

The following commands are available to trigger optional jobs:

  • /test e2e-agnostic-ovn-cmd
  • /test e2e-aws
  • /test e2e-aws-csi
  • /test e2e-aws-disruptive
  • /test e2e-aws-multitenant
  • /test e2e-aws-ovn
  • /test e2e-aws-single-node
  • /test e2e-azure
  • /test e2e-metal-ipi
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-ovn-ipv6
  • /test e2e-openstack
  • /test e2e-openstack-csi-cinder
  • /test e2e-openstack-csi-manila
  • /test e2e-vsphere
  • /test k8s-e2e-aws
  • /test k8s-e2e-aws-ovn-serial
  • /test k8s-e2e-gcp-five-control-plane-replicas

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-kubernetes-master-e2e-agnostic-ovn-cmd
  • pull-ci-openshift-kubernetes-master-e2e-aws-csi
  • pull-ci-openshift-kubernetes-master-e2e-aws-ovn-cgroupsv2
  • pull-ci-openshift-kubernetes-master-e2e-aws-ovn-crun
  • pull-ci-openshift-kubernetes-master-e2e-aws-ovn-fips
  • pull-ci-openshift-kubernetes-master-e2e-aws-ovn-serial
  • pull-ci-openshift-kubernetes-master-e2e-gcp
  • pull-ci-openshift-kubernetes-master-e2e-gcp-ovn-upgrade
  • pull-ci-openshift-kubernetes-master-images
  • pull-ci-openshift-kubernetes-master-integration
  • pull-ci-openshift-kubernetes-master-k8s-e2e-aws-ovn-serial
  • pull-ci-openshift-kubernetes-master-k8s-e2e-conformance-aws
  • pull-ci-openshift-kubernetes-master-k8s-e2e-gcp-ovn
  • pull-ci-openshift-kubernetes-master-k8s-e2e-gcp-serial
  • pull-ci-openshift-kubernetes-master-unit
  • pull-ci-openshift-kubernetes-master-verify
  • pull-ci-openshift-kubernetes-master-verify-commits

In response to this:

/test 4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 1, 2023
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci
Copy link

openshift-ci bot commented Oct 1, 2023

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot closed this Oct 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backports/validated-commits Indicates that all commits come to merged upstream PRs. kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leak in kubelet volume_manager populator processedPods
3 participants