for every huge page resource, we need to remove it from allocatable memory when Updating Node Allocatable limit across pods #96255

mysunshine92 · 2020-11-05T09:31:29Z

What type of PR is this?
/kind bug

What this PR does / why we need it:

in os operating system cgroup aspect，since memory use do not include hugepage use，we need to remove huge page resource from allocatable memory when Updating Node Allocatable limit across pods.
here:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/node_container_manager_linux.go#L184

Which issue(s) this PR fixes:

Fixes #84426

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

…emory when Updating Node Allocatable limit across pods

k8s-ci-robot · 2020-11-05T09:31:37Z

@mysunshine92: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mysunshine92 · 2020-11-05T10:26:35Z

/test pull-kubernetes-e2e-kind

mysunshine92 · 2020-11-05T10:26:39Z

/test pull-kubernetes-e2e-gce-ubuntu-containerd

mysunshine92 · 2020-11-05T12:39:25Z

cc @derekwaynecarr

mysunshine92 · 2020-11-05T14:34:13Z

/assign @derekwaynecarr

mysunshine92 · 2020-11-06T03:59:27Z

cc @vishh

mysunshine92 · 2020-11-06T10:59:54Z

cc @derekwaynecarr

mysunshine92 · 2020-11-08T13:51:10Z

cc @Random-Liu

AlexeyPerevalov · 2020-11-12T14:07:34Z

/cc

For pod QoS classes guaranteed and burstable this change fixes minor issue, since this bug is hidden by kube-scheduler which takes into account v1.Node.Status.Allocatable, as was mentioned in #84426 (comment).
But if pod with besteffort QoS will start on worker node only top level cgroup will limit its memory usage, in our case it's memory/kubepod.slice/. And there will be not an allocatable value, but capacity (which is higher and includes hugepage amount). So the workload in pod will faced with out of memory not because due to cgroup limit, but with higher probability because of memory exceeding on the host.

So it's better to fix it.

AlexeyPerevalov · 2020-11-12T14:13:29Z

pkg/kubelet/cm/node_container_manager_linux.go

@@ -197,6 +198,20 @@ func (cm *containerManagerImpl) getNodeAllocatableAbsoluteImpl(capacity v1.Resou
 		}
 		result[k] = value
 	}
+
+	for k, v := range result {


nodestatus in its setters.go is doing the same job. Since nodestatus uses cm package, maybe somehow make a common code in cm for it, because now this code duplication looks not so good.

k8s-ci-robot · 2020-11-12T14:14:21Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mysunshine92
To complete the pull request process, please assign derekwaynecarr after the PR has been reviewed.
You can assign the PR to them by writing /assign @derekwaynecarr in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/kubelet/cm/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ehashman · 2021-02-04T00:28:24Z

/hold

while waiting on author

fejta-bot · 2021-05-05T00:43:02Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

fejta-bot · 2021-06-04T01:12:45Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

fejta-bot · 2021-07-04T01:49:51Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-07-04T01:49:56Z

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

for every huge page resource, we need to remove it from allocatable m…

018b465

…emory when Updating Node Allocatable limit across pods

k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 5, 2020

k8s-ci-robot requested review from Random-Liu and vishh November 5, 2020 09:32

k8s-ci-robot assigned derekwaynecarr Nov 5, 2020

k8s-ci-robot requested a review from AlexeyPerevalov November 12, 2020 14:07

AlexeyPerevalov suggested changes Nov 12, 2020

View reviewed changes

ehashman added this to Needs Reviewer in SIG Node PR Triage Jan 6, 2021

ehashman moved this from Needs Reviewer to Waiting on Author in SIG Node PR Triage Jan 12, 2021

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 4, 2021

odinuge mentioned this pull request Mar 26, 2021

Delete the hugePages capacity from the memory resource capacity #99943

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 5, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 4, 2021

k8s-ci-robot closed this Jul 4, 2021

SIG Node PR Triage automation moved this from Waiting on Author to Done Jul 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

for every huge page resource, we need to remove it from allocatable memory when Updating Node Allocatable limit across pods #96255

for every huge page resource, we need to remove it from allocatable memory when Updating Node Allocatable limit across pods #96255

mysunshine92 commented Nov 5, 2020

k8s-ci-robot commented Nov 5, 2020

mysunshine92 commented Nov 5, 2020

mysunshine92 commented Nov 5, 2020

mysunshine92 commented Nov 5, 2020

mysunshine92 commented Nov 5, 2020

mysunshine92 commented Nov 6, 2020

mysunshine92 commented Nov 6, 2020

mysunshine92 commented Nov 8, 2020

AlexeyPerevalov commented Nov 12, 2020 •

edited

AlexeyPerevalov Nov 12, 2020

k8s-ci-robot commented Nov 12, 2020

ehashman commented Feb 4, 2021

fejta-bot commented May 5, 2021

fejta-bot commented Jun 4, 2021

fejta-bot commented Jul 4, 2021

k8s-ci-robot commented Jul 4, 2021

for every huge page resource, we need to remove it from allocatable memory when Updating Node Allocatable limit across pods #96255

for every huge page resource, we need to remove it from allocatable memory when Updating Node Allocatable limit across pods #96255

Conversation

mysunshine92 commented Nov 5, 2020

k8s-ci-robot commented Nov 5, 2020

mysunshine92 commented Nov 5, 2020

mysunshine92 commented Nov 5, 2020

mysunshine92 commented Nov 5, 2020

mysunshine92 commented Nov 5, 2020

mysunshine92 commented Nov 6, 2020

mysunshine92 commented Nov 6, 2020

mysunshine92 commented Nov 8, 2020

AlexeyPerevalov commented Nov 12, 2020 • edited

AlexeyPerevalov Nov 12, 2020

Choose a reason for hiding this comment

k8s-ci-robot commented Nov 12, 2020

ehashman commented Feb 4, 2021

fejta-bot commented May 5, 2021

fejta-bot commented Jun 4, 2021

fejta-bot commented Jul 4, 2021

k8s-ci-robot commented Jul 4, 2021

AlexeyPerevalov commented Nov 12, 2020 •

edited