Fix exclusive CPU allocations being deleted at container restart #90377

cbf123 · 2020-04-22T21:35:22Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

The expectation is that exclusive CPU allocations happen at pod
creation time. When a container restarts, it should not have its
exclusive CPU allocations removed, and it should not need to
re-allocate CPUs.

There are a few places in the current code that look for containers
that have exited and call CpuManager.RemoveContainer() to clean up
the container. This will end up deleting any exclusive CPU
allocations for that container, and if the container restarts within
the same pod it will end up using the default cpuset rather than
what should be exclusive CPUs.

Removing those calls and adding resource cleanup at allocation
time should get rid of the problem.

Which issue(s) this PR fixes:

Fixes #90303

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Fixes regression in CPUManager that caused freeing of exclusive CPUs at incorrect times

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2020-04-22T21:35:31Z

Hi @cbf123. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cbf123 · 2020-04-22T21:35:52Z

/sig node

klueska · 2020-04-22T21:57:11Z

/assign @klueska

klueska · 2020-04-22T21:58:14Z

We had a long discussion about this here for context:
https://kubernetes.slack.com/archives/C0BP8PW9G/p1587155932390500

klueska

Minor changes requested. However it would be nice to see a test or two added that trigger the bug before the change, but fix it after. That way regressions like this won't happen in the future.

klueska · 2020-04-22T22:01:04Z

pkg/kubelet/cm/internal_container_lifecycle.go

-	}
+	// We can't safely call i.cpuManager.RemoveContainer(containerID)
+	// here. Regular containers could be in the process of restarting, and
+        // RemoveContainer() would remove any allocated exclusive CPUs that the


Can you fix the indenting here?

oops...editor was set to use spaces instead of tabs.

klueska · 2020-04-22T22:01:12Z

pkg/kubelet/cm/internal_container_lifecycle.go

-	}
+	// We can't safely call i.cpuManager.RemoveContainer(containerID)
+	// here. Regular containers could be in the process of restarting, and
+        // RemoveContainer() would remove any allocated exclusive CPUs that the


I wouldn't bother with this comment here. I know we used to have code here that called RemoveContainer(), but it's more confusing to see the comment out of context, than to not see it at all -- especially since there is no path for calling RemoveContainer() from any external hooks anymore.

I actually plan to go through the TopologyManager after this and remove it's callout as well -- thus removing the need for this hook (and the InternalContainerLifecycle interface) altogether.

klueska · 2020-04-22T22:33:45Z

As some more background, this is a regression due to the refactoring of the CPUManager that happened as part of:

https://github.com/kubernetes/kubernetes/pull/87759/commits

As part of that refactoring, the CPUManager moved to a model where CPUs are now allocated across all containers at pod admission time rather than as each individual container comes online. Since all CPUs are allocated at pod admission time, they can only properly be freed back to the shared pool at pod deletion time (or lazily after the pod is already gone).

In the old model, CPUs were allocated to each container as it came online (as part of a container pre-start-hook) so we were free to (and in fact required to) free them as each container exited (as part of a post-stop-hook). This is problematic in the new model, however, since CPUs are now assumed to retain their assignment to a container for the lifetime of a pod. As an oversight, the logic was left in place to do this freeing on each container exit instead of waiting for pod deletion. This causes problems (for example) when a container restarts without causing its bounding pod to be restarted.

This patch updates the CPUManager to make sure that CPUs are only ever freed back to the shared pool after a pod has been deleted. It does this by lazily calling the existing removeStaleState() function at appropriate times instead of directly calling RemoveContainer() at container exit. The removeStaleState() function itself walks through the CPUManager state and frees any CPUs not bound to actively running pods.

We now make this call at three locations in the code:

At the top of the GetTopologyHints() call, just before a new pod runs it's logic to generate hints for the TopologyManager. This ensures it will have access to any "newly" available CPUs from terminated pods when generating these hints.
At the top of the Allocate() call, just before a new pod runs it's logic to allocate CPUs. This ensures it will have access to any "newly" available CPUs from terminated pods when performing new allocations.
Periodically, as part of the existing reconcileState() function. This guarantees that CPUs will be freed from terminated pods at least once per reconcile period (currently 10 seconds) in the case that no new pods enter the system and trigger the removeStaleState() function as part of Allocate().

In theory, we don't need 2 when the TopologyManager is enabled because (1) and (2) are in the same synchronous loop. However, not all setups enable the TopologyManager, so it is required in both places for now.

In the future we should consider adding a hook for pod deletion instead of doing the lazy cleanup as part of (1) and (2). We will likely always do the lazy cleanup as part of (3) however, just to make sure we we are always in a sane state.

klueska · 2020-04-23T18:47:22Z

/ok-to-test

klueska · 2020-04-23T19:08:58Z

pkg/kubelet/cm/cpumanager/cpu_manager_test.go

+type mockSourcesReady struct{}
+
+func (s *mockSourcesReady) AddSource(source string) {}
+
+func (s *mockSourcesReady) AllReady() bool          { return false }
+


There already exists a sourcesReadyStub you can use here instead of this.

Removing this should fix your gofmt error in the pull-kubernetes-verify test as well.

sourcesReadyStub.AllReady() returns "true", which causes the new call to removeStaleState() to try to actually do stuff, which causes all sorts of grief

I see. I'd rather not artifically turn off valid code paths that might uncover other underlying problems that are lurking though.

Below is a diff to your current patch that deals with the issue you are seeing in a way that's more consistent with the rest of the tests in this file:

diff --git a/pkg/kubelet/cm/cpumanager/cpu_manager_test.go b/pkg/kubelet/cm/cpumanager/cpu_manager_test.go index 7a6724d76c2..bd18b2c95eb 100644 --- a/pkg/kubelet/cm/cpumanager/cpu_manager_test.go +++ b/pkg/kubelet/cm/cpumanager/cpu_manager_test.go @@ -40,12 +40,6 @@ import ( "k8s.io/kubernetes/pkg/kubelet/cm/topologymanager" ) -type mockSourcesReady struct{} - -func (s *mockSourcesReady) AddSource(source string) {} - -func (s *mockSourcesReady) AllReady() bool { return false } - type mockState struct { assignments state.ContainerCPUAssignments defaultCPUSet cpuset.CPUSet @@ -275,14 +269,14 @@ func TestCPUManagerAdd(t *testing.T) { err: testCase.updateErr, }, containerMap: containermap.NewContainerMap(), - activePods: func() []*v1.Pod { return nil }, podStatusProvider: mockPodStatusProvider{}, + sourcesReady: &sourcesReadyStub{}, } - mgr.sourcesReady = &mockSourcesReady{} - pod := makePod("fakePod", "fakeContainer", "2", "2") container := &pod.Spec.Containers[0] + mgr.activePods = func() []*v1.Pod { return []*v1.Pod{pod} } + err := mgr.Allocate(pod, container) if !reflect.DeepEqual(err, testCase.expAllocateErr) { t.Errorf("CPU Manager Allocate() error (%v). expected error: %v but got: %v", @@ -495,12 +489,13 @@ func TestCPUManagerAddWithInitContainers(t *testing.T) { state: state, containerRuntime: mockRuntimeService{}, containerMap: containermap.NewContainerMap(), - activePods: func() []*v1.Pod { return nil }, podStatusProvider: mockPodStatusProvider{}, + sourcesReady: &sourcesReadyStub{}, + activePods: func() []*v1.Pod { + return []*v1.Pod{testCase.pod} + }, } - mgr.sourcesReady = &mockSourcesReady{} - containers := append( testCase.pod.Spec.InitContainers, testCase.pod.Spec.Containers...) @@ -1031,14 +1026,14 @@ func TestCPUManagerAddWithResvList(t *testing.T) { err: testCase.updateErr, }, containerMap: containermap.NewContainerMap(), - activePods: func() []*v1.Pod { return nil }, podStatusProvider: mockPodStatusProvider{}, + sourcesReady: &sourcesReadyStub{}, } - mgr.sourcesReady = &mockSourcesReady{} - pod := makePod("fakePod", "fakeContainer", "2", "2") container := &pod.Spec.Containers[0] + mgr.activePods = func() []*v1.Pod { return []*v1.Pod{pod} } + err := mgr.Allocate(pod, container) if !reflect.DeepEqual(err, testCase.expAllocateErr) { t.Errorf("CPU Manager Allocate() error (%v). expected error: %v but got: %v",

Thanks for the test changes, applied.

klueska · 2020-04-23T21:03:44Z

Also, can you update the release note to:

Fixes regression in CPUManager that caused freeing of exclusive CPUs at incorrect times

Even though this is not a user-facing change, we plan to backport this to the 1.18 branch, and having a release note in the original PR eases this process.

k8s-ci-robot · 2020-04-27T17:45:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cbf123, klueska

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/cm/OWNERS~~ [klueska]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2020-04-27T20:14:08Z

@cbf123: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-kubernetes-e2e-kind	`ab5870d`	link	`/test pull-kubernetes-e2e-kind`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

cbf123 · 2020-04-27T20:28:30Z

/test pull-kubernetes-e2e-kind

…7-upstream-release-1.18 Automated cherry pick of #90377: Fix exclusive CPU allocations being deleted at container

Fix exclusive CPU allocations being deleted at container restart

…ainer restart ref: kubernetes/kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com>

…iner restart ref: kubernetes/kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com>

…iner restart ref: kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com> Origin-commit: 3b9312345f11741b1ce1779bc644bf5441cae2c4

hex108 · 2020-11-18T09:44:49Z

Could we cherry pick it to release-1.17? Thanks! @klueska @cbf123

…ing deleted at container restart

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. area/kubelet and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 22, 2020

k8s-ci-robot requested review from klueska and mtaufen April 22, 2020 21:38

klueska suggested changes Apr 22, 2020

View reviewed changes

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 23, 2020

klueska reviewed Apr 23, 2020

View reviewed changes

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Apr 23, 2020

cbf123 force-pushed the container_cpuset_fixup_2 branch from 2b90201 to 8f6ea23 Compare April 23, 2020 23:08

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/cloudprovider sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 23, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 27, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 27, 2020

klueska mentioned this pull request Apr 27, 2020

Automated cherry pick of #90377: Fix exclusive CPU allocations being deleted at container #90530

Merged

k8s-ci-robot merged commit 7fdc127 into kubernetes:master Apr 27, 2020

k8s-ci-robot added this to the v1.19 milestone Apr 27, 2020

cbf123 deleted the container_cpuset_fixup_2 branch April 27, 2020 20:50

k8s-ci-robot added a commit that referenced this pull request May 30, 2020

Merge pull request #90530 from klueska/automated-cherry-pick-of-#9037…

e1e78ec

…7-upstream-release-1.18 Automated cherry pick of #90377: Fix exclusive CPU allocations being deleted at container

cynepco3hahue pushed a commit to cynepco3hahue/kubernetes that referenced this pull request Jun 2, 2020

Merge pull request kubernetes#90377 from cbf123/container_cpuset_fixup_2

a1ac777

Fix exclusive CPU allocations being deleted at container restart

Yukariko mentioned this pull request Jun 10, 2020

Kubernetes v1.18.0 cpumanager: cpuset not assigned after pod restart #91668

Closed

cynepco3hahue pushed a commit to cynepco3hahue/origin that referenced this pull request Jun 10, 2020

UPSTREAM <90377>: Fix exclusive CPU allocations being deleted at cont…

50897ad

…ainer restart ref: kubernetes/kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com>

cynepco3hahue mentioned this pull request Jun 10, 2020

Bug 1846431: UPSTREAM: 90377: Fix exclusive CPU allocations being deleted at container restart openshift/origin#25089

Closed

cynepco3hahue pushed a commit to cynepco3hahue/origin that referenced this pull request Jun 10, 2020

UPSTREAM <90377>: Fix exclusive CPU allocations being deleted at cont…

6940400

…ainer restart ref: kubernetes/kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com>

cynepco3hahue pushed a commit to cynepco3hahue/origin that referenced this pull request Jun 10, 2020

UPSTREAM :90377: fix exclusive CPU allocations being deleted at conta…

cc3a414

…iner restart ref: kubernetes/kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com>

cynepco3hahue pushed a commit to cynepco3hahue/origin that referenced this pull request Jun 11, 2020

UPSTREAM: 90377: fix exclusive CPU allocations being deleted at conta…

fc9760a

…iner restart ref: kubernetes/kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com>

This was referenced Jul 2, 2020

CpuManagerPolicy static doesn't work with pods that run to completion on k8s 1.18 #92436

Closed

[cpumanager] AddContainer error: not enough cpus available to satisfy request #79159

Closed

cynepco3hahue pushed a commit to cynepco3hahue/origin that referenced this pull request Jul 12, 2020

UPSTREAM: 90377: fix exclusive CPU allocations being deleted at conta…

7eb61ff

…iner restart ref: kubernetes/kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com>

cynepco3hahue pushed a commit to cynepco3hahue/origin that referenced this pull request Jul 20, 2020

UPSTREAM: 90377: fix exclusive CPU allocations being deleted at conta…

6f86106

…iner restart ref: kubernetes/kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com>

harper1011 mentioned this pull request Jul 21, 2020

Terminate Guaranteed pods without changing their cgroup cpuset #92377

Closed

cynepco3hahue pushed a commit to cynepco3hahue/origin that referenced this pull request Jul 23, 2020

UPSTREAM: 90377: fix exclusive CPU allocations being deleted at conta…

3b93123

…iner restart ref: kubernetes/kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com>

cynepco3hahue mentioned this pull request Jul 23, 2020

[release-4.5] Bug 1846453: UPSTREAM: 90377: fix exclusive CPU allocations being deleted at container restart openshift/origin#25309

Merged

hex108 added a commit to hex108/kubernetes that referenced this pull request Nov 19, 2020

Partly cherry pick kubernetes#90377: Fix exclusive CPU allocations be…

b795739

…ing deleted at container restart

cynepco3hahue mentioned this pull request Jul 4, 2021

memory manager: handling init containers #99640

Merged

2 tasks

bart0sh mentioned this pull request Feb 23, 2023

cleanup(kubelet): remove unused function #115929

Merged

gjkim42 mentioned this pull request Jul 19, 2023

Topology manager should consider restartable init containers #119407

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix exclusive CPU allocations being deleted at container restart #90377

Fix exclusive CPU allocations being deleted at container restart #90377

cbf123 commented Apr 22, 2020 •

edited

k8s-ci-robot commented Apr 22, 2020

cbf123 commented Apr 22, 2020

klueska commented Apr 22, 2020 •

edited

klueska commented Apr 22, 2020

klueska left a comment

klueska Apr 22, 2020

cbf123 Apr 23, 2020

klueska Apr 22, 2020 •

edited

klueska commented Apr 22, 2020 •

edited

klueska commented Apr 23, 2020

klueska Apr 23, 2020

klueska Apr 23, 2020

cbf123 Apr 23, 2020

klueska Apr 23, 2020

cbf123 Apr 27, 2020

klueska commented Apr 23, 2020 •

edited

k8s-ci-robot commented Apr 27, 2020

k8s-ci-robot commented Apr 27, 2020

cbf123 commented Apr 27, 2020

hex108 commented Nov 18, 2020

Fix exclusive CPU allocations being deleted at container restart #90377

Fix exclusive CPU allocations being deleted at container restart #90377

Conversation

cbf123 commented Apr 22, 2020 • edited

k8s-ci-robot commented Apr 22, 2020

cbf123 commented Apr 22, 2020

klueska commented Apr 22, 2020 • edited

klueska commented Apr 22, 2020

klueska left a comment

Choose a reason for hiding this comment

klueska Apr 22, 2020

Choose a reason for hiding this comment

cbf123 Apr 23, 2020

Choose a reason for hiding this comment

klueska Apr 22, 2020 • edited

Choose a reason for hiding this comment

klueska commented Apr 22, 2020 • edited

klueska commented Apr 23, 2020

klueska Apr 23, 2020

Choose a reason for hiding this comment

klueska Apr 23, 2020

Choose a reason for hiding this comment

cbf123 Apr 23, 2020

Choose a reason for hiding this comment

klueska Apr 23, 2020

Choose a reason for hiding this comment

cbf123 Apr 27, 2020

Choose a reason for hiding this comment

klueska commented Apr 23, 2020 • edited

k8s-ci-robot commented Apr 27, 2020

k8s-ci-robot commented Apr 27, 2020

cbf123 commented Apr 27, 2020

hex108 commented Nov 18, 2020

cbf123 commented Apr 22, 2020 •

edited

klueska commented Apr 22, 2020 •

edited

klueska Apr 22, 2020 •

edited

klueska commented Apr 22, 2020 •

edited

klueska commented Apr 23, 2020 •

edited