New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix exclusive CPU allocations being deleted at container restart #90377
Fix exclusive CPU allocations being deleted at container restart #90377
Conversation
Hi @cbf123. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig node |
/assign @klueska |
We had a long discussion about this here for context: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes requested. However it would be nice to see a test or two added that trigger the bug before the change, but fix it after. That way regressions like this won't happen in the future.
} | ||
// We can't safely call i.cpuManager.RemoveContainer(containerID) | ||
// here. Regular containers could be in the process of restarting, and | ||
// RemoveContainer() would remove any allocated exclusive CPUs that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix the indenting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops...editor was set to use spaces instead of tabs.
} | ||
// We can't safely call i.cpuManager.RemoveContainer(containerID) | ||
// here. Regular containers could be in the process of restarting, and | ||
// RemoveContainer() would remove any allocated exclusive CPUs that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't bother with this comment here. I know we used to have code here that called RemoveContainer(), but it's more confusing to see the comment out of context, than to not see it at all -- especially since there is no path for calling RemoveContainer() from any external hooks anymore.
I actually plan to go through the TopologyManager after this and remove it's callout as well -- thus removing the need for this hook (and the InternalContainerLifecycle
interface) altogether.
As some more background, this is a regression due to the refactoring of the CPUManager that happened as part of: https://github.com/kubernetes/kubernetes/pull/87759/commits As part of that refactoring, the CPUManager moved to a model where CPUs are now allocated across all containers at pod admission time rather than as each individual container comes online. Since all CPUs are allocated at pod admission time, they can only properly be freed back to the shared pool at pod deletion time (or lazily after the pod is already gone). In the old model, CPUs were allocated to each container as it came online (as part of a container pre-start-hook) so we were free to (and in fact required to) free them as each container exited (as part of a post-stop-hook). This is problematic in the new model, however, since CPUs are now assumed to retain their assignment to a container for the lifetime of a pod. As an oversight, the logic was left in place to do this freeing on each container exit instead of waiting for pod deletion. This causes problems (for example) when a container restarts without causing its bounding pod to be restarted. This patch updates the CPUManager to make sure that CPUs are only ever freed back to the shared pool after a pod has been deleted. It does this by lazily calling the existing We now make this call at three locations in the code:
In theory, we don't need 2 when the In the future we should consider adding a hook for pod deletion instead of doing the lazy cleanup as part of (1) and (2). We will likely always do the lazy cleanup as part of (3) however, just to make sure we we are always in a sane state. |
/ok-to-test |
type mockSourcesReady struct{} | ||
|
||
func (s *mockSourcesReady) AddSource(source string) {} | ||
|
||
func (s *mockSourcesReady) AllReady() bool { return false } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There already exists a sourcesReadyStub
you can use here instead of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this should fix your gofmt
error in the pull-kubernetes-verify
test as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sourcesReadyStub.AllReady() returns "true", which causes the new call to removeStaleState() to try to actually do stuff, which causes all sorts of grief
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I'd rather not artifically turn off valid code paths that might uncover other underlying problems that are lurking though.
Below is a diff to your current patch that deals with the issue you are seeing in a way that's more consistent with the rest of the tests in this file:
diff --git a/pkg/kubelet/cm/cpumanager/cpu_manager_test.go b/pkg/kubelet/cm/cpumanager/cpu_manager_test.go
index 7a6724d76c2..bd18b2c95eb 100644
--- a/pkg/kubelet/cm/cpumanager/cpu_manager_test.go
+++ b/pkg/kubelet/cm/cpumanager/cpu_manager_test.go
@@ -40,12 +40,6 @@ import (
"k8s.io/kubernetes/pkg/kubelet/cm/topologymanager"
)
-type mockSourcesReady struct{}
-
-func (s *mockSourcesReady) AddSource(source string) {}
-
-func (s *mockSourcesReady) AllReady() bool { return false }
-
type mockState struct {
assignments state.ContainerCPUAssignments
defaultCPUSet cpuset.CPUSet
@@ -275,14 +269,14 @@ func TestCPUManagerAdd(t *testing.T) {
err: testCase.updateErr,
},
containerMap: containermap.NewContainerMap(),
- activePods: func() []*v1.Pod { return nil },
podStatusProvider: mockPodStatusProvider{},
+ sourcesReady: &sourcesReadyStub{},
}
- mgr.sourcesReady = &mockSourcesReady{}
-
pod := makePod("fakePod", "fakeContainer", "2", "2")
container := &pod.Spec.Containers[0]
+ mgr.activePods = func() []*v1.Pod { return []*v1.Pod{pod} }
+
err := mgr.Allocate(pod, container)
if !reflect.DeepEqual(err, testCase.expAllocateErr) {
t.Errorf("CPU Manager Allocate() error (%v). expected error: %v but got: %v",
@@ -495,12 +489,13 @@ func TestCPUManagerAddWithInitContainers(t *testing.T) {
state: state,
containerRuntime: mockRuntimeService{},
containerMap: containermap.NewContainerMap(),
- activePods: func() []*v1.Pod { return nil },
podStatusProvider: mockPodStatusProvider{},
+ sourcesReady: &sourcesReadyStub{},
+ activePods: func() []*v1.Pod {
+ return []*v1.Pod{testCase.pod}
+ },
}
- mgr.sourcesReady = &mockSourcesReady{}
-
containers := append(
testCase.pod.Spec.InitContainers,
testCase.pod.Spec.Containers...)
@@ -1031,14 +1026,14 @@ func TestCPUManagerAddWithResvList(t *testing.T) {
err: testCase.updateErr,
},
containerMap: containermap.NewContainerMap(),
- activePods: func() []*v1.Pod { return nil },
podStatusProvider: mockPodStatusProvider{},
+ sourcesReady: &sourcesReadyStub{},
}
- mgr.sourcesReady = &mockSourcesReady{}
-
pod := makePod("fakePod", "fakeContainer", "2", "2")
container := &pod.Spec.Containers[0]
+ mgr.activePods = func() []*v1.Pod { return []*v1.Pod{pod} }
+
err := mgr.Allocate(pod, container)
if !reflect.DeepEqual(err, testCase.expAllocateErr) {
t.Errorf("CPU Manager Allocate() error (%v). expected error: %v but got: %v",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the test changes, applied.
Also, can you update the release note to:
Even though this is not a user-facing change, we plan to backport this to the 1.18 branch, and having a release note in the original PR eases this process. |
2b90201
to
8f6ea23
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cbf123, klueska The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@cbf123: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/test pull-kubernetes-e2e-kind |
…7-upstream-release-1.18 Automated cherry pick of #90377: Fix exclusive CPU allocations being deleted at container
Fix exclusive CPU allocations being deleted at container restart
…ainer restart ref: kubernetes/kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
…ainer restart ref: kubernetes/kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
…iner restart ref: kubernetes/kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
…iner restart ref: kubernetes/kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
…iner restart ref: kubernetes/kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
…iner restart ref: kubernetes/kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
…iner restart ref: kubernetes/kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
…iner restart ref: kubernetes#90377 Signed-off-by: Artyom Lukianov <alukiano@redhat.com> Origin-commit: 3b9312345f11741b1ce1779bc644bf5441cae2c4
…ing deleted at container restart
What type of PR is this?
/kind bug
What this PR does / why we need it:
The expectation is that exclusive CPU allocations happen at pod
creation time. When a container restarts, it should not have its
exclusive CPU allocations removed, and it should not need to
re-allocate CPUs.
There are a few places in the current code that look for containers
that have exited and call CpuManager.RemoveContainer() to clean up
the container. This will end up deleting any exclusive CPU
allocations for that container, and if the container restarts within
the same pod it will end up using the default cpuset rather than
what should be exclusive CPUs.
Removing those calls and adding resource cleanup at allocation
time should get rid of the problem.
Which issue(s) this PR fixes:
Fixes #90303
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: