Fix bug in TopologyManager hint generation after kubelet restart #84525

klueska · 2019-10-29T16:36:55Z

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind api-change

/kind bug

/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
Previously, the HintProviders for the CPUManager and devicemanager would
attempt to generate (new) hints for already running containers after a
kubelet restart.

This patch adds logic to both the CPUManager and the devicemanager to regenerate hints for containers that already have CPUs and/or devices allocated to them.

Which issue(s) this PR fixes:

Fixes #84479

Does this PR introduce a user-facing change?:

NONE

klueska · 2019-10-29T16:37:07Z

/assign @ConnorDoyle

k8s-ci-robot · 2019-10-29T16:39:32Z

@klueska: GitHub didn't allow me to request PR reviews from the following users: cbf123.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @cbf123

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

klueska · 2019-10-29T16:40:10Z

/cc @lmdaly @nolancon

pkg/kubelet/cm/devicemanager/topology_hints_test.go

cbf123

Looks good to me, appreciate the fix.

klueska · 2019-10-29T17:06:03Z

@cbf123 would you mind pulling down this branch and giving it a spin to make sure it addresses your issue in a live setting?

cbf123 · 2019-10-29T17:09:09Z

@klueska Yep, I should be able to try it out.

klueska · 2019-10-30T08:39:24Z

@cbf123 after sleeping on this solution, I think there may still be a problem.

It's possible that a kubelet goes down after a container has been granted a set of devices, but before a container has been granted a set of CPUs (or vice versa). When this happens, we want the hints generated for the container after the kubelet restart to be the same as they were before the restart so that any unallocated resources (either a set of devices or CPUs in this case) can have their hints combined properly by the TopologyManager. Simply returning a nil implies a "don't care" and has the possibility to mess up alignment under this edge case.

I will rework this patch sometime today and let you know once it's done.

klueska · 2019-10-30T17:10:45Z

@cbf123 @lmdaly

OK. Update pushed. It has a dependency on the commit in #81344 which is also included as part of this PR.

cbf123 · 2019-10-30T17:35:28Z

Basic initial testing shows the previous version of the fix does seem to resolve the original problem. I'll give this one a try too.

klueska · 2019-10-30T17:54:30Z

/retest

pkg/kubelet/cm/cpumanager/policy_static.go

klueska · 2019-11-04T21:04:43Z

If you're satisfied with this patch for now at least, can you give it an lgtm so other reviewers know the state of things.

cbf123

LGTM

klueska · 2019-11-04T21:13:26Z

I guess the best we could do is error out at container creation if the CPUs being allocated don't actually align with what was expected from the Hint. At least then you error out without allocating CPUs that don't have the alignment you expected.

We can't actually even do that without intertwining the topologymanager policy with the allocation policy (since for the best-effort policy, for example, we want to continue the allocation even if the NUMA affinity isn't satisfied).

This will become especially important as we move to a model where exclusive CPUs are assigned at pod admission time rather than at pod creation time. Having this function will allow us to do garbage collection on these CPUs anytime we are about to allocate CPUs to a new set of containers, in addition to reclaiming state periodically in the reconcileState() loop.

This ensures that we have the most up-to-date state when generating topology hints for a container. Without this, it's possible that some resources will be seen as allocated, when they are actually free.

This patch also includes test to make sure the newly added logic works as expected.

klueska · 2019-11-05T16:00:28Z

To reiterate, I'm looking for a review about the points in the following comment from someone more knowledgable in this area: #84525 (comment)

/cc @derekwaynecarr
/cc @ConnorDoyle

ConnorDoyle

These changes make sense based on the PR description of the fault. Thanks @klueska for again leaving great comments to follow and @cbf123 for adding your review.

/approve

ConnorDoyle · 2019-11-06T05:27:28Z

pkg/kubelet/cm/devicemanager/topology_hints.go

 	// Strip all devices in use from the list of healthy ones.
 	return m.healthyDevices[resource].Difference(m.allocatedDevices[resource])
 }

+func (m *ManagerImpl) getAllocatedDevices(podUID, containerName, resource string) sets.String {
+	// Pull the list of device IDs directly from the list of podDevices.
+	return m.podDevices[podUID][containerName][resource].deviceIds


It's not expected for the 2nd and 3rd level mappings to be nil, but it would be nicer to check (and log and return an empty set I suppose)

One better -- there was already a function for this in the podDevices abstraction:

m.podDevices.containerDevices()

ConnorDoyle · 2019-11-06T05:45:45Z

pkg/kubelet/cm/devicemanager/topology_hints.go

@@ -28,26 +28,49 @@ import (
 // ensures the Device Manager is consulted when Topology Aware Hints for each
 // container are created.
 func (m *ManagerImpl) GetTopologyHints(pod v1.Pod, container v1.Container) map[string][]topologymanager.TopologyHint {
-	deviceHints := make(map[string][]topologymanager.TopologyHint)
+	// Garbage collect any stranded device resources before providing TopologyHints
+	m.updateAllocatedDevices(m.activePods())


👍 for moving this out of getAvailableDevices.

k8s-ci-robot · 2019-11-06T05:48:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ConnorDoyle, klueska

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/cm/cpumanager/OWNERS~~ [ConnorDoyle]
~~pkg/kubelet/cm/devicemanager/OWNERS~~ [ConnorDoyle]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ConnorDoyle · 2019-11-06T06:18:17Z

Responding to the specific questions called out by @klueska:

I don't think there's a problem with calling activePods or GetPodStatus under the cpumanager mutex. The issue with holding the lock in removeStaleState may be that we do perform some file I/O within that function by virtue of calling m.policy.RemoveContainer. Ideally we could find a way to avoid that.
Could you walk through the sequence of events that causes the pod status to be missing when GetTopologyHints is called on the cpumanager? As you pointed out in the comment, it seems like avoiding that possibility would have to wait until the cpumanager state is refactored to store (pod.Name, container.Name) composite keys instead of container ID. So even if it is possible, I'm not sure the problem is addressable in this patch.

/cc @derekwaynecarr for a 2nd opinion, he may have more background on pod status guarantees

This patch also includes test to make sure the newly added logic works as expected.

klueska · 2019-11-06T15:33:38Z

@ConnorDoyle Thanks for the review, and thanks for the response to my questions!

I agree theres nothing more we can do in this PR if a pod status is not ready at the point we need it to be. It would be good to know if it is at all possible to not have a vaild status at this point though. Let's wait and what @derekwaynecarr's response is.

With that said, I think all that's missing now is an /lgtm now that I've addressed: #84525 (review)

ConnorDoyle · 2019-11-06T18:23:52Z

/lgtm

pkg/kubelet/cm/cpumanager/cpu_manager.go

klueska · 2019-11-06T22:23:00Z

/retest

k8s-ci-robot assigned ConnorDoyle Oct 29, 2019

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 29, 2019

k8s-ci-robot requested review from RenaudWasTaken and vikaschoudhary16 October 29, 2019 16:39

k8s-ci-robot requested review from lmdaly and nolancon October 29, 2019 16:40

cbf123 reviewed Oct 29, 2019

View reviewed changes

pkg/kubelet/cm/devicemanager/topology_hints_test.go Show resolved Hide resolved

cbf123 approved these changes Oct 29, 2019

View reviewed changes

klueska force-pushed the upstream-fix-hint-generation-after-kubelet-restart branch from 2e04e11 to 583ca94 Compare October 30, 2019 16:55

cbf123 reviewed Oct 30, 2019

View reviewed changes

pkg/kubelet/cm/cpumanager/policy_static.go Show resolved Hide resolved

cbf123 reviewed Oct 30, 2019

View reviewed changes

pkg/kubelet/cm/cpumanager/policy_static.go Outdated Show resolved Hide resolved

cbf123 reviewed Oct 30, 2019

View reviewed changes

pkg/kubelet/cm/cpumanager/policy_static.go Outdated Show resolved Hide resolved

cbf123 reviewed Oct 30, 2019

View reviewed changes

pkg/kubelet/cm/cpumanager/policy_static.go Outdated Show resolved Hide resolved

cbf123 approved these changes Nov 4, 2019

View reviewed changes

cbf123 mentioned this pull request Nov 4, 2019

Race condition when creating pods, Topology Manager Policy sometimes ignored #84749

Closed

klueska added 3 commits November 5, 2019 12:45

Sync all CPU and device state before generating TopologyHints for them

58f3554

This ensures that we have the most up-to-date state when generating topology hints for a container. Without this, it's possible that some resources will be seen as allocated, when they are actually free.

Add some more comments to GetTopologyHints() in the devicemanager

a338c8f

klueska force-pushed the upstream-fix-hint-generation-after-kubelet-restart branch from b63fbcb to 585adc8 Compare November 5, 2019 14:29

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 5, 2019

Ensure CPUManager TopologyHints are regenerated after kubelet restart

9dc116e

This patch also includes test to make sure the newly added logic works as expected.

k8s-ci-robot requested review from ConnorDoyle and derekwaynecarr November 5, 2019 16:06

klueska force-pushed the upstream-fix-hint-generation-after-kubelet-restart branch from 585adc8 to 6b0bd6d Compare November 5, 2019 16:15

klueska mentioned this pull request Nov 5, 2019

Update CPUManager stored state semantics #84462

Merged

ConnorDoyle reviewed Nov 6, 2019

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 6, 2019

Ensure devicemanager TopologyHints are regenerated after kubelet restart

4d4d4bd

This patch also includes test to make sure the newly added logic works as expected.

klueska force-pushed the upstream-fix-hint-generation-after-kubelet-restart branch from 6b0bd6d to 4d4d4bd Compare November 6, 2019 15:30

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 6, 2019

tedyu reviewed Nov 6, 2019

View reviewed changes

pkg/kubelet/cm/cpumanager/cpu_manager.go Show resolved Hide resolved

k8s-ci-robot merged commit 08e5781 into kubernetes:master Nov 6, 2019

k8s-ci-robot added this to the v1.17 milestone Nov 6, 2019

klueska mentioned this pull request Nov 11, 2019

Topology Manager for 1.17 Tracking Issue #83479

Closed

klueska mentioned this pull request Dec 4, 2019

Refactor parts of CPUManager #84196

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug in TopologyManager hint generation after kubelet restart #84525

Fix bug in TopologyManager hint generation after kubelet restart #84525

klueska commented Oct 29, 2019 •

edited

klueska commented Oct 29, 2019

k8s-ci-robot commented Oct 29, 2019

klueska commented Oct 29, 2019

cbf123 left a comment

klueska commented Oct 29, 2019

cbf123 commented Oct 29, 2019

klueska commented Oct 30, 2019

klueska commented Oct 30, 2019

cbf123 commented Oct 30, 2019

klueska commented Oct 30, 2019

klueska commented Nov 4, 2019

cbf123 left a comment

klueska commented Nov 4, 2019

klueska commented Nov 5, 2019 •

edited

ConnorDoyle left a comment

ConnorDoyle Nov 6, 2019

klueska Nov 6, 2019

ConnorDoyle Nov 6, 2019

k8s-ci-robot commented Nov 6, 2019

ConnorDoyle commented Nov 6, 2019

klueska commented Nov 6, 2019

ConnorDoyle commented Nov 6, 2019

klueska commented Nov 6, 2019

Fix bug in TopologyManager hint generation after kubelet restart #84525

Fix bug in TopologyManager hint generation after kubelet restart #84525

Conversation

klueska commented Oct 29, 2019 • edited

klueska commented Oct 29, 2019

k8s-ci-robot commented Oct 29, 2019

klueska commented Oct 29, 2019

cbf123 left a comment

Choose a reason for hiding this comment

klueska commented Oct 29, 2019

cbf123 commented Oct 29, 2019

klueska commented Oct 30, 2019

klueska commented Oct 30, 2019

cbf123 commented Oct 30, 2019

klueska commented Oct 30, 2019

klueska commented Nov 4, 2019

cbf123 left a comment

Choose a reason for hiding this comment

klueska commented Nov 4, 2019

klueska commented Nov 5, 2019 • edited

ConnorDoyle left a comment

Choose a reason for hiding this comment

ConnorDoyle Nov 6, 2019

Choose a reason for hiding this comment

klueska Nov 6, 2019

Choose a reason for hiding this comment

ConnorDoyle Nov 6, 2019

Choose a reason for hiding this comment

k8s-ci-robot commented Nov 6, 2019

ConnorDoyle commented Nov 6, 2019

klueska commented Nov 6, 2019

ConnorDoyle commented Nov 6, 2019

klueska commented Nov 6, 2019

klueska commented Oct 29, 2019 •

edited

klueska commented Nov 5, 2019 •

edited