New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in TopologyManager hint generation after kubelet restart #84525
Fix bug in TopologyManager hint generation after kubelet restart #84525
Conversation
/assign @ConnorDoyle |
@klueska: GitHub didn't allow me to request PR reviews from the following users: cbf123. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, appreciate the fix.
@cbf123 would you mind pulling down this branch and giving it a spin to make sure it addresses your issue in a live setting? |
@klueska Yep, I should be able to try it out. |
@cbf123 after sleeping on this solution, I think there may still be a problem. It's possible that a kubelet goes down after a container has been granted a set of devices, but before a container has been granted a set of CPUs (or vice versa). When this happens, we want the hints generated for the container after the kubelet restart to be the same as they were before the restart so that any unallocated resources (either a set of devices or CPUs in this case) can have their hints combined properly by the I will rework this patch sometime today and let you know once it's done. |
2e04e11
to
583ca94
Compare
Basic initial testing shows the previous version of the fix does seem to resolve the original problem. I'll give this one a try too. |
/retest |
If you're satisfied with this patch for now at least, can you give it an lgtm so other reviewers know the state of things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We can't actually even do that without intertwining the topologymanager policy with the allocation policy (since for the best-effort policy, for example, we want to continue the allocation even if the NUMA affinity isn't satisfied). |
This will become especially important as we move to a model where exclusive CPUs are assigned at pod admission time rather than at pod creation time. Having this function will allow us to do garbage collection on these CPUs anytime we are about to allocate CPUs to a new set of containers, in addition to reclaiming state periodically in the reconcileState() loop.
This ensures that we have the most up-to-date state when generating topology hints for a container. Without this, it's possible that some resources will be seen as allocated, when they are actually free.
b63fbcb
to
585adc8
Compare
This patch also includes test to make sure the newly added logic works as expected.
To reiterate, I'm looking for a review about the points in the following comment from someone more knowledgable in this area: #84525 (comment) /cc @derekwaynecarr |
585adc8
to
6b0bd6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Strip all devices in use from the list of healthy ones. | ||
return m.healthyDevices[resource].Difference(m.allocatedDevices[resource]) | ||
} | ||
|
||
func (m *ManagerImpl) getAllocatedDevices(podUID, containerName, resource string) sets.String { | ||
// Pull the list of device IDs directly from the list of podDevices. | ||
return m.podDevices[podUID][containerName][resource].deviceIds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not expected for the 2nd and 3rd level mappings to be nil, but it would be nicer to check (and log and return an empty set I suppose)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One better -- there was already a function for this in the podDevices
abstraction:
m.podDevices.containerDevices()
@@ -28,26 +28,49 @@ import ( | |||
// ensures the Device Manager is consulted when Topology Aware Hints for each | |||
// container are created. | |||
func (m *ManagerImpl) GetTopologyHints(pod v1.Pod, container v1.Container) map[string][]topologymanager.TopologyHint { | |||
deviceHints := make(map[string][]topologymanager.TopologyHint) | |||
// Garbage collect any stranded device resources before providing TopologyHints | |||
m.updateAllocatedDevices(m.activePods()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for moving this out of getAvailableDevices
.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ConnorDoyle, klueska The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Responding to the specific questions called out by @klueska:
/cc @derekwaynecarr for a 2nd opinion, he may have more background on pod status guarantees |
This patch also includes test to make sure the newly added logic works as expected.
6b0bd6d
to
4d4d4bd
Compare
@ConnorDoyle Thanks for the review, and thanks for the response to my questions! I agree theres nothing more we can do in this PR if a pod status is not ready at the point we need it to be. It would be good to know if it is at all possible to not have a vaild status at this point though. Let's wait and what @derekwaynecarr's response is. With that said, I think all that's missing now is an /lgtm now that I've addressed: #84525 (review) |
/lgtm |
/retest |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Previously, the HintProviders for the CPUManager and devicemanager would
attempt to generate (new) hints for already running containers after a
kubelet restart.
This patch adds logic to both the CPUManager and the devicemanager to regenerate hints for containers that already have CPUs and/or devices allocated to them.
Which issue(s) this PR fixes:
Fixes #84479
Does this PR introduce a user-facing change?: