New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix removing pods from podTopologyHints mapping #101615
Conversation
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Welcome @aheng-ch! |
Hi @aheng-ch. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
b370a21
to
b923c1a
Compare
b923c1a
to
e31460c
Compare
/assign @derekwaynecarr @Random-Liu @nolancon |
/kind bug |
/cc @klueska |
Hi @aheng-ch thanks for finding this issue and submitting your PR. This seems like a legit issue that should actually be backported to 1.20 and 1.21 as well. I will review it more formally early next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general. A few comments.
@@ -94,11 +95,11 @@ func (s *scope) AddHintProvider(h HintProvider) { | |||
|
|||
// It would be better to implement this function in topologymanager instead of scope | |||
// but topologymanager do not track mapping anymore | |||
func (s *scope) AddContainer(pod *v1.Pod, containerID string) error { | |||
func (s *scope) AddContainer(pod *v1.Pod, container *v1.Container, containerID string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this always returns nil
, let's remove the error
return value and make this function consistent with the other AddContainer()
calls from other components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, I will update it
podUIDString := s.podMap[containerID] | ||
delete(s.podMap, containerID) | ||
if _, exists := s.podTopologyHints[podUIDString]; exists { | ||
delete(s.podTopologyHints[podUIDString], containerID) | ||
podUIDString, containerName, _ := s.podMap.GetContainerRef(containerID) | ||
s.podMap.RemoveByContainerID(containerID) | ||
if _, err := s.podMap.GetContainerID(podUIDString, containerName); err != nil { | ||
delete(s.podTopologyHints[podUIDString], containerName) | ||
if len(s.podTopologyHints[podUIDString]) == 0 { | ||
delete(s.podTopologyHints, podUIDString) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of this feels redundant. I think the following would be sufficient:
if podUIDString, containerName, exists := s.podMap.GetContainerRef(containerID); exists {
s.podMap.RemoveByContainerID(containerID)
delete(s.podTopologyHints[podUIDString], containerName)
if len(s.podTopologyHints[podUIDString]) == 0 {
delete(s.podTopologyHints, podUIDString)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems ok to delete(s.podTopologyHints[podUIDString], containerName)
immediately after s.podMap.RemoveByContainerID(containerID)
because we don't use it after the pod is created. But podMap
is the mapping of (PodUid, ContainerName) to ContainerID, when a container restart (ContainerID changed), a new record will be added to podMap
. Then the contents in podMap
would be like ContainerIDOld: {PodUID1, ContainerName1}, ContainerIDNew:{PodUID1, ContainerName1}
. If we delete ContainerName1
in podTopologyHints
when remove ContainerIDOld
from podMap
, it will cause the info between podMap
and podTopologyHints
is inconsistent. So I think it may be better to remove the container from podTopologyHints
after deleting all records about this container in podMap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. You are handling the case where a container with the same podUUID
and containerName
is started with a different ContainerID
(i.e. after a restart). We clearly don't want to delete that new containers topology hints inadvertently.
I think this deserves a comment, since it is not obvious why we would need to re-call into the podMap
in this way. How about:
// Get the podUID and containerName associated with the containerID to be removed and remove it
podUIDString, containerName, err := s.podMap.GetContainerRef(containerID)
if err != nil {
return nil
}
s.podMap.RemoveByContainerID(containerID)
// In cases where a container has been restarted, it's possible that the same podUID and
// containerName are already associated with a *different* containerID now. Only remove
// the TopologyHints associated with that podUID and containerName if this is not true.
if _, err := s.podMap.GetContainerID(podUIDString, containerName); err != nil {
delete(s.podTopologyHints[podUIDString], containerName)
if len(s.podTopologyHints[podUIDString]) == 0 {
delete(s.podTopologyHints, podUIDString)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your suggestion
if len1-len2 != 1 || lenHints1-lenHints2 != 1 { | ||
t.Errorf("Remove Pod resulted in error") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we break this into two checks with different errors based on which one was incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right, I will update it
0b06923
to
f5df514
Compare
f5df514
to
ff7b94f
Compare
Looks good. Let's backport this to 1.20 and 1.21 once this is merged. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aheng-ch, klueska The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubernetes-e2e-kind |
/triage accepted |
@fromanirh: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/priority important-soon |
…1615-origin-release-1.21 Automated cherry pick of #101615: fix removing pods from podTopologyHints mapping
…1615-origin-release-1.20 Automated cherry pick of #101615: fix removing pods from podTopologyHints mapping
What type of PR is this?
/kind bug
What this PR does / why we need it:
podTopologyHints
recordsTopologyHints
and is indexed by PodUID to ContainerNamekubernetes/pkg/kubelet/cm/topologymanager/scope.go
Line 84 in 81dd9d7
but when remove a container, it deletes the record according to the containerID which will result in the record never being deleted
kubernetes/pkg/kubelet/cm/topologymanager/scope.go
Line 115 in 81dd9d7
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: