-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query pod status outside loop over containers #77291
Conversation
/test pull-kubernetes-e2e-gce-100-performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pr :)
One small functionality comment from me, and I would also like us to verify this change is safe to make before preceding.
@@ -225,8 +225,8 @@ func (m *manager) reconcileState() (success []reconciledContainer, failure []rec | |||
for _, pod := range m.activePods() { | |||
allContainers := pod.Spec.InitContainers | |||
allContainers = append(allContainers, pod.Spec.Containers...) | |||
status, ok := m.podStatusProvider.GetPodStatus(pod.UID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be confident that the podStatus
will remain consistent over this entire loop? If we can't, I think we open ourselves up to a possible race condition if we check the podStatus at the start of the loop, and then it somehow changes during the middle of executing the loop. Although to be fair, there's also that risk even if we check the pod status during each iteration of the loop, its just less because we are checking the pod status more frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ConnorDoyle
Can you comment on this refactoring ?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this makes sense to do. the pod status really only needs to be queried once before we loop through the containers.
for _, container := range allContainers { | ||
status, ok := m.podStatusProvider.GetPodStatus(pod.UID) | ||
if !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do make this change (and I'm not yet positive its safe), I believe we should also move this if !ok
check outside of the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable reference is to the ok outside the container loop.
The intention of the check here is to include container name in the failure.
So the check should be kept here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @mattjmcnaughton on this one, in the interest of local readability, if ok
is false you could immediately append all of the containers to the failure list and continue
the outer pod iteration loop. As is, only the first container is logged; it makes more sense to just log the pod name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please confirm that the test needs to be modified when only pod name is returned:
if !foundFailedContainer {
t.Errorf("Expected reconciliation failure for container: %s", testCase.expectFailedContainerName)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning, expectFailedContainerName parameter can be dropped from cpu_manager_test.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the reconciledContainer struct as-is; there are other scenarios that need to test for individual container reconcile failures.
For this change, if the pod status can't be queried we loop over each container in the pod and add it to the list of failures. I don't believe the tests would need to change beyond expecting all of the container to be in the failed list in this scenario.
Let me know if that isn't clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the PR. Let me know what you think, @ConnorDoyle
In order not to have two if blocks with same condition, I kept the if block inside the container loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no worse than what we had before, so if you don't feel like making further changes it's OK. Just for clarity, I had something like this in mind:
for _, pod := range m.activePods() {
allContainers := pod.Spec.InitContainers
allContainers = append(allContainers, pod.Spec.Containers...)
if status, ok := m.podStatusProvider.GetPodStatus(pod.UID); !ok {
klog.Warningf("[cpumanager] reconcileState: skipping pod; status not found (pod: %s)", pod.Name)
for _, container := range allContainers {
failure = append(failure, reconciledContainer{pod.Name, container.Name, ""})
}
continue // to next pod
}
for _, container := range allContainers {
// ...
}
The benefit I see would be to include all containers from the pod we failed to get status in the failure list, instead of just the first container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code is consistent with previous behavior.
So please accept the PR if it is Okay with you.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually for such a small efficiency gain the readability hit would not be worth it. It's idiomatic to check error conditions immediately in go. But, since it's only one line removed it is OK.
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
for _, container := range allContainers { | ||
status, ok := m.podStatusProvider.GetPodStatus(pod.UID) | ||
if !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually for such a small efficiency gain the readability hit would not be worth it. It's idiomatic to check error conditions immediately in go. But, since it's only one line removed it is OK.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ConnorDoyle, tedyu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
In manager#reconcileState, we call GetPodStatus per container.
This is not needed.
This PR moved the call outside the loop over containers.