Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query pod status outside loop over containers #77291

Merged
merged 2 commits into from
May 2, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions pkg/kubelet/cm/cpumanager/cpu_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -225,10 +225,10 @@ func (m *manager) reconcileState() (success []reconciledContainer, failure []rec
for _, pod := range m.activePods() {
allContainers := pod.Spec.InitContainers
allContainers = append(allContainers, pod.Spec.Containers...)
status, ok := m.podStatusProvider.GetPodStatus(pod.UID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be confident that the podStatus will remain consistent over this entire loop? If we can't, I think we open ourselves up to a possible race condition if we check the podStatus at the start of the loop, and then it somehow changes during the middle of executing the loop. Although to be fair, there's also that risk even if we check the pod status during each iteration of the loop, its just less because we are checking the pod status more frequently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ConnorDoyle
Can you comment on this refactoring ?

Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this makes sense to do. the pod status really only needs to be queried once before we loop through the containers.

for _, container := range allContainers {
status, ok := m.podStatusProvider.GetPodStatus(pod.UID)
if !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do make this change (and I'm not yet positive its safe), I believe we should also move this if !ok check outside of the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable reference is to the ok outside the container loop.
The intention of the check here is to include container name in the failure.

So the check should be kept here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @mattjmcnaughton on this one, in the interest of local readability, if ok is false you could immediately append all of the containers to the failure list and continue the outer pod iteration loop. As is, only the first container is logged; it makes more sense to just log the pod name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm that the test needs to be modified when only pod name is returned:

			if !foundFailedContainer {
				t.Errorf("Expected reconciliation failure for container: %s", testCase.expectFailedContainerName)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning, expectFailedContainerName parameter can be dropped from cpu_manager_test.go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the reconciledContainer struct as-is; there are other scenarios that need to test for individual container reconcile failures.

For this change, if the pod status can't be queried we loop over each container in the pod and add it to the list of failures. I don't believe the tests would need to change beyond expecting all of the container to be in the failed list in this scenario.

Let me know if that isn't clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR. Let me know what you think, @ConnorDoyle

In order not to have two if blocks with same condition, I kept the if block inside the container loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no worse than what we had before, so if you don't feel like making further changes it's OK. Just for clarity, I had something like this in mind:

	for _, pod := range m.activePods() {
		allContainers := pod.Spec.InitContainers
		allContainers = append(allContainers, pod.Spec.Containers...)
		if status, ok := m.podStatusProvider.GetPodStatus(pod.UID); !ok {
			klog.Warningf("[cpumanager] reconcileState: skipping pod; status not found (pod: %s)", pod.Name)
			for _, container := range allContainers {
				failure = append(failure, reconciledContainer{pod.Name, container.Name, ""})
			}
			continue  // to next pod
		}

		for _, container := range allContainers {
			// ...
		}

The benefit I see would be to include all containers from the pod we failed to get status in the failure list, instead of just the first container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code is consistent with previous behavior.

So please accept the PR if it is Okay with you.

Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually for such a small efficiency gain the readability hit would not be worth it. It's idiomatic to check error conditions immediately in go. But, since it's only one line removed it is OK.

klog.Warningf("[cpumanager] reconcileState: skipping pod; status not found (pod: %s, container: %s)", pod.Name, container.Name)
klog.Warningf("[cpumanager] reconcileState: skipping pod; status not found (pod: %s)", pod.Name)
failure = append(failure, reconciledContainer{pod.Name, container.Name, ""})
break
}
Expand Down