Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query pod status outside loop over containers #77291

Merged
merged 2 commits into from
May 2, 2019

Conversation

tedyu
Copy link
Contributor

@tedyu tedyu commented May 1, 2019

What type of PR is this?
/kind cleanup

What this PR does / why we need it:
In manager#reconcileState, we call GetPodStatus per container.

This is not needed.
This PR moved the call outside the loop over containers.

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 1, 2019
@tedyu
Copy link
Contributor Author

tedyu commented May 1, 2019

/test pull-kubernetes-e2e-gce-100-performance

Copy link
Contributor

@mattjmcnaughton mattjmcnaughton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pr :)

One small functionality comment from me, and I would also like us to verify this change is safe to make before preceding.

@@ -225,8 +225,8 @@ func (m *manager) reconcileState() (success []reconciledContainer, failure []rec
for _, pod := range m.activePods() {
allContainers := pod.Spec.InitContainers
allContainers = append(allContainers, pod.Spec.Containers...)
status, ok := m.podStatusProvider.GetPodStatus(pod.UID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be confident that the podStatus will remain consistent over this entire loop? If we can't, I think we open ourselves up to a possible race condition if we check the podStatus at the start of the loop, and then it somehow changes during the middle of executing the loop. Although to be fair, there's also that risk even if we check the pod status during each iteration of the loop, its just less because we are checking the pod status more frequently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ConnorDoyle
Can you comment on this refactoring ?

Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this makes sense to do. the pod status really only needs to be queried once before we loop through the containers.

for _, container := range allContainers {
status, ok := m.podStatusProvider.GetPodStatus(pod.UID)
if !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do make this change (and I'm not yet positive its safe), I believe we should also move this if !ok check outside of the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable reference is to the ok outside the container loop.
The intention of the check here is to include container name in the failure.

So the check should be kept here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @mattjmcnaughton on this one, in the interest of local readability, if ok is false you could immediately append all of the containers to the failure list and continue the outer pod iteration loop. As is, only the first container is logged; it makes more sense to just log the pod name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm that the test needs to be modified when only pod name is returned:

			if !foundFailedContainer {
				t.Errorf("Expected reconciliation failure for container: %s", testCase.expectFailedContainerName)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning, expectFailedContainerName parameter can be dropped from cpu_manager_test.go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the reconciledContainer struct as-is; there are other scenarios that need to test for individual container reconcile failures.

For this change, if the pod status can't be queried we loop over each container in the pod and add it to the list of failures. I don't believe the tests would need to change beyond expecting all of the container to be in the failed list in this scenario.

Let me know if that isn't clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR. Let me know what you think, @ConnorDoyle

In order not to have two if blocks with same condition, I kept the if block inside the container loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no worse than what we had before, so if you don't feel like making further changes it's OK. Just for clarity, I had something like this in mind:

	for _, pod := range m.activePods() {
		allContainers := pod.Spec.InitContainers
		allContainers = append(allContainers, pod.Spec.Containers...)
		if status, ok := m.podStatusProvider.GetPodStatus(pod.UID); !ok {
			klog.Warningf("[cpumanager] reconcileState: skipping pod; status not found (pod: %s)", pod.Name)
			for _, container := range allContainers {
				failure = append(failure, reconciledContainer{pod.Name, container.Name, ""})
			}
			continue  // to next pod
		}

		for _, container := range allContainers {
			// ...
		}

The benefit I see would be to include all containers from the pod we failed to get status in the failure list, instead of just the first container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code is consistent with previous behavior.

So please accept the PR if it is Okay with you.

Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually for such a small efficiency gain the readability hit would not be worth it. It's idiomatic to check error conditions immediately in go. But, since it's only one line removed it is OK.

@ConnorDoyle
Copy link
Contributor

/assign

Copy link
Contributor

@ConnorDoyle ConnorDoyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

for _, container := range allContainers {
status, ok := m.podStatusProvider.GetPodStatus(pod.UID)
if !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually for such a small efficiency gain the readability hit would not be worth it. It's idiomatic to check error conditions immediately in go. But, since it's only one line removed it is OK.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 2, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ConnorDoyle, tedyu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 2, 2019
@k8s-ci-robot k8s-ci-robot merged commit 98c4c1e into kubernetes:master May 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants