Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SetNodeUpdateStatusNeeded whenever nodeAdd event is received #37727

Merged
merged 1 commit into from
Dec 2, 2016

Conversation

rkouj
Copy link
Contributor

@rkouj rkouj commented Nov 30, 2016

What this PR does / why we need it:
Bug fix and SetNodeStatusUpdateNeeded for a node whenever its api object is added. This is to ensure that we don't lose the attached list of volumes in the node when its api object is deleted and recreated.

fixes #37586
#37585

Special notes for your reviewer:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 30, 2016
@k8s-oncall
Copy link

This change is Reviewable

@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Dec 1, 2016
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 1, 2016
@saad-ali saad-ali added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Dec 1, 2016
@saad-ali saad-ali added this to the v1.5 milestone Dec 1, 2016
@saad-ali saad-ali added cherrypick-candidate do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. labels Dec 1, 2016
Copy link
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small comments

@@ -457,6 +460,7 @@ func (asw *actualStateOfWorld) updateNodeStatusUpdateNeeded(nodeName types.NodeN
"Failed to set statusUpdateNeeded to needed %t because nodeName=%q does not exist",
needed,
nodeName)
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Errors should not be silently swallowed, consumers should decide if they want to swallow the error. So I'd suggest modifying this chain to bubble the error up the caller and let the caller decide to ignore it.

Copy link
Contributor Author

@rkouj rkouj Dec 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and I wanted to change it back to return fmt.Errorf() as it previously was. However that would involve changing the return type of the function and everywhere where it is being called (twice in this case) which might also affect some tests.
It might make sense to not touch production code that has been tested before. Also this would involve an additional overhead on the intended fix of this PR.
I can definitely create a separate issue and PR to refactor this code and spend some time with the changes that it would potentially affect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. We can do this during normal 1.6 development.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #39056

@@ -1094,6 +1094,17 @@ func Test_OneVolumeTwoNodes_TwoDevicePaths(t *testing.T) {
verifyAttachedVolume(t, attachedVolumes, generatedVolumeName2, string(volumeName), node2Name, devicePath2, true /* expectedMountedByNode */, false /* expectNonZeroDetachRequestedTime */)
}

func Test_SetNodeStatusUpdateNeededError(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write a comment before the test summarizing what it is testing (expected behavior).

Then inside the test I like to break it down in to three sections:

// Arrange
// Act
// Assert

This makes it easier to follow the test. Basically you setup, do something, then verify it. Up to you if you want to follow this pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@rkouj
Copy link
Contributor Author

rkouj commented Dec 1, 2016

After stubbing out code that deletes pods, I was able to verify that the upgrade works and I'm able to see the data that was written before the upgrade.

kubectl describe pods
https://gist.github.com/rkouj/dc642237b390b77f6db891bfb11255f7

@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GKE smoke e2e failed for commit b25a426. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@saad-ali
Copy link
Member

saad-ali commented Dec 1, 2016

After stubbing out code that deletes pods, I was able to verify that the upgrade works and I'm able to see the data that was written before the upgrade.

Awesome.

@rkouj
Copy link
Contributor Author

rkouj commented Dec 1, 2016

@k8s-bot gci gke e2e test this

Copy link
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more nit

}
nodeName := types.NodeName(node.Name)
adc.nodeUpdate(nil, obj)
adc.actualStateOfWorld.SetNodeStatusUpdateNeeded(nodeName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a comment why we are doing this since it is not obvious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@saad-ali saad-ali removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Dec 1, 2016
@saad-ali
Copy link
Member

saad-ali commented Dec 1, 2016

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 1, 2016
@saad-ali saad-ali removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 1, 2016
@saad-ali
Copy link
Member

saad-ali commented Dec 1, 2016

Removing LGTM while @rkouj investigates another upgrade failure.

@rkouj
Copy link
Contributor Author

rkouj commented Dec 1, 2016

I retested it and it's working fine. It looks like first time around I didn't have the fix in my branch when I was doing the test. I'll retest this again tomorrow morning to be absolutely sure.

@rkouj
Copy link
Contributor Author

rkouj commented Dec 1, 2016

Retested. The fix works.

This is when the node object got recreated

E1201 22:26:42.684050 3486 kubelet_node_status.go:131] Previously node "kubernetes-minion-group-k3b4" had externalID "6192076403268984273"; now it is "6725016659983934100"; will delete and recreate.

This is when the nodeStatusUpdateNeeded got called.

I1201 22:26:42.977372 5 attach_detach_controller.go:254] [LOGTOREMOVE] about to call SetNodeStatusUpdateNeeded for nodeName kubernetes-minion-group-k3b4

@rkouj
Copy link
Contributor Author

rkouj commented Dec 1, 2016

@rkouj
Copy link
Contributor Author

rkouj commented Dec 1, 2016

Frequency of updates on the nodeAdd

https://gist.github.com/anonymous/bf6ea5f3e465d261f918a7e062966515

@rkouj
Copy link
Contributor Author

rkouj commented Dec 1, 2016

@saad-ali PTAL

@saad-ali
Copy link
Member

saad-ali commented Dec 1, 2016

Reviewed 1 of 4 files at r1, 2 of 3 files at r2, 1 of 1 files at r3.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


Comments from Reviewable

@saad-ali
Copy link
Member

saad-ali commented Dec 1, 2016

Thanks for revalidating!

/lgtm


Review status: all files reviewed at latest revision, 3 unresolved discussions.


Comments from Reviewable

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 1, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-ci-robot
Copy link
Contributor

Jenkins kops AWS e2e failed for commit 638ef1b. Full PR test history.

The magic incantation to run this job again is @k8s-bot kops aws e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE e2e failed for commit 638ef1b. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@rkouj
Copy link
Contributor Author

rkouj commented Dec 2, 2016

@k8s-bot cvm gce e2e test this

@rkouj
Copy link
Contributor Author

rkouj commented Dec 2, 2016

@k8s-bot kops aws e2e test this

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit c552f89 into kubernetes:master Dec 2, 2016
@saad-ali saad-ali added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Dec 2, 2016
k8s-github-robot pushed a commit that referenced this pull request Dec 2, 2016
…upstream-release-1.5

Automatic merge from submit-queue

Automated cherry pick of #37727

Cherry pick of #37727 on release-1.5.
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GCE PD unable to attach when node is upgraded to 1.5 from 1.4
6 participants