Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let the service controller retry when presistUpdate returns a conflict error #68087

Merged
merged 1 commit into from
Sep 6, 2018

Conversation

grayluck
Copy link
Contributor

@grayluck grayluck commented Aug 30, 2018

What this PR does / why we need it:
If a load balancer is changed while provisioning, it will fall into an error state and will not self-recover.
This PR picks up the conflict error and let serviceController retry in order to get the load balancer out of error state.

Special notes for your reviewer:
/assign @MrHohn @rramkumar1

Release note:

Let service controller retry creating load balancer when persistUpdate failed due to conflict.

@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Aug 30, 2018
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 30, 2018
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 30, 2018
@MrHohn
Copy link
Member

MrHohn commented Aug 30, 2018

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 30, 2018
@MrHohn
Copy link
Member

MrHohn commented Aug 30, 2018

@grayluck Can you update the release note to explain what this fixes? Thanks.

@bowei
Copy link
Member

bowei commented Aug 30, 2018

where is the test?

@neolit123
Copy link
Member

/sig apps

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 30, 2018
@grayluck
Copy link
Contributor Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 31, 2018
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 31, 2018
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 31, 2018
@grayluck
Copy link
Contributor Author

Unit test added.
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 31, 2018
@grayluck
Copy link
Contributor Author

/retest

@grayluck grayluck force-pushed the refetch branch 2 times, most recently from 33fd9e6 to 482a499 Compare August 31, 2018 22:20
@MrHohn
Copy link
Member

MrHohn commented Aug 31, 2018

Might need to run ./hack/update-bazel.sh to update the BUILD file.

@grayluck
Copy link
Contributor Author

Done. Thanks!

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Aug 31, 2018
@grayluck
Copy link
Contributor Author

Release note updated.

@grayluck
Copy link
Contributor Author

/milestone v1.12

@k8s-ci-robot
Copy link
Contributor

@grayluck: You must be a member of the kubernetes/kubernetes-milestone-maintainers github team to set the milestone.

In response to this:

/milestone v1.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@MrHohn MrHohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good, minor comments in test.


svcCache := controller.cache.getOrCreate(svcName)
if err := controller.processServiceUpdate(svcCache, svc, svcName); err == nil {
t.Errorf("controller.processServiceUpdate() = nil, want error")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fatalf here? We won't get event below anyway if this doesn't error out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

retryMsg := "Error creating load balancer (will retry)"
if gotEvent := func() bool {
events := controller.eventRecorder.(*record.FakeRecorder).Events
for len(events) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do something similar to

func checkEvent(t *testing.T, recorder *record.FakeRecorder, expected string, shouldMatch bool) bool {
select {
case received := <-recorder.Events:
if strings.HasPrefix(received, expected) != shouldMatch {
t.Errorf(received)
if shouldMatch {
t.Errorf("Should receive message \"%v\" but got \"%v\".", expected, received)
} else {
t.Errorf("Unexpected event \"%v\".", received)
}
}
return false
case <-time.After(2 * time.Second):
if shouldMatch {
t.Errorf("Should receive message \"%v\" but got timed out.", expected)
}
return true
}
}
?

Seems better to have an timeout on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is synchronous in this test. The time processServiceUpdate is returned, all events should be recorded. And "for len(events) > 0" guards that we will not wait for future (and there won't be future events) here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

@grayluck
Copy link
Contributor Author

grayluck commented Sep 4, 2018

Thanks for the review, MrHohn! Commit updated.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 4, 2018
Copy link
Member

@MrHohn MrHohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

retryMsg := "Error creating load balancer (will retry)"
if gotEvent := func() bool {
events := controller.eventRecorder.(*record.FakeRecorder).Events
for len(events) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: grayluck, MrHohn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 4, 2018
@bowei
Copy link
Member

bowei commented Sep 4, 2018

/milestone v1.12

@k8s-ci-robot k8s-ci-robot added this to the v1.12 milestone Sep 4, 2018
@grayluck
Copy link
Contributor Author

grayluck commented Sep 5, 2018

Thanks Bowei!
/priority important-soon

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 5, 2018
@MrHohn
Copy link
Member

MrHohn commented Sep 6, 2018

/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Sep 6, 2018
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants