Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix endpoints controller del lead-election endpoints #45478

Merged
merged 1 commit into from
May 11, 2017

Conversation

HardySimpson
Copy link
Contributor

@HardySimpson HardySimpson commented May 8, 2017

when there are multiple controller-manager instances, we observe that it will delete leader-election endpoints after 5min, and cause re-election, add a check to avoid that

Fixes #45585

error log

192.168.0.5 - - [02/May/2017:15:10:13 +0000] "GET /api/v1/endpoints HTTP/1.1" 200 1175 "-" "kube-controller-manager/V100R001C00B012 (linux/amd64) kubernetes/bede5a0/endpoint-controller"
192.168.0.5 - - [02/May/2017:15:10:13 +0000] "DELETE /api/v1/namespaces/kube-system/endpoints/kube-controller-manager HTTP/1.1" 200 46 "-" "kube-controller-manager/V100R001C00B012 (linux/amd64) kubernetes/bede5a0/endpoint-controller"
192.168.0.5 - - [02/May/2017:15:10:13 +0000] "DELETE /api/v1/namespaces/kube-system/endpoints/kube-scheduler HTTP/1.1" 200 46 "-" "kube-controller-manager/V100R001C00B012 (linux/amd64) kubernetes/bede5a0/endpoint-controller"
192.168.0.7 - - [02/May/2017:15:10:14 +0000] "GET /api/v1/namespaces/kube-system/endpoints/kube-scheduler HTTP/1.1" 404 123 "-" "kube-scheduler/V100R001C00B012 (linux/amd64) kubernetes/bede5a0"
192.168.0.7 - - [02/May/2017:15:10:14 +0000] "POST /api/v1/namespaces/kube-system/endpoints HTTP/1.1" 201 398 "-" "kube-scheduler/V100R001C00B012 (linux/amd64) kubernetes/bede5a0"
192.168.0.6 - - [02/May/2017:15:10:14 +0000] "GET /api/v1/namespaces/kube-system/endpoints/kube-controller-manager HTTP/1.1" 404 141 "-" "kube-controller-manager/V100R001C00B012 (linux/amd64) kubernetes/bede5a0"
192.168.0.6 - - [02/May/2017:15:10:14 +0000] "POST /api/v1/namespaces/kube-system/endpoints HTTP/1.1" 201 416 "-" "kube-controller-manager/V100R001C00B012 (linux/amd64) kubernetes/bede5a0"
192.168.0.7 - - [02/May/2017:15:10:14 +0000] "GET /api/v1/namespaces/kube-system/endpoints/kube-controller-manager HTTP/1.1" 200 416 "-" "kube-controller-manager/V100R001C00B012 (linux/amd64) ku

release-note

none

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 8, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels May 8, 2017
@HardySimpson
Copy link
Contributor Author

what's wrong with CI, it shows

W0508 09:30:26.751] ERROR: /workspace/k8s.io/kubernetes/pkg/controller/endpoint/BUILD:11:1: error executing shell command: 'bazel-out/local-fastbuild/bin/pkg/controller/endpoint/bazel-out/local-fastbuild/bin/pkg/controller/endpoint/go_default_library.a.GoCompileFile.params' failed: bash failed: error executing command 
W0508 09:30:26.751]   (exec env - \
W0508 09:30:26.751]     GOARCH=amd64 \
W0508 09:30:26.752]     GOOS=linux \
W0508 09:30:26.752]   /bin/bash -c bazel-out/local-fastbuild/bin/pkg/controller/endpoint/bazel-out/local-fastbuild/bin/pkg/controller/endpoint/go_default_library.a.GoCompileFile.params)
W0508 09:30:26.752] 
W0508 09:30:26.753] Use --sandbox_debug to see verbose messages from the sandbox.
W0508 09:30:26.755] Use --strategy=GoCompile=standalone to disable sandboxing for the failing actions.
I0508 09:30:26.855] k8s.io/kubernetes/pkg/controller/endpoint/endpoints_controller.go:38: can't find import: "k8s.io/kubernetes/pkg/client/leaderelection/resourcelock"
W0508 09:30:30.025] ____Building complete.
W0508 09:30:30.054] ____Elapsed time: 63.080s, Critical Path: 40.89s

but the import seems no error

"k8s.io/kubernetes/pkg/client/leaderelection/resourcelock"

@MrHohn
Copy link
Member

MrHohn commented May 8, 2017

what's wrong with CI

Please run ./hack/update-bazel.sh to update the BUILD files.

@MrHohn
Copy link
Member

MrHohn commented May 8, 2017

I don't have enough context, @wojtek-t do you mind taking a look?

@wojtek-t
Copy link
Member

wojtek-t commented May 8, 2017

I think this PR is fine, but adding @mikedanese for final confirmation.

@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 9, 2017
@HardySimpson
Copy link
Contributor Author

HardySimpson commented May 9, 2017

Please run ./hack/update-bazel.sh to update the BUILD files.

Yes I update it now

@mikedanese
Copy link
Member

mikedanese commented May 9, 2017

This is a fine solution but I will propose an alternative. The leader election client could create a headless service without a selector for the corresponding endpoints if that service does not exist. Cluster operators can later modify this service to select the corresponding pods which seems desirable.

@HardySimpson
Copy link
Contributor Author

This is a fine solution but I will propose an alternative. The leader election client could create a headless service without a selector for the corresponding endpoints if that service does not exist. Cluster operators can later modify this service to select the corresponding pods which seems desirable.

That's also OK, the good points of this solution is avoiding endpoints-controller know anything about leader-election, the bad points is that user maybe confused by 2 not working Services. maybe you can add some description in these lead-election Services.

@mikedanese
Copy link
Member

mikedanese commented May 10, 2017

We might want to consider the other approach. For now this is fine.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 10, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: HardySimpson, mikedanese

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 10, 2017
@timothysc timothysc added area/HA kind/bug Categorizes issue or PR as related to a bug. labels May 10, 2017
@timothysc timothysc added this to the v1.7 milestone May 10, 2017
@timothysc
Copy link
Member

Fixes #45585

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 45569, 45602, 45604, 45478, 45550)

@k8s-github-robot k8s-github-robot merged commit fc7ae99 into kubernetes:master May 11, 2017
@jsravn
Copy link
Contributor

jsravn commented Jun 19, 2017

Any chance of a 1.6 cherry pick?

@@ -461,6 +462,14 @@ func (e *EndpointController) checkLeftoverEndpoints() {
}
for i := range list.Items {
ep := &list.Items[i]
if _, ok := ep.Annotations[resourcelock.LeaderElectionRecordAnnotationKey]; ok {
Copy link
Member

@liggitt liggitt Jul 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems inappropriate... I would have expected an empty service definition rather than changing the endpoints controller (as long as an endpoints object was still being used as a lock object... the move to configmaps makes this less relevant)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed that here #45478 (comment), I prefer it as well. Happy to switch that over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/HA cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet