Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle pod addition / removal errors #82209

Merged
merged 1 commit into from Sep 11, 2019

Conversation

@tedyu
Copy link
Contributor

commented Aug 31, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
Currently error return from RemovePod / AddPod is ignored in generic_scheduler.

This PR adds log for each error return.

Since reprievePod uses removePod to undo pod addition when fits is false, I think we should treat the non-nil error from removePod as error.
Otherwise selectVictimsOnNode would operate on wrong assumption.

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@tedyu

This comment has been minimized.

Copy link
Contributor Author

commented Sep 3, 2019

@wgliang
Can you review ?

thanks

@wgliang
wgliang approved these changes Sep 4, 2019
Copy link
Member

left a comment

/kind cluenup
/lgtm

Looks like nothing hurts.
@tedyu Thanks.

@tedyu

This comment has been minimized.

Copy link
Contributor Author

commented Sep 5, 2019

@Huang-Wei
Please review

Copy link
Member

left a comment

Some comments.

pkg/scheduler/core/generic_scheduler.go Outdated Show resolved Hide resolved
pkg/scheduler/core/generic_scheduler.go Outdated Show resolved Hide resolved
pkg/scheduler/core/generic_scheduler.go Outdated Show resolved Hide resolved
pkg/scheduler/core/generic_scheduler.go Outdated Show resolved Hide resolved
@Huang-Wei

This comment has been minimized.

Copy link
Member

commented Sep 5, 2019

BTW: it should be a /cleanup instead of /bug.

@yutedz yutedz force-pushed the yutedz:gen-sched-err branch from 1e70a19 to 85acb02 Sep 5, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm label Sep 5, 2019
@tedyu

This comment has been minimized.

Copy link
Contributor Author

commented Sep 5, 2019

/kind cleanup

@tedyu

This comment has been minimized.

Copy link
Contributor Author

commented Sep 5, 2019

@Huang-Wei
Please take another look.

Thanks

@yutedz yutedz force-pushed the yutedz:gen-sched-err branch from 85acb02 to 6305fcc Sep 5, 2019
@tedyu

This comment has been minimized.

Copy link
Contributor Author

commented Sep 5, 2019

@Huang-Wei
Your comment has been addressed.
Please review again.

@yutedz yutedz force-pushed the yutedz:gen-sched-err branch from 6305fcc to fbf0acd Sep 5, 2019
@k8s-ci-robot k8s-ci-robot added the lgtm label Sep 6, 2019
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Huang-Wei, tedyu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tedyu

This comment has been minimized.

Copy link
Contributor Author

commented Sep 6, 2019

Thanks for the review, @Huang-Wei

@yutedz yutedz force-pushed the yutedz:gen-sched-err branch from 6cbe294 to 89a70fa Sep 6, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm label Sep 6, 2019
@tedyu

This comment has been minimized.

Copy link
Contributor Author

commented Sep 9, 2019

@Huang-Wei
Please add /lgtm again.

Thanks

@Huang-Wei

This comment has been minimized.

Copy link
Member

commented Sep 9, 2019

/lgtm

BTW: any new changes this force-push commit introduced?

@k8s-ci-robot k8s-ci-robot added the lgtm label Sep 9, 2019
@tedyu

This comment has been minimized.

Copy link
Contributor Author

commented Sep 9, 2019

No new change.
The previous patch got stale due to recent checkin.

@liggitt

This comment has been minimized.

Copy link
Member

commented Sep 11, 2019

/test pull-kubernetes-bazel-test

@liggitt

This comment has been minimized.

Copy link
Member

commented Sep 11, 2019

/hold cancel

@liggitt

This comment has been minimized.

Copy link
Member

commented Sep 11, 2019

all scheduler-related PRs in the failing batches: #82187 #82209 #82222

@tedyu

This comment has been minimized.

Copy link
Contributor Author

commented Sep 11, 2019

Since TestBindPlugiin failed, I wonder if 82187 was related to the test failure.

@tedyu

This comment has been minimized.

Copy link
Contributor Author

commented Sep 11, 2019

/test pull-kubernetes-integration

@tedyu

This comment has been minimized.

Copy link
Contributor Author

commented Sep 11, 2019

I have run integration test locally where the test passes

@tedyu

This comment has been minimized.

Copy link
Contributor Author

commented Sep 11, 2019

This was what I did with integration test local run:

with 82187 - pass
with 82187 and this - pass
with 82187 and this and 82222 - test failure in TestBindPlugin
82222 alone - test failure in TestBindPlugin

I0911 10:00:52.586024   27758 httplog.go:90] PUT /api/v1/namespaces/default/endpoints/kubernetes: (1.447854ms) 200 [___TestBindPlugin_in_k8s_io_kubernetes_test_integration_scheduler/v0.0.0 (darwin/amd64) kubernetes/$Format 127.0.0.1:53171]
--- FAIL: TestBindPlugin (66.28s)
    framework_test.go:1057: test #3: Waiting for invoke event 2 timeout.
    framework_test.go:1057: test #3: Waiting for invoke event 3 timeout.
@Huang-Wei

This comment has been minimized.

Copy link
Member

commented Sep 11, 2019

Thanks @tedyu. I can confirm #82222 is the culprit.

/hold cancel

@k8s-ci-robot k8s-ci-robot merged commit 02433e0 into kubernetes:master Sep 11, 2019
23 of 24 checks passed
23 of 24 checks passed
pull-kubernetes-kubemark-e2e-gce-big Job triggered.
Details
cla/linuxfoundation yutedz authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-conformance-kind-ipv6 Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.