Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix default scheduler crash if scheduler extender filter returns a not found node #79641

Merged
merged 1 commit into from Aug 14, 2019

Conversation

@yqwang-ms
Copy link
Contributor

commented Jul 2, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
See issue #79640

Which issue(s) this PR fixes:

Fixes #79640

Special notes for your reviewer:
In future, we may should further improve the default scheduler tolerance to scheduler extender.

Does this PR introduce a user-facing change?:

If scheduler extender filtered a not found node, current scheduling round for this pod will just be skipped.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jul 2, 2019

Hi @yqwang-ms. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yqwang-ms

This comment has been minimized.

Copy link
Contributor Author

commented Jul 2, 2019

@wgliang

This comment has been minimized.

Copy link
Member

commented Jul 2, 2019

/assign
/ok-to-test

pkg/scheduler/core/extender.go Outdated Show resolved Hide resolved
@yqwang-ms

This comment has been minimized.

Copy link
Contributor Author

commented Jul 5, 2019

/retest

@hex108
hex108 approved these changes Jul 5, 2019
Copy link
Member

left a comment

/lgtm

Could you please help squash the commits?

@yqwang-ms

This comment has been minimized.

Copy link
Contributor Author

commented Jul 5, 2019

I am not authorized to merge this pull request, @hex108 could you please help to click the "squash and merge" button? Such as
image

@hex108

This comment has been minimized.

Copy link
Member

commented Jul 5, 2019

I am not authorized to merge this pull request, @hex108 could you please help to click the "squash and merge" button? Such as
image

You could squash the commits in your local machine and force push it to remote.

@yqwang-ms yqwang-ms force-pushed the yqwang-ms:yqwang/fix-ds-crash branch from 053807a to d7a8a7f Jul 5, 2019

@k8s-ci-robot k8s-ci-robot removed the lgtm label Jul 5, 2019

@yqwang-ms

This comment has been minimized.

Copy link
Contributor Author

commented Jul 5, 2019

Thanks for the info, @hex108 ,I have force pushed, please check :)

@yqwang-ms

This comment has been minimized.

Copy link
Contributor Author

commented Jul 5, 2019

/retest

@hex108

This comment has been minimized.

Copy link
Member

commented Jul 5, 2019

/lgtm

Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm label Jul 5, 2019

@yqwang-ms

This comment has been minimized.

Copy link
Contributor Author

commented Jul 6, 2019

Hi @Huang-Wei @k82cn could you please take a look at this small fix? Thanks!

@yqwang-ms

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

Hi @hex108, I am not familar with the K8S PR process, could you please tell me what should I do to continue push this PR to approve?

@hex108

This comment has been minimized.

Copy link
Member

commented Jul 8, 2019

Hi @hex108, I am not familar with the K8S PR process, could you please tell me what should I do to continue push this PR to approve?

Wait for one of the approvers for approve. :)

@yqwang-ms

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

Great! Thanks!

@yqwang-ms

This comment has been minimized.

Copy link
Contributor Author

commented Jul 17, 2019

Hi @Huang-Wei @k82cn could you please take a look at this small fix when you free? :)
Or could you please tell me what should I do next?
Thanks again!

@yqwang-ms

This comment has been minimized.

Copy link
Contributor Author

commented Aug 12, 2019

/assign @k82cn

@yqwang-ms

This comment has been minimized.

Copy link
Contributor Author

commented Aug 12, 2019

/assign @Huang-Wei

@Huang-Wei
Copy link
Member

left a comment

Apologies for the late reply. Some comments below.

nodeResult = append(nodeResult, nodeNameToInfo[(*result.NodeNames)[i]].Node())
for _, nodeName := range *result.NodeNames {
if node, ok := nodeNameToInfo[nodeName]; ok {
nodeResult = append(nodeResult, node.Node())

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei Aug 12, 2019

Member

With this PR, len(nodeResult) is not necessarily the same as len(*result.NodeNames), so it'd good to change L303 to nodeResult = make([]*v1.Node).

This comment has been minimized.

Copy link
@yqwang-ms

yqwang-ms Aug 13, 2019

Author Contributor

Thanks for your review. :)

For this comment, do you mean change to nodeResult = make([]*v1.Node, 0), however, len(nodeResult) is expected to be the same as len(*result.NodeNames), and in common cases, it is. Otherwise, it is a rare "exception", and we already returned an error for this.

So, to optimize for most common cases, we would better to still make a slice with an expect capacity for the following fast appending.

Is that ok?

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei Aug 13, 2019

Member

For this comment, do you mean change to nodeResult = make([]*v1.Node, 0)

Yes, if the length is not fixed.

According to your comments that the length is fixed, and we return error upon mismatch. Technically we should do nodeResult = make([]*v1.Node, len(*result.NodeNames)) to ensure len and cap inside the slice are the same, and use nodeResult[i] = node.Node(). (I recalled that I did a test and it showed a perf improvement).

Can you update the code, as well as L305 to L310, and make them squashed into one commit?

This comment has been minimized.

Copy link
@yqwang-ms

yqwang-ms Aug 14, 2019

Author Contributor

Thanks! Adjusted according to your suggestion, as well as L311 to L314, pls check

pkg/scheduler/core/extender.go Outdated Show resolved Hide resolved
pkg/scheduler/core/extender.go Outdated Show resolved Hide resolved

@k8s-ci-robot k8s-ci-robot removed the lgtm label Aug 13, 2019

@yqwang-ms

This comment has been minimized.

Copy link
Contributor Author

commented Aug 13, 2019

/retest

1 similar comment
@yqwang-ms

This comment has been minimized.

Copy link
Contributor Author

commented Aug 13, 2019

/retest

@yqwang-ms yqwang-ms force-pushed the yqwang-ms:yqwang/fix-ds-crash branch from 8602e85 to 5927ec4 Aug 14, 2019

@yqwang-ms

This comment has been minimized.

Copy link
Contributor Author

commented Aug 14, 2019

/retest

@Huang-Wei

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

/lgtm
/approve
/retest

@k8s-ci-robot k8s-ci-robot added the lgtm label Aug 14, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Huang-Wei, yqwang-ms

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yqwang-ms

This comment has been minimized.

Copy link
Contributor Author

commented Aug 14, 2019

/retest

@k8s-ci-robot k8s-ci-robot merged commit 2ad2795 into kubernetes:master Aug 14, 2019

23 checks passed

cla/linuxfoundation yqwang-ms authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details

@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Aug 14, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.