Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taint node when initializing node. #63955

Merged
merged 1 commit into from Jul 27, 2018

Conversation

@k82cn
Copy link
Member

k82cn commented May 17, 2018

Signed-off-by: Da K. Ma klaus1982.cn@gmail.com

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #63897

Release note:

If `TaintNodesByCondition` enabled, taint node with `TaintNodeUnschedulable` when
initializing node to avoid race condition.
@k82cn

This comment has been minimized.

Copy link
Member Author

k82cn commented May 17, 2018

/sig node

@k82cn k82cn changed the title Taint node when initializing node. WIP: Taint node when initializing node. May 17, 2018

@k82cn k82cn force-pushed the k82cn:k8s_63897 branch from e4af5e4 to 7b6c4df May 17, 2018

@k8s-ci-robot k8s-ci-robot added size/L and removed size/S labels May 17, 2018

@k82cn k82cn force-pushed the k82cn:k8s_63897 branch from 7b6c4df to 6190017 May 17, 2018

@k82cn k82cn changed the title WIP: Taint node when initializing node. Taint node when initializing node. May 17, 2018

@k82cn k82cn force-pushed the k82cn:k8s_63897 branch from 6190017 to 06bc488 May 17, 2018

@k82cn

This comment has been minimized.

Copy link
Member Author

k82cn commented May 17, 2018

/retest

1 similar comment
@k82cn

This comment has been minimized.

Copy link
Member Author

k82cn commented May 17, 2018

/retest

@k82cn

This comment has been minimized.

Copy link
Member Author

k82cn commented May 21, 2018

@kubernetes/sig-node-pr-reviews , ping for review :)

@feiskyer
Copy link
Member

feiskyer left a comment

LGTM

@tallclair PTAL

})
kubeClient.AddReactor("*", "*", func(action core.Action) (bool, runtime.Object, error) {
return true, nil, fmt.Errorf("no reaction implemented for %s", action)
})

This comment has been minimized.

@tallclair

tallclair May 22, 2018

Member

Can you refactor some or all of this setup to a test helper that is reused by the different tests in this file?

This comment has been minimized.

@k82cn

k82cn May 23, 2018

Author Member

done

// node to avoid race condition; refer to #63897 for more detail.
if utilfeature.DefaultFeatureGate.Enabled(features.TaintNodesByCondition) {
if node.Spec.Unschedulable {
nodeTaints = append(nodeTaints, v1.Taint{

This comment has been minimized.

@tallclair

tallclair May 22, 2018

Member

Should this verify that nodeTaints doesn't already include Unschedulable before adding it?

This comment has been minimized.

@k82cn

k82cn May 23, 2018

Author Member

yes. updated :)

done <- struct{}{}
}()
select {
case <-time.After(wait.ForeverTestTimeout):

This comment has been minimized.

@tallclair

tallclair May 22, 2018

Member

Is it likely that registerWithAPIServer will hang? If not I'd say let the test framework handle the timeout and avoid the complexity.

This comment has been minimized.

@k82cn

k82cn May 23, 2018

Author Member

should not hang, removed :)

@k82cn k82cn force-pushed the k82cn:k8s_63897 branch 2 times, most recently from ee737c7 to c30f2b8 May 23, 2018

@k82cn

This comment has been minimized.

Copy link
Member Author

k82cn commented May 25, 2018

/retest

// node to avoid race condition; refer to #63897 for more detail.
if utilfeature.DefaultFeatureGate.Enabled(features.TaintNodesByCondition) {
if node.Spec.Unschedulable &&
!taintutil.TaintExists(nodeTaints, &unschedulableTaint) {

This comment has been minimized.

@tallclair

tallclair May 25, 2018

Member

If the taint exists with a different effect, this will duplicate it. Granted, that's probably a bug anyway, but I think it's safer to just check for the key here (overwriting as necessary). Although actually, does the NodeRestriction controller allow the taint effect to be changed here?
@liggitt

This comment has been minimized.

@liggitt

liggitt May 25, 2018

Member

initialNode() is only used to create the node in initial registration. it does not update nodes.

This comment has been minimized.

@k82cn

k82cn May 28, 2018

Author Member

If the taint exists with a different effect, this will duplicate it.

Yes, that up to user's setting; it's an valid case that user also add the effect, we only need to handle NoSchedule by kubelet.

Although actually, does the NodeRestriction controller allow the taint effect to be changed here?

As liggitt@ said, NodeRestriction controller only reject node taints updates; this's accepted as creation :)

This comment has been minimized.

@tallclair

tallclair Jun 26, 2018

Member

If that's the case, then why check TaintExists at all here?

This comment has been minimized.

@liggitt

liggitt Jun 26, 2018

Member

They could've specified it explicitly in the --register-with-taints flag

This comment has been minimized.

@tallclair

tallclair Jul 9, 2018

Member

Sorry, I'm confused... then couldn't this duplicate the taint specified from --register-with-taints?

This comment has been minimized.

@k82cn

k82cn Jul 10, 2018

Author Member

If it's duplicated, it'll not add it; that's why we check TaintExists :). And we only check against unschedulableTaint, which is handled by NodeLifeCycleController; the other customized taint will follow the default Taint/Toleration behaviour :)

This comment has been minimized.

@tallclair

tallclair Jul 17, 2018

Member

To clarify:

  1. Kubelet is started with --register-with-taints=node.kubernetes.io/unschedulable=true:PreferNoSchedule
  2. This code checks for a taint with node.kubernetes.io/unschedulable:NoSchedule
  3. No match is found, and the kubelet ends up with the duplicate taint node.kubernetes.io/unschedulable:PreferNoSchedule and node.kubernetes.io/unschedulable:NoSchedule

This comment has been minimized.

@k82cn

k82cn Jul 23, 2018

Author Member

@tallclair , node.kubernetes.io/unschedulable:PreferNoSchedule is used for priorities, and node.kubernetes.io/unschedulable:NoSchedule is used for predicates; so I think they're not duplicated; and node.kubernetes.io/unschedulable:NoSchedule will be removed when .spec.unschedulable is updated to false.

}

kubeClient.AddReactor("*", "*", notImplemented)
}

This comment has been minimized.

@tallclair

tallclair May 25, 2018

Member

Sorry for the confusion, I didn't mean these specific methods so much as all the test setup boilerplate. Especilally the mock cAdvisor, which is unrelated to the function being tested.

This comment has been minimized.

@k82cn

k82cn May 28, 2018

Author Member

Can we handle this in a separate PR? The ImagesFSInfo and RootFSInfo are different; it takes some time to go through all case to have a helper func for it.

@k82cn

This comment has been minimized.

Copy link
Member Author

k82cn commented May 28, 2018

/retest

@k82cn k82cn force-pushed the k82cn:k8s_63897 branch from c30f2b8 to 9870333 Jun 4, 2018

@k82cn

This comment has been minimized.

Copy link
Member Author

k82cn commented Jul 16, 2018

@liggitt , @mikedanese , @tallclair any more comments ?

@k8s-github-robot k8s-github-robot removed this from the v1.12 milestone Jul 20, 2018

@k82cn k82cn force-pushed the k82cn:k8s_63897 branch from 9870333 to b8b07e0 Jul 22, 2018

Taint node when initializing node.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>

@k82cn k82cn force-pushed the k82cn:k8s_63897 branch from b8b07e0 to aac9f1c Jul 23, 2018

@k82cn

This comment has been minimized.

Copy link
Member Author

k82cn commented Jul 23, 2018

/retest

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jul 23, 2018

@k82cn: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-local-e2e-containerized aac9f1c link /test pull-kubernetes-local-e2e-containerized

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k82cn

This comment has been minimized.

Copy link
Member Author

k82cn commented Jul 24, 2018

/test

@tallclair

This comment has been minimized.

Copy link
Member

tallclair commented Jul 27, 2018

/lgtm
/milestone v1.12
/status approved-for-milestone

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jul 27, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: k82cn, tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jul 27, 2018

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@k82cn @tallclair

Pull Request Labels
  • sig/node: Pull Request will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the pull request owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/bug: Fixes a bug discovered during the current release.
Help
@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jul 27, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jul 27, 2018

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit ed58d0d into kubernetes:master Jul 27, 2018

16 of 18 checks passed

pull-kubernetes-local-e2e-containerized Job failed.
Details
Submit Queue Required Github CI test is not green: pull-kubernetes-kubemark-e2e-gce-big
Details
cla/linuxfoundation k82cn authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

@k82cn k82cn deleted the k82cn:k8s_63897 branch Jul 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.