Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got target node name of pending DaemonSet pod. #61387

Closed
wants to merge 1 commit into from

Conversation

k82cn
Copy link
Member

@k82cn k82cn commented Mar 20, 2018

Signed-off-by: Da K. Ma klaus1982.cn@gmail.com

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
part of #59194 Fixes #61050

Release note:

None

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 20, 2018
@k82cn
Copy link
Member Author

k82cn commented Mar 20, 2018

/assign @janetkuo

@k82cn
Copy link
Member Author

k82cn commented Mar 20, 2018

There's a race condition that: after DaemonSet controller created the pods, scheduler takes time to bind the pod to the node; if node condition was changed (e.g. label updated) during this period, a new pod was created unexpected. Refer to the following log, thousands of pods was created :(

I0320 01:25:18.592] Mar 20 01:25:18.385: INFO: 24 / 2723 pods in namespace 'kube-system' are running and ready (608 seconds elapsed)

@liggitt
Copy link
Member

liggitt commented Mar 20, 2018

There's a race condition that: after DaemonSet controller created the pods, scheduler takes time to bind the pod to the node; if node condition was changed (e.g. label updated) during this period, a new pod was created unexpected. Refer to the following log, thousands of pods was created

Why is the daemonset creating pods that are sensitive to node label changes?


for _, term := range terms {
for _, exp := range term.MatchExpressions {
if exp.Key == kubeletapis.LabelHostname &&
Copy link
Member

@liggitt liggitt Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it appears the daemonset depends on the value of this label matching the node name. that is not a valid assumption. see #2462 and #10612 as just a couple examples (there are many) of why this assumption is not valid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened #61410 to track

@k82cn
Copy link
Member Author

k82cn commented Mar 20, 2018

There's a race condition that: after DaemonSet controller created the pods, scheduler takes time to bind the pod to the node; if node condition was changed (e.g. label updated) during this period, a new pod was created unexpected. Refer to the following log, thousands of pods was created

Why is the daemonset creating pods that are sensitive to node label changes?

DaemonSet will create pods to the selected nodes, one of selection is node selector for node labels; so if label of node changed, DaemonSet controller should delete or create pods accordingly.

1 similar comment
@k82cn
Copy link
Member Author

k82cn commented Mar 20, 2018

There's a race condition that: after DaemonSet controller created the pods, scheduler takes time to bind the pod to the node; if node condition was changed (e.g. label updated) during this period, a new pod was created unexpected. Refer to the following log, thousands of pods was created

Why is the daemonset creating pods that are sensitive to node label changes?

DaemonSet will create pods to the selected nodes, one of selection is node selector for node labels; so if label of node changed, DaemonSet controller should delete or create pods accordingly.

Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: k82cn
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: janetkuo

Assign the PR to them by writing /assign @janetkuo in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k82cn
Copy link
Member Author

k82cn commented Apr 10, 2018

/sig scheduling

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Apr 10, 2018
@k82cn
Copy link
Member Author

k82cn commented Apr 10, 2018

Thanks for liggitt@ 's comments, I'll continue this PR after #61410 merged :)

@k82cn
Copy link
Member Author

k82cn commented May 11, 2018

Handled this in #63223 :)

@k82cn k82cn closed this May 11, 2018
@k82cn k82cn deleted the k8s_59194_6 branch May 11, 2018 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[test failed] gci-gce-alpha-features
4 participants