Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: prevent bootstrap of nodes with known names #81056

Merged
merged 1 commit into from Feb 1, 2020

Conversation

neolit123
Copy link
Member

@neolit123 neolit123 commented Aug 7, 2019

What this PR does / why we need it:
If a Node name in the cluster is already taken and this Node is Ready,
prevent TLS bootsrap on "kubeadm join" and exit early.

This change requires that a new ClusterRole is granted to the
"system:bootstrappers:kubeadm:default-node-token" group to be
able get Nodes in the cluster. The same group already has access
to obtain objects such as the KubeletConfiguration and kubeadm's
ClusterConfiguration.

The motivation of this change is to prevent undefined behavior
and the potential control-plane breakdown if such a cluster
is racing to have two nodes with the same name for long periods
of time.

The following values are validated in the following precedence
from lower to higher:

  • actual hostname
  • NodeRegistration.Name (or "--node-name") from JoinConfiguration
  • "--hostname-override" passed via kubeletExtraArgs

If the user decides to not let kubeadm know about a custom node name
and to instead override the hostname from a kubelet systemd unit file,
kubeadm will not be able to detect the problem.

Which issue(s) this PR fixes:

Fixes kubernetes/kubeadm#1711

Special notes for your reviewer:
NONE

Does this PR introduce a user-facing change?:

kubeadm: reject a node joining the cluster if a node with the same name already exists

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/kind feature
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 7, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 7, 2019
@liggitt
Copy link
Member

liggitt commented Aug 7, 2019

what are the implications for cleaning up a node to set it up again? does this mean that kubeadm reset is no longer sufficient to allow kubeadm join to succeed again?

klog.V(1).Infof("[kubelet-start] Checking for an existing node with name %q in the cluster", nodeName)
nodes, err := bootstrapClient.CoreV1().Nodes().List(metav1.ListOptions{})
if err != nil {
return errors.Wrap(err, "cannot obtain the list of nodes in the cluster")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a skippable preflight check instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on that, we need to be able to skip this as I can see users running into problems with stale nodes, etc.

Copy link
Member Author

@neolit123 neolit123 Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no so sure about this, stale nodes are a subject of the administrator calling kubectl delete. kubeadm join failing during boostrap would indicate that the cluster requires admin intervetion (or node rename).

if the user is allowed to skip this check, it can lead to catastrophic outcome - e.g. imaging a case where the operator responsible for the worker join is not the cluster admin.

let's talk more about this during the office hours, possibly next week.

Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neolit123 !
We may also try to nuke the Node object on reset, but permission wise IDK if this is feasible.

cmd/kubeadm/app/cmd/phases/join/kubelet.go Outdated Show resolved Hide resolved
klog.V(1).Infof("[kubelet-start] Checking for an existing node with name %q in the cluster", nodeName)
nodes, err := bootstrapClient.CoreV1().Nodes().List(metav1.ListOptions{})
if err != nil {
return errors.Wrap(err, "cannot obtain the list of nodes in the cluster")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on that, we need to be able to skip this as I can see users running into problems with stale nodes, etc.

cmd/kubeadm/app/cmd/phases/join/kubelet.go Outdated Show resolved Hide resolved
@liggitt
Copy link
Member

liggitt commented Aug 7, 2019

We may also try to nuke the Node object on reset, but permission wise IDK if this is feasible.

if kubeadm reset is run with the API permissions of the kubelet, it will not have the ability to delete the Node API object (I didn't think it made any API calls today)

I suppose, that Get is slightly more efficient, but IDK that for sure. Also, IDK what's the difference from security standpoint in allowing folks, with bootstrap tokens, to list all nodes vs get nodes.
I'll leave that on the trusted advice of @liggitt .

Get is way more efficient on large clusters.

A bootstrap token can obtain a node client credential that allows get/list/watch of all nodes, so read-only access to Get a node is not concerning.

@neolit123
Copy link
Member Author

Get is way more efficient on large clusters.

completely forgot about get in the early hours of the day.

We may also try to nuke the Node object on reset, but permission wise IDK if this is feasible.

possibly a separate PR, but also needs permission evaluation.
my initial reaction is to leave this to the admin group.

@neolit123
Copy link
Member Author

neolit123 commented Aug 7, 2019

what are the implications for cleaning up a node to set it up again? does this mean that kubeadm reset is no longer sufficient to allow kubeadm join to succeed again?

this is a problem that i though about after sending the PR.

we either have to:

  • extend reset to delete nodes.
  • modify this check to only fail on existing Ready nodes.
  • EDIT: don't allow the node to re-join without admin deleting the stale node.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 7, 2019
@neolit123 neolit123 force-pushed the 1.16-kubeadm-node-names branch 3 times, most recently from 8759416 to 43b1316 Compare October 7, 2019 20:48
@neolit123
Copy link
Member Author

neolit123 commented Oct 7, 2019

@rosti @liggitt @SataQiu

i have updated this PR:

  • using "GET" for the Node object
  • use the same common utility for node name precedence between the place where we write kubelet flags and join new nodes
  • still allow a new Node to join if the same Node name in the cluster exists, but its status is not Ready. this allows re-join on failure.

@neolit123 neolit123 force-pushed the 1.16-kubeadm-node-names branch 2 times, most recently from 27ea933 to e52a155 Compare October 7, 2019 21:00
@neolit123
Copy link
Member Author

/retest

@SataQiu
Copy link
Member

SataQiu commented Oct 8, 2019

/test pull-kubernetes-kubemark-e2e-gce-big
/test pull-kubernetes-e2e-gce-100-performance

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2020
@neolit123
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2020
Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neolit123 !
Implementation wise, it looks good.
We may want to add a more thorough message to the user along with steps how to fix the problem.
/lgtm

cmd/kubeadm/app/cmd/phases/join/kubelet.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 8, 2020
If a Node name in the cluster is already taken and this Node is Ready,
prevent TLS bootsrap on "kubeadm join" and exit early.

This change requires that a new ClusterRole is granted to the
"system:bootstrappers:kubeadm:default-node-token" group to be
able get Nodes in the cluster. The same group already has access
to obtain objects such as the KubeletConfiguration and kubeadm's
ClusterConfiguration.

The motivation of this change is to prevent undefined behavior
and the potential control-plane breakdown if such a cluster
is racing to have two nodes with the same name for long periods
of time.

The following values are validated in the following precedence
from lower to higher:
- actual hostname
- NodeRegistration.Name (or "--node-name") from JoinConfiguration
- "--hostname-override" passed via kubeletExtraArgs

If the user decides to not let kubeadm know about a custom node name
and to instead override the hostname from a kubelet systemd unit file,
kubeadm will not be able to detect the problem.
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 26, 2020
@neolit123
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 26, 2020
@neolit123 neolit123 requested a review from rosti January 26, 2020 16:52
@neolit123
Copy link
Member Author

/retest

Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neolit123 !
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 31, 2020
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@rosti
Copy link
Contributor

rosti commented Jan 31, 2020

/test pull-kubernetes-e2e-gce

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

2 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit f812429 into kubernetes:master Feb 1, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Feb 1, 2020
@kvaps
Copy link
Member

kvaps commented Mar 26, 2020

Hi, this change caused two new issues: #89501, #89512

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reject a node joining the cluster if a node with the same name already exists
7 participants