Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet does not update node-labels from flags if they change #59314

Closed
smarterclayton opened this issue Feb 3, 2018 · 13 comments
Closed

Kubelet does not update node-labels from flags if they change #59314

smarterclayton opened this issue Feb 3, 2018 · 13 comments
Assignees
Labels
sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@smarterclayton
Copy link
Contributor

Once a kubelet has created a node object for itself via registration, if the --node-labels flag ever changes the labels aren't updated. I see the labels being set into the objectmeta in the init code, but I don't see the server being updated (external ID hasn't changed).

Recreate:

  1. Start kubelet with no --node-labels, wait for it to register
  2. Update kubelet to start with --node-labels a=b

Expect:

  1. API object for node to have labels a=b

Actual:

  1. API object labels don't change until node object is deleted

I don't see any obvious place where this has been broken, but it looks like the patch calculated by nodeutil is only the conditions.

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Feb 3, 2018
@smarterclayton smarterclayton added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Feb 3, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Feb 3, 2018
@smarterclayton smarterclayton self-assigned this Feb 3, 2018
@jhorwit2
Copy link
Contributor

jhorwit2 commented Feb 3, 2018

@smarterclayton this is intended functionality. The flag (see below) mentions they're only added when registering.

https://kubernetes.io/docs/reference/generated/kubelet/

--node-labels mapStringString Labels to add when registering the node in the cluster. Labels must be key=value pairs separated by ','.

@jhorwit2
Copy link
Contributor

jhorwit2 commented Feb 3, 2018

I think we should probably rename it to --register-with-labels so it matches --register-with-taints especially since it's still an alpha feature and that's more clear.

@smarterclayton
Copy link
Contributor Author

This is going to be broken with dynamic configuration - as a user, it was unexpected for new labels not to be added. I think it's time to revisit this.

@smarterclayton
Copy link
Contributor Author

@mtaufen @liggitt re dynamic reconfiguration.

@smarterclayton
Copy link
Contributor Author

note: It's silly we call this an alpha feature when it's been in the kubelet since 1.1. We should own up to it one way or another.

@liggitt
Copy link
Member

liggitt commented Feb 4, 2018

kubelet ownership of its own labels is not deterministic and is problematic. Kubelet updating labels on an existing Node API object on start is not the direction we want to go, since it removes the ability to centrally manage those labels via the API

@smarterclayton
Copy link
Contributor Author

I disagree - the kubelet owns its own labels (those specified at --node-labels time) at a minimum. I agree that node labels should be manageable via an API, especially those set by humans. I do not agree that a fleet of bootstrapped nodes in scaling groups should not be able to specify their own labels and keep them up to date based on config. And given that the node controller can stomp down nodes, the newly bootstrapped nodes in those scale groups will end up having those labels (when the node object is deleted), so having a weird inconsistent state where labels sometimes show up is pointless and confusing.

Since we're not going to remove --node-labels from the kubelet, and we are moving head on towards bootstrapped and centrally configured nodes (dynamic node config), the node should manage at a minimum the set of labels and taints specified by CLI. I don't actually care whether the old labels are removed, since most of those label changes will be managed by rotating scale groups, but the current state is dumb.

@liggitt
Copy link
Member

liggitt commented Feb 4, 2018

I disagree - the kubelet owns its own labels

That is incompatible with using labels to segment node groups for anything security related. Why wouldn't the person applying config to nodes also label them?

@smarterclayton
Copy link
Contributor Author

In that case we should just make the node controller do it based on the config - it can observe the active config correctly. But if we really care about security, then the node shouldn’t be able to set taints or labels on create other and only the node controller manages that set. That seems strictly better, although disabling node labels ends up being a painful deprecation security process.

@smarterclayton
Copy link
Contributor Author

Disabling/preventing the kubelets node-labels and register-taints flags, that is

@liggitt
Copy link
Member

liggitt commented Feb 6, 2018

if we really care about security, then the node shouldn’t be able to set taints or labels on create other and only the node controller manages that set. That seems strictly better, although disabling node labels ends up being a painful deprecation security process.

That is the goal, trying to work through how to transition there. It probably looks like whitelisting existing labels/taints the kubelets set, but in the meantime I want to avoid expansion of kubelet control over taints/labels

cc @mikedanese @kubernetes/sig-auth-misc

@k8s-ci-robot k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Feb 6, 2018
@mikedanese
Copy link
Member

mikedanese commented Feb 8, 2018

This might be a dup of #18394 but is more specific

@liggitt
Copy link
Member

liggitt commented Feb 10, 2018

this seems like the same issue as #18394
/close

openshift-merge-robot added a commit to openshift/openshift-ansible that referenced this issue Mar 11, 2018
Automatic merge from submit-queue.

Make openshift-ansible use static pods to install the control plane, make nodes prefer bootstrapping

1. Nodes continue to be configured for bootstrapping (as today)
2. For bootstrap nodes, we write a generic bootstrap-node-config.yaml that contains static pod references and any bootstrap config, and then use that to start a child kubelet using `--write-flags` instead of launching the node ourselves.  If a node-config.yaml is laid down in `/etc/origin/node` it takes precedence.
3. For 3.10 we want dynamic node config from Kubernetes to pull down additional files, but there are functional gaps.  For now, the openshift SDN container has a sidecar that syncs node config to disk and updates labels (kubelet doesn't update labels, kubernetes/kubernetes#59314)
4. On the masters, if openshift_master_bootstrap_enabled we generate the master-config.yaml and the etcd config, but we don't start etcd or the masters (no services installed)
5. On the masters, we copy the static files into the correct pod-manifest-path (/etc/origin/node/pods) or similar
6. The kubelet at that point should automatically pick up the new static files and launch the components
7. We wait for them to converge
8. We install openshift-sdn as the first component, which allows nodes to go ready and start installing things.  There is a gap here where the masters are up, the nodes can bootstrap, but the nodes are not ready because no network plugin is installed.

Challenges at this point:

* The master shims (`master-logs` and `master-restart`) need to deal with CRI-O and systemd.  Ideally this is a temporary shim until we remove systemd for these components and have cri-ctl installed.
* We need to test failure modes of the static pods
* Testing

Further exploration things:

* need to get all the images using image streams or properly replaced into the static pods
* need to look at upgrades and updates
* disk locations become our API (`/var/lib/origin`, `/var/lib/etcd`) - how many customers have fiddled with this?
* may need to make the kubelet halt if it hasn't been able to get server/client certs within a bounded window (5m?) so to ensure that autoheals happen (openshift/origin#18430)
* have to figure out whether dynamic kubelet config is a thing we can rely on for 3.10 (@liggitt), and what gaps there are with dynamic reconfig
* client-ca.crt is not handled by bootstrapping or dynamic config.  This needs a solution unless we keep the openshift-sdn sidecar around
* kubelet doesn't send sd notify to systemd (kubernetes/kubernetes#59079)

@derekwaynecarr @sdodson @liggitt @deads2k this is the core of self-hosting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

5 participants