Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: kubeadm operator

Closed
timothysc opened this issue Jul 31, 2019 · 23 comments
Closed

RFE: kubeadm operator #1698

timothysc opened this issue Jul 31, 2019 · 23 comments
Assignees
Labels
kind/design kind/feature priority/important-longterm
Milestone

Comments

@timothysc
Copy link
Member

timothysc commented Jul 31, 2019

As a Kubernetes Operator I would like to enable be able to declaratively control configuration changes, and upgrades in a systematic fashion.

This will require a KEP.
Entire feature-set is TBD.

WIP KEP: https://hackmd.io/@QlB2bmbhS-aeuDlwOCH9Yw/HkidAVXlS
presentation: https://docs.google.com/presentation/d/1ckEbp_4-9Q90UNV_UwvQQ7MDF6J9Jpl1EdRHUo5pWn0/edit#slide=id.g633cabb4a3_0_5

https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/kubeadm/20190916-kubeadm-operator.md

/kind feature

@timothysc timothysc added this to the Next milestone Jul 31, 2019
@k8s-ci-robot k8s-ci-robot added the kind/feature label Jul 31, 2019
@stealthybox
Copy link
Contributor

stealthybox commented Jul 31, 2019

questions on scope:

What is the operators main function / deployment topology?
Is it deployed to a cluster after kubeadm init so that it can self-manage the cluster?
Does it live in a parent management cluster?
Do you run it as a commandline tool (does it read config from a file and/or the cluster's API?)

Do we intend for this to be consumed by a Cluster API bootstrap provider?
Is the config for the operator a ConfigMap / CRD?

How does the operator mutate the cluster?
( Perhaps it could schedule privileged Pods, Jobs, or a DaemonSet to mutate each Node )

@fabriziopandini
Copy link
Member

fabriziopandini commented Aug 1, 2019

WRT to scope, IMO the kubeadm operator should be responsible for two things

  • In place mutations of kubeadm generated artifacts
  • Orchestration of such mutations across nodes

Instead, I think that we should consider out of scope everything that fits under the management of infrastructure or it is related to the management of "immutable" nodes

I would divide the uses case for the operator in two groups:

  1. Improve the UX for cluster lifecycle activities already supported by kubeadm e.g.
  • upgrades
  • client certificate renewal
  1. Enable cluster lifecycle activities not yet supported by kubeadm e.g.
  • certificate rotation
  • "change the cluster"

@neolit123 neolit123 added the priority/important-longterm label Aug 2, 2019
@neolit123
Copy link
Member

neolit123 commented Aug 2, 2019

@stealthybox and @fabriziopandini covered some of the questions and comments that i have.

but this overlaps with action that CAPI performs i need to understand more about the demand.
who demands this?

@timothysc is your plan to enable some actions that CAPI now performs on the side of this operator? at the same time allow non-CAPI users to use the same actions.

How does the operator mutate the cluster?
( Perhaps it could schedule privileged Pods, Jobs, or a DaemonSet to mutate each Node )

^ my other top question, this can end up being not-so-secure.
mutating host paths from a priv-Pod is a no-no.

cc @dlipovetsky
for perhaps an interesting topic.

@neolit123
Copy link
Member

neolit123 commented Aug 2, 2019

mutating host paths from a priv-Pod is a no-no.

@fabriziopandini @timothysc do you remember my partially silly proposal to transfer kubeadm control-plane certs over encrypted sockets? i did it to showcase what may be a way to manage a kubeadm cluster using a socket protocol.

i'd argue that it will be more secure that any host action we try to perform from a privileged Pod.

@neolit123 neolit123 added the kind/design label Aug 6, 2019
@fabriziopandini
Copy link
Member

fabriziopandini commented Sep 4, 2019

Working on a KEP + POC
/lifecycle active

@fabriziopandini fabriziopandini added the lifecycle/active label Sep 4, 2019
@dlipovetsky
Copy link

dlipovetsky commented Sep 4, 2019

Instead, I think that we should consider out of scope everything that fits under the management of infrastructure or it is related to the management of "immutable" nodes

I would divide the uses case for the operator in two groups:

  1. Improve the UX for cluster lifecycle activities already supported by kubeadm e.g.
  • upgrades
  • client certificate renewal

If nodes are "immutable," does it follow that:

  • Certificates on the node can be rotated only by deploying a new node and removing the old one. (I realize there some is some nuance here, because kubelet itself renews its client certificate)
  • A node must be upgraded by deploying a new node and removing the old one.

@fabriziopandini
Copy link
Member

fabriziopandini commented Sep 5, 2019

@dlipovetsky agreed.

"Immutable" means that any operation is done deploying a new node and removing the old one, while "mutable" means that any operation is done via in-place mutations

kubeadm-operator is meant to support the "mutable" approach, while "immutable" operations IMO are out of scope

but you are right, there is nuance here 😉, e.g. nothing prevents an administrator to mix up "Immutable" and "mutable" operations on the same cluster

@dlipovetsky
Copy link

dlipovetsky commented Sep 5, 2019

kubeadm-operator is meant to support the "mutable" approach

Thanks a lot for clarifying @fabriziopandini! I wasn't aware of kubeadm-operator before seeing this issue, so I didn't have the right context. So kubeadm-operator is not for a CAPI-managed cluster, which requires the "immutable" approach.

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Sep 10, 2019

@timothysc: The label(s) kind/ cannot be appled. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

As a Kubernetes Operator I would like to enable be able to declaratively control configuration changes, and upgrades in a systematic fashion.

This will require a KEP.
WIP KEP: https://hackmd.io/@QlB2bmbhS-aeuDlwOCH9Yw/HkidAVXlS

Entire feature-set is TBD.

/kind feature
/assign @fabriziopandini @neolit123

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Sep 10, 2019

@timothysc: The label(s) kind/ cannot be appled. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

As a Kubernetes Operator I would like to enable be able to declaratively control configuration changes, and upgrades in a systematic fashion.

This will require a KEP.
Entire feature-set is TBD.

WIP KEP: https://hackmd.io/@QlB2bmbhS-aeuDlwOCH9Yw/HkidAVXlS

/kind feature
/assign @fabriziopandini @neolit123

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fabriziopandini
Copy link
Member

fabriziopandini commented Sep 16, 2019

As per kubeadm office hour discussion, we are considering certificate rotation in the scope of the kubeadm operator. I will open an issue to track this properly

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Oct 14, 2019

@timothysc: The label(s) kind/ cannot be applied. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

As a Kubernetes Operator I would like to enable be able to declaratively control configuration changes, and upgrades in a systematic fashion.

This will require a KEP.
Entire feature-set is TBD.

WIP KEP: https://hackmd.io/@QlB2bmbhS-aeuDlwOCH9Yw/HkidAVXlS
presentation: https://docs.google.com/presentation/d/1ckEbp_4-9Q90UNV_UwvQQ7MDF6J9Jpl1EdRHUo5pWn0/edit#slide=id.g633cabb4a3_0_5

/kind feature
/assign @fabriziopandini @neolit123

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Oct 14, 2019

@timothysc: The label(s) kind/ cannot be applied. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

As a Kubernetes Operator I would like to enable be able to declaratively control configuration changes, and upgrades in a systematic fashion.

This will require a KEP.
Entire feature-set is TBD.

WIP KEP: https://hackmd.io/@QlB2bmbhS-aeuDlwOCH9Yw/HkidAVXlS
presentation: https://docs.google.com/presentation/d/1ckEbp_4-9Q90UNV_UwvQQ7MDF6J9Jpl1EdRHUo5pWn0/edit#slide=id.g633cabb4a3_0_5

/kind feature

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fejta-bot
Copy link

fejta-bot commented Jan 12, 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale and removed lifecycle/active labels Jan 12, 2020
@neolit123
Copy link
Member

neolit123 commented Jan 12, 2020

/remove-lifecycle stale

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Apr 1, 2020

@timothysc: The label(s) kind/ cannot be applied, because the repository doesn't have them

In response to this:

As a Kubernetes Operator I would like to enable be able to declaratively control configuration changes, and upgrades in a systematic fashion.

This will require a KEP.
Entire feature-set is TBD.

WIP KEP: https://hackmd.io/@QlB2bmbhS-aeuDlwOCH9Yw/HkidAVXlS
presentation: https://docs.google.com/presentation/d/1ckEbp_4-9Q90UNV_UwvQQ7MDF6J9Jpl1EdRHUo5pWn0/edit#slide=id.g633cabb4a3_0_5

https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/kubeadm/20190916-kubeadm-operator.md

/kind feature

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fejta-bot
Copy link

fejta-bot commented Jun 30, 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale label Jun 30, 2020
@neolit123
Copy link
Member

neolit123 commented Jun 30, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale label Jun 30, 2020
@jseguillon
Copy link

jseguillon commented Jul 6, 2020

Would love this operator with this Story :
As a Kubernetes Operator, I can declare new hosts where kubeadm is applied

Background :
With ssh access, create kubeadm templates, get tokens and run joins

EDIT : maybe this current "Operator" should be a "pure kubeadm" implementation of Cluster API ?

@fabriziopandini
Copy link
Member

fabriziopandini commented Jul 6, 2020

@jseguillon
IMO bootstrapping a new node is out of scope of the kubeadm operator.
Also, I strongly believe that kubeadm is and should be a low-level tool that can be used by Cluster API or by other tools, not a Cluster API implementation

@fabriziopandini
Copy link
Member

fabriziopandini commented Oct 1, 2020

/close
in favor of #2317
the initial work on the operator helped in exploring this field, however, we should now focus on defining a clean surface for API an this requires some modeling work and a more detailed use cases

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Oct 1, 2020

@fabriziopandini: Closing this issue.

In response to this:

/close
in favor of #2317
the initial work on the operator helped in exploring this field, however, we should now focus on defining a clean surface for API an this requires some modeling work and a more detailed use cases

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pacoxu
Copy link
Member

pacoxu commented Jun 30, 2022

Update this issue as well to let all who are watching kubeadm operator status know the update.

Not sure if this is the right place to discuss on kubeadm operator. There are some threads in kubernetes/enhancements#2505.

I write a simple kubelet-reloader as a tool for kubeadm operator.

  • kubelet-reloader will watch on /usr/bin/kubelet-new.
  • once there is a different version of kubelet-new, the reloader will replace /usr/bin/kubelet and restart kubelet.

Currently the kubeadm-operator v0.1.0 can support upgrade cross versions like v1.22 to v1.24.

  • kubeadm operator will download kubectl/kubelet/kubeadm and upgrade.
  • kubelet will be placed in /usr/bin/kubelet-new for kubelet reloader.

See quick-start.

Some thoughts on the next steps

See #2317 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design kind/feature priority/important-longterm
Projects
None yet
Development

No branches or pull requests

9 participants