Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaner separation of kubeadm and machine bootstrapping #5294

Open
randomvariable opened this issue Sep 22, 2021 · 40 comments
Open

Cleaner separation of kubeadm and machine bootstrapping #5294

randomvariable opened this issue Sep 22, 2021 · 40 comments
Assignees
Labels
area/bootstrap Issues or PRs related to bootstrap providers help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. kind/proposal Issues or PRs related to proposals. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@randomvariable
Copy link
Member

randomvariable commented Sep 22, 2021

User Story

As a cluster operator, with development teams requiring the use of multiple operating systems, I would like a better machine bootstrapping abstraction.

Detailed Description

Cluster API Bootstrap Provider Kubeadm currently conflates two activities:

  • Generating a kubeadm configuration
  • Generating the machine bootstrapping data that eventually executes kubeadm, which only at present supports Cloud Init.

The relationship between Cluster API and machine bootstrapping has created a number of challenges:

How to secure kubeadm node joins

How to secure control plane instantiation

  • Given that to instantiate a control plane on machine boot, we need to give it key material, the CAPI Providers then have to have some mechanism to share this private data with the machine, which has become commonly known as "instance metadata".
  • Users which have read only access to the infrastructure can actually read all the private key material and reconstruct administrative access to both etcd and the API server, which represents a privilege escalation risk
  • Note that this is separate to kubeadm node joins, and is not resolved with the kubelet authentication plugin proposal
  • Some providers have a mechanism to secure the data, i.e. AWS, but these are wholly dependent on the inner workings of cloud-init
    • Specific support needs to be added for each bootstrap mechanism to every cloud provider.

How to extensibly support different bootstrappers without increasing the spaghettiness

  • Cluster API currently only supports cloud-init
  • PRs are in progress to add support for Ignition v2 (FlatCar) support to CABPK and CAPA (with CAPA's secure instance metadata support)
    • Ignition v3 still will not be supported (required for RedHat/Fedora CoreOS 4.6+ support)
    • Each bootstrapper adds complexity to CABPK and CAPA

Bootstrap reporting

It can be hard to find out what happened when bootstrapping failed. To be fair, the amount of requests for this has gone down over time due to improvements in CABPK and kubeadm, but it's still nice to have ideally.

Anything else you would like to add:

For completeness, and to avoid folk having to work through an unwieldy closed PR, I'm including the user stories and requirements in their entirety from #4221:

User Stories

IDTitleDescription
U1Non cloud-init bootstrap processes Ignition is a user-data processing Linux bootstrapping system used by Flat Car Linux, RHEL Atomic Host and Fedora CoreOS. (cluster-api/3761)
U2System preparation Although Flatcar Container Linux is being added to Image Builder, Flatcar is intended to also be used as an immutable distribution, with all additions being done at first boot. Flatcar users should be able to use standard Flatcar images with Cluster API.
U3Active Directory As a platform operator of a Windows environment, I may require their Kubernetes nodes to be domain joined such that the application workloads operate with appropriate Kerberos credentials to connect to services in the infrastructure.

For Windows or Linux hosts joining an Active Directory, they must effectively be given a set of bootstrap credentials to join the directory and persist a Kerberos keytab for the host.

U4CIS Benchmark Compliance As a platform operator, I require Kubernetes clusters to pass the CIS Benchmark in order to meet organisational level security compliance requirements.
U5DISA STIG Compliance As a platform operator in a US, UK, Canadian, Australian or New Zealand secure government environment, I require my Kubernetes clusters to be compliant with the DISA STIG.
U6Kubeadm UX As a cluster operator, I would like the bootstrap configuration of clusters or machines to be shielded from changes happening in kubeadm (e.g. v1beta1 and v1beta2 type migration)
U7Existing Clusters As a cluster operator with existing clusters, I would like to be able to, after enabling the necessary flags or feature gates, to create new clusters or machines using nodeadm.
U8Air-gapped As a cluster operator, I need Cluster API to operate independently of an internet connection in order to be able to provision clusters in an air-gapped environment, i.e. where the data center is not connected to the public internet.
U9Advanced control plane configuration files As a cluster operator, I need to configure components of my control plane, such as audit logging policies, KMS encryption, authentication webhooks to meet organisational requirements.
U10ContainerD Configuration Options such as proxy configuration, registry mirrors, custom certs, cgroup hierachy (image-builder/471) need to often be customised, and it isn’t always suitable to do at an image level. Cluster operators in an organisation often resort to prekubeadmcommand bash scripts to configure containerd and restart the service.
U11API Server Auth Reconfiguration As a cluster operator, I need to reconfigure the API server such that I can deploy a new static pod for authentication and insert an updated API server configuration.
U12Improving bootstrap reporting SRE teams often need to diagnose failed nodes, and having better information about why a node may have failed to join, or better indication of success would be helpful. (cluster-api/3716)
U13Large payloads Some vendors, and advanced cluster operators may need to drop large payloads in bootstrap configuration to do a number of tasks, such as drop CA certificates, bootstrap a network components, etc...

Cloud providers often have limited sizes for bootstrap data (e.g. AWS/Azure and vSphere)

U14External bootstrappers This is to capture the current state that Cluster API allows external bootstrappers to exist, and this should not be changed.

Requirements Specification

We define three modalities of the node bootstrapper:

ModeDescription
Provisioning Expected to run as part of machine bootstrapping e.g. (part of cloud-* SystemD units or Windows OOBE). Only supported when used with Cluster API bootstrapping. Typically executes cluster creation or node join procedures, configuring kubelet etc...
Preparation Could be run as part of machine bootstrapping prior to “provisioning”, and “prepares” a machine for use with Kubernetes. We largely keep this out of scope for the initial implementation unless there is a trivial implementation.
Post Parts of the use cases above require ongoing management of a host. We list these as requirements, but are largely not in scope for the machine bootstrapper and should be dealt with by external systems.
IDRequirementModeRelated Stories
R1 The machine bootstrapper MUST be able to execute kubeadm and report its outcome. Provisioning ProvisioningU1
R2 The machine bootstrapper MUST allow the configuration of Linux sysctl parameters PreparationU2,U4
R3 The machine bootstrapper COULD allow the application of custom static pods on the control plane ProvisioningU4,U9
R4 The machine bootstrapper MUST not directly expose the kubeadm API to the end user ProvisioningU6
R5 The machine bootstrapper MUST be able to be used in conjunction with an OS provided bootstrapping tool, not limited to Cloud-Init, Ignition, Talos and Windows Answer File. ProvisioningU1
R6 The machine bootstrapper/authenticator binary MUST provide cryptographic verification in situations where it is downloaded post-boot. PreparationU2
R7 The machine bootstrapper MUST not be reliant on the use of static pods to operate AllU5
R8 The machine bootstrapper MUST enable a Windows node to be domain joined. The machine bootstrapper WILL NOT manage the group membership of a Windows node in order to enable Group Managed Service Accounts ProvisioningU3
R9 The node bootstrapping system MUST be opt-in and not affect the operation of existing clusters when Cluster API is upgraded. ProvisioningU7
R10 The machine bootstrapper system SHOULD allow the agent to be downloaded from the management cluster PreparationU8
R11 The machine bootstrapper MUST be able to operate without connectivity to the internet (using proper configuration parameters), or to the management cluster. ProvisioningU7
R12 When the machine bootstrapper is downloaded on boot the location MUST be configurable PreparationU8
R13 When the machine bootstrapper is downloaded from the public internet, it MUST be downloadable from a location not subject to frequent rate limiting (e.g. a GCS bucket). PreparationU9
R14 The machine bootstrapper MUST be able to configure containerd given a structured configuration input.. ProvisioningU10
R15 The machine bootstrapper MUST publish a documented contract for operating system maintainers to integrate with the machine bootstrapper. AllU1
R16 The machine bootstrapper MUST support pulling payloads from a defined location outside of the cloud provider's Instance Metadata Service in order to cope with large payloads. AllU13
R17 The machine bootstrapper MUST not preclude the use of external bootstrappers as is the case today. AllU14

/kind feature

An example of the current flow for AWS is here (courtesy of @PushkarJ )
image

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 22, 2021
@vincepri
Copy link
Member

/assign @killianmuldoon

@randomvariable
Copy link
Member Author

One excellent suggestion from the lengthy discussion in that proposal was that we should https://github.com/mozilla/sops as the encryption envelope for private key material.

@CecileRobertMichon
Copy link
Contributor

/cc @t-lo

@JoelSpeed
Copy link
Contributor

Just wanted to clarify what we are talking about here, is the purpose of this issue to define an interface for various providers to implement (for different OSs?) or to define a tool that will be implemented?

Reading through some of this it seemed as if this is talking about building some bootstrap binary that would allow configuration of various OSs, but I initially had assumed this would define an interface

@randomvariable
Copy link
Member Author

Reading through some of this it seemed as if this is talking about building some bootstrap binary that would allow configuration of various OSs, but I initially had assumed this would define an interface

I think it would be both: An interface with a default implementation.

@vincepri
Copy link
Member

/milestone v1.0

@k8s-ci-robot k8s-ci-robot added this to the v1.0 milestone Sep 30, 2021
@vincepri
Copy link
Member

/kind proposal

@randomvariable
Copy link
Member Author

cc @richardcase @codablock

I've been reviewing some PRs in CAPA, namely kubernetes-sigs/cluster-api-provider-aws#2854 and it looks like EKS has the same challenges for some areas, e.g. User Story 10, and the way it's being tackled there is to add shell scripts to the EKS image builder equivalent and then add an API in CAPA. I wonder if we should consider this as a new project and get some folk together.

@richardcase
Copy link
Member

I've been reviewing some PRs in CAPA, namely kubernetes-sigs/cluster-api-provider-aws#2854 and it looks like EKS has the same challenges for some areas, e.g. User Story 10, and the way it's being tackled there is to add shell scripts to the EKS image builder equivalent and then add an API in CAPA. I wonder if we should consider this as a new project and get some folk together.

It does have some of the same challenges for sure. And yes it would be good to get some people together to start discussions on this.

@enxebre
Copy link
Member

enxebre commented Oct 25, 2021

@randomvariable can we include a story/req here to satisfy existing ability for users to plugin their own bootstrapping mechanism? This can be achieved today two different ways:
a - Pre-populating a custom bootstrap secret and setting that a machine creation time.
b - Implementing a custom bootstrap provider.

@randomvariable
Copy link
Member Author

randomvariable commented Nov 3, 2021

can we include a story/req here to satisfy existing ability for users to plugin their own bootstrapping mechanism? This can be achieved today two different ways.

Have added as U14 and R17 respectively. Have also captured the comment from #4172 around payload size in U13 / R16.

@randomvariable
Copy link
Member Author

Another use case in #3782

@enxebre
Copy link
Member

enxebre commented Jan 3, 2022

/area bootstrap

@k8s-ci-robot k8s-ci-robot added the area/bootstrap Issues or PRs related to bootstrap providers label Jan 3, 2022
@fabriziopandini fabriziopandini removed this from the v1.1 milestone Feb 3, 2022
@richardcase
Copy link
Member

@johananl - a few of us discussed this at Kubecon EU earlier this year. There is some prior art from @randomvariable in the form of a proposal (which was closed) and we have started to resurect this after Kubecon. How would you feel about collaborating on the proposal which a few like minded people interested in this?

@johananl
Copy link
Member

johananl commented Aug 31, 2023

Sure, I'm counting on collaboration here @richardcase :-)
Is there a way for me to see the prior art you've mentioned? Is it recorded anywhere? I don't want to do duplicate work of course.

I'm happy to add my notes so far to an existing document. If that doesn't exist, I'm happy to share a new document where people can chime in.

@richardcase
Copy link
Member

This is the issue with the prior art #4221 and we started a new version of it as a Google doc to start making changes (although not many changes have been added yet 😉 )

@johananl
Copy link
Member

Great, thanks. I'll go through both. Should I "plug into" the Google doc then?

What I have so far is mainly brainstorming with myself on paper with some thoughts about the design, some questions, some potential problems etc.

@richardcase
Copy link
Member

It might be good to start adding to the doc (which is the original proposal) and then we could update it & move it forward, so we can see how it changes from the original work done by @randomvariable . But if you'd like to work another way, then that's all good.

@johananl
Copy link
Member

johananl commented Sep 1, 2023

OK, I'll add to the doc (I can't see who the owner of the document is and didn't want to step on any toes).

@johananl
Copy link
Member

johananl commented Sep 1, 2023

I'm adding a bunch of comments/suggestions about the existing state of the document. Hope it's not too much 😬

@johananl
Copy link
Member

Note to self and to anyone else involved: We should keep #6539 in mind in case we touch k8s object references as part of the proposed design.

@Danil-Grigorev
Copy link
Member

Hi @johanan, we are currently actively investigating the approaches based on the original CAEP and the doc @richardcase mentioned. Can you share the current state of the design, so the document will reflect it better?

@johananl
Copy link
Member

Hi @Danil-Grigorev. Glad to see more people are getting involved 🙂

My current impression is that while the original proposal touches some important aspects of this issue, the most important concern described by @randomvariable above -- the separation of bootstrappers such as kubeadm from the provisioning tools (cloud-init, Ignition etc.) -- isn't handled in it. In its current state, the proposal sounds like we're starting from the solution (machineadm) and working our way back to the requirements rather than the other way around.

In addition, in my opinion the original proposal includes a lot of user stories, some of which don't seem directly related to the issue at hand (e.g. Active Directory domain joins) and might be better handled in separate proposals.

So, I'm not saying we shouldn't pursue the machineadm direction or that all of the user stories aren't important, I just think that adding a new binary which runs on the nodes without first solving the conflation we have in the API isn't going to lead us to where we're aiming, at least not on its own.

@richardcase what do you think about the above? Am I missing something?

In the meantime I started working on a separate design proposal which specifically addresses the conflation of bootstrap (e.g. kubeadm) and provisioning (e.g. cloud-init) because this is arguably the main thing we need to solve and I couldn't find any work around that so far. I'm happy to join efforts if there is any existing/prior work around that.

I'm still actively working on the proposal and it's by no means ready for review, but I'll share the WIP so that people can start to follow my train of thought and perhaps provide very early feedback. Here it is: https://docs.google.com/document/d/1Fz5vWwhWA-d25_QDqep0LWF6ae0DnTqd5-8k8N0vDDM/edit?usp=sharing

@richardcase
Copy link
Member

richardcase commented Feb 14, 2024

Thanks @johananl .

the separation of bootstrappers such as kubeadm from the provisioning tools (cloud-init, Ignition etc.) -- isn't handled in it

This is actually one of the motivations of the original proposal by @randomvariable. So cloud-init/ignition are only used to transmit a "machine config" file to the machine and then machineadm takes action based on that file (which might be running kubeadm).

I would agree that it covers more than the strict separation of concerns between the commands required to bootstrap a cluster and the means to get those commands executed. The idea of the doc was that the original proposal was the starting point and could be updated.

With quite a few interested parties it feels like it would be good to start a feature group? Like whats been done around in-place upgrades and karpenter. wdyt?

@richardcase
Copy link
Member

And forgot to say, thanks for sharing the doc @johananl 🙇 I'll take a read.

@johananl
Copy link
Member

johananl commented Feb 14, 2024

So cloud-init/ignition are only used to transmit a "machine config" file to the machine and then machineadm takes action based on that file (which might be running kubeadm).

@richardcase yes, I understand. This basically moves the bootstrap (e.g. kubeadm) and provisioning (e.g. cloud-init) process into the nodes, which isn't clear to me we want to do. Here are a few problems I can currently see with this approach:

  • If we want to support arbitrary combinations of bootstrappers and provisioners (e.g. "I want a K3s cluster on Flatcar, therefore I need to use K3s bootstrap with Ignition provisioning"), we'd have to either maintain a big machineadm binary containing support for the cartesian product of bootstrappers and provisioners, or maintain one machineadm flavor for each bootstrapper-provisioner pair.
  • We'd be practically moving a big chunk of the CAPI machinery currently living in k8s controllers into the nodes. While I'm not sure what the pros are, I can certainly see cons such as the bootstrap+provisioning process becoming more opaque to CAPI administrators, making troubleshooting harder and potentially limiting flexibility since we now have two concerns combined in one component (machineadm). I see a huge advantage in relying on the k8s API for "wiring" different processes together, in terms of both flexibility and observability.

While I agree that we want to improve separation of concerns and protect CAPI components (e.g. infra providers) from provisioner API changes (as stated in the original proposal's motivation section), it's not clear to me that we also have to move this logic out of k8s and into the node while we're at it: AFAICT we could solve the same problem by introducing a provisioner contract and isolating provisioning data from other data (bootstrap, infrastructure etc.) in the CAPI types. We can then rely on references in the API for loosely coupling together the relevant bootstrapper and provisioner implementations. Since we'd have contracts for both, any bootstrapper could be used with any provisioner (and also any infra provider since the infra provider would hopefully just receive a blob of text representing the provisioning config and expose it to machines using instance metadata while remaining agnostic about the specific format).

Rather than explicitly supporting every bootstrapper-provisioner combination (which would likely become a serious maintenance burden given that we already have 6 bootstrap providers and at least 2 provisioners and would likely add more in the future), we could isolate the two concerns using contracts which would make them orthogonal -- which they technically are, though currently not in CAPI (more details in my WIP proposal).

I'm not sure that's a goal in CAPI, but my intuition tells me we might want to move to the nodes only things which absolutely have to run on the nodes, and the rest should live on the management cluster. This way we can use things such as k8s object status fields to track the various stages of workload cluster operations.

I hope that's clear. We certainly have to discuss this further since all of the above are my initial thoughts on the matter and it's very likely I am missing things.

With quite a few interested parties it feels like it would be good to start a feature group? Like whats been done around in-place upgrades and karpenter. wdyt?

Sure, we can do that. I'm not quite sure what a "feature group" means and how to form it though. Care to elaborate?

@richardcase
Copy link
Member

Thanks for taking the time to explain @johananl, that's really helpful. I may not agree with all aspects of the problems listed but generally i agree. You have highlighted the importance of revisiting the original proposal with fresh eyes. I should say i'm not particularly attached to the original proposal, it was just a starting point.

I'm not sure that's a goal in CAPI, but my intuition tells me we might want to move to the nodes only things which absolutely have to run on the nodes, and the rest should live on the management cluster.

I agree and the approach you are taking helps aid this. Some of the things that could require running on the node would fall into the day 2 operations area but as you say these can be separate and outside this scope.

Another area that the approach you are suggesting (admittedly i only skim read so far so i may be off) is around the re-use of controlplane logic. Currently other control-plane providers (like k3s, rke2) have very similar requirements around scale up, scale down, upgrades etc and currently most providers copy what kubeadm is doing to a greater or lesser extend...this results in differences in functionality and potentially bugs etc. I see this approach helping with that.....its a pain i feel a lot.

I'm looking forward to having a proper read.

@johananl
Copy link
Member

Thanks a lot @richardcase. Happy to learn where the disagreement points are (feel free to comment on my proposal if that helps).

I do see parts of the original proposal which directly overlap with my current vision of how things could work. Also, I am not opposed to having an on-node agent if we realize this is necessary. Maybe it is and I trust you and @randomvariable had good reasons for thinking in that direction.

My proposal is an initial thought process, too. I expect it to change quite a lot before it could have a chance at becoming something we implement. We may also realize that we need both proposals since each touches at least some different/unique aspects of the problems at hand.

Looking forward to any feedback you might have on the new proposal. I on my end will keep working on it and will advertise it more loudly when I feel it's ready for a wider round of reviews.

@fabriziopandini
Copy link
Member

/priority backlog

@k8s-ci-robot k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Apr 11, 2024
@johananl
Copy link
Member

A quick update on this task:

Got some good initial feedback on the WIP proposal. Thanks!
I have to follow up and update the proposal accordingly. Once the initial feedback is addressed, I'll open a proper CAEP PR.

@fabriziopandini
Copy link
Member

Merging also requirements from #9631

As developers, we want clear separation between bootstrapping (join token) and provisioning (machine startup script) in order to not mix these concerns into what is currently called "bootstrap data secret".

@johananl
Copy link
Member

johananl commented May 9, 2024

Thanks @fabriziopandini, this seems highly relevant. To clarify, IIUC the requirement is: We shouldn't store plain-text secrets in the provisioning configuration accessible from cloud machines via user data or similar mechanisms. The separation of bootstrap and provisioning is already the main story in this WIP proposal so the "keep secrets out" part is the new addition. Correct?

@fabriziopandini
Copy link
Member

@johananl let's see if @AndiDog, the author of the other issue issue chimes in.
My understanding is similar to yours and it boils up to "keep the token out" in order to help the machine pool use case.

Please, feel free to decide if to keep this in scope of your proposal or not (in case of not, just list it as a non goal or a future goal, so we keep track of this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bootstrap Issues or PRs related to bootstrap providers help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. kind/proposal Issues or PRs related to proposals. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests