-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📖 [CAEP] Add machineadm bootstrapper proposal #4221
Conversation
Signed-off-by: Naadir Jeewa <jeewan@vmware.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Around 70% of the CLI portion is already implemented in https://github.com/flanksource/konfigadm which we would be happy to improve/change/butcher into an initial version of the CLI.
status: provisional | ||
see-also: | ||
- "/docs/proposals/2021022-kubelet-authentication-plugin.md" | ||
- "/docs/proposals/2021022-artifacts-management.md" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, will be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking forward to 2021022-kubelet-authentication-plugin.md
. We are aiming at the user case of using it to do vault PKI integration.
<td>U4</td><td>CIS Benchmark Compliance</td> | ||
<td> | ||
As a platform operator, I require Kubernetes clusters to pass the CIS Benchmark in order to meet organisational level security compliance requirements. | ||
</td> | ||
</tr> | ||
|
||
<tr> | ||
<td>U5</td><td>DISA STIG Compliance</td> | ||
<td> | ||
As a platform operator in a US, UK, Canadian, Australian or New Zealand secure government environment, I require my Kubernetes clusters to be compliant with the DISA STIG. | ||
</td> | ||
</tr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand how this fits into bootstrapping? Is this not an imaging/installation/configuration concern? Or is this meant to be As a platform operator who needs to comply with CIS/DISA STG I require a secure mechanism of providing sensitive secrets at bootstrap phase
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A user story might be fulfilled by several things working in tandem. This supports the user story but does not fulfil it completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other parts of CIS/STIG compliance would machineadm be responsible for ?
<tr> | ||
<td>R1</td> | ||
<td> | ||
The machine bootstrapper MUST be able to execute kubeadm and report its outcome. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
report its outcome
To who? and to what level of detail
<tr> | ||
<td>R6</td> | ||
<td> | ||
The machine bootstrapper/authenticator binary MUST provide cryptographic verification in situations where it is downloaded post-boot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this verification of machineadm
or of the binaries / packages / files machineadm
might download?
<tr> | ||
<td>R7</td> | ||
<td> | ||
The machine bootstrapper MUST not be reliant on the use of static pods to operate</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the static pods, or the dependency on a running and working containerd/kubelet combo ? In which case I would rephrase to
The machine bootstrapper MUST not be reliant on the use of static pods to operate</td> | |
The machine bootstrapper MUST be able to operate without a working containerd/kubelet installation</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's literally static pods which are prohibited in the Kubernetes STIG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this then a concern for kubeadm? i.e. kubeadm init --systemd
A controller that ships with the infrastructure provider that can reconcile the storage of machineadm configurations with infrastructure APIs (e.g. Amazon S3/Minio, GCS, custom server etc...). | ||
</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A controller that ships with the infrastructure provider that can reconcile the storage of machineadm configurations with infrastructure APIs (e.g. Amazon S3/Minio, GCS, custom server etc...). | |
</td> | |
Core plugins that reconcile machine configuration during machine boot | |
</td> |
|
||
### Plugin architecture | ||
|
||
Machineadm will use the [go-plugin][go-plugin] architecture used by Hashicorp. Machineadm will expect plugins to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the configuration going to be pluggable?
Machineadm's security model is intended such that secrets associated with machine configuration are stored on the | ||
management Kubernetes cluster and are secured to the level that the backing API server and etcd are secured. | ||
|
||
For machine bootstrapping, secrets are intended to be delivered by an infrastructure provider to a suitable secure | ||
location, and to also provide a mechanism for the machine to report back status. Examples include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a very weak model considering the pre-requisites for setting up encryption, and the lack of audibility (Due to the verbosity of Kubernetes auditing and lack of etcd auditing)
Why not just rely on the cloud providers for this through something like sops.
machineadm.yml -> sops encypt -> MachineConfig -> S3
S3 -> sops decrypt | machineadm -c -
sops is widely tested, easily usable as library and supports AWS, GCP, Azure, Hashicorp and PGP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted. The envelope format is one I've used before based off the original Python version of SOPS. Noted, SOPS is now in Golang, so it's worth a revisit. Thanks. Will depend on whether or not we can consume it programatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a decryption perspective using it as a library just works: https://pkg.go.dev/go.mozilla.org/sops/v3@v3.6.1/decrypt#Data
It will then use the SOPS metadata in the metadata to try all available decryption mechanisms.
For encryption, it is interfaced using MasterKey, which are creatable with a single function call to the KMS provider e.g. https://pkg.go.dev/go.mozilla.org/sops/v3@v3.6.1/kms#NewMasterKeyFromArn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. I'll take a look and will update.
the use of envelope encryption where a KMS encrypts a random string that is itself used as a key for encryption/data | ||
of data. | ||
|
||
We furthermore protect the KMS issued random string by deriving a key from it using SHA-512 PBKDF2-HMAC hashed to 50,000 rounds, with the result used for AES-256-GCM. This provides FIPS compliance with a suitable level of encryption given the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rolling your own encryption is rarely advised, sops PGP will meet the needs of users without a KMS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using AES-GCM and PBKDF2 is "not rolling your own encryption".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, from experience, libgcrypt is extremely painful to use programatically, even more so from cgo. We'd have to use https://github.com/keybase/go-crypto, but it uses independent algos. Using the go crypto libraries with shims for FIPS-approved BoringSSL is a cleaner implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sops already does all the heavy lifting for PGP,
I don't mean rolling your own crypto algorithm, but rather how the algorithm is used - A lot can go wrong even if you are using proven algorithms - https://loup-vaillant.fr/articles/rolling-your-own-crypto / https://security.stackexchange.com/questions/18197/why-shouldnt-we-roll-our-own
} | ||
``` | ||
|
||
##### Encrypted configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, the machineadm_config.yaml
can be encrypted using sops, and then decrypted inline by machineadm
there should not be any need to configure decryption parameters and can be inferred from the sops envelope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I +1 here. I would like to keep encryption configuration to a minimum and sops seems like a good fit to me.
(Disclaimer: not an expert on this)
There's still quite a lot of details I want to add here, but there's plenty to think about, so comments welcome. |
* OS Distribution: An OS Distribution refers to a packaged version of an Operating System, in order to primarily distinguish between different Linux distributions such as CentOS vs. Ubuntu vs. Talos, as opposed to differences between Operating Systems as a whole (e.g. Linux and Windows). | ||
* Machineadm : Is a binary CLI that is executed on machines to perform Kubernetes Cluster API bootstrap. | ||
* CABPK: Cluster API Bootstrap Provider Kubeadm is the bootstrap controller that exists from v1alpha2 onwards and generates cloud-init bootstrap data to execute kubeadm for a machine provisioned by Cluster API | ||
* Cloud-Init: Is a first-boot bootstrapper written by Canonical and is widely used across Ubuntu, Amazon Linux 2, and VMware PhotonOS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to also include other bootstrap provider such as Ignition or Talos in the glossary?
<tr> | ||
<td>U1</td><td>Non cloud-init bootstrap processes</td> | ||
<td> | ||
Ignition is a user-data processing Linux bootstrapping system used by Flat Car Linux, RHEL Atomic Host and Fedora CoreOS. (cluster-api/3761) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a nitpick but this should be worded as an actual user story (similar to the ones below).
} | ||
``` | ||
|
||
##### Encrypted configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I +1 here. I would like to keep encryption configuration to a minimum and sops seems like a good fit to me.
(Disclaimer: not an expert on this)
|
||
### Risks and Mitigations | ||
|
||
- What are the risks of this proposal and how do we mitigate? Think broadly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be missing?
</tbody> | ||
</table> | ||
|
||
##### Core plugins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How was this selection made? What would be a criteria for a core plugin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a good metric for this right now. Core plugins should do at least what we expose in cloud-init across the providers today in order to enable upgrades.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see Windows being included in the design :-) and thanks for tackling this hard topic.
I had a few thoughts from my (limited) understanding of the overall capi designs so far. I have some reservations of adding a totally new component to this system. I will try to explain my reservations:
I thought the idea for plug-able Bootstrap Providers was to provide a way for other system that didn't follow the same bootstrapping mechanism (i.e. Ignition or even Windows) to create their own mechanisms for formatting the data in a way the infra providers could then consume. So in the case of flatcar, there would be a bootstrap provider that formats the data in the way Ignition knows how to handle it. The challenge I've seen is a infra provider chooses one bootstrap provider and can't easily swap them in and out (kubernetes-sigs/cluster-api-provider-azure#1035)
With this proposal it looks like we are say CABPK is the bootstrap provider that will provide data to multiple different types of bootstrapers (kubeadm/ignition/maybe even windows answerfiles). And therefore the idea that we needed pluggable bootstrap providers should be revisited since re-implementing a lot of the logic (certs generation and such) already in CABPK would be duplication, tedious, and hard. Folks would rather just re-use a lot of CABPK and output it differently.
From the challenges in the proposal they seem to fall into two categories: (1) different formats for bootstrappers (Ignite/kubeadm/windows answerfiles) and (2) getting secrets for those bootstrappers in the places.
In particular the challenges related to different formats for bootstrappers (1), it seems like having a single bootstrap provider but with different output types would address some of them without adding a new component. I considered this an option for the windows proposal but cloudbase-init was such a nice alternative to cloud-init with fewer changes to CABPK that we went that route. I am not as familiar with Ignition but it seems like it doesn't have the same cloud-init api and re-using the logic with a different output was proposed in #4172. Was this approach considered?
The second set of challenges that this proposal addresses around secrets and security of sensitive information (2) feels like a different set of problems than the ones related to different types of bootstrappers (though closely related). If you designed for multiple types of outputs, the ability to add additional ways to gather secrets might (?) be able to be part of that output.
Hope that makes sense and helps give another perspective.
|
||
Furthermore, certain providers, such as Cluster API Provider AWS are utilising time-limited hacks within cloud-init to secure bootstrap userdata, and this is not sustainable for the health of the project over time. | ||
|
||
Use of an agnostic bootstrapper (machineadm) benefits end users in that they won’t need to closely monitor changes within each system that may have negative side effects on Cluster API. In addition, separating out the processes required to bootstrap a Kubernetes node from the bootstrap mechanism allows for Cluster API Kubeadm Bootstrap Provider (CABPK) to become an independent component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How this going to avoid growing into its own "cloud-init" over time?
<tr> | ||
<td>U7</td><td>Existing Clusters</td> | ||
<td> | ||
As a cluster operator with existing clusters, I would like to be able to, after enabling the necessary flags or feature gates, to create new clusters or machines using nodeadm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is nodeadm?
<tr> | ||
<td>R5</td> | ||
<td> | ||
The machine bootstrapper MUST be able to be used in conjunction with an OS provided bootstrapping tool, not limited to Cloud-Init, Ignition, Talos and Windows Answer File. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloudbase-init was mentioned above isn't mentioned here. Is it included in cloud-init or is this suggesting the preferred way to provide a Windows answer file that kicks of provisioning scripts? Having an example of an alternative bootstrapping tool would be useful for me to grok some of the examples.
<tr> | ||
<td>R8</td> | ||
<td> | ||
The machine bootstrapper MUST enable a Windows node to be domain joined. The machine bootstrapper WILL NOT manage the group membership of a Windows node in order to enable Group Managed Service Accounts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is saying, the machine bootstrapper will not create the AD groups or the gMSA accounts. That activity is left up to the domain admin but it will provide the ability to join a node into an existing AD Group which has permissions to use the gMSA
<tr> | ||
<td>U6</td><td>Kubeadm UX</td> | ||
<td> | ||
As a cluster operator, I would like the bootstrap configuration of clusters or machines to be shielded from changes happening in kubeadm (e.g. v1beta1 and v1beta2 type migration)</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still a concern for this proposal?
</tr> | ||
|
||
<tr> | ||
<td>machineadm core plugins</td><td>cluster-api repo</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be part of machineadm directly?
// InfrastructureRef is a reference to an infrastructure provide specific resource | ||
// that holds details of how to store and retrieve bootstrap data securely, | ||
// and how a bootstrapper can report status. | ||
InfrastructureRef *corev1.ObjectReference `json:"infrastructureRef,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we expecting this to be different for each machine/across machines?
|
||
// MachineBootstrapConfigSpec defines the desired state of a MachineBootstrapConfig | ||
type MachineBootstrapConfigSpec struct { | ||
PluginRefs []corev1.ObjectReference `json:"pluginRefs"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we expecting this to be different for each machine/across machines?
MachineBootstrapConfigSpec `json:",inline"` | ||
// PluginTemplateRefs is a list of machine bootstrap plugin template refs | ||
// which will be instantiated for each machine. | ||
PluginTemplateRefs []corev1.ObjectReference `json:"pluginTemplateRefs,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be I'm missing something, but what is the difference between PluginTemplateRefs
and PluginRefs
embedded via MachineBootstrapConfigSpec
?
#### Serialized data format | ||
|
||
`machineadm_config.yaml` will be a multi-part YAML docment supporting multiple | ||
data types, read and processed in order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why order is important?
} | ||
|
||
// PluginStatus defines the status of a machineadm plugin | ||
type PluginStatus struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expected usage of this type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bootstrap reporting
@@ -134,6 +140,11 @@ The Custom Resource for Kubernetes that represents a request to have a place to | |||
|
|||
See also: [Server](#server) | |||
|
|||
### Machineadm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest call it MachineBootstrapper
to indicated more appropriate scope of this binary.
Adm` for machine has its traditional and wider connotation.
## Motivation | ||
|
||
Cluster API’s reliance on cloud-init has frequently caused problems: changes in patch releases have caused breaking changes for Cluster API providers, such as vSphere and AWS. It has also made it difficult for other vendors, not using cloud-init, to easily use the core | ||
Cluster API providers, examples include OpenShift and FlatCar Linux which both use Ignition, and Talos with their own system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this. We also used Ignition with a internal build OS (base on Ubuntu).
|
||
### Goals | ||
|
||
* To produce a minimal on-machine bootstrapping mechanism to run kubeadm, and configure cluster related components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We find that if we have to use external CA to control plane components cert issuing, Kubeadm almost became useless other than generating the configurations and control plane manifests. So for this reason, binding the machine bootstrapper to kubeadmn is a strong coupling. An alternative approach is to define the interface with the artefacts needed by the control plane components, an example is the eskctl.
I understand this is out of scope for this proposal just to bring this to attention/discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaving it in as a goal because the machine bootstrapper will support kubeadm via one of its plugins, but we should also enable the other use cases. We also want to support external CAs as well.
<tr> | ||
<td>U3</td><td>Active Directory</td> | ||
<td> | ||
As a platform operator of a Windows environment, I may require their Kubernetes nodes to be domain joined such that the application workloads operate with appropriate Kerberos credentials to connect to services in the infrastructure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar user case: for platform use Vault as PKI, the node should be able to join the cluster with the vault token/app_role credential.
<tr> | ||
<td>Provisioning</td> | ||
<td> | ||
Expected to run as part of machine bootstrapping e.g. (part of cloud-* SystemD units or Windows OOBE). Only supported when used with Cluster API bootstrapping. Typically executes cluster creation or node join procedures, configuring kubelet etc... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"cluster creation or node join procedures, configuring kubelet "
This responsibility is now (partly) taken by kubeadm - does it mean the machineadm will assume that role ?
<tr> | ||
<td>Preparation</td> | ||
<td> | ||
Could be run as part of machine bootstrapping prior to “provisioning”, and “prepares” a machine for use with Kubernetes. We largely keep this out of scope for the initial implementation unless there is a trivial implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what stage is "prior to provisioning". Do we have an use case to illustrate it?
status: provisional | ||
see-also: | ||
- "/docs/proposals/2021022-kubelet-authentication-plugin.md" | ||
- "/docs/proposals/2021022-artifacts-management.md" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking forward to 2021022-kubelet-authentication-plugin.md
. We are aiming at the user case of using it to do vault PKI integration.
/milestone v0.4.x |
Thanks all. Sorry for the delay in replies, but am slowly returning. |
From the challenges in the proposal they seem to fall into two categories: (1) different formats for bootstrappers (Ignite/kubeadm/windows answerfiles) and (2) getting secrets for those bootstrappers in the places.
Thanks for this. Me and Yassine have been going back and forth on this ourselves. I do however at least need to explore it in the alternatives section so we can make a decision. |
Co-authored-by: Ace Eldeib <alexeldeib@gmail.com>
Hi All, After some discussion and thought, it's clear that this needs a "revise and resubmit", and some more thought about what this actually is. I don't think we can deliver this in v1alpha4, and will focus on #4219 for v1alpha4. If someone wants to take this forward, happy to discuss. Thanks for all your feedback. |
Signed-off-by: Naadir Jeewa jeewan@vmware.com
What this PR does / why we need it:
Defines an OS agnostic machine bootstrapper, compatible with cloud-init, Ignition,
and Windows answer files.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #