Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubeadmConfig changes should be reconciled for machine pools, triggering instance recreation #8858

Closed
AndiDog opened this issue Jun 15, 2023 · 31 comments
Labels
area/bootstrap Issues or PRs related to bootstrap providers area/machinepool Issues or PRs related to machinepools kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@AndiDog
Copy link
Contributor

AndiDog commented Jun 15, 2023

What would you like to be added (User Story)?

As operator, I want KubeadmConfig.spec.{files,preKubeadmCommands,...} changes to have an effect on MachinePool-creates nodes, resulting in server instance recreation.

Detailed Description

Discussed in office hours 2023-06-14 (notes, main points copied into this issue below).

Situation: MachinePool manifest references AWSMachinePool and KubeadmConfig (a very regular machine pool config)

Expectation: Changing KubeadmConfig.spec.* should lead to recreating (“rolling”) nodes. With infra provider CAPA, nothing happens at the moment. Here's why.

  • Problem 1: CAPI’s KubeadmConfigReconciler does not immediately update the bootstrap secret once KubeadmConfig.spec changes, but only once it rotates the bootstrap token (purpose: new machine pool-created nodes can join the cluster later on). This means several minutes of waiting for reconciliation.

    • Suggestion: Simple bug fix. @AndiDog has a draft implementation that always considers updating the secret, not only if the token must be refreshed. In the meantime, users can work around by creating a new KubeadmConfig object.
  • Problem 2: CAPA (and likely all other infra providers) does not watch the bootstrap secret, so it cannot immediately react to KubeadmConfig.spec changes either.

    • @AndiDog Should it even directly watch the secret? What should the CAPI ⇔ CAPx contract be?
    • @fabriziopandini: Watching secrets can blow up memory [of the kubectl client]. Think of the UX and possible solutions first.
    • @CecileRobertMichon: Maybe change MachinePool.spec.template.spec.bootstrap.dataSecretName every time because that triggers reconciliation for the MachinePool object (machinepool_controller_phases.go code).
    • @sbueringer: For MachinePool support in ClusterClass we have to decide what the “ideal” way to rollout BootstrapConfig is
  • Problem 3: The bootstrap secret contains both the “how to set up this server init data” (e.g. cloud-init / ignition) and the random bootstrap token by which nodes join the cluster. If only the token gets refreshed (DefaultTokenTTL is 15 minutes), we don’t want nodes to be recreated, since that would recreate all nodes every few minutes.

Anything else you would like to add?

Label(s) to be applied

/kind feature
/area bootstrap
/area machinepool

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. area/bootstrap Issues or PRs related to bootstrap providers area/machinepool Issues or PRs related to machinepools needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 15, 2023
@killianmuldoon
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 15, 2023
@fabriziopandini
Copy link
Member

fabriziopandini commented Jun 20, 2023

Frankly speaking, what I found confusing in the UX is that we are changing an external reference in place, and this is inconsistent with everything else in CAPI. Additionally, the reference to a bootstrap object is defined inside a machine template, while e.g. in MD/MS we have a similar API modeling but we use a bootstrap template instead of a bootstrap object.

However, IMO while designing a solution for problem 1, I think we should aim for a solution where changes to external objects should surface to the parent objects, no matter if user-driven (e.g rotating template references) or machine-driven (by changing the data secret name).

This approach - changing the parent object - is consistent with everything else in CAPI and provides a clean pattern for all the providers, and IMO this could address problem 2.

WRT to problem 3, It seems to me that what we are saying is going in the direction of each machine having its own bootstrap token so machine provisioning is not affected by changed to concurrent changes to the bootstrap object, and this could be somehow overlapping with @Jont828 work (see eg. #8828 (comment))

Last note, what we are deciding

@AndiDog
Copy link
Contributor Author

AndiDog commented Aug 21, 2023

This approach - changing the parent object - is consistent with everything else in CAPI and provides a clean pattern for all the providers, and IMO this could address problem 2.

But then the providers need to watch KubeadmConfig. Right now, most of them only watch their own types (e.g. AWSCluster) and Cluster. I think it's the only reasonable way to implement good machine pool related reconciliation, so I'm fine with that proposal. Then how does a provider find out when the bootstrap data secret got updated, since that's when the provider needs to work on its machine pool, for example CAPA updating the auto-scaling group's latest launch configuration, put in the latest user data, and roll out new nodes. We need to also avoid timing bugs – if providers reconcile too early, they still see the old content of the secret.


Here is one idea how to add such a contract with providers:

  • Providers should not checksum the content of the bootstrap data (which my first hacky attempt did: Trigger machine pool instance refresh also on user data change (unless it's only the bootstrap token) cluster-api-provider-aws#4245), but instead CAPI should reason about the data and report whether something changed. Since the bootstrap token is currently part of that data, and gets rotated regularly (is this for security reasons?), the bootstrap data content changes all the time but does not necessarily have relevant changes such as a change in KubeadmConfig.spec.*.

    • Therefore, CAPI could store a KubeadmConfig.status.dataSecretPartialChecksum which hashes the cloud-init/ignition content, excluding the random bootstrap token value. It should first update the secret and then write that field.
    • We can also put the checksum into the secret's labels for comparison.
    • Also, there should be a KubeadmConfig.status.dataSecretFullChecksum including hashing of the token.
  • A provider like CAPA can watch KubeadmConfig objects which are related to AWSMachinePool.

    • Once the KubeadmConfig.status.dataSecretPartialChecksum value changes as compared to the latest rollout (for CAPA, this could be compared to a tag on the launch configuration version), the controller applies the latest bootstrap data to the machine pool, rolling out new nodes (for CAPA, this means to create a new launch configuration version and also trigger instance refresh).
    • If only KubeadmConfig.status.dataSecretFullChecksum changes, but not KubeadmConfig.status.dataSecretPartialChecksum, it means the bootstrap token got rotated and the new bootstrap data should only apply to new nodes without rolling existing nodes (for CAPA, this means to create a new launch configuration version but not triggering instance refresh).
  • Since this only adds new fields, this does not look like a breaking change for providers. They can migrate to this contract when they're ready. For backward compatibility to older CAPI versions, meaning the checksum fields are defaulted to "", providers can fall back to their old behavior.

What do you think of this?

@CecileRobertMichon
Copy link
Contributor

But then the providers need to watch KubeadmConfig

This is a dealbreaker if we're talking about watching specific Kubeadm* types. Infra providers should be bootstrap provider agnostic and work with all bootstrap providers and control providers as kubeadm is not the only one out there. If we're trying to watch the bootstrap reference that would be fine as we can do that without referencing kubeadmconfig specifically. In that case, status.dataSecretPartialChecksum would need to become part of the bootstrap provider contract and every bootstrap provider would need to implement it.

@AndiDog
Copy link
Contributor Author

AndiDog commented Sep 13, 2023

@CecileRobertMichon I agree that infra providers shouldn't directly watch KubeadmConfig/Secret, but then how would AWSMachinePool reconciliation get triggered immediately? Having CAPA reconcile every few minutes, as it does right now (implementation detail!), isn't really enough: it sees that the secret has changed and can react as we desire, but as a user I really expect this to happen within seconds.

The hierarchy is MachinePool > KubeadmConfig > Secret, so if the AWSMachinePool controller should get triggered, we'd need to write something into at least MachinePool.status, I guess.

CAPA's current controller has this:

func (r *AWSMachinePoolReconciler) SetupWithManager(ctx context.Context, mgr ctrl.Manager, options controller.Options) error {
	return ctrl.NewControllerManagedBy(mgr).
		WithOptions(options).
		For(&expinfrav1.AWSMachinePool{}).
		Watches(
			&source.Kind{Type: &expclusterv1.MachinePool{}},
			handler.EnqueueRequestsFromMapFunc(machinePoolToInfrastructureMapFunc(expinfrav1.GroupVersion.WithKind("AWSMachinePool"))),
		).
		WithEventFilter(predicates.ResourceNotPausedAndHasFilterLabel(logger.FromContext(ctx).GetLogger(), r.WatchFilterValue)).
		Complete(r)
}

Does the Watches mean it reacts to changes in MachinePool.{spec,status} even if no AWSMachinePool object changed? If yes, that would be great and my proposal could be changed to write the checksums into there as well.

@CecileRobertMichon
Copy link
Contributor

Does the Watches mean it reacts to changes in MachinePool.{spec,status} even if no AWSMachinePool object changed

yes, that's correct

@elmiko
Copy link
Contributor

elmiko commented Sep 13, 2023

we are discussing this issue at today's community meeting, a point raised by @CecileRobertMichon about an alternative to using the status field, what if we could use annotations instead?

@fabriziopandini
Copy link
Member

Adding on top of ^^, we should keep into account that there are users relying on Machine pools + other bootstrap providers (different than the kubeadm one)

@CecileRobertMichon
Copy link
Contributor

Here is the current bootstrap provider contract: https://cluster-api.sigs.k8s.io/developer/providers/bootstrap

If we add the field to MachinePoolStatus based on Boostrap config status field we would need to either make the status field part of the boostrap status contract or make it opt-in/optional so it doesn't break other bootstrap providers that don't implement the field.

cc @johananl @richardcase who are working on #5294 which might help simplify this

@AndiDog
Copy link
Contributor Author

AndiDog commented Sep 14, 2023

If we add the field to MachinePoolStatus based on Boostrap config status field we would need to either make the status field part of the boostrap status contract or make it opt-in/optional so it doesn't break other bootstrap providers that don't implement the field.

I think we can make it an optional part of the contract, mentioning that bootstrap providers SHOULD (not MUST) fill in the new fields. The infrastructure provider can then deal with it backwards-compatibly. For example, CAPA currently creates a new launch configuration version if bootstrap data changed, but does not roll out new nodes (= trigger autoscaling group instance refresh). If it sees the "checksum without random bootstrap token" field filled, it can take a decision to roll out nodes immediately, and otherwise fall back to previous behavior. Likewise for other providers.

Regarding #5294, I'm not sure what exactly will change. Some items like "only cloud-init supported" are out of date by now, and my proposal would actually avoid having cloud-init/ignition-specific stuff in CAPA (as in the hack kubernetes-sigs/cluster-api-provider-aws#4245 which parses cloud-init data). @johananl @richardcase could you provide a renewed summary of the high-level changes planned in that issue? I'd like to see if 1) it can resolve problems we address here and 2) whether the timeline is too long out so that it makes sense to implement solutions for this issue very soon.

Overall, I'd prefer implementing a proposal soon in order to make machine pools actually useful and less surprising right now. The discussed proposal is backward-compatible, optional and can therefore easily be changed or removed later on. Getting it implemented for the KubeadmConfig + CAPA use case should be quite easy. @CecileRobertMichon how would we go ahead if we think the proposal makes sense? Should I then just start implementing or do we want a design document?


Update of my proposal after recent discussions:

  • Add fields {KubeadmConfig,AnyOtherBootstrapConfig}.status.{dataSecretFullChecksum,dataSecretPartialChecksum} (maybe with better names) that contain a hash of the bootstrap data with/without the regularly-rotated, random bootstrap token. Bootstrap providers should first write the new secret content and then set these fields. Adapt the contract to add this as optional part. Announce in which CAPI version this API extension gets released so that infra providers can adapt.

    I've researched other bootstrap providers (EKS, Microk8s, Talos) and they're all very simple, writing data into a secret and setting status.dataSecretName as desired. Hence, it should be easy to implement hashing in all those providers (@fabriziopandini thanks for raising the concern).

  • Bubble the field values up into MachinePool.status.{dataSecretFullChecksum,dataSecretPartialChecksum}

  • Infra providers normally watch MachinePool objects and their reconciliation will get triggered immediately on a change. For example, CAPA could then compare the checksum stored in the launch configuration version tags with the MachinePool.status.dataSecretPartialChecksum, and if there's a change, roll out new nodes.

@fabriziopandini
Copy link
Member

fabriziopandini commented Sep 14, 2023

I'm sorry I have very limited time ATM to dig into this rabbit hole, but I lack some knowledge of MachinePools, and catching up with this + this specific problem requires a dedicated slot that I'm struggling to get.

I still have some concerns about one of the assumptions of this thread, and more specifically about the fact that we are changing the BootstrapConfig (the KubeadmConfig in this example) in place

I'm not sure the BootstrapConfig was originally conceived for supporting in-place changes. If I look at the code, It seems to me that this behavior is supported incidentally, because we are rotating the bootstrap token for security reasons, which is something we added at a later stage, AFAIK without any intent to support this use case (in-place changes).

Now, the assumption above is making this a supported use case, but in a way that is very specific for machine pools which might create some "unexpected side effect" when BootstrapConfig is used with Machines (what if users then ask for us to support for in-place changes on stand-alone machines, given that the BootstrapConfig can be changed in place?).

The assumption above is also driving us to implement two additional layers of complexity in our code.

The first layer of complexity is for handling the change of spec without waiting for bootstrap token rotation & computing two checksums which are based on a very specific Kubeadm feature, the bootstrap token, but which barely makes sense for other bootstrap provider implementations.

The second layer of complexity is to bubble up two checksums to the machine pool object.

Given all that, I'm starting to wonder if this is the right path to take because it seems we keep adding complexity and exceptions to solve a problem that we created ourselves with the initial assumption.

If I think about what we need, ultimately we need a way to know that the BootstrapConfig object is changed, which is something we usually do in a very explicit way by rotating the bootstrap object.

If we assume to rotate the BootstrapConfig in the MachinePool, the InfrastructureMachinePool will be immediately notified of the change, because they are already watching for the parent MachinePool object.

If this is correct, probably what is missing is that we don't know which machines have been created with the new and the old bootstrap token. This could probably be addressed in a couple of ways, but this is also where my lack of knowledge shows up.

For implementation with machine pool machines, I assume we can use the single machines to track this info
For implementation without machine pool machines, I assume this should be tracked on the InfrastructureMachinePool object or somehow delegated to the underlying infrastructure construct.

But those are just assumptions

@fabriziopandini
Copy link
Member

Additional consideration.
Another path, more radical but interesting, is to think about replacing the BoostrapConfig with BoostrapConfigTemplate, and hide away from the users all the complexity about bootstrap secrets etc. etc.

This is interesting in my mind because it can fit more naturally in the MachinePoolMachine story as well because the MachinePool implementation in CAPI can create those machines upfront / with the right data secret/infra machine etc. and the InfrastructureMachinePool implementation can simply attach to them the missing info when available

@johananl
Copy link
Member

johananl commented Sep 21, 2023

Thanks for tagging me @CecileRobertMichon. I'll try to add my thoughts in general as well as in the context of the WIP proposal for addressing #5294.

Terminology disamiguation

First, I think we have to update our terminology to clearly distinguish between two related yet distinct concerns:

  1. Running a "bootstrapper" such as kubeadm to bootstrap a server (namely: turn the server into a k8s node). Let's call this "bootstrap" (we already do so in the glossary).
  2. Interpreting a cloud-init or Ignition blob and customizing a server's OS. Let's call this "provisioning". This is a new term and I'm happy to replace it with another as long as it's distinct from "bootstrap" and doesn't lead to more confusion.

Note that the bootstrap process is based on top of provisioning: We currently use cloud-init or Ignition in order to convey kubeadm configuration to a machine as well as to execute the shell commands which trigger the kubeadm init/join process. However, provisioning isn't the same as bootstrap because provisioning can also contain OS-level customizations which aren’t related to bootstrap (e.g. tweaking a sysctl parameter). Furthermore, even if provisioning contained only bootstrap-related configuration, IMO we'd still need the separate terminology because kubeadm configuration isn’t the same as cloud-init or Ignition configuration and it’s important to distinguish between the two given the different stages performed in order to get from a declarative kubeadm config to a VM with a running kubelet.

My thoughts

With that said, here are my thoughts:

Machine recreation on bootstrap config change

Following what @fabriziopandini said, I think that before we figure out how to recreate machines following a change to the bootstrap config, we should decide whether we want to allow doing so in the first place. Personally, I don't see why not but I don't have a strong opinion here. Regardless of what gets decided, I agree that this should either be supported and working or explicitly unsupported and therefore documented and rejected by a webhook. In any case, I think we can agree that "supported by accident" isn't something we want.

Multiple checksums

I don't think we want to maintain two bootstrap config checksums for the following reasons:

  1. We should avoid baking kubeadm-specific assumptions into core CAPI (or at least avoid adding more such assumptions on top of existing ones). AFAIK the project aims to be extensible and support more bootstrap mechanisms in the future.
  2. The need to do a partial hash suggests to me that two unrelated concerns may have been mixed in a single data structure. I'd rather fix a design problem than add new complexity to work around it.

I'd have to dig deeper into the codebase to get a clearer understanding of the problem, but I'm guessing the thing which shouldn't be hashed (bootstrap token) should have resided elsewhere if it's not a part of { things we watch when deciding whether to replace a machine }. I don't have an alternative suggestion at the moment, but chances are we'll have to address this concern as a byproduct of working on #5294 anyway.

CAPI <> CAPx contract

Great point about the CAPI <> CAPx contract @AndiDog. This should be a core piece of the #5294 work and I've already started thinking about it. I'll try to copy you in relevant discussions around the proposal process.

We already have https://cluster-api.sigs.k8s.io/developer/providers/machine-infrastructure which might be relevant but I need to dig deeper and see if we have an entire contract missing or we just need to update an existing one.

High-level description of #5294

The #5294 proposal is still WIP, but very briefly here is what I envision at the moment:

  • The user specifies their desired cluster config declaratively. The cluster config includes a selected bootstrapper (e.g. kubeadm) but any bootstrapper-specific knobs are hidden from the user to shield the user from changes to the bootstrapper config schema (that is, if kubeadm changes its API, there is no reason to break the CAPI UX).
  • A controller (need to define which) reads the cluster config and generates a bootstrap config, e.g. a kubeadm config document.
  • A controller (need to define which) uses the bootstrap config document to generate a provisioning config for the relevant provisioner (e.g. cloud-init) while potentially merging the generated config with any user-specified customizations which aren't related to the bootstrapper. This allows CAPI to be opinionated about the bootstrap process while still allowing the user to perform arbitrary OS customizations without abusing the bootstrap mechanism.
  • An infra provider watches the final provisioning config and injects it to VMs using provider-specific mechanisms (e.g. AWS user data).

The proposal needs a lot more fleshing out so I don't know about a timeline. I'm still gathering requirements all over the place since #5294 touches a very central area of the codebase which affects a lot of use cases. My hope is that we can cleanly separate the effort into manageable tasks and focus on a narrow change set at a given time without losing the bigger picture. Maybe we should break #5294 down to a couple of smaller proposals - we'll see.

Infra providers watching KubeadmConfig

Following from the above, I don't think infra providers should watch KubeadmConfig because I think it goes against the separation of concerns we want to eventually achieve. Infra providers should watch a location with bootstrapper-agnostic provisioning config (e.g. a cloud-config document) and pass that config to machines.


Any comments on the above are welcome. Sorry about the 📜🙂

@AndiDog
Copy link
Contributor Author

AndiDog commented Sep 21, 2023

@johananl great input, thanks for being so detailed! The terminology makes sense.

I'd have to dig deeper into the codebase to get a clearer understanding of the problem, but I'm guessing the thing which shouldn't be hashed (bootstrap token) should have resided elsewhere if it's not a part of { things we watch when deciding whether to replace a machine }.

This is a very technical issue: AWS EC2 instances allow 16 KB of user data (essentially one byte blob) which must include everything to set up the instance on top of the base image (e.g. Ubuntu, Flatcar Linux or a custom one). Since the bootstrap token changes over time because it expires, the AWSMachinePool controller has to keep updating the EC2 auto-scaling group's latest launch configuration with lots of cloud-init/ignition + somewhere in there the random token. A machine pool can by definition create a new instance at any time and therefore always requires a fresh bootstrap token. The only way to split the "fixed" (from KubeadmConfig.spec) part from the random+rotating part (bootstrap token) in the current implementation is to put the token in external storage and reach out to the storage within the cloud-init/ignition commands. Only then we could compare the whole script content without requiring a checksum, and trigger rolling out new nodes if something has changed. But since we don't have this separation, I had the idea of a partial checksum, which apparently isn't a great, long-term solution. It seems #5294 can separate the data and concerns nicely. I will keep thinking of good solutions and workarounds that may be implementable now. I agree with the concerns and that we need to investigate further (@fabriziopandini @johananl). Unfortunately, this makes machine pools not ready for production at the moment. Please keep us updated about planned steps for #5294 👍

@johananl
Copy link
Member

Thanks @AndiDog, the issue is clearer to me now.

Actually, we already do have a partial solution which might go in the direction you've outlined. Ignition support on AWS utilizes S3 as a storage for Ignition config:

https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/b90c18cf1bac7308ace428245945a40e3099d758/controllers/awsmachine_controller.go#L750

The rationale for using S3 in that specific context was to bypass the AWS user data size limit, IIRC because things like TLS certificates took a lot of space and caused issues with some cluster configurations. But the user data size limit is a broader problem which affects more use cases, not just Ignition on AWS, and is one of the user stories covered in #5294 (see U13 - "large payloads").

@fabriziopandini
Copy link
Member

#5294 is a discussion way bigger than this specific issue.

That means that if we put this in the critical path most probably it will delay the solution to this specific problem and move off the table more tactical alternatives we were considering till now like #8858 (comment) or the idea I was mulling about in #8858 (comment) about leveraging on object rotation.

I just want to make sure everyone agrees on the implications, starting from @AndiDog who reported this issue.

@AndiDog
Copy link
Contributor Author

AndiDog commented Sep 27, 2023

I attempted the workaround of swapping the MachinePool -> KubeadmConfig reference which was mentioned as an idea by @fabriziopandini:

If I think about what we need, ultimately we need a way to know that the BootstrapConfig object is changed, which is something we usually do in a very explicit way by rotating the bootstrap object.

If we assume to rotate the BootstrapConfig in the MachinePool, the InfrastructureMachinePool will be immediately notified of the change, because they are already watching for the parent MachinePool object.

Implementation-wise, I use Helm template magic to append a hash of KubeadmConfig.spec YAML content to KubeadmConfig.metadata.name so whenever a provisioning-relevant change is made, e.g. addition to KubeadmConfig.spec.files, the object reference gets changed and the other controllers have a reconciliation party.

kind: MachinePool
spec:
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfig
          name: somename-<hash of KubeadmConfig.spec>

Sadly, it doesn't work at the moment. Here's why (I'm continuing my problem numbering 😂):

  • Problem 4: CAPI MachinePool just remains ready because of this early return.

     func (r *MachinePoolReconciler) reconcileBootstrap(ctx context.Context, cluster *clusterv1.Cluster, m *expv1.MachinePool) (ctrl.Result, error) {
     	// Call generic external reconciler if we have an external reference.
     	var bootstrapConfig *unstructured.Unstructured
     	if m.Spec.Template.Spec.Bootstrap.ConfigRef != nil {
     		bootstrapReconcileResult, err := r.reconcileExternal(ctx, cluster, m, m.Spec.Template.Spec.Bootstrap.ConfigRef)
     		// [...]
     		bootstrapConfig = bootstrapReconcileResult.Result
     	}
    
     	// If the bootstrap data secret is populated, set ready and return.
     	if m.Spec.Template.Spec.Bootstrap.DataSecretName != nil {
     		m.Status.BootstrapReady = true
     		conditions.MarkTrue(m, clusterv1.BootstrapReadyCondition)
     		return ctrl.Result{}, nil
     	}

    This means that the MachinePool.status.dataSecretName remains the same as before. Since I install my stuff with Helm, the old KubeadmConfig/somename-<old hash> and its secret are deleted, but now still referenced by MachinePool.status.dataSecretName! The newly-referenced KubeadmConfig/somename-<new hash> is not used. In short, swapping the object reference is not supported if the machine pool has been reconciled and became BootstrapReady before.

  • Problem 5: It gets worse. CAPA, on next reconciliation, tries to read the old secret (named MachinePool.status.dataSecretName=somename-<old hash> in the example). The old secret is however deleted, and CAPA has a code bug where it just continues after failing to read the secret. Ultimately, it creates a new launch template version with empty user data (= no cloud-init/ignition script). That version is applied immediately on the ASG (auto-scaling group), so any new instances would start but not get provisioned as Kubernetes nodes, causing a disaster.

    The bug:

     func (s *Service) ReconcileLaunchTemplate(
     	scope scope.LaunchTemplateScope,
     	canUpdateLaunchTemplate func() (bool, error),
     	runPostLaunchTemplateUpdateOperation func() error,
     ) error {
     	bootstrapData, err := scope.GetRawBootstrapData()
     	if err != nil {
     		record.Eventf(scope.GetMachinePool(), corev1.EventTypeWarning, "FailedGetBootstrapData", err.Error())
     		// <------------- this continues instead of returning an error!!!
     	}
     	bootstrapDataHash := userdata.ComputeHash(bootstrapData)
    
     	// [...]
    
     	// Create a new launch template version if there's a difference in configuration, tags,
     	// userdata, OR we've discovered a new AMI ID.
     	if needsUpdate || tagsChanged || *imageID != *launchTemplate.AMI.ID || launchTemplateUserDataHash != bootstrapDataHash {
     		// [...]
     		if err := ec2svc.CreateLaunchTemplateVersion(scope.GetLaunchTemplateIDStatus(), scope, imageID, bootstrapData); err != nil {

One bug in CAPI, one in CAPA. We believed that changing the bootstrap object reference (here: KubeadmConfig) should work, but that needs patching as well.

Do you think these are easier to fix than the other idea (supporting reconciliation of in-place changes to KubeadmConfig.spec e.g. with the dataSecretPartialChecksum field)? Let's get some opinions of which idea to tackle first. I'm fine to try patching the case of swapping the object reference, since that seems easier to fix, is a reasonable use case, and likely doesn't require API/contract changes.

@fabriziopandini
Copy link
Member

@AndiDog thanks for reporting back your findings

Just for clarification, the idea I was suggesting about template rotation was not a workaround or something I was expecting to work out of the box.

It was an idea about an alternative possible UX that we should aim to fix this issue, given that as explained above it seems to me that the checksum idea adds a lot of complexity instead of going at the root cause of the problem, which ultimately requires a discussion on the MachinePool API.

@AndiDog
Copy link
Contributor Author

AndiDog commented Oct 13, 2023

We discussed this in a meeting (@CecileRobertMichon, @fabriziopandini, @vincepri) and decided to try and support the reference rotation (changed MachinePool.spec.template.spec.bootstrap.configRef.name) as described in #8858 (comment). Other solutions didn't seem so natural, or are more complex. I'll give it a shot and report back.

@CecileRobertMichon
Copy link
Contributor

One additional thought: we might want to think about how to deal with "orphan" configs (for example, avoid rotating the KubeadmConfig bootstrap token if the config isn't actually in use by a MachinePool)

@AndiDog
Copy link
Contributor Author

AndiDog commented Oct 19, 2023

It seems problem 4 was already taken care of in issue #6563 (PR #8667) and I was referring to older code above. Nevertheless, I have to look into this again since the PR's test change only covers a changed name of <referenced bootstrap config>.status.dataSecretName, not rotation of the bootstrap config.

@AndiDog
Copy link
Contributor Author

AndiDog commented Oct 25, 2023

The mentioned problems 4+5 (#8858 (comment)) are fixed via new PRs. CAPI was fine by now and I only added a test (#9616). CAPA required a fix (kubernetes-sigs/cluster-api-provider-aws#4589).

With this, changing the KubeadmConfig reference in a CAPA cluster works so far as that the reconciler creates a new launch template version. This however still doesn't create new EC2 instances i.e. roll out new nodes, given how CAPA still has no contract or way of knowing whether something relevant changed inside the user data. We are however now in a better position to discuss that specific issue without running into other trouble. I'll discuss it in one of the next CAPI office hours.

@AndiDog
Copy link
Contributor Author

AndiDog commented Oct 25, 2023

One idea was confirmed as reasonable by several people in the office hours: consider changing the bootstrap object reference as desire to roll out. I had the doubt that both the old and new bootstrap config (with different names) could produce the same output and the user could be surprised that all nodes get replaced. But then again, @CecileRobertMichon and others rightfully commented "Why would you create a new one with the same content?" since that seems unusual.

Next, I will try this solution and see whether other opinions appear as to how this can be solved. And @vincepri suggested factoring out a new issue for splitting off the random, long-lived bootstrap token from KubeadmConfig, since it's intermixed here but really a separate problem. I'll create an issue for that.

@Jont828
Copy link
Contributor

Jont828 commented Nov 14, 2023

@AndiDog In #8842, we're adding MachinePool Machines to DockerMachinePools. This would affect the ClusterClass rollout e2e test as we wouldn't be able to rollout changes to the KubeadmConfig to the MachinePool Machines, so I've made a change to exclude them for now. But in the future, we'd want to revert this change. Tagging @willie-yao as well.

@CecileRobertMichon
Copy link
Contributor

@AndiDog what is the status on this issue? Would love to see MachinePools graduate (#9005) in the coming release cycle, is this issue still a blocker? If so, do you need any help?

@AndiDog
Copy link
Contributor Author

AndiDog commented Jan 23, 2024

kubernetes-sigs/cluster-api-provider-aws#4619, which solves the issue for CAPA, will soon be merged. It's already working well in our fork. CAPI code also seems to be fine by – I didn't run into more issues.

For CAPZ, kubernetes-sigs/cluster-api-provider-azure#2972 might be a related problem, but I'm not currently involved in investigating that.

Whether the MachinePool API should be stabliized depends on whether we want to stabilize the main infra provider implementations so they support reconciling all changes (as reported in this issue).

@AndiDog
Copy link
Contributor Author

AndiDog commented Feb 5, 2024

For CAPA, the fix is now merged (kubernetes-sigs/cluster-api-provider-aws#4619).

Given that CAPI code was already fixed earlier (via #8667), I think we can close this.

/close

@k8s-ci-robot
Copy link
Contributor

@AndiDog: Closing this issue.

In response to this:

For CAPA, the fix is now merged (kubernetes-sigs/cluster-api-provider-aws#4619).

Given that CAPI code was already fixed earlier (via #8667), I think we can close this.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member

sbueringer commented Apr 12, 2024

@AndiDog Sorry it's a pretty long issue with various linked issues/PRs etc.

Can you please summarize what a user would have to do to rollout a change to a KubeadmConfig with a reguar cluster (not using ClusterClass). Is it as simple as modifying the KubeadmConfig object that is linked in a MachinePool directly?

(I'm trying to figure out if we got the ClusteClass implementation right)

@AndiDog
Copy link
Contributor Author

AndiDog commented Apr 20, 2024

@sbueringer Indeed, many ideas and suggestions. This one got implemented until now:

Change MachinePool.spec.template.spec.bootstrap.configRef.name and the KubeadmConfig.spec changes will be reconciled.

In practice, that means a user's KubeadmConfig.metadata.name should be suffixed with a hash (or increasing number) when making changes, and the reference in the MachinePool object gets changed to this new KubeadmConfig name.

@sbueringer
Copy link
Member

Thank you very much.

This means as of today ClusterClass will not be able to rollout changes to bootstrapconfigs for machine pools. I'll open a new issue for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bootstrap Issues or PRs related to bootstrap providers area/machinepool Issues or PRs related to machinepools kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

9 participants