Machine with cloud-init 23.3.0 or newer fails to join cluster #4745
Labels
kind/bug
Categorizes issue or PR as related to a bug.
priority/important-soon
Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
triage/accepted
Indicates an issue or PR is ready to be actively worked on.
/kind bug
What steps did you take and what happened:
I used https://github.com/kubernetes-sigs/image-builder/ to create an Ubuntu 20.04 AMI with the latest available cloud-init package, 23.3.3. The machine fails to join the cluster.
What did you expect to happen:
The machine should join the cluster.
Anything else you would like to add:
In #1490, CAPA began writing sensitive user-data to AWS Secrets Manager (#1924 added support for an alternative, the SSM Parameter Store). CAPA replaced the user-data produced by CABPK with a mechanism to fetch the user-data from the service. This mechanism relied on an "include" that would, by design, fail the first time cloud-init ran. CAPA relied on cloud-init ignoring the failure.
As of canonical/cloud-init#367, cloud-init stopped ignoring the failure by default, but introduced a feature flag that allowed cloud-init to ignore the failure, as it had in the past. The default settings caused the cloud-init boot to fail, and kubernetes-sigs/image-builder#406 used the feature flag as a work around.
More recently, as of canonical/cloud-init#4228, the feature flag itself was removed. Without the feature flag, the existing workaround has no effect, and cloud-init boot fails.
@supershal and I looked into this issue, and filed kubernetes-sigs/image-builder#1333. We finally understand the root cause.
The most CAPA-maintained AMIs were created with cloud-init 22.4.2, instead of the default cloud-init version.
Environment:
kubectl version
): v1.27.8/etc/os-release
): Ubuntu 20.04The text was updated successfully, but these errors were encountered: