Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow user-controlled naming of Machines in Machine collections #10577

Open
mdbooth opened this issue May 8, 2024 · 34 comments
Open

Allow user-controlled naming of Machines in Machine collections #10577

mdbooth opened this issue May 8, 2024 · 34 comments
Labels
area/machinedeployment Issues or PRs related to machinedeployments area/machineset Issues or PRs related to machinesets area/provider/control-plane-kubeadm Issues or PRs related to KCP kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@mdbooth
Copy link
Contributor

mdbooth commented May 8, 2024

This would be the long-term solution to the problem discussed in #10463

What would you like to be added (User Story)?

As a user, I must ensure all my nodes are named according to a corporate policy which, amongst other things, ensures that alerts from centralised monitoring are directed to the correct team.

Additional Context

Naming of Nodes

Although other configurations are technically possible, the name of a Node will almost always be based on the name of the cloud resource which is hosting the Node. This may be taken from the hostname of the guest OS, which will typically have been served to the OS on startup from a metadata service, or may have been set directly in kubelet configuration based on information queried from the same metadata service.

Naming of cloud resources

CAPI has never had a contract covering the naming of cloud resources by infrastructure providers. Consequently different providers have settled on different conventions. That I am aware of, various different infrastructure providers use some combination of:

  • Namespace
  • Machine name
  • InfrastructureMachine name

In the following, we assume only that cloud resources (and therefore Nodes) are named by some function of these inputs which an end-user of the infrastructure provider can understand intuitively.

Breaking change

#9833 changed the creation of InfrastructureMachine objects to have the same name as their associated Machine object. In general this change is welcome, as now the bootstrap, Machine, and InfrastructureMachine all have the same name. This simplifies operating the cluster.

However, it changed one of the inputs to our naming function. The infrastructure machine was previously named after the template it was generated from. With this change it is now unrelated to the infrastructure machine template. This changes the naming of Nodes on any platform which uses InfrastructureMachine name in its naming function, which includes OpenStack.

MachineDeployments and MachineSets

MachineDeployments are affected because the InfrastructureTemplate name used to be the basis of the InfrastructureMachine name and therefore an input to the cloud resource naming function, but it no longer is.

However, as the MachineDeployment name now serves the same naming function as the InfrastructureTemplate name, a workaround is to create new MachineDeployments with the required name, instead of relying on the name of the InfrastructureTemplate.

KubeadmControlPlane

Control plane machines created by KubeadmControlPlane are affected for the same reason as MachineDeployments.

However, unlike with MachineDeployments, it is not possible to create a new KubeadmControlPlane for an existing cluster. This means that it is not possible to reflect Node naming policy in control planes for:

  • clusters created before CAPI 1.7
  • clusters whose Node naming policy changes after creation

Detailed Description

I propose to make the smallest possible API change to enables flexible Node naming for all Nodes. As the only controller with no reasonable workaround is KubeadmControlPlane, I propose to add a naming field to KubeadmControlPlane.

We will add a new field to KubeadmControlPlaneMachineTemplate:

// NamePrefix will be used as a prefix when generating control plane Machine names. Names will be generated by appending `-` and a random suffix. If not specified, the default is use the name of the containing KubeadmControlPlane.
NamePrefix string

In a yaml template this would look like:

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: example
spec:
  ...
  machineTemplate:
    namePrefix: "team_foo-controlplane"
    ...

The code implementing this change will live in:

The latter change will ensure that updating namePrefix will result in a control plane rollout.

Updating MachineDeployment/MachineSet

We had previously discussed applying the same solution to MachineDeployment for consistency. We can still do that, but as it would be a distinct API change I propose we consider it separately. However, on reflection I now believe we should not do this at all.

Firstly, I think we should not do this because it is a completely distinct change which must be made separately to both separate APIs, and separate implementations, and it is not required by the use case.

Secondly, the naming of objects according to their owning collection is a common and convenient pattern in kubernetes, which we would be breaking. Consistent naming was the principal driver for #9833 and despite this specific regression it remains desirable. If the use case can be covered without separating the naming of Machines from their owning MachineDeployment, I think that is preferrable.

Lastly, we cannot be entirely consistent in this in any case, because the proposed API cannot be implemented by MachinePools. Given that there will be inconsistency in any case, I prefer to reduce the scope of the 'special case' to the smallest possible which covers the use case.

Anything else you would like to add?

Relevant slack discussion here: https://kubernetes.slack.com/archives/C8TSNPY4T/p1715173795198159

Label(s) to be applied

/kind feature
/area provider/control-plane-kubeadm
/area machineset
/area machinedeployment

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label May 8, 2024
@k8s-ci-robot
Copy link
Contributor

@mdbooth: The label(s) area/area/provider/control-plane-kubeadm, area/area/machineset, area/area/machinedeployment cannot be applied, because the repository doesn't have them.

In response to this:

What would you like to be added (User Story)?

As a user, I would like to be able to ensure that Nodes in my cluster conform to a naming scheme I control and can update as required.

Detailed Description

This would be the long-term solution to the problem discussed in #10463

Anything else you would like to add?

Relevant slack discussion here: https://kubernetes.slack.com/archives/C8TSNPY4T/p1715173795198159

This is a placeholder for now. I intend to expand it into a more thorough proposal in the next week or so.

Label(s) to be applied

/kind feature
/area area/provider/control-plane-kubeadm
/area area/machineset
/area area/machinedeployment

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 8, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

CAPI contributors will take a look as soon as possible, apply one of the triage/* labels and provide further guidance.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mdbooth
Copy link
Contributor Author

mdbooth commented May 8, 2024

/area provider/control-plane-kubeadm
/area machineset
/area machinedeployment

@k8s-ci-robot k8s-ci-robot added area/provider/control-plane-kubeadm Issues or PRs related to KCP area/machineset Issues or PRs related to machinesets area/machinedeployment Issues or PRs related to machinedeployments labels May 8, 2024
@enxebre
Copy link
Member

enxebre commented May 13, 2024

Can we include some use cases that are impacted/improved by this?

@mdbooth
Copy link
Contributor Author

mdbooth commented May 13, 2024

Can we include some use cases that are impacted/improved by this?

I have 'corporate naming policy' which accounts for a great many sins in practise. Such policy is often independent of the specifics of any particular application, such as CAPI.

Regardless of which provider you are using, the name of the Machine will almost certainly form a component of, quite possibly the entirety of, the name of the cloud resource which is eventually created. If your policy covers the naming of these resources, you will need flexibility in the naming of the Machine.

Similarly, the name of the cloud resource will typically be the basis, or the entirety, of the hostname of the guest OS running on that resource. This will typically be used as the Node name, and is therefore visible to the end-user. If policy, user documentation, or training covers the naming of these user-visible components, you will need flexibility in the naming of the Machine.

Additional tooling may have been built up which assumes that certain kinds of Nodes are named in a particular way.

It may be undesirable to the actor deploying a cluster to leak 'implementation' details such as the name of a specific template to end users of the cluster. In this case they will need flexibility to name created machines independently of the creating template.

@lentzi90 Are you able to add your specifics?

I'll collate into the description when I write it up properly.

@lentzi90
Copy link
Contributor

Sure thing!

The 'corporate naming policy' is pretty much it. It just takes multiple shapes. For example we have downstream requirements on the hostnames to make security and/or network monitoring easier. The same goes for cloud resources (VMs, load balancers, etc.). In many cases the resources created by CAPI/CAPx need to co-exist with other resources and "fit in" with them. The other resource may be managed manually or through third-party tools.

At minimum, we must be able to avoid names clashing. In practice it is also necessary that the resources are named in a way that human operators can understand easily. We cannot have confusion about where resources "belong" or who owns them. It must be possible to tell at a glance what is going on. Existing third-party tooling must also be able to tell the resources apart.

A way to influence the naming of Machines would probably be enough to address all of this. As @mdbooth mentioned, the cloud resources will typically be named according to the Machine/InfraMachines and then trickle through to hostnames and node names. As long as we get at least some kind of control over the prefix in these names it will be enough.

@fabriziopandini
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label May 13, 2024
@fabriziopandini
Copy link
Member

fabriziopandini commented May 13, 2024

This is a placeholder for now. I intend to expand it into a more thorough proposal in the next week or so

ACK

trying to help, I pick up where I lelf in #10463 (comment):

There are 3 layers of this problem:

  1. How to generate the name for objects in the management cluster
  2. Provide guidelines to infrastructure providers about how infrastructure objects (VMs etc) should be named when provisioning a Machine with a corresponding infrastructure machine object.
  3. Which options exist for Influencing how the node hosted on a machine is named, and eventually extend those options.

If I look at this issue so far, it is not clear which of these three problems are we trying to address.

  • The issue title is about point one ("Allow user-controlled naming of Machines")
  • The issue description talks about point three ("I would like to be able to ensure that Nodes in my cluster ....")
  • The comment above are talking about point two ("the name of the cloud resource...")

So I think we should clarify better what we are trying to achieve to make it easier to follow.
Also, based on the other threads + comments above the proposed solution goes back again to point one - How to generate the name for objects in the management cluster -, and my understanding is that we are talking about this because ultimately as of today there is no consensus for point two and three across providers, and naming CAPI objects (the machine) is the best proxy to have some sort of indirect control on cloud resource names and node names.
If that's the case, what about surfacing this in the issue description instead of talking about point 2 only?

Also, I think we might surface why the desired output cannot achieved by simply naming KCD or the MachineDeployment according to the 'corporate naming policy'.

Last but not least, the main point question that IMO we have to sort out before moving on to implementation details, Is if it is an acceptable trade off for everyone to sacrify the consistency of the UX inside the management cluster because we are avoiding/we cannot to fix point two and three...

@mdbooth
Copy link
Contributor Author

mdbooth commented May 14, 2024

Also, I think we might surface why the desired output cannot achieved by simply naming KCD or the MachineDeployment according to the 'corporate naming policy'.

For MachineDeployments we can do this. However, IIUC a KCP cannot be replaced after the cluster has been created, which means that:

  • The control plane of an existing cluster upgrading to Cluster API 1.7 cannot be fixed
  • A future change of policy cannot be applied to the control plane of an existing cluster

We discussed allowing flexible machine naming for only KCP, but the feeling was that it would be more consistent to apply it to both KCP and MD/MS.

@mdbooth
Copy link
Contributor Author

mdbooth commented May 14, 2024

There are 3 layers of this problem:

  1. How to generate the name for objects in the management cluster
  2. Provide guidelines to infrastructure providers about how infrastructure objects (VMs etc) should be named when provisioning a Machine with a corresponding infrastructure machine object.
  3. Which options exist for Influencing how the node hosted on a machine is named, and eventually extend those options.

If I look at this issue so far, it is not clear which of these three problems are we trying to address.

  • The issue title is about point one ("Allow user-controlled naming of Machines")
  • The issue description talks about point three ("I would like to be able to ensure that Nodes in my cluster ....")
  • The comment above are talking about point two ("the name of the cloud resource...")

I want to leave the specifics of how providers name cloud resources out of this discussion. For the purposes of this issue, the only requirement is that cloud resources are named by a function of at least Machine name.

Some (e.g. CAPO) use InfraMachine name, but as that is now identical to Machine name, those providers are covered. Others use additional inputs such as namespace. As long as Machine name is a component, they are covered.

As you point out, there has never been a provider contract for cloud resource naming. Trying to impose one now may not be practical at all, but at the very least it is likely to take a very long time to implement thoroughly. I also don't think it's necessary if we control the inputs to whatever naming function the providers have chosen to implement.

An example is that the previous effort to standardise these mentioned such issues as name length limitations and special characters. CAPI does not need to concern itself with these matters. As long as CAPI provides a well-understood set of inputs to a naming function, the cloud provider can implement any function which suits their environment, and their users will understand what effects various changes will make.

I agree we could do with some documented guidance, though.

So when the description talks about 'Machine name', 'cloud resource', and 'Node', the assumption is only that these things are related in some manner which an experienced user of a specific provider can easily understand and predict. I believe this is currently true in all cases.

I should definitely clarify that!

@lentzi90
Copy link
Contributor

I think it is also worth noting that each provider can make promises on how the naming works if they like. Having control over the input is essential though and that is the part we can address here.

@fabriziopandini
Copy link
Member

fabriziopandini commented May 14, 2024

Thanks @mdbooth and @lentzi90 for the clarification, from my point of view we now have a clean explanation of what we are trying to achieve and why.

Considering all that, I'm personally ok we the ask, allow user-controlled naming of Machines + corresponding Infra Machine and bootstrap config object in Machine collections, given that this will be an opt-in feature and we can document what trade-off this implies (Implementation details TBD, but we already chatted about the option of doing something similar to naming strategies in CC for consistency).

Let's hear if this is acceptable also for everyone else:

  • to let the users to intentionally trade-in the consistency of the UX inside the management cluster to comply 'corporate naming policy' when this is not possible by naming KCD or the MachineDeployment upfront
  • to accept the fact that whatever happens outside the management cluster (naming cloud resources, naming nodes) is lazily defined as a generic naming function taking in input the name of the objects in Cluster API, and that each cloud provider is ultimately responsible for defining and documenting the specific implementation in details.

@sbueringer
Copy link
Member

Sounds good to me!

@fabriziopandini
Copy link
Member

@mboersma to check this from the MachinePool PoV (if applicable)

@mdbooth
Copy link
Contributor Author

mdbooth commented May 14, 2024

@mboersma to check this from the MachinePool PoV (if applicable)

FYI @sbueringer already had some thoughts on this: https://kubernetes.slack.com/archives/C8TSNPY4T/p1715178789376599?thread_ts=1715173795.198159&cid=C8TSNPY4T

tl;dr My initial suggestion was to add this naming scheme to MachineTemplateSpec so it would be inherited everywhere that machines are created from a template. @sbueringer pointed out that this would introduce it into the MachinePool API, which is unlikely to be able to support it.

@fabriziopandini
Copy link
Member

@lentzi90 @mdbooth what about adding a few detail about implementation + bringing this up at the office hours, so then we can move on with implementation?

@mdbooth
Copy link
Contributor Author

mdbooth commented May 22, 2024

@lentzi90 @mdbooth what about adding a few detail about implementation + bringing this up at the office hours, so then we can move on with implementation?

Unfortunately I don't think I'll make it for today as I've had some fires to put out. I think I can do this for next week, though. Would that work?

@lentzi90
Copy link
Contributor

I'll also try to join next week.
Regarding implementation details, my understanding is that we would add spec.namingStrategy for MachineDeployments (propagated to MachineSets) and KubeadmControlPlanes, mimicking existing functionality in ClusterClass.
I think the templating mechanism could include the MachineSet name for MachineDeployments and the KCP name for KCPs. Both could also include the Cluster name.

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: example
spec:
  namingStrategy:
    template: "{{ .cluster.name }}-{{ .kubeadmControlPlane.name }}-{{ .random }}"
...
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: example
spec:
  namingStrategy:
    template: "{{ .cluster.name }}-{{ .machineSet.name }}-{{ .random }}"
...

@sbueringer
Copy link
Member

sbueringer commented May 23, 2024

Sounds good to me!

The MachinePool topic is a bit complex.

First, I think there are various reasons why we shouldn't add it to MachineTemplateSpec:

  • We can't add it as ObjectMeta.GenerateName, because ObjectMeta is used in other places and it would be inconsistent with the namingStrategies in ClusterClass
  • We can't add it to MachineSpec because that is also used by the Machine. And a naming strategy or something similar within the Machine object doesn't make sense
  • MachineTemplateSpec is also used in MachinePools. I think that already leads to a few fields that MachinePools inherited from MachineDeployments but nobody every implemented them.
    • Taking this one step further maybe wasn't even a good idea to reuse MachineSpec from the Machine in MachineTemplateSpec for MS/MD because this also means that MachineDeployments have fields that don't make sense (e.g. providerID, maybe clusterName)
  • So I think it's better to place it ouside of the MachineTemplateSpec.

Regarding support of this feature in MachinePools specifically. Today we create Machines like this (if an InfraMachinePool supports MachinePool Machines):

  • InfraMachinePool / infra is doing it's thing => server shows up
  • Infra provider creates an InfraMachine
  • core CAPI MachinePool controller creates a Machine with the same name as InfraMachine

So we don't have direct control over the name of the Machine because we just take the name of the InfraMachine. I have no idea:

  • how various infra providers are determining the InfraMachine name
  • if they have control over the InfraMachine name (maybe the use the server / node name and don't have control over that)

So I don't know if it would be possible to establish a contract like "if MachinePool.spec.namingStrategy is set, InfraMachinePools have to calculate InfraMachine names based on the MachinePool.spec.namingStrategy.template".
Please note that the template generates not just a prefix it actually generates the entire name.

@lentzi90
Copy link
Contributor

Thanks @sbueringer for the thorough explanation! I guess this answers the question "why (don't) do it this way".
From my perspective it is not necessary to handle MachinePools so we can just leave them out.

@fabriziopandini
Copy link
Member

First, I think there are various reasons why we shouldn't add it to MachineTemplateSpec etc...

Should we take the opportunity to fix this mess? The field seems really to belong to MachineTemplateSpec IMO
I just want to avoid to add more confusion on top

@sbueringer
Copy link
Member

sbueringer commented May 23, 2024

Where would you want to add it?

// MachineTemplateSpec describes the data needed to create a Machine from a template.
type MachineTemplateSpec struct {
	// Standard object's metadata.
	// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
	// +optional
	ObjectMeta `json:"metadata,omitempty"`

	// Specification of the desired behavior of the machine.
	// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status
	// +optional
	Spec MachineSpec `json:"spec,omitempty"`
}

I think that MD.spec.template.namingStrategy is really not a good fit
(it's not a template naming strategy)

@fabriziopandini
Copy link
Member

I need to carve out some time to take a look a this, I was just reacting to the list of problems above (BTW, we should probably track host problems somewhere, those are really good points)

@sbueringer
Copy link
Member

sbueringer commented May 23, 2024

Definitely agree with solving these problems. It would be good if the "right place" to put the namingStrategy is an orthogonal concern and not influenced by some problems elsewhere. Independent of these problems my current take is that spec.namingStrategy is a good fit (but of course open for discussions :))

(I just like the idea that template is basically a "template" for the Machine, in the sense of that we ~ copy & paste ObjectMeta & Spec from there to generate a new Machine. I don't think that namingStrategy field in there fits, just like other fields we have directly in MD.spec vs in MD.spec.template)

@mdbooth
Copy link
Contributor Author

mdbooth commented May 23, 2024

@fabriziopandini With the workaround in place we have some breathing space and I'm not in a rush. How long should I leave it?

@fabriziopandini
Copy link
Member

I will stick to the original plan that assumes the workaround should be deprecated now and go away by EOY (CAPI 1.9).
Otherwise, it is not a workaround anymore...

@mdbooth
Copy link
Contributor Author

mdbooth commented May 28, 2024

Ok. I'll update the description before tomorrow's office hours. It'll be based on @lentzi90 's namingStrategy proposal.

@fabriziopandini
Copy link
Member

I'm not opposed to start the implementation from the control plane (TBD if we consider this something KCP specific or if we want somehow to make this a feature in the contract for all the control plane providers)

WRT to the API, I would prefer consistency with something already exist:

  • CC naming strategy: KCP.spec.template.namingStrategy + .template or .prefix
  • K8s metadata: KCP.spec.template.metadata.generateName

@mdbooth
Copy link
Contributor Author

mdbooth commented May 31, 2024

I'm not opposed to start the implementation from the control plane (TBD if we consider this something KCP specific or if we want somehow to make this a feature in the contract for all the control plane providers)

Definitely out of scope for this, but I wonder if this is a layering issue: perhaps the KCP should have a machine collection rather than be a machine collection.

WRT to the API, I would prefer consistency with something already exist:

  • CC naming strategy: KCP.spec.template.namingStrategy + .template or .prefix
  • K8s metadata: KCP.spec.template.metadata.generateName

My preference would be generate name, but it would require a small amount of refactoring. The current order of creation is:

  • generate but don't create the Machine
  • generate and create the InfraMachine, using Machine.name (which doesn't exist yet)
  • generate and create bootstrap using Machine.name (which still doesn't exist yet)
  • create previously generated Machine

With a little refactoring this could be:

  • generate and create InfraMachine using generateName
  • generate and create bootstrap using InfraMachine.Name
  • generate and create Machine using InfraMachine.Name

This would be equivalent to what we currently do, except the name would be generated server-side.


Incidentally, do we have a resource leak here? We reconcile resources based on Machines, but we create the Machine last. e.g. if we create the InfraMachine and bootstrap but Machine creation fails, e.g. due to an ephemeral API interruption, what is going to cleanup the InfraMachine and the bootstrap? It looks to me like they will leak.

@sbueringer
Copy link
Member

Just a reminder. ObjectMeta is a struct that is used in various places. We cannot just add a new field there.

It also very much makes it impossible to evolve the API going forward (How would we introduce something like template in a consistent way later on if we go with generateName now? That is the reason why we went with namingStrategy.template for ClusterClass)

@sbueringer
Copy link
Member

Definitely out of scope for this, but I wonder if this is a layering issue: perhaps the KCP should have a machine collection rather than be a machine collection.

Can you expand on this?

@mdbooth
Copy link
Contributor Author

mdbooth commented May 31, 2024

Definitely out of scope for this, but I wonder if this is a layering issue: perhaps the KCP should have a machine collection rather than be a machine collection.

Can you expand on this?

This is a half-baked idea and would be a huge architecture change, but...

What if KCP referenced, e.g. a MachineDeployment which created and owned the machines. KCP could manage quorum with, e.g. the pre-drain hook.

@mdbooth
Copy link
Contributor Author

mdbooth commented May 31, 2024

Just a reminder. ObjectMeta is a struct that is used in various places. We cannot just add a new field there.

It also very much makes it impossible to evolve the API going forward (How would we introduce something like template in a consistent way later on if we go with generateName now? That is the reason why we went with namingStrategy.template for ClusterClass)

Was that for me? If it was for me, using generateName would be an implementation detail. It would not be exposed in the API.

@sbueringer
Copy link
Member

sbueringer commented Jun 3, 2024

@mdbooth Yup

WRT to the API, I would prefer consistency with something already exist:

  • CC naming strategy: KCP.spec.template.namingStrategy + .template or .prefix
  • K8s metadata: KCP.spec.template.metadata.generateName

My preference would be generate name, but it would require a small amount of refactoring

I thought this means you would add the field to the KCP type as KCP.spec.template.metadata.generateName

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/machinedeployment Issues or PRs related to machinedeployments area/machineset Issues or PRs related to machinesets area/provider/control-plane-kubeadm Issues or PRs related to KCP kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

6 participants