Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fork the tagging controller into generic node customization controller #833

Closed
mmerkes opened this issue Jan 21, 2024 · 3 comments
Closed
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@mmerkes
Copy link
Contributor

mmerkes commented Jan 21, 2024

I intend to begin work on this feature on 2/5/24, and I've discussed in detail with my colleagues, but I wanted to post an issue here for tracking purposes and for feedback from the community.

What would you like to be added?

In 2022, a tagging controller was added to the cloud provider, which is responsible for tagging and untagging node resources as they join and leave the cluster. This controller runs in clusters managed by EKS by default and it can be run elsewhere.

The tagging controller is a very focused controller at this point. While there has been discussions about expanding the resources supported, it's still very limited. However, the usefulness of the tagging controller could be expanded by converting it into a generic node customization controller. It already watches node events, so you could take advantage of the overhead already running and extend that further. There are other feature requests, like #300, that this would simplify and require operators to only run a single controller.

Why is this needed?

There's a feature request to add support for labeling nodes with the AWS zone ID along with the zone name. The zone name is set configured as the node's zone today, and it can create complications for users running in multiple AWS accounts. In the largest AWS regions like us-east-1 and us-west-2, zone names, like us-east-2a, do not always map to the same zone IDs, like use1-az2, across accounts, so if you're trying to assess impact to a particular zone, you name you need normalize the node name across accounts.

There's no hook to provide an additional zone name in existing cloud provider and changing the zone to be a zone ID rather than zone name would be a breaking, backwards incompatible change. Any expectations around the zone name in existing cluster, either around resources deployed there or tooling, will break, and the migration may cause unexpected migration in scheduling, PDBs, etc. As a result, the only practical option is sticking this into a controller, and by utilizing the tagging controller, albeit forking and extending it, will mean that operators only need to operate one controller for both purposes.

We've also heard requests to label nodes with the accelerator family as well, which would be easy to add with this change. I can imagine other labels that could be added in the future that might also be useful.

FAQ

What's being added to the tagging controller?

Support to add additional labels on nodes, specifically zone IDs #300. Label would be something like topology.k8s.aws/zone-id.

Why not just add it to the tagging controller?
Right now, the tagging controller is pretty lightweight. We can do the migration with little controversy now to a name that's more appropriate.

Can I just keep using the tagging controller?
Yes. The change would done in a way to re-use that code, but keep the existing tagging controller command to avoid breaking clusters unaware of the change. I would add a log to recommend migrating to the new controller, and possibly deprecate the tagging controller some day, but that could be discussed in the future

How would cluster admins migrate from the tagging controller to the new controller?
Clusters admins would swap out the tagging controller for the new controller as a direct replacement. The intention is for the operation to be simple and quiet.

Do we need to add any new permissions?
CCM should already have all of the permissions needed to support mentioned functionality.

/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 21, 2024
@mmerkes
Copy link
Contributor Author

mmerkes commented Jan 21, 2024

/assign mmerkes

@mmerkes
Copy link
Contributor Author

mmerkes commented Jan 21, 2024

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 21, 2024
@mmerkes
Copy link
Contributor Author

mmerkes commented Feb 20, 2024

Resolving this issue in favor of a different solution: kubernetes/kubernetes#123223

@mmerkes mmerkes closed this as completed Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

2 participants