Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AdditionalLabels to cloudprovider.InstanceMetadata #123223

Merged
merged 1 commit into from Feb 21, 2024

Conversation

mmerkes
Copy link
Contributor

@mmerkes mmerkes commented Feb 9, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adds the AdditionalLabels field to the InstanceMetadata, which can be set by cloud provider implementations to provide custom labels that will be applied to nodes. There's a need in the AWS cloud provider to set labels specific to AWS on the nodes, and I assume that other cloud providers can put this to use as well. The change is backwards compatible because if the cloud providers never set the field, nothing will change. There's not a hook directly in the cloud provider implementations that allows them to customize node objects directly, so they either need to run additional controllers or provide it via the InstanceMetadata struct and instruct the node_controller on how to apply it to the node.

As a side node, this change also requires that cloud providers utilize InstancesV2, which gives them direct access to the instance metadata. The AWS cloud provider does not implement InstancesV2, so I will be doing that migration there directly.

Which issue(s) this PR fixes:

N/A

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Added support for cloud provider integrations to supply optional, per-Node custom labels that will be
applied to Nodes by the node controller.
Extra labels will only be applied where the cloud provider integration implements this.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

To use this feature, cloud providers must use the [InstancesV2 interface](https://github.com/kubernetes/kubernetes/blob/ad19beaa83363de89a7772f4d5af393b85ce5e61/staging/src/k8s.io/cloud-provider/cloud.go#L210) and
update the InstanceMetadata method to set the `AdditionalLabels` field with desired labels.
If no action is taken by cloud providers, the node controller behavior will be no different.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 9, 2024
@k8s-ci-robot k8s-ci-robot added area/cloudprovider sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 9, 2024
@cartermckinnon
Copy link
Contributor

So AdditionalLabels can overwrite the standard Zone and Region labels. I think that's fine, just noting it for discussion. 😄

@dims
Copy link
Member

dims commented Feb 10, 2024

/assign @elmiko @andrewsykim

@mmerkes
Copy link
Contributor Author

mmerkes commented Feb 10, 2024

So AdditionalLabels can overwrite the standard Zone and Region labels. I think that's fine, just noting it for discussion. 😄

Ya, that's true, but this is all coming from the cloud providers, so should be more tightly controlled.

@andrewsykim
Copy link
Member

Agreed that risk is probably low because cloud providers should know about the other labels.

But I do wonder if we should ignore or drop any kubernetes.io labels. Technically a cloud provider should not be using them anyways.

@andrewsykim
Copy link
Member

Either way, can we update the API doc for AdditionalLabels to say it ignores (or overwrites) other node labels

@mmerkes
Copy link
Contributor Author

mmerkes commented Feb 10, 2024

I think any of the following are reasonable:

  1. If cloud provider hands us labels, just overwrite them.
  2. If labels are already set, ignore them.
  3. If labels are in the well known labels, ignore them.
  4. If labels include kubernetes.io, ignore them.

With 4, we outlaw cloud providers from using kubernetes.io in their labels, which isn't unreasonable and they probably shouldn't, but maybe too opinionated? 3 would only target known labels, and then we'd have to maintain them as a set, which isn't a huge deal. I can't think of a scenario where cloud providers would intentionally override them because they provide all of the inputs that go into them anyway. 2 would be the easiest to maintain and not allow labels to be overridden. At this point, can nodes have labels? I guess if kubelet sent them while registering, it would have those labels? If that's the case, it potentially gets a little more complicated, though cloud providers should have their own label namespace and users shouldn't be setting those via kubelet anyway.

Personally, I'm still inclined to just override and document. If cloud providers are overriding the k8s well known labels, sounds like a bug and seems very unlikely. This would be the simplest approach, and the easiest for cloud providers to reason about. It would potentially be confusing to have scenarios where they pass in a label and we ignore it. That being said, I don't think there's a wrong choice here. It's a new feature.

@sftim
Copy link
Contributor

sftim commented Feb 10, 2024

Changelog suggestion

-Added support for cloud providers to supply custom labels that will be applied to nodes by the node controller. Labels will only be applied if implemented by the cloud providers.
+Added support for cloud provider plugins to supply optional, per-node custom labels that will be
+applied to Nodes by the node controller.
+Extra labels will only be applied where the cloud provider integration implements this.

We're talking about allowing code you run to these labels, rather than about granting access for a firm like Google or Microsoft or AWS to do so. If the right term isn't ”plugins”, please swap it for something more accurate.

@sftim
Copy link
Contributor

sftim commented Feb 10, 2024

  • Is there a way for a cloud provider integration to specify that a label shouldn't be set, where the node controller normally would set it?
  • How about a similar mechanism for annotations?

@sftim
Copy link
Contributor

sftim commented Feb 10, 2024

The rule I'd like:

  • if any labels are inside kubernetes.io or k8s.io and aren't known to the code (matching against an allow list), create an Event against the node (so that the problem is more visible)
    • as much as is feasible, only do this once per Node; not once per label!

@mmerkes
Copy link
Contributor Author

mmerkes commented Feb 10, 2024

Updated the changelog per @sftim 's suggestion, albeit with some tweaks. I'm not sure plugin is the right word or not, so I just when with super generic integrations.

Is there a way for a cloud provider integration to specify that a label shouldn't be set, where the node controller normally would set it?

No. What would be the use case for that? If there's a need, may just need to run a patched version of k8s.

How about a similar mechanism for annotations?

My use case is labels, but I could see use for adding support for annotations or taints. I assume it'd be similar, and I don't think it would impact the model. My preference would be to keep this change focused on labels, and if there's a desire to extend the functionality to annotations and/or taints, would put up a follow up PR for that as well.

if any labels are inside kubernetes.io or k8s.io and aren't known to the code (matching against an allow list), create an Event against the node (so that the problem is more visible)

Interesting idea. In that case, are you suggesting ignoring or overriding the original value? NVM, I see you weighed in here in support of allow overriding.

@mmerkes
Copy link
Contributor Author

mmerkes commented Feb 10, 2024

/retest

@aojea
Copy link
Member

aojea commented Feb 14, 2024

Overriding labels is probably fine, but I still think we should drop kubernetes.io or k8s.io labels from cloud providers.

I feel overriding is not fine per #123223 (comment)

Not overriding them and logging seems better to me because is the cloud provider who owns the problem, and it can fix it, and is completely backwards compatible. If we break compatibility overriden I expect this to be feature flagged at least

@elmiko
Copy link
Contributor

elmiko commented Feb 15, 2024

@mmerkes i'm a little curious about this

If labels include kubernetes.io, ignore them.

my concern is that we run the risk of breaking a cloud provider that might be expecting to use these labels. i'm curious if we, as a community, are moving towards ensuring that the kubernetes.io/k8s.io prefix is only used with the well-known labels?

@sftim
Copy link
Contributor

sftim commented Feb 15, 2024

if we, as a community, are moving towards ensuring that the kubernetes.io/k8s.io prefix is only used with the well-known labels?

That is something I've seen contributors pushing. My take on what Kubernetes might do in this case is that people would see warnings if they use unrecognized labels / annotations, rather than anything stronger (if we wanted to enforced, we'd likely use an actual API field). We also might opt to do nothing and silently accept the unrecognized keys (ie, the current behavior). A final option is to have a warnlist of labels that we know are not registered, are not used by any in-project code, and that folks should stop using.

Another approach is to warn for unrecognized taints only, but leave labels and annotations out. We have a much smaller set of official taints and we make changes less often.

Signed-off-by: Matt Merkes <merkes@amazon.com>

Emits event when overriding labels in node controller

Signed-off-by: Matt Merkes <merkes@amazon.com>

Discard kubernetes.io additional labels in node controller

Signed-off-by: Matt Merkes <merkes@amazon.com>

Exclude kubernetes reserved label namespaces
@mmerkes
Copy link
Contributor Author

mmerkes commented Feb 16, 2024

There's a been a number of conversations on the right behavior here, so I'm going to try to summarize the topics here to work toward resolving them.

Should we allow cloud provider implementations to use the k8s.io or kubernetes.io namespaces?

No. There's been discussion that contributors are pushing to keep those namespaces restricted to k8s well known labels. If we were to allow them, cloud provider implementations could potentially be setting a label that k8s/k8s will implement in the future. Allowing labels in those namespaces would mean that restricting them later would be backwards incompatible. IMO, we should recommend that cloud provider implementations use their own namespace, like k8s.aws, to make it clear where the labels are coming from.

How do we treat labels that are already set in the node?

Labels that are already set in the node will be treated as immutable, and the labels provided by the cloud provider will be ignored in those cases. Per this discussion, overriding the labels could create a hot loop situation where the controller and some other actor keep competing with each other to set the label.

How will users know that labels are not getting applied as expected?

If the cloud provider supplies a label in a k8s/k8s namespace or a label that's already applied to a node but with a different value, we'll discard those labels and log a warning. Cluster admins will need to use these logs to spot the potentially unexpected issue. When cloud provider implementations are setting new labels via this mechanism, I don't think it's unreasonable to expect that they actually do the bare minimum testing of those changes and verify that the labels get applied. A more confusing case might be where labels are passed into kubelet and applied before the node controller gets those labels, though if cloud provider implementations are taking ownership of a label namespace, users should be avoiding using those namespaces directly and will need to depend on the logs. We discussed emitting an event when labels don't get applied, but we can't restrict the event to a single occasion and don't want to fill up the node events when the controller runs again.

What about taints and annotations?

This PR is scoped to handling labels. If there's appetite for taints and annotations in the future, we can add support via the same mechanism.

If I missed any other open discussions, let me know. Otherwise, let me know we're close to aligning on this. @dims @cartermckinnon @andrewsykim @aojea @sftim

@sftim
Copy link
Contributor

sftim commented Feb 16, 2024

Should we allow cloud provider implementations to use the k8s.io or kubernetes.io namespaces?

No. There's been discussion that contributors are pushing to keep those namespaces restricted to k8s well known labels. If we were to allow them, cloud provider implementations could potentially be setting a label that k8s/k8s will implement in the future. Allowing labels in those namespaces would mean that restricting them later would be backwards incompatible. IMO, we should recommend that cloud provider implementations use their own namespace, like k8s.aws, to make it clear where the labels are coming from.

What if a cloud provider integration wants to set, say, topology.kubernetes.io/region, or node.kubernetes.io/instance-type?

@mmerkes
Copy link
Contributor Author

mmerkes commented Feb 16, 2024

What if a cloud provider integration wants to set, say, topology.kubernetes.io/region, or node.kubernetes.io/instance-type?

There's already a mechanism for that. Region and instance type are passed into the InstanceMetadata as fields and those are set explicitly. If a cloud provider implementation is trying to set those directly in AdditionalLabels, it's trying to circumvent formal mechanisms already in place and shouldn't be allowed to do that.

@dims
Copy link
Member

dims commented Feb 21, 2024

Thanks! Looks like everyone had a say :) and no new comments for a while. It looks sane to me. I am sure @mmerkes is happy to iterate as needed.

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 21, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 99bcbe3662437eff362b5a421d57ca1339d75add

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, mmerkes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 21, 2024
@dims
Copy link
Member

dims commented Feb 21, 2024

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 21, 2024
@mmerkes
Copy link
Contributor Author

mmerkes commented Feb 21, 2024

/retest

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@aojea
Copy link
Member

aojea commented Feb 21, 2024

lgtm
/assign @andrewsykim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cloudprovider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants