Add AdditionalLabels to cloudprovider.InstanceMetadata #123223

mmerkes · 2024-02-09T18:56:49Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adds the AdditionalLabels field to the InstanceMetadata, which can be set by cloud provider implementations to provide custom labels that will be applied to nodes. There's a need in the AWS cloud provider to set labels specific to AWS on the nodes, and I assume that other cloud providers can put this to use as well. The change is backwards compatible because if the cloud providers never set the field, nothing will change. There's not a hook directly in the cloud provider implementations that allows them to customize node objects directly, so they either need to run additional controllers or provide it via the InstanceMetadata struct and instruct the node_controller on how to apply it to the node.

As a side node, this change also requires that cloud providers utilize InstancesV2, which gives them direct access to the instance metadata. The AWS cloud provider does not implement InstancesV2, so I will be doing that migration there directly.

Which issue(s) this PR fixes:

N/A

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Added support for cloud provider integrations to supply optional, per-Node custom labels that will be
applied to Nodes by the node controller.
Extra labels will only be applied where the cloud provider integration implements this.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

To use this feature, cloud providers must use the [InstancesV2 interface](https://github.com/kubernetes/kubernetes/blob/ad19beaa83363de89a7772f4d5af393b85ce5e61/staging/src/k8s.io/cloud-provider/cloud.go#L210) and
update the InstanceMetadata method to set the `AdditionalLabels` field with desired labels.
If no action is taken by cloud providers, the node controller behavior will be no different.

staging/src/k8s.io/cloud-provider/controllers/node/node_controller.go

cartermckinnon · 2024-02-10T00:02:23Z

So AdditionalLabels can overwrite the standard Zone and Region labels. I think that's fine, just noting it for discussion. 😄

dims · 2024-02-10T00:06:39Z

/assign @elmiko @andrewsykim

staging/src/k8s.io/cloud-provider/controllers/node/node_controller.go

mmerkes · 2024-02-10T00:28:14Z

So AdditionalLabels can overwrite the standard Zone and Region labels. I think that's fine, just noting it for discussion. 😄

Ya, that's true, but this is all coming from the cloud providers, so should be more tightly controlled.

andrewsykim · 2024-02-10T01:16:37Z

Agreed that risk is probably low because cloud providers should know about the other labels.

But I do wonder if we should ignore or drop any kubernetes.io labels. Technically a cloud provider should not be using them anyways.

andrewsykim · 2024-02-10T01:17:45Z

Either way, can we update the API doc for AdditionalLabels to say it ignores (or overwrites) other node labels

mmerkes · 2024-02-10T15:21:22Z

I think any of the following are reasonable:

If cloud provider hands us labels, just overwrite them.
If labels are already set, ignore them.
If labels are in the well known labels, ignore them.
If labels include kubernetes.io, ignore them.

With 4, we outlaw cloud providers from using kubernetes.io in their labels, which isn't unreasonable and they probably shouldn't, but maybe too opinionated? 3 would only target known labels, and then we'd have to maintain them as a set, which isn't a huge deal. I can't think of a scenario where cloud providers would intentionally override them because they provide all of the inputs that go into them anyway. 2 would be the easiest to maintain and not allow labels to be overridden. At this point, can nodes have labels? I guess if kubelet sent them while registering, it would have those labels? If that's the case, it potentially gets a little more complicated, though cloud providers should have their own label namespace and users shouldn't be setting those via kubelet anyway.

Personally, I'm still inclined to just override and document. If cloud providers are overriding the k8s well known labels, sounds like a bug and seems very unlikely. This would be the simplest approach, and the easiest for cloud providers to reason about. It would potentially be confusing to have scenarios where they pass in a label and we ignore it. That being said, I don't think there's a wrong choice here. It's a new feature.

sftim · 2024-02-10T18:06:26Z

Changelog suggestion

-Added support for cloud providers to supply custom labels that will be applied to nodes by the node controller. Labels will only be applied if implemented by the cloud providers.
+Added support for cloud provider plugins to supply optional, per-node custom labels that will be
+applied to Nodes by the node controller.
+Extra labels will only be applied where the cloud provider integration implements this.

We're talking about allowing code you run to these labels, rather than about granting access for a firm like Google or Microsoft or AWS to do so. If the right term isn't ”plugins”, please swap it for something more accurate.

sftim · 2024-02-10T18:10:49Z

Is there a way for a cloud provider integration to specify that a label shouldn't be set, where the node controller normally would set it?
How about a similar mechanism for annotations?

sftim · 2024-02-10T18:10:54Z

The rule I'd like:

if any labels are inside kubernetes.io or k8s.io and aren't known to the code (matching against an allow list), create an Event against the node (so that the problem is more visible)
- as much as is feasible, only do this once per Node; not once per label!

mmerkes · 2024-02-10T21:31:03Z

Updated the changelog per @sftim 's suggestion, albeit with some tweaks. I'm not sure plugin is the right word or not, so I just when with super generic integrations.

Is there a way for a cloud provider integration to specify that a label shouldn't be set, where the node controller normally would set it?

No. What would be the use case for that? If there's a need, may just need to run a patched version of k8s.

How about a similar mechanism for annotations?

My use case is labels, but I could see use for adding support for annotations or taints. I assume it'd be similar, and I don't think it would impact the model. My preference would be to keep this change focused on labels, and if there's a desire to extend the functionality to annotations and/or taints, would put up a follow up PR for that as well.

if any labels are inside kubernetes.io or k8s.io and aren't known to the code (matching against an allow list), create an Event against the node (so that the problem is more visible)

Interesting idea. In that case, are you suggesting ignoring or overriding the original value? NVM, I see you weighed in here in support of allow overriding.

mmerkes · 2024-02-10T23:20:04Z

/retest

staging/src/k8s.io/cloud-provider/controllers/node/node_controller.go

aojea · 2024-02-14T11:27:36Z

Overriding labels is probably fine, but I still think we should drop kubernetes.io or k8s.io labels from cloud providers.

I feel overriding is not fine per #123223 (comment)

Not overriding them and logging seems better to me because is the cloud provider who owns the problem, and it can fix it, and is completely backwards compatible. If we break compatibility overriden I expect this to be feature flagged at least

elmiko · 2024-02-15T15:14:01Z

@mmerkes i'm a little curious about this

If labels include kubernetes.io, ignore them.

my concern is that we run the risk of breaking a cloud provider that might be expecting to use these labels. i'm curious if we, as a community, are moving towards ensuring that the kubernetes.io/k8s.io prefix is only used with the well-known labels?

sftim · 2024-02-15T15:20:47Z

if we, as a community, are moving towards ensuring that the kubernetes.io/k8s.io prefix is only used with the well-known labels?

That is something I've seen contributors pushing. My take on what Kubernetes might do in this case is that people would see warnings if they use unrecognized labels / annotations, rather than anything stronger (if we wanted to enforced, we'd likely use an actual API field). We also might opt to do nothing and silently accept the unrecognized keys (ie, the current behavior). A final option is to have a warnlist of labels that we know are not registered, are not used by any in-project code, and that folks should stop using.

Another approach is to warn for unrecognized taints only, but leave labels and annotations out. We have a much smaller set of official taints and we make changes less often.

Signed-off-by: Matt Merkes <merkes@amazon.com> Emits event when overriding labels in node controller Signed-off-by: Matt Merkes <merkes@amazon.com> Discard kubernetes.io additional labels in node controller Signed-off-by: Matt Merkes <merkes@amazon.com> Exclude kubernetes reserved label namespaces

mmerkes · 2024-02-16T17:24:17Z

There's a been a number of conversations on the right behavior here, so I'm going to try to summarize the topics here to work toward resolving them.

Should we allow cloud provider implementations to use the k8s.io or kubernetes.io namespaces?

No. There's been discussion that contributors are pushing to keep those namespaces restricted to k8s well known labels. If we were to allow them, cloud provider implementations could potentially be setting a label that k8s/k8s will implement in the future. Allowing labels in those namespaces would mean that restricting them later would be backwards incompatible. IMO, we should recommend that cloud provider implementations use their own namespace, like k8s.aws, to make it clear where the labels are coming from.

How do we treat labels that are already set in the node?

Labels that are already set in the node will be treated as immutable, and the labels provided by the cloud provider will be ignored in those cases. Per this discussion, overriding the labels could create a hot loop situation where the controller and some other actor keep competing with each other to set the label.

How will users know that labels are not getting applied as expected?

If the cloud provider supplies a label in a k8s/k8s namespace or a label that's already applied to a node but with a different value, we'll discard those labels and log a warning. Cluster admins will need to use these logs to spot the potentially unexpected issue. When cloud provider implementations are setting new labels via this mechanism, I don't think it's unreasonable to expect that they actually do the bare minimum testing of those changes and verify that the labels get applied. A more confusing case might be where labels are passed into kubelet and applied before the node controller gets those labels, though if cloud provider implementations are taking ownership of a label namespace, users should be avoiding using those namespaces directly and will need to depend on the logs. We discussed emitting an event when labels don't get applied, but we can't restrict the event to a single occasion and don't want to fill up the node events when the controller runs again.

What about taints and annotations?

This PR is scoped to handling labels. If there's appetite for taints and annotations in the future, we can add support via the same mechanism.

If I missed any other open discussions, let me know. Otherwise, let me know we're close to aligning on this. @dims @cartermckinnon @andrewsykim @aojea @sftim

sftim · 2024-02-16T17:48:30Z

Should we allow cloud provider implementations to use the k8s.io or kubernetes.io namespaces?

No. There's been discussion that contributors are pushing to keep those namespaces restricted to k8s well known labels. If we were to allow them, cloud provider implementations could potentially be setting a label that k8s/k8s will implement in the future. Allowing labels in those namespaces would mean that restricting them later would be backwards incompatible. IMO, we should recommend that cloud provider implementations use their own namespace, like k8s.aws, to make it clear where the labels are coming from.

What if a cloud provider integration wants to set, say, topology.kubernetes.io/region, or node.kubernetes.io/instance-type?

mmerkes · 2024-02-16T17:56:00Z

What if a cloud provider integration wants to set, say, topology.kubernetes.io/region, or node.kubernetes.io/instance-type?

There's already a mechanism for that. Region and instance type are passed into the InstanceMetadata as fields and those are set explicitly. If a cloud provider implementation is trying to set those directly in AdditionalLabels, it's trying to circumvent formal mechanisms already in place and shouldn't be allowed to do that.

dims · 2024-02-21T17:24:29Z

Thanks! Looks like everyone had a say :) and no new comments for a while. It looks sane to me. I am sure @mmerkes is happy to iterate as needed.

/approve
/lgtm

k8s-ci-robot · 2024-02-21T17:24:36Z

LGTM label has been added.

Git tree hash: 99bcbe3662437eff362b5a421d57ca1339d75add

k8s-ci-robot · 2024-02-21T17:24:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, mmerkes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/cloud-provider/OWNERS~~ [dims]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dims · 2024-02-21T17:25:08Z

/triage accepted
/priority important-soon

mmerkes · 2024-02-21T18:28:51Z

/retest

k8s-triage-robot · 2024-02-21T21:35:31Z

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

The PR does have any do-not-merge/* labels
The PR does not have the needs-ok-to-test label
The PR is mergeable (does not have a needs-rebase label)
The PR is approved (has cncf-cla: yes, lgtm, approved labels)
The PR is failing tests required for merge

You can:

Review the full test history for this PR
Prevent this bot from retesting with /lgtm cancel or /hold
Help make our tests less flaky by following our Flaky Tests Guide

/retest

aojea · 2024-02-21T21:46:59Z

lgtm
/assign @andrewsykim

k8s-ci-robot requested review from justinsb and sttts February 9, 2024 19:02

k8s-ci-robot added area/cloudprovider sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 9, 2024

mmerkes force-pushed the labels branch from 44f00b9 to f776ea2 Compare February 9, 2024 21:39

cartermckinnon reviewed Feb 9, 2024

View reviewed changes

staging/src/k8s.io/cloud-provider/controllers/node/node_controller.go Outdated Show resolved Hide resolved

k8s-ci-robot assigned andrewsykim and elmiko Feb 10, 2024

dims reviewed Feb 10, 2024

View reviewed changes

staging/src/k8s.io/cloud-provider/controllers/node/node_controller.go Show resolved Hide resolved

mmerkes mentioned this pull request Feb 10, 2024

Implements InstancesV2 interface kubernetes/cloud-provider-aws#845

Merged

mmerkes force-pushed the labels branch from 4cd9181 to 6a63a0e Compare February 10, 2024 22:38

mmerkes force-pushed the labels branch from 6a63a0e to 3be0cdf Compare February 14, 2024 07:10

mmerkes commented Feb 14, 2024

View reviewed changes

staging/src/k8s.io/cloud-provider/controllers/node/node_controller.go Outdated Show resolved Hide resolved

mmerkes force-pushed the labels branch from 3be0cdf to c52df0c Compare February 15, 2024 05:30

mmerkes force-pushed the labels branch from c52df0c to 3e431ad Compare February 16, 2024 17:00

mmerkes mentioned this pull request Feb 20, 2024

Fork the tagging controller into generic node customization controller kubernetes/cloud-provider-aws#833

Closed

k8s-ci-robot assigned dims Feb 21, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 21, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 21, 2024

k8s-ci-robot merged commit 3ae6c05 into kubernetes:master Feb 21, 2024
15 checks passed

k8s-ci-robot added this to the v1.30 milestone Feb 21, 2024

mmerkes mentioned this pull request Mar 3, 2024

Add zone-id topology label to nodes kubernetes/cloud-provider-aws#855

Merged

chess-knight mentioned this pull request Apr 22, 2024

[occm] Add Openstack server hostId as k8s node label kubernetes/cloud-provider-openstack#2579

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AdditionalLabels to cloudprovider.InstanceMetadata #123223

Add AdditionalLabels to cloudprovider.InstanceMetadata #123223

mmerkes commented Feb 9, 2024 •

edited

cartermckinnon commented Feb 10, 2024

dims commented Feb 10, 2024

mmerkes commented Feb 10, 2024 •

edited

andrewsykim commented Feb 10, 2024

andrewsykim commented Feb 10, 2024

mmerkes commented Feb 10, 2024

sftim commented Feb 10, 2024 •

edited

sftim commented Feb 10, 2024

sftim commented Feb 10, 2024

mmerkes commented Feb 10, 2024 •

edited

mmerkes commented Feb 10, 2024

aojea commented Feb 14, 2024

elmiko commented Feb 15, 2024

sftim commented Feb 15, 2024 •

edited

mmerkes commented Feb 16, 2024 •

edited

sftim commented Feb 16, 2024

mmerkes commented Feb 16, 2024

dims commented Feb 21, 2024

k8s-ci-robot commented Feb 21, 2024

k8s-ci-robot commented Feb 21, 2024

dims commented Feb 21, 2024

mmerkes commented Feb 21, 2024

k8s-triage-robot commented Feb 21, 2024

aojea commented Feb 21, 2024

Add AdditionalLabels to cloudprovider.InstanceMetadata #123223

Add AdditionalLabels to cloudprovider.InstanceMetadata #123223

Conversation

mmerkes commented Feb 9, 2024 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

cartermckinnon commented Feb 10, 2024

dims commented Feb 10, 2024

mmerkes commented Feb 10, 2024 • edited

andrewsykim commented Feb 10, 2024

andrewsykim commented Feb 10, 2024

mmerkes commented Feb 10, 2024

sftim commented Feb 10, 2024 • edited

sftim commented Feb 10, 2024

sftim commented Feb 10, 2024

mmerkes commented Feb 10, 2024 • edited

mmerkes commented Feb 10, 2024

aojea commented Feb 14, 2024

elmiko commented Feb 15, 2024

sftim commented Feb 15, 2024 • edited

mmerkes commented Feb 16, 2024 • edited

sftim commented Feb 16, 2024

mmerkes commented Feb 16, 2024

dims commented Feb 21, 2024

k8s-ci-robot commented Feb 21, 2024

k8s-ci-robot commented Feb 21, 2024

dims commented Feb 21, 2024

mmerkes commented Feb 21, 2024

k8s-triage-robot commented Feb 21, 2024

aojea commented Feb 21, 2024

mmerkes commented Feb 9, 2024 •

edited

mmerkes commented Feb 10, 2024 •

edited

sftim commented Feb 10, 2024 •

edited

mmerkes commented Feb 10, 2024 •

edited

sftim commented Feb 15, 2024 •

edited

mmerkes commented Feb 16, 2024 •

edited