Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-role labeling fails #4007

Open
hryamzik opened this issue Jul 22, 2021 · 18 comments
Open

node-role labeling fails #4007

hryamzik opened this issue Jul 22, 2021 · 18 comments
Labels
blocked/aws kind/bug priority/important-longterm Important over the long term, but may not be currently staffed and/or may require multiple releases

Comments

@hryamzik
Copy link

I'm trying to apply node labels using the following configuration:

  - name: test
    instanceType: m5.large
    minSize: 1
    maxSize: 2
    desiredCapacity: 2
    labels:
      role: test
      node-role.kubernetes.io/test: ok

According to #580 and #582 using node-role subdomain should the way to apply labels, however I still get a domain error:

AWS::EKS::Nodegroup/ManagedNodeGroup: CREATE_FAILED – "Label cannot contain reserved prefixes [kubernetes.io/, k8s.io/, eks.amazonaws.com/] (Service: AmazonEKS; Status Code: 400; Error Code: InvalidParameterException; Request ID: *** Proxy: null)"

Here's eksctl info:

eksctl version: 0.57.0
kubectl version: v1.21.2
OS: darwin
@hryamzik hryamzik added the kind/help Request for help label Jul 22, 2021
@Callisto13
Copy link
Contributor

Thanks for asking @hryamzik .

I cannot find that code in eksctl anymore, so it looks like it was removed. I will look into why, and if it happened by accident I will put it back 👍

@AnthonyPoschen
Copy link

same issue also exists with

eksctl version: 0.56.0
kubectl version: v1.21.2
OS: darwin

@Callisto13 Callisto13 added kind/bug good first issue Good for newcomers and removed kind/help Request for help labels Jul 23, 2021
@Callisto13
Copy link
Contributor

I was not able to dig into this further today, and I am away from now until the week after next. Free for anyone in the team or community to pick up 👍

@gohmc
Copy link

gohmc commented Jul 24, 2021

While eksctl allows node-role.kubernetes.io but the CF EKS module AWS::EKS::Nodegroup/ManagedNodeGroup rejected it. @TBBle has well explained the issue here.

@TBBle
Copy link
Contributor

TBBle commented Jul 24, 2021

Indeed, this is the same core issue as #2363, but using Managed Node Groups, so the error is different.

Hence, the solution is blocked on some development work elsewhere; Managed Node Groups might be solved via the same thing that solves #2363, e.g., kubernetes/cloud-provider-aws#110, but it's also possible that AWS could implement a Managed Node Groups-specific solution which would resolve this problem, but not help #2363.

Interestingly, there's no feature request ticket open on https://github.com/aws/containers-roadmap/ for this. The ticket I linked to before was actually for setting the node-role label to the node group name, rather than the label specified in the config, which is a more-specific use-case than what we have here.

Edit: I opened a feature request upstream for EKS Managed Node Groups to support this: aws/containers-roadmap#1451

@cPu1
Copy link
Collaborator

cPu1 commented Jul 26, 2021

probably has something to do with
https://github.com/weaveworks/eksctl/blob/69c1a78f12017f47bcb247f388e9f9442f7da11c/pkg/apis/eksctl.io/v1alpha5/validation.go#L477

eventually drilling down allowing
https://github.com/kubernetes/kubernetes/blob/ea0764452222146c47ec826977f49d7001b0ea8c/staging/src/k8s.io/api/core/v1/well_known_labels.go#L51

not sure how this now allows setting the role.

As @gohmc and @TBBle have pointed out, the error is from the Managed Nodegroups API itself and not from a validation performed in eksctl. eksctl is blocked on the upstream API adding support for reserved prefixes in labels.

@Callisto13
Copy link
Contributor

dupe: #1373

@TBBle
Copy link
Contributor

TBBle commented Oct 8, 2021

I'd say this isn't a dupe, as that ticket was looking at unmanaged node-groups (i.e. closer to #2363), and its proposed solution approach solves a different problem: For both this ticket and #2363, the use-case is setting the node-role, which I think would not be considered "solved" if it only applies to nodes that were created while using eksctl to create the nodegroup, as future-created nodes, e.g. from the Cluster Autoscaler, would have an empty role still.

@Skarlso
Copy link
Contributor

Skarlso commented Nov 15, 2021

This issue in blocked state until AWS provides us with a reasonable solution to keep labels over autoscaler. Maybe through Karpenter.

@TBBle
Copy link
Contributor

TBBle commented Nov 16, 2021

I expect that Karpenter, being a slot-in replacement for Cluster Autoscaler, will sadly have the same timing problem as a 'node label controller'.

@Skarlso
Copy link
Contributor

Skarlso commented Nov 16, 2021

It's not entirely slot-in. It has some extra features and a different algorithm. Are you sure it couldn't have the ability to reapply existing labels to nodes?

@TBBle
Copy link
Contributor

TBBle commented Nov 17, 2021

It could, but it'd be doing it at the same time as any other Controller (except an Admission Controller), i.e. after the Node object has been persisted, and racing with the kube-scheduler to apply labels before any other Controller tries to make decisions about them.

So it can do it, but so could Cluster Autoscaler, and so could a dedicated Controller. They all have the same chance of success, AFAIK. (Unless somehow Karpenter is creating the Node resource and then kubelet only updates it, but I didn't see any suggestion of that in its docs, and I'm not sure if k8s is robust against that anyway...)

The core issue is that in the Node admission workflow, the Node object is created by the kubelet, and after passing the Node Restriction Admission Controller, persisted and available to the scheduler. It's possible the Cloud Provider Manager can get involved and add labels to the Node before it's marked 'Ready', I'm not sure. I didn't see a place for it last time I looked, but opened kubernetes/cloud-provider-aws#110 in case someone else could see what I couldn't.

Annoyingly, even a mutating Admission Controller (my other thought) might not work, because the Node Restriction Admission Controller would run after it, and might reject the labels even though kubelet itself didn't add them; I'm not sure if it can distinguish labels added by kubelet in its request from labels added to that request by a mutating Admission Controller.

@Skarlso
Copy link
Contributor

Skarlso commented Nov 17, 2021

@TBBle thanks for the explanation!

Reading about kubernetes things handling nodes and AWS doing its thing, I have no idea how this could ever be resolved other than a continuously checking/updating controller, which we won't do sadly. :/

@TBBle
Copy link
Contributor

TBBle commented Nov 17, 2021

If all you're trying to do is apply the node-role label as something other than master, and you're using it only for informative output in kubectl get Nodes (and not nodeSelector, or metrics tagging for example), then a controller that reacts to new nodes and applies that label would be fine, because then it doesn't materially matter if someone sees the node in the kubectl output for the few seconds before the label hasn't been applied yet.

Otherwise, it's best to use your own label for such behaviour, which is what I've done in the past: used a mycompany.com/node-role label (and taint) to group nodes from multiple node groups under one of a few categories, e.g., backend, publicIP (for publicly reachable NodePort or HostPort Pods) and locust (to isolate the load-testing Pods from the load being tested).

I honestly think if they taught kubectl to also recognise a different label for the Node Role field, and rejected attempts to set that value to master, it'd be sufficiently secure for the use-case, and would resolve all this friction in the node-management community.

@Himangini Himangini removed the priority/backlog Not staffed at the moment. Help wanted. label Sep 28, 2022
@Himangini Himangini removed the good first issue Good for newcomers label Mar 29, 2023
@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Apr 29, 2023
@github-actions
Copy link
Contributor

github-actions bot commented May 5, 2023

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 5, 2023
@cPu1 cPu1 reopened this May 5, 2023
@cPu1 cPu1 added priority/important-longterm Important over the long term, but may not be currently staffed and/or may require multiple releases and removed stale labels May 5, 2023
@enver
Copy link

enver commented May 20, 2023

Check out a simple workaround here: https://hub.docker.com/repository/docker/enver/label-nodes/general

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked/aws kind/bug priority/important-longterm Important over the long term, but may not be currently staffed and/or may require multiple releases
Projects
None yet
Development

No branches or pull requests

9 participants