k3s CCM must allow for custom patching during node join #1644

sandys · 2020-04-16T03:33:27Z

hi
we are attempting a production deploy of k3s on AWS, but with auto-scaling-groups and spot instances.

To do this, we need to use cluster-autoscaler - which expects ProviderId in a certain format (aws:///eu-west-3a/<EC2_INSTANCE_ID>). The issue is that k3s built-in ccm will tag ProviderId in a different format (k3s://).

Now generally speaking, we dont need the AWS CCM. we are not doing much with it. All that we need is cluster-autoscaler. This is true for most people.
cluster-autoscaler will work if k3s were setup like this

k3s server --disable-cloud-controller --kubelet-arg cloud-provider=aws
kubectl patch node <NODE_NAME> -p '{"spec":{"providerId":"aws://whatever"}}'

If k3s ccm allows for a customer providerid to be set during node-join, everything should work just fine.

The text was updated successfully, but these errors were encountered:

brandond · 2020-04-16T08:54:48Z

If you disable the cloud controller, it shouldn't set the providerid at all. That's the behavior I see at least - are you seeing something different? That should let you patch it out-of-band to whatever you need to set it to.

sandys · 2020-04-16T08:57:53Z

Well my point is that I don't want to remove k3s CCM and then hunt around for another and bring it in. K3s CCM is sufficient for most cloud usecases and will play with the other components of ecosystem - if it can allow me to set nodewise providerid

…

On Thu, 16 Apr, 2020, 14:25 Brandon Davidson, ***@***.***> wrote: If you disable the cloud controller, it shouldn't set the providerid at all. That's the behavior I see at least - are you seeing something different? That should let you patch it out-of-band to whatever you need to set it to. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1644 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAASYU4ASJE4SSATJLHCPP3RM3BWPANCNFSM4MJC7SDA> .

stale · 2021-07-31T06:59:48Z

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

jawabuu · 2021-08-11T08:35:43Z

@sandys Did you get a workaround for this?

brandond · 2021-08-11T17:58:54Z

I don't think it makes sense for the K3s cloud provider to contain functionality already covered by the AWS cloud-provider. If you want AWS ProviderIDs, disable the built-in k3s cloud-provider and install the out-of-tree aws cloud provider.

jawabuu · 2021-08-11T18:58:01Z

@brandond Understood.
In my case I'm actually deploying to hetzner & linode.
k3s works perfectly including the ccm. What we're looking at is adding cluster-autoscaler abilities to k3s.
Unfortunately cluster-autoscaler has a hard requirement that provider-id matches the format for the cloud provider.
As @sandys commented, a feature request to override the provider-id on cluster creation would be very helpful in this case.
There are obviously manual workaroounds but these are required after creation which impacts any bootstrapping process.
Since the k3s:// does not really affect current functionality, I would request considering it.

brandond · 2021-08-11T20:45:04Z

cluster-autoscaler has a hard requirement that provider-id matches the format for the cloud provider.

So you want the K3s cloud provider to spoof other provider names so that the cluster autoscaler will think you're using the correct cloud provider? It seems like we'd need to also embed additional logic to set the provider ID; right now it just set the node ID to the hostname, but other cloud providers do different things, like setting it to an instance ID or something. Would K3s be expected to do that too?

We recently made it possible to run the k3s cloud provider as a standalone pod - you might take a look at https://github.com/rancher/image-build-rke2-cloud-provider/blob/main/main.go and see if you can tweak the code to do what you want.

jawabuu · 2021-08-11T21:02:02Z

We recently made it possible to run the k3s cloud provider as a standalone pod - you might take a look at https://github.com/rancher/image-build-rke2-cloud-provider/blob/main/main.go and see if you can tweak the code to do what you want.

This is great news. I will definitely take a look.
In a simple case, the ccm could just honor this flag. K3S should not be responsible for anything else other than setting the ID.
Acquiring and formatting the ID should be the user's responsibility.
For example

--kubelet-arg="provider-id=hcloud://$(curl -s http://169.254.169.254/hetzner/v1/metadata/instance-id)"
--kubelet-arg="provider-id=aws:///$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)/$(curl -s http://169.254.169.254/latest/meta-data/instance-id)"

sandys · 2021-08-12T06:57:52Z

@brandond i think the way this bug can be looked at is that - its not about the CCM. Its just that if you could allow us to set providerid on node-join, that will solve everything.

Other people have the same issue btw - https://liquidreply.net/scale-out-your-raspberry-pi-kubernetes-cluster-to-the-cloud?cookie-state-change=1628751131057

Now, someone may ask that if "kubectl patch" exists, then why do we need this particular feature request.
Because it increases the complexity of the infrastructure massively. Because i need to patch kubectl post-join, i need to have an entire monitoring infrastructure that waits for a new node to come up and become healthy and patch it only after that. This is a big problematic thing to do (and is the place where we have frequent failure). If k3s allows for the patch while joining, i can completely do away with this post-join monitoring infrastructure.

There are lots of these restrictions when it comes to deploying k3s in the cloud.

e.g

Make sure you set the hostname before attempting to bootstrap the Kubernetes cluster, or you’ll end up with nodes whose name in Kubernetes doesn’t match up, and you’ll see various “permission denied”/“unable to enumerate” errors in the logs. For what it’s worth, preliminary testing indicates that this step—setting the hostname to the FQDN—is necessary for Ubuntu but may not be needed for CentOS/RHEL.

It is a HUGE benefit.

So please dont look at this issue as a CCM customization request. Please look at this as a kubectl-patch-while-node-join request.

Also, k3s agent config gives a lot of customization options - even "--node-name" and all. being able to set K3S_NODE_NAME="${EC2_INTERNAL_DNS}" is a godsend. There are a bunch of things that need to be done BEFORE join or else the cluster doesnt behave well. Some things are mentioned here - https://blog.scottlowe.org/2018/09/28/setting-up-the-kubernetes-aws-cloud-provider/

In general, this is a popular request with any kubernetes tool - e.g. kubernetes/kubeadm#202 . Also rancher has had a similar request as well rancher/rancher#13076 and rancher/rancher#13835

brandond · 2021-08-12T19:04:29Z

--kubelet-arg="provider-id=hcloud://$(curl -s http://169.254.169.254/hetzner/v1/metadata/instance-id)"

If we did allow something like this, it would probably be called --provider-id, as --kubelet-arg is for passing args directly to the kubelet, which isn't how you would want to do this.

If you want to take a shot at a PR to do this, you can find the code here:
https://github.com/k3s-io/k3s/blob/master/pkg/cloudprovider/instances.go#L34-L44

Note that the agent and cloud provider communicate via annotations - the agent sets annotations on the node that declare its desired hostname and IP addresses, and the cloud provider reads and returns those when the cloud controller initializes the node. You'd need to add a new annotation for the desired providerid, and return that instead of the nodename. You might also need to do something with the InstanceType if you don't want them to come up with a k3s:// prefix on the providerid.

jawabuu · 2021-08-12T19:11:49Z

@brandond Thanks for this.
--provider-id flag makes sense

sandys · 2021-08-13T09:57:47Z

Note that the agent and cloud provider communicate via annotations - the agent sets annotations on the node that declare its desired hostname and IP addresses, and the cloud provider reads and returns those when the cloud controller initializes the node. You'd need to add a new annotation for the desired providerid, and return that instead of the nodename. You might also need to do something with the InstanceType if you don't want them to come up with a k3s:// prefix on the providerid.

@brandond this is more so the reason why k3s must set stuff like providerid, nodename, etc BEFORE joining. Because once it joins, there is a race condition between what the CCM (that is running on the existing k3s cluster) will do versus any kubectl patch that we manually do.

That is the reason almost any documentation of node joins in a cloud scenario STRONGLY recommends doing all of this before joining with strong disclaimers that it may result in unknown situations if any of these providerid, nodenames, etc is set after joining.

stale · 2022-02-09T10:59:36Z

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

stale bot added the status/stale label Jul 31, 2021

stale bot removed the status/stale label Aug 11, 2021

stale bot added the status/stale label Feb 9, 2022

stale bot closed this as completed Feb 23, 2022

brandond mentioned this issue Jan 19, 2024

[cloud-provider] only pretend that instances from the current "cloud" exist #9280

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k3s CCM must allow for custom patching during node join #1644

k3s CCM must allow for custom patching during node join #1644

sandys commented Apr 16, 2020

brandond commented Apr 16, 2020

sandys commented Apr 16, 2020 via email

stale bot commented Jul 31, 2021

jawabuu commented Aug 11, 2021

brandond commented Aug 11, 2021

jawabuu commented Aug 11, 2021

brandond commented Aug 11, 2021 •

edited

Loading

jawabuu commented Aug 11, 2021

sandys commented Aug 12, 2021 •

edited

Loading

brandond commented Aug 12, 2021 •

edited

Loading

jawabuu commented Aug 12, 2021

sandys commented Aug 13, 2021

stale bot commented Feb 9, 2022

k3s CCM must allow for custom patching during node join #1644

k3s CCM must allow for custom patching during node join #1644

Comments

sandys commented Apr 16, 2020

brandond commented Apr 16, 2020

sandys commented Apr 16, 2020 via email

stale bot commented Jul 31, 2021

jawabuu commented Aug 11, 2021

brandond commented Aug 11, 2021

jawabuu commented Aug 11, 2021

brandond commented Aug 11, 2021 • edited Loading

jawabuu commented Aug 11, 2021

sandys commented Aug 12, 2021 • edited Loading

brandond commented Aug 12, 2021 • edited Loading

jawabuu commented Aug 12, 2021

sandys commented Aug 13, 2021

stale bot commented Feb 9, 2022

brandond commented Aug 11, 2021 •

edited

Loading

sandys commented Aug 12, 2021 •

edited

Loading

brandond commented Aug 12, 2021 •

edited

Loading