Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update handling of custom cluster-cidr #902

Merged
merged 2 commits into from
Jul 26, 2023

Conversation

thomasprade
Copy link
Contributor

Concerning #894

Due to k3s' defaults for cluster-cidr (10.42.0.0/16) and service-cidr (10.43.0.0/16) the creation of more than 42 nodepools and their respective subnets conflicted with these IP ranges.

Changes:

  • Added a variable for setting the service-cidr
  • Added cluster- and service-cidr to k3s server start arguments
  • Extended documentation on theoretical maximum of nodepools

Changes on the `cluster_ipv4_cidr` variable in the kube.tf file were not completely reflected in the cluster config and the default pod ip range `10.42.0.0/16` conflicted with the creation of more than 41 agent nodepools.
@M4t7e
Copy link
Contributor

M4t7e commented Jul 24, 2023

Concerning #894 (comment)

I think my assumptions from the referenced comment are right, that K3s should be in charge of IP Address Management (IPAM) and not the CNI, if we want to have a fully working integration into Hetzner with their CCM: https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/main/README.md?plain=1#L122

When you use the Cloud Controller Manager with networks support, the CCM is in favor of allocating the IPs (& setup the
routing) (Docs: https://kubernetes.io/docs/concepts/architecture/cloud-controller/#route-controller). The CNI plugin you
use needs to support this k8s native functionality (Cilium does it, I don't know about Calico & WeaveNet), so basically
you use the Hetzner Cloud Networks as the underlying networking stack.

Btw, they also use ipam.mode=kubernetes in their testing setup with Cilium.
ipam.mode=kubernetes -> https://docs.cilium.io/en/stable/network/concepts/ipam/kubernetes/#kubernetes-host-scope

The Kubernetes host-scope IPAM mode is enabled with ipam: kubernetes and delegates the address allocation to each individual node in the cluster. IPs are allocated out of the PodCIDR range associated to each node by Kubernetes.

It should be the same story for Calico: https://docs.tigera.io/calico/latest/reference/configure-cni-plugins#using-host-local-ipam

Host local IPAM is generally only used on clusters where integration with the Kubernetes route controller is necessary

This is definitely the reason why the RouteController makes total mess with the Hetzner Network Routes, because the CNI is decoupled from K3s IPAM. I am still not sure about the impact and with the default values (cluster_ipv4_cidr untouched) it looks good in the first place, because K3s and Cilium are using the same network range. But they still work independently from each other and in case the Hetzner network routes for the Nodes Pod ranges are matching, it is pure coincidence or they use the same assignment approach. This really feels like it stands on wobbly legs and I think it should be fixed.

I would suggest to fix it either in this PR or in a new one, that implicitly covers this PR. cluster-cidr needs to be configurable and Calico and Cilium need an adjustment to use Kubernetes host-scope IPAM mode, so that cluster-cidr from K3s is honored.

@aleksasiriski aleksasiriski self-requested a review July 24, 2023 17:01
Copy link
Collaborator

@mysticaltech mysticaltech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that fix @thomasprade, it's important!

@mysticaltech mysticaltech changed the base branch from master to staging July 25, 2023 06:25
@thomasprade
Copy link
Contributor Author

@M4t7e afaik the cluster_ipv4_cidr is already passed to cilium in locals.tf:356 (and calico in locals.tf:390 for that matter), so hopefuly fixing the way the cluster- and service cidr is passed to k3s also fixes your problem.

@mysticaltech glad to help 👍

@M4t7e
Copy link
Contributor

M4t7e commented Jul 25, 2023

@thomasprade Yes it is and that seems to be the problem already. That explicit configuration leads to decoupling the CNI IPAM from K3s IPAM. Then they run independently from each other and only K3s IPAM will set the needed Hetzner network routes to the distinct pod networks for the nodes (you can see them if you open the Hetzner console, select your network and switch to "Routes" tab).

I have tested your change with the current settings in Cilium and it does not hurt. This PR is already the first step to solve that decoupling problem, but it will not fix it. It just "looks" fixed, because of the similar IP assignment rules from K3s and Cilium (and maybe also Calico?). They still can diverge and if that happens, non-masqueraded setups (which are usually preferred due to performance reasons) can break even internally (this depends on many other settings). At least the current defaults lead to masquerading as far as I can see and test. So hopefully it's just a theoretical problem in most cases.

I'm currently crafting something for Cilium to solve that decoupling issue and it looks already very promising. I'll treat this change as precondition, as I can only fix it after this is done. So I'm looking forward for the merge 🙂

@@ -66,7 +66,7 @@ locals {
apply_k3s_selinux = ["/sbin/semodule -v -i /usr/share/selinux/packages/k3s.pp"]

install_k3s_server = concat(local.common_pre_install_k3s_commands, [
"curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_START=true INSTALL_K3S_SKIP_SELINUX_RPM=true INSTALL_K3S_CHANNEL=${var.initial_k3s_channel} INSTALL_K3S_EXEC='server ${var.k3s_exec_server_args}' sh -"
"curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_START=true INSTALL_K3S_SKIP_SELINUX_RPM=true INSTALL_K3S_CHANNEL=${var.initial_k3s_channel} INSTALL_K3S_EXEC='server ${var.k3s_exec_server_args} --cluster-cidr=${var.cluster_ipv4_cidr} --service-cidr=${var.service_ipv4_cidr}' sh -"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to put that in the k3s server config files (specified in init.tf and control_planes.tf), together with the other config params?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried setting the two arguments as an additional parameter in the file provisioner in init.tf:11, but this didn't work.
From what I understand of the server documentation and configuration documentation these options must be passed as an argument to the k3s server command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting with K3s v1.19.1+k3s1 all CLI parameters can be set in a configuration file.

You can now configure K3s using a configuration file rather than environment variables or command line arguments

I think it is not enough only setting it in init.tf:11, because this will only be on the first control plane node (first_control_plane resource). It also needs to be in control_planes.tf:95 for the other control plane nodes. See also the comment from https://docs.k3s.io/installation/configuration#configuration-with-binary

It is important to match critical flags on your server nodes. For example, if you use the flag --disable servicelb or --cluster-cidr=10.200.0.0/16 on your master node, but don't set it on other server nodes, the nodes will fail to join.

Maybe @mysticaltech or somebody else could verify that as I am pretty new to the project, but this is what I understood from reverse engineering it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasprade What you did here is good, but just for consistency, if would be best to implement it indeed like @M4t7e mentioned in both init.tf and control-plane.tf, then we merge! 🙏

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do it myself, it's super quick!

@mysticaltech mysticaltech merged commit 41b909c into kube-hetzner:staging Jul 26, 2023
1 check passed
@mysticaltech
Copy link
Collaborator

Released in v2.4.0 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants