Multi machines kind cluster #1928

dblas · 2020-11-17T08:49:07Z

Hello all,
I didn't find exactly what I'm looking for in the issues.
I saw the complex network documentation as a draft but that didn't answer my question and I wonder why there is no trial at all on this subject. Or, is it so obvious that any one considers it useless to talk about it?

What's about configuring kind's nodes that spread over several physical machines as we use to do it with a classical k8s deployment (in a VM)?
There doesn't seem to be anything to prevent such a network configuration: kind nodes can be seen as classical k8s nodes, they have kubeadm inside and it's possible to build a node image with flannel or calico inside.
So what is the problem? Pb with CNI interface kindnet? Configurations not very well tested?

My goal would be to use 3 physical machines with, for each one, 4 Kind-Docker's containers.
Imagine these 3 machines as columns.
The first row of containers would be used for production-grade control plane (3 nodes seen from k8s points of view).
The second row would be used for production-grade worker nodes (3 nodes).
The third row for test-grade control plane (3 nodes or less, that's test after all).
The last row for test-grade worker nodes (3 nodes or less, that's test also).

Would be easy for an application in test to go to production using a shared file system (a SAN for example).

What do you think about it?
Thank you for the job accomplished.

db

BenTheElder · 2020-11-17T21:56:25Z

There are lots of other kubernetes deployment tools, kind is built for creating local test clusters. Currently multi-machine is just completely out of scope for the use case.

If you want a multi-machine production deployment there's plenty of mature options in SIG Cluster lifecycle (e.g. kops),

BenTheElder · 2020-11-17T21:57:37Z

also kind is also not intended to be production grade 😬
https://kind.sigs.k8s.io/docs/user/configuration/#api-server

I would be extremely skeptical of "containerized" kubernetes in production regardless of the tool, but it is doubly not the intended use of this project

Morriz · 2021-11-24T09:35:14Z

I think a lot of ppl would love to spin up a multi machine kind cluster for testing heavy deployments (e2e anybody?). Being able to utilize spare metal for that would be awesome.

A local (single machine) kind cluster just does not provide enough resources for those tests.

So what would be needed to make kind multi machine aware? A gossip round with leader election to get the list of containerized nodes before adm starts its work? That could work, no?

Saying that we need to tell kubeadm to target VMs or real machines is going back in time. Kind could lead us the way forward by facilitating fast multi machine containerized k8s as a service.

BenTheElder · 2021-11-24T11:39:58Z

This is completely out of scope for the project: https://kind.sigs.k8s.io/docs/contributing/project-scope/

Also, something that doesn't work today and will not be trivial to resolve is resource limitations (e.g. memory limits) #1963, kind is not an appropriate way to run heavy workloads even if they can fit on a single host machine, it's most useful for integration / e2e testing the APIs.

kubeadm is an appropriate tool for bare metal multi node kubernetes clusters. Note that kind is built on kubeadm and most of the rest of what happens is about dealing with shoe-horning things "into" container "nodes".

Morriz · 2021-11-24T16:04:47Z

Also, something that doesn't work today and will not be trivial to resolve is resource limitations (e.g. memory limits) #1963, kind is not an appropriate way to run heavy workloads even if they can fit on a single host machine, it's most useful for integration / e2e testing the APIs.

Exactly, and then being able to quickly spin up kind over multiple machines makes it even better to test e2e scenarios that require more resources. k8s knows how to schedule and take what it needs.

kubeadm is an appropriate tool for bare metal multi node kubernetes clusters.

Multi node or multi container, kubeadm only cares about tcp access and permissions. It just wants a list of IPs to whip into a cluster.

Note that kind is built on kubeadm and most of the rest of what happens is about dealing with shoe-horning things "into" container "nodes".

Great that kind has come so far and that many use it as it works for what is intended: testing. Except for the next step: multi machine ;)

If what I am observing is correct, then the added work would be for kind to:

check if a list of machines was provided (defaulting to local mode)
if a list is provided kind will ssh into those machines and tell docker to start a container that broadcasts it's presence (webhook back into kind?) to the master
when some time has passed the array of containerized "host nodes" is expected to be complete and kubeadm can continue doing it's work

But since this is out of scope for kind as you say I am hoping to find an attempt in user space, as I can imagine more people to see the use case of this

BenTheElder · 2021-11-24T21:57:32Z

Exactly, and then being able to quickly spin up kind over multiple machines makes it even better to test e2e scenarios that require more resources. k8s knows how to schedule and take what it needs.

No, because even within one machine a heavy workload is not appropriate because kind does not work with proper resource management. Scheduling depends only on requests, but resource management also requires functional limits (which kind does not have at the moment, see for example #1963 mentioned above). This is also not the purpose of the project:

"kind is a tool for running local Kubernetes clusters using Docker container "nodes". kind was primarily designed for testing Kubernetes itself, but may be used for local development or CI."

(from https://github.com/kubernetes-sigs/kind/blob/main/README.md)

Multi node or multi container, kubeadm only cares about tcp access and permissions. It just wants a list of IPs to whip into a cluster.

That's not really accurate. kubeadm + kubernetes requires things like an installed and configured container runtime at the node level, unique machine ids, sufficient resource available, and other bits.

You can see more of this here https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

Great that kind has come so far and that many use it as it works for what is intended: testing. Except for the next step: multi machine ;)

This is not a step for the project. There are plenty of multi-machine cluster tools in the upstream Kubernetes organization owned by SIG Cluster Lifecycle or SIG Testing (kubeadm, kOps, kubespray, cluster-API, kube-up.sh, ...).

We are not building production multi-machine clusters here. It takes enough work to keep the existing featureset working smoothly and doing so would further increase unnecessary overlap with the multitude of subprojects for this already within the Kubernetes project.

If what I am observing is correct, then the added work would be for kind to:

check if a list of machines was provided (defaulting to local mode)

if a list is provided kind will ssh into those machines and tell docker to start a container that broadcasts it's presence (webhook back into kind?) to the master

when some time has passed the array of containerized "host nodes" is expected to be complete and kubeadm can continue doing it's work

That's not all that would be required. E.G. Going multi-machine would completely change the networking for starters, kind has optimized simple and low cost networking for the assumed single-host-kernel with flat-node-network-topology.

But since this is out of scope for kind as you say I am hoping to find an attempt in user space, as I can imagine more people to see the use case of this

This is definitely out of scope. Similarly the cluster-API "docker" provider based on kind only supports a single host machine, and the minkube docker driver (also based on kind). But there are many options for metal multi-node clusters. I would suggest looking at one of these to start https://kubernetes.io/docs/setup/production-environment/tools/

Morriz · 2021-11-25T14:12:25Z

Ok, thanks for the elaborate response again. There is always more involved than meets the eye on first sight. I do hope we can agree that it might be useful to see a tool someday that can spin up k8s clusters in containerized nodes over multiple machines, as that would serve our need to spin up ks8 clusters for testing purposes quickly (the link you sent is targeting slow solutions that all do old school provisioning, so don't quickly boot up preinstalled images), while still being able to max out the hardware available.

Coqueiro · 2022-07-29T20:13:55Z

I want to add another use case. Currently, my company has a solution where we provide e2e tests using kind, running using self-hosted Github Action runners. The solution allows users to run their pipelines over the cloud on CI, but the same tool is also available for the local environment. That said, in the local environment, it takes a toll on the machine running a heavy kind container + zoom + slack + (other stuff). So delegating part of the workload to a worker node in the cloud would be awesome, and doing so without having to have different tools would be even greater.

BenTheElder · 2022-07-29T20:37:02Z

That said, in the local environment, it takes a toll on the machine running a heavy kind container + zoom + slack + (other stuff). So delegating part of the workload to a worker node in the cloud would be awesome, and doing so without having to have different tools would be even greater.

If your workload is too heavy for the local machine and you have to bring in remote resources anyhow, I would recommend fully offloading the workload to the remote resources, either with a sufficiently large cloud machine running a kind cluster wholly on that machine, or testing against some other conformant "actual" remote test cluster.

It is out of scope for this project do do multi-machine clusters, we are focused on quick, conformant local clusters.
https://kind.sigs.k8s.io/docs/contributing/project-scope/

The moment we start moving towards multi-machine "production" clusters we're duplicating effort vs the multitude of tools offered by Kubernetes SIG Cluster Lifecycle (not to mention third party offerings) and things like networking become much more complex (e.g. our lightweight CNI / overlay network solution kindnetd is simple because it can assume flat evenly weighted IP connectivity between nodes) making the project at least more complex to maintain if not also more bloated for local usage.

obriensystems · 2023-02-27T03:17:10Z

Thanks team for the comments - I too am looking for a faster way to get a dev/staging kubernetes cluster up on multiple M1/M2 OSX machines (I have 4 deprecated 32-64g macbook pros)

In the past rancher/RKE was working fine - however it required installing VMWare Fusion to host ubuntu 16 VMs
Docker Desktop's single Kubernetes cluster essentially replicates most of what KIND does - but I like KINDs portability and k8s spec level overlay

To the KIND team - thank you for all the work - it is greatly appreciated

michaelobrien@mbp7 ObrienlabsDev % kind create cluster                                       
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.25.3) 🖼 
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing CNI 🔌 
 ✓ Installing StorageClass 💾 
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Thanks for using kind! 😊
michaelobrien@mbp7 ObrienlabsDev % docker ps  
CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS          PORTS                       NAMES
7ef51479872f   kindest/node:v1.25.3   "/usr/local/bin/entr…"   41 seconds ago   Up 38 seconds   127.0.0.1:64151->6443/tcp   kind-control-plane

michaelobrien@mbp7 ObrienlabsDev % kubectl get nodes
NAME                 STATUS   ROLES           AGE   VERSION
kind-control-plane   Ready    control-plane   93s   v1.25.3
michaelobrien@mbp7 ObrienlabsDev % kubectl get pods --all-namespaces
NAMESPACE            NAME                                         READY   STATUS    RESTARTS   AGE
kube-system          coredns-565d847f94-5xpdx                     1/1     Running   0          87s
kube-system          coredns-565d847f94-gpmdx                     1/1     Running   0          87s
kube-system          etcd-kind-control-plane                      1/1     Running   0          101s
kube-system          kindnet-x9t4h                                1/1     Running   0          87s
kube-system          kube-apiserver-kind-control-plane            1/1     Running   0          103s
kube-system          kube-controller-manager-kind-control-plane   1/1     Running   0          102s
kube-system          kube-proxy-wklkk                             1/1     Running   0          87s
kube-system          kube-scheduler-kind-control-plane            1/1     Running   0          102s
local-path-storage   local-path-provisioner-684f458cdd-77bgn      1/1     Running   0          87s

ctrought · 2023-03-01T15:14:38Z

faster way to get a dev/staging kubernetes cluster up on multiple

In case this is of interest to you, hot off the press.

https://kwok.sigs.k8s.io/

BenTheElder · 2023-03-01T17:55:38Z

KWOK doesn't address the request here, it allows simulating many fake nodes without having any kubelet (IE actual nodes and the ability to run pods), sometimes in conjunction with KIND. You can also use KIND on its own to create multiple nodes.

However KIND and KWOWK both run locally, not across multiple machines.

To run an actual multi-host cluster you should use something like kubeadm directly, which kind uses internally.
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

jcus0006 · 2023-08-25T21:05:59Z

@BenTheElder could you confirm that "multi-node" in this link refers to firing containers in the same machine, and not actually across multi-machines? I read this thread and while it is stated multiple times that multi-host KinD is out-of-scope, I don't know what to make of the "multi-node" reference in that link, and whether something has changed since the last activity within this thread that would have potentially made "true multi node" KinD possible. This is perhaps a long shot before going for the suggested multi-host kubeadm way :)

BenTheElder · 2023-08-25T23:08:24Z

@BenTheElder could you confirm that "multi-node" in this link refers to firing containers in the same machine, and not actually across multi-machines?

It refers to multiple containers on the same machine, like creating multiple kind clusters. You can confirm by creating a cluster with the sample config and then checking docker ps.

I read this thread and while it is stated multiple times that multi-host KinD is out-of-scope, I don't know what to make of the "multi-node" reference in that link, and whether something has changed since the last activity within this thread that would have potentially made "true multi node" KinD possible. This is perhaps a long shot before going for the suggested multi-host kubeadm way :)

Every kind node is a container (docker or podman currently), doing its best to emulate what kubernetes would expect for a node (including running systemd, containerd, etc.).

They are still local clusters, multi-machine "real" clusters have many many offerings in SIG Cluster Lifecycle and we do not recommend even exposing kind clusters to other machines (see the warning at https://kind.sigs.k8s.io/docs/user/configuration/#api-server). It would be wasteful to implement this again in the face of N different options already maintained within the organization.

Scope is covered at https://kind.sigs.k8s.io/docs/contributing/project-scope/

Multi-node clusters were a requirement for testing Kubernetes (P0 and why this project originally exists), since day one, before we had a dedicated repo and long before this issue. Those nodes are local containers though.

Sebssekk · 2024-05-02T06:53:27Z

Probably it's useless now, but I think I could have found a workaround to this problem.
I needed a super light and fast K8s for teaching and let students play with it and I thought at kind but spread on more machines to keep them small.

The workaroud:

join machines in a docker swarm cluster
create an overlay attachable network
create a global service on that network to force it on every machines
there is an experimental feature in kind to change it's default network, using the KIND_EXPERIMENTAL_DOCKER_NETWORK variable, so I created a kind node attached to the overlay network
from the idea found in this blog i went on other machines and manually added worker nodes attaching it to the overlay network

And it just works 😅
(at least is enough for my low requirements needs...)

If anyone interested I can give more info and an ansible playbook to automatize it

BenTheElder · 2024-05-02T19:35:34Z

I would really recommend learning multi-host clusters using a supported solution (e.g. kubeadm, which kind uses internally) and I would not recommend networking kind beyond the local host for security reasons (consider for example that there's no process to keep security patches deployed), that's just not something we're scoped to support and there's so many other options where energy is already being spent on these problems.

Sebssekk · 2024-05-02T19:47:11Z

You are absolutely right, my workaround fits for my low requirements scenario where I created a temp environment with containerized applications using KasmVNC to stream them on the browser, so students can access to labs directly from their PCs with nothing to install.. I started with common Kind but in this way I can create multiple smaller machines instead of a giant one.. (I attached to the overlay network students access containers too)

Yet it's not for something real, Just to list some limits, If you need to import images on the cluster you have to do it on every machines and the kind-cloud-provider for load balance does not work

BenTheElder closed this as completed Nov 17, 2020

BenTheElder self-assigned this Nov 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi machines kind cluster #1928

Multi machines kind cluster #1928

dblas commented Nov 17, 2020 •

edited

Loading

BenTheElder commented Nov 17, 2020

BenTheElder commented Nov 17, 2020

Morriz commented Nov 24, 2021 •

edited

Loading

BenTheElder commented Nov 24, 2021

Morriz commented Nov 24, 2021

BenTheElder commented Nov 24, 2021

Morriz commented Nov 25, 2021 •

edited

Loading

Coqueiro commented Jul 29, 2022

BenTheElder commented Jul 29, 2022

obriensystems commented Feb 27, 2023 •

edited

Loading

ctrought commented Mar 1, 2023

BenTheElder commented Mar 1, 2023 •

edited

Loading

jcus0006 commented Aug 25, 2023

BenTheElder commented Aug 25, 2023 •

edited

Loading

Sebssekk commented May 2, 2024

BenTheElder commented May 2, 2024 •

edited

Loading

Sebssekk commented May 2, 2024 •

edited

Loading

Multi machines kind cluster #1928

Multi machines kind cluster #1928

Comments

dblas commented Nov 17, 2020 • edited Loading

BenTheElder commented Nov 17, 2020

BenTheElder commented Nov 17, 2020

Morriz commented Nov 24, 2021 • edited Loading

BenTheElder commented Nov 24, 2021

Morriz commented Nov 24, 2021

BenTheElder commented Nov 24, 2021

Morriz commented Nov 25, 2021 • edited Loading

Coqueiro commented Jul 29, 2022

BenTheElder commented Jul 29, 2022

obriensystems commented Feb 27, 2023 • edited Loading

ctrought commented Mar 1, 2023

BenTheElder commented Mar 1, 2023 • edited Loading

jcus0006 commented Aug 25, 2023

BenTheElder commented Aug 25, 2023 • edited Loading

Sebssekk commented May 2, 2024

BenTheElder commented May 2, 2024 • edited Loading

Sebssekk commented May 2, 2024 • edited Loading

dblas commented Nov 17, 2020 •

edited

Loading

Morriz commented Nov 24, 2021 •

edited

Loading

Morriz commented Nov 25, 2021 •

edited

Loading

obriensystems commented Feb 27, 2023 •

edited

Loading

BenTheElder commented Mar 1, 2023 •

edited

Loading

BenTheElder commented Aug 25, 2023 •

edited

Loading

BenTheElder commented May 2, 2024 •

edited

Loading

Sebssekk commented May 2, 2024 •

edited

Loading