Joining nodes to local development cluster #115319

mhmxs · 2023-01-25T17:57:49Z

I created a new cluster by executing local-up-cluster.sh, and I tried to join additional nodes via kubeadm. My goal with this is to be able to test anything on my local box on a distributed system at light speed fast. I had to change the script to solve all the joining problems, and it makes sense to extend local cluster capabilities for every developer.

Here is the diff:

diff --git a/hack/local-up-cluster.sh b/hack/local-up-cluster.sh
index 20355a5074d..2426955706f 100755
--- a/hack/local-up-cluster.sh
+++ b/hack/local-up-cluster.sh
@@ -36,6 +36,7 @@ KUBELET_IMAGE=${KUBELET_IMAGE:-""}
 FAIL_SWAP_ON=${FAIL_SWAP_ON:-"false"}
 # Name of the dns addon, eg: "kube-dns" or "coredns"
 DNS_ADDON=${DNS_ADDON:-"coredns"}
+POD_CIDR=${POD_CIDR:-10.88.0.0/16}
 CLUSTER_CIDR=${CLUSTER_CIDR:-10.1.0.0/16}
 SERVICE_CLUSTER_IP_RANGE=${SERVICE_CLUSTER_IP_RANGE:-10.0.0.0/24}
 FIRST_SERVICE_CLUSTER_IP=${FIRST_SERVICE_CLUSTER_IP:-10.0.0.1}
@@ -547,6 +548,7 @@ EOF
       "${node_port_range}" \
       --v="${LOG_LEVEL}" \
       --vmodule="${LOG_SPEC}" \
+      --enable-bootstrap-token-auth \
       --audit-policy-file="${AUDIT_POLICY_FILE}" \
       --audit-log-path="${LOG_DIR}/kube-apiserver-audit.log" \
       --authorization-webhook-config-file="${AUTHORIZATION_WEBHOOK_CONFIG_FILE}" \
@@ -844,6 +846,7 @@ clientConnection:
   kubeconfig: ${CERT_DIR}/kube-proxy.kubeconfig
 hostnameOverride: ${HOSTNAME_OVERRIDE}
 mode: ${KUBE_PROXY_MODE}
+clusterCIDR: ${CLUSTER_CIDR}
 conntrack:
 # Skip setting sysctl value "net.netfilter.nf_conntrack_max"
   maxPerCore: 0
@@ -1063,7 +1066,7 @@ function install_cni {
         "type": "host-local",
         "ranges": [
           [{
-            "subnet": "10.88.0.0/16"
+            "subnet": "${POD_CIDR}"
           }],
           [{
             "subnet": "2001:4860:4860::/64"

What do you think?

The text was updated successfully, but these errors were encountered:

mhmxs · 2023-01-25T17:57:57Z

/sig node

k8s-ci-robot · 2023-01-25T17:57:58Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

MathurUtkarsh · 2023-01-26T17:23:28Z

assign this issue to me.

mhmxs · 2023-01-26T19:46:38Z

@MathurUtkarsh I'm glad to fix this (based on your instructions) if you think it makes sense for the community.

BenTheElder · 2023-01-27T07:01:25Z

local-up-cluster is meant to be a minimum viable tool for a single-node cluster on the current host, as opposed to being kubeadm compatible (for which you should use kubeadm init?)

I don't think you need to change the pod CIDR to use kubeadm, you can configure kubeadm to match instead?

cc @dims

BenTheElder · 2023-01-27T07:01:38Z

/remove-sig node
/sig cluster-lifecycle

mhmxs · 2023-01-27T09:59:37Z

@BenTheElder I can understand the scope is limited of this local cluster. But on the other hand, these two lines --enable-bootstrap-token-auth (we can make it optional) and clusterCIDR: ${CLUSTER_CIDR} (this one doesn't make any significant change) allow me to join any number of nodes via kubeadm join.

The end result is I'm able to test any Kubernetes code change on a multi-node (virtualized) Kubernetes cluster in 5-6 minutes. My flow is the following:

change code
run make all WHAT=x,y only once on my dev machine
execute local-run-cluster.sh on master node
generate join token
start kubelet, kube-proxy on worker nodes

I think this is the fastest way of Kubernetes development. In summary: small change, huge benefit :)

I agree that I don't have to change pod CIDR, except I would like to test some network features where I have to change the pod CIDR. It isn't a game changer but makes things easier.

@cc @dims

dims · 2023-01-27T12:06:41Z

@mhmxs please go ahead and file a PR, if you can add a small markdown doc as well about this flow, let's see how that looks and decide? (yes, +1 to really small changes to local-up-cluster)

mhmxs · 2023-01-27T13:50:00Z

@dims Let's first describe my env here to have a better understanding of how I solved all the issues around the multi-node setup.

I have a bunch of environment variables on all the nodes:

export NET_PLUGIN=cni
export ALLOW_PRIVILEGED=1
export ETCD_HOST=${MASTER_IP}
export API_HOST=${MASTER_IP}
export ADVERTISE_ADDRESS=${MASTER_IP}
export API_CORS_ALLOWED_ORIGINS=".*"
export KUBE_CONTROLLERS="*,bootstrapsigner,tokencleaner"
export KUBECONFIG=/var/run/kubernetes/admin.kubeconfig
export POD_CIDR="10.88.0.0/16"
export SERVICE_CLUSTER_IP_RANGE="10.0.0.0/24"

I start single node instance as usual:

KUBELET_HOST=0.0.0.0 HOSTNAME_OVERRIDE=${MASTER_NAME} ./hack/local-up-cluster.sh -O

In the next step I install a CNI driver, I prefer Calico:

curl -Ls https://docs.projectcalico.org/manifests/calico.yaml | kubectl apply -f -

Time to generate join token:

kubeadm token create --print-join-command > /var/run/kubernetes/join.sh

Allow kubeadm to read config maps like cluster-info, kubeadm-config, etc.

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kubeadm:bootstrap-signer-clusterinfo
  namespace: kube-public
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: kubeadm:bootstrap-signer-clusterinfo
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: system:anonymous
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: kubeadm:bootstrap-signer-clusterinfo
  namespace: kube-public
rules:
- apiGroups:
  - ''
  resources:
  - configmaps
  verbs:
  - get

Then generate configs:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: $(base64 -iw0 /var/run/kubernetes/server-ca.crt)
    server: https://${MASTER_IP}:6443/
  name: ''
contexts: []
current-context: ''
kind: Config
preferences: {}
users: []
---
apiServer:
  timeoutForControlPlane: 2m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: local-up-cluster
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: ${KUBE_VERSION}
networking:
  dnsDomain: cluster.local
  serviceSubnet: ${SERVICE_CLUSTER_IP_RANGE}

Kubelet also needs some permissions:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kubelet:operate
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kubelet:operate
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: system:anonymous
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kubelet:operate
rules:
- apiGroups:
  - '*'
  resources:
  - '*'
  verbs:
  - '*'
 ---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: kubeadm:bootstrap-signer-kubeadm-config
  namespace: kube-system
rules:
- apiGroups:
  - ''
  resourceNames:
  - kubeadm-config
  - kube-proxy
  - kubelet-config
  resources:
  - configmaps
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kubeadm:bootstrap-signer-kubeadm-config
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: kubeadm:bootstrap-signer-kubeadm-config
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: system:bootstrap:${token_id}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kubeadm:bootstrap-signer-kubeadm-config
rules:
- apiGroups:
  - ''
  resources:
  - nodes
  verbs:
  - '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kubeadm:bootstrap-signer-kubeadm-config
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kubeadm:bootstrap-signer-kubeadm-config
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: system:bootstrap:${token_id}

Create config maps based on existing config:

  sed "s/master-node/''/" /var/run/kubernetes/kube-proxy.yaml > /var/run/kubernetes/config.conf
  kubectl delete cm -n kube-system kube-proxy |:
  kubectl create cm -n kube-system --from-file=/var/run/kubernetes/config.conf kube-proxy

  cp -f /var/run/kubernetes/kubelet.yaml /var/run/kubernetes/kubelet
  kubectl delete cm -n kube-system kubelet-config |:
  kubectl create cm -n kube-system --from-file=/var/run/kubernetes/kubelet kubelet-config

On the worker nodes we need two Systemd service files:

[Unit]
Description=Kube proxy

Wants=network-online.target
After=network-online.target

[Service]
ExecStart=/vagrant/github.com/kubernetes/kubernetes/_output/local/bin/linux/amd64/kube-proxy \
--v=3 \
--config=/var/run/kubernetes/config.conf \
--master="https://${MASTER_IP}:6443"
Restart=on-failure
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target

[Unit]
Description=Kubelet

Wants=kube-proxy
After=kube-proxy

[Service]
ExecStart=/vagrant/github.com/kubernetes/kubernetes/_output/local/bin/linux/amd64/kubelet \
--address=0.0.0.0 \
--hostname-override=$(hostname) \
--pod-cidr="${POD_CIDR}" \
--node-ip="${NODE_IP}" \
--register-node=true \
--v=3 \
--bootstrap-kubeconfig=/var/run/kubernetes/admin.kubeconfig \
--kubeconfig=/var/run/kubernetes/admin.kubeconfig \
--container-runtime-endpoint=unix:///var/run/containerd/containerd.sock \
--client-ca-file=/var/run/kubernetes/client-ca.crt \
--config=/var/run/kubernetes/kubelet.yaml
Restart=no
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target

Finally start components on worker node:

systemctl restart kube-proxy
sh /var/run/kubernetes/join.sh

The end result:

kubectl get no -o wide
NAME            STATUS   ROLES    AGE     VERSION                                    INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
master-node     Ready    <none>   3m51s   v1.27.0-alpha.0.367+1fe7f09b46850f-dirty   192.168.56.10   <none>        Ubuntu 22.04.1 LTS   5.15.0-57-generic   containerd://1.6.8
worker-node01   Ready    <none>   2m22s   v1.27.0-alpha.0.367+1fe7f09b46850f-dirty   192.168.56.11   <none>        Ubuntu 22.04.1 LTS   5.15.0-57-generic   containerd://1.6.8

kubectl get po -Ao wide
NAMESPACE     NAME                                      READY   STATUS    RESTARTS   AGE     IP              NODE            NOMINATED NODE   READINESS GATES
kube-system   calico-kube-controllers-57b57c56f-chbnq   1/1     Running   0          2m49s   192.168.0.3     master-node     <none>           <none>
kube-system   calico-node-hlxqg                         1/1     Running   0          2m49s   192.168.56.10   master-node     <none>           <none>
kube-system   calico-node-jx4gr                         1/1     Running   0          2m29s   192.168.56.11   worker-node01   <none>           <none>
kube-system   coredns-6846b5b5f-qqhx8                   1/1     Running   0          4m27s   192.168.0.2     master-node     <none>           <none>

I know it is a bit complicated, but automation is the key here. I have created a Vagrant based setup with a few simple commands: start, network && join, member-(on worker node). https://github.com/mhmxs/vagrant-kubeadm-kubernetes

My environment does a few other tricks, like creates a NFS share on master, configures network and DNS, and downloads dependencies to decrease starting time. I have spent a few days to figure these out :)

Does it make sense to write this hand-book? Where should be the docs located?

BenTheElder · 2023-01-27T17:36:53Z

I think this is the fastest way of Kubernetes development. In summary: small change, huge benefit :)

FWIW: https://kind.sigs.k8s.io/ can run a local multi-node cluster and runs a faster subset of the build. kind build node-image && kind create cluster --image=kindest/node:latest --config=multinode.yaml

Does it make sense to write this hand-book? Where should be the docs located?

Why would we use kubeadm join and not kubeadm init? I'm not sure this is a supported use-case for kuebadm either.

I'm not objecting to the cluster-up script changes, I leave that to @dims, but I do think cluster-up serves an important role as a minimum viable bootstrap without kubeadm etc. Whereas I'd expect fully-kubeadm when using kubeadm ... and I think we have docs for this. cc @neolit123

mhmxs · 2023-01-27T18:45:16Z

FWIW: https://kind.sigs.k8s.io/ can run a local multi-node cluster and runs a faster subset of the build. kind build node-image && kind create cluster --image=kindest/node:latest --config=multinode.yaml

On the other hand I'm a storage engineer and I need a separated kernel on each node. I can agree that my use-case doesn't fit for everybody. The only significant change here is the enable boostrap part.

I'm not sure this is a supported use-case for kuebadm either.

That's why I'm not sure the documentation makes any sense for the community. (And this part is located on my private repo)

But I'm sure my change request should help other kernel space devs to create multi node clusters.

Fix me if i'm wrong @BenTheElder, but kubeadm uses images not the raw binaries, so i have to distribute locally built images to the nodes or have to build images on nodes.

BenTheElder · 2023-01-27T20:29:23Z

On the other hand I'm a storage engineer and I need a separated kernel on each node. I can agree that my use-case doesn't fit for everybody. The only significant change here is the enable boostrap part.

That's a good use-case for other solutions indeed. We used to have a vagrant local-devel option in-tree before minikube, but at this point they're all out of tree. I'm not sure where we'd put this.

Fix me if i'm wrong @BenTheElder, but kubeadm uses images not the raw binaries, so i have to distribute locally built images to the nodes or have to build images on nodes.

From your prior comments I thought you were spinning up nodes from source with all nodes and joining the additional nodes to local-up-cluster with kubeadm join, in which case you'd still need that for the kubeadm join-ed nodes ...?

mhmxs · 2023-01-27T20:49:06Z

From your prior comments I thought you were spinning up nodes from source with all nodes and joining the additional nodes to local-up-cluster with kubeadm join, in which case you'd still need that for the kubeadm join-ed nodes ...?

I spin up the single node cluster with local-up-cluster.sh and join additional nodes via kubeadm join. This solution was number one when I was looking for options to join a node. If there is an easier way, I'm glad to have a try, but this is the way how we do it in production.

k8s-triage-robot · 2023-04-27T21:01:48Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2023-05-27T21:20:17Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2023-06-26T21:54:00Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2023-06-26T21:54:06Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 25, 2023

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 25, 2023

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 25, 2023

k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jan 27, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2023

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 27, 2023

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joining nodes to local development cluster #115319

Joining nodes to local development cluster #115319

mhmxs commented Jan 25, 2023

mhmxs commented Jan 25, 2023

k8s-ci-robot commented Jan 25, 2023

MathurUtkarsh commented Jan 26, 2023

mhmxs commented Jan 26, 2023

BenTheElder commented Jan 27, 2023

BenTheElder commented Jan 27, 2023

mhmxs commented Jan 27, 2023 •

edited

Loading

dims commented Jan 27, 2023

mhmxs commented Jan 27, 2023 •

edited

Loading

BenTheElder commented Jan 27, 2023

mhmxs commented Jan 27, 2023

BenTheElder commented Jan 27, 2023

mhmxs commented Jan 27, 2023

k8s-triage-robot commented Apr 27, 2023

k8s-triage-robot commented May 27, 2023

k8s-triage-robot commented Jun 26, 2023

k8s-ci-robot commented Jun 26, 2023

Joining nodes to local development cluster #115319

Joining nodes to local development cluster #115319

Comments

mhmxs commented Jan 25, 2023

mhmxs commented Jan 25, 2023

k8s-ci-robot commented Jan 25, 2023

MathurUtkarsh commented Jan 26, 2023

mhmxs commented Jan 26, 2023

BenTheElder commented Jan 27, 2023

BenTheElder commented Jan 27, 2023

mhmxs commented Jan 27, 2023 • edited Loading

dims commented Jan 27, 2023

mhmxs commented Jan 27, 2023 • edited Loading

BenTheElder commented Jan 27, 2023

mhmxs commented Jan 27, 2023

BenTheElder commented Jan 27, 2023

mhmxs commented Jan 27, 2023

k8s-triage-robot commented Apr 27, 2023

k8s-triage-robot commented May 27, 2023

k8s-triage-robot commented Jun 26, 2023

k8s-ci-robot commented Jun 26, 2023

mhmxs commented Jan 27, 2023 •

edited

Loading

mhmxs commented Jan 27, 2023 •

edited

Loading