-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: Initial docs for bare-metal support
- Loading branch information
Showing
1 changed file
with
159 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,159 @@ | ||
# Bare Metal Support | ||
|
||
***Bare metal support is experimental, and may be removed at any time*** | ||
|
||
## Introduction | ||
|
||
kOps has some experimental bare-metal support, specifically for nodes. The idea | ||
we are exploring is that you can run your control-plane in a cloud, but you can | ||
join physical machines as nodes to that control-plane, even though those nodes | ||
are not located in the cloud. | ||
|
||
This approach has some limitations and complexities - for example the | ||
cloud-controller-manager for the control plane won't be able to attach volumes | ||
to the nodes, because they aren't cloud VMs. The advantage is that we can first | ||
implement node bare-metal support, before tackling the complexities of the | ||
control plane. | ||
|
||
## Walkthrough | ||
|
||
Create a "normal" kOps cluster; here we are using GCE: | ||
|
||
``` | ||
kops create cluster foo.k8s.local --cloud gce --zones us-east4-a | ||
kops update cluster --yes --admin | ||
``` | ||
|
||
Create a kops-system namespace, to hold secrets that are generated as part | ||
of joining the machine. Technically although these are sensitive, they aren't | ||
secrets, because they only hold public keys, so we will create a CRD in future: | ||
|
||
``` | ||
kubectl create ns kops-system | ||
``` | ||
|
||
Create a RoleBinding and Role to allow kops-controller to read secrets: | ||
``` | ||
kubectl apply --server-side -f - <<EOF | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: RoleBinding | ||
metadata: | ||
name: kops-controller | ||
namespace: kops-system | ||
roleRef: | ||
apiGroup: rbac.authorization.k8s.io | ||
kind: Role | ||
name: kops-controller:pki-verifier | ||
subjects: | ||
- apiGroup: rbac.authorization.k8s.io | ||
kind: User | ||
name: system:serviceaccount:kube-system:kops-controller | ||
--- | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: Role | ||
metadata: | ||
name: kops-controller:pki-verifier | ||
namespace: kops-system | ||
rules: | ||
- apiGroups: | ||
- "" | ||
resources: | ||
- secrets | ||
verbs: | ||
- get | ||
- list | ||
- watch | ||
EOF | ||
``` | ||
|
||
|
||
### Create a VM | ||
|
||
When first trying this out, we recommend creating a local VM instead of a true | ||
bare-metal machine. | ||
|
||
``` | ||
mkdir vm1 | ||
cd vm1 | ||
wget -O debian11.qcow2 https://cloud.debian.org/images/cloud/bullseye/20231013-1532/debian-11-nocloud-amd64-20231013-1532.qcow2 | ||
qemu-img create -o backing_file=debian11.qcow2,backing_fmt=qcow2 -f qcow2 vm1-root.qcow2 10G | ||
qemu-system-x86_64 \ | ||
-smp 2 \ | ||
-enable-kvm \ | ||
-netdev user,id=net0,net=192.168.76.0/24,dhcpstart=192.168.76.9,hostfwd=tcp::2222-:22 \ | ||
-device rtl8139,netdev=net0 \ | ||
-m 4G \ | ||
-drive file=vm1-root.qcow2,if=virtio,format=qcow2 \ | ||
-nographic -serial mon:stdio | ||
``` | ||
|
||
Now login as root (with no password, and set up SSH and the machine name): | ||
|
||
``` | ||
ssh-keygen -A | ||
systemctl restart sshd | ||
echo "vm1" > /etc/hostname | ||
hostname vm1 | ||
``` | ||
|
||
Currently the `kops toolbox enroll` command only supports SSH agents for | ||
the private key; so get your public key from `ssh-add -L`, and then you must | ||
currently manually add it to the `authorized_keys` file on the VM. | ||
|
||
``` | ||
mkdir ~/.ssh/ | ||
vim ~/.ssh/authorized_keys | ||
``` | ||
|
||
After you've done this, open a new terminal and SSH should now work | ||
from the host: `ssh root@127.0.0.1 -p 2222 uptime` | ||
|
||
|
||
### Joining the VM to the cluster | ||
|
||
``` | ||
go run ./cmd/kops toolbox enroll --cluster foo.k8s.local --instance-group nodes-us-east4-a --ssh-user root --host 127.0.0.1 --ssh-port 2222 | ||
``` | ||
|
||
Within a minute or so, the node should appear in `kubectl get nodes`. | ||
If it doesn't work, first check the kops-configuration log: | ||
`ssh root@127.0.0.1 -p 2222 journalctl -u kops-configuration` | ||
|
||
And then if that looks OK (ends in "success"), check the kubelet log: | ||
`ssh root@127.0.0.1 -p 2222 journalctl -u kubelet`. | ||
|
||
### The state of the node | ||
|
||
You should observe that the node is running, and pods are scheduled to the node. | ||
|
||
``` | ||
kubectl get pods -A --field-selector spec.nodeName=vm1 | ||
``` | ||
|
||
Cilium will likely be running on the node. | ||
|
||
The GCE PD CSI driver is scheduled, but is likely crash-looping | ||
because it can't reach the GCE metadata service. You can see this from the | ||
logs on the VM in `/var/log/container` | ||
(e.g. `ssh root@127.0.0.1 -p 2222 cat /var/log/containers/*gce-pd-driver*.log`) | ||
|
||
If you try to use `kubectl logs`, you will see an error like the below, which | ||
indicates another problem - that the control plane cannot reach the kubelet: | ||
`Error from server: Get "https://192.168.76.9:10250/containerLogs/gce-pd-csi-driver/csi-gce-pd-node-l2rm8/csi-driver-registrar": dial tcp 192.168.76.9:10250: i/o timeout` | ||
|
||
### Cleanup | ||
|
||
Quit the qemu VM with Ctrl-a x. | ||
|
||
Delete the node and the secret | ||
``` | ||
kubectl delete node delete | ||
kubectl delete secret -n kops-system debian | ||
``` | ||
|
||
If you're done with the cluster also: | ||
``` | ||
kops delete cluster foo.k8s.local --yes | ||
``` |