This repo was templated from https://github.com/k8s-at-home/template-cluster-k3s and reduced/modified to match my own needs.
The following components will be installed in the k3s cluster by default.
- cert-manager - Operator to request SSL certificates and store them as Kubernetes resources
- calico - CNI (container network interface)
- flux - GitOps tool for deploying manifests from the
cluster
directory - hajimari - start page with ingress discovery
- kube-vip - layer 2 load balancer for the Kubernetes control plane
- local-path-provisioner - local storage class provided by k3s
- nfs-subdir-external-provisioner - nfs default storage class provided by k3s (existing NFS server must be setup beforehand)
- metallb - bare metal load balancer
- reloader - restart pods when Kubernetes
configmap
orsecret
changes - system-upgrade-controller - upgrade k3s
- traefik - ingress controller
- prometheus-kube-stack - prometheus operator with alertmanager
- blackbox-exporter - Exporter for prometheus to monitor HTTP/ICMP endpoints
- alertmanager-discord - Allows alerting to discord
- grafana - Webdashboard to visualize prometheus metrics
- home-assistant - Open source home automation platform
- tandoorrecipes - recipe manager that allows management of digital recipes
- pi-hole - a DNS sinkhole that protects your devices from unwanted content without installing any client-side software
- Argo CD - a declarative, GitOps continuous delivery tool for Kubernetes
- Miniflux - a minimalist and opinionated feed reader
For provisioning the following tools will be used:
- Ubuntu - this is a pretty universal operating system that supports running all kinds of home related workloads in Kubernetes
- Ansible - this will be used to provision the Ubuntu operating system to be ready for Kubernetes and also to install k3s
- One or more raspberry pis with a fresh install of Ubuntu Server 20.04.
- Some experience in debugging problems and a positive attitude ;)
Tools that needs to be installed on the local workstation.
Tool | Purpose |
---|---|
ansible | Preparing Ubuntu for Kubernetes and installing k3s |
direnv | Exports env vars based on present working directory |
flux | Operator that manages your k8s cluster based on your Git repository |
age | A simple, modern and secure encryption tool (and Go library) with small explicit keys, no config options, and UNIX-style composability. |
go-task | A task runner / simpler Make alternative written in Go (snap install task --classic) |
ipcalc | Used to verify settings in the configure script |
jq | Used to verify settings in the configure script |
kubectl | Allows you to run commands against Kubernetes clusters |
sops | Encrypts k8s secrets with Age |
helm | Manage Kubernetes applications |
kustomize | Template-free way to customize application configuration |
pre-commit | Runs checks pre git commit |
prettier | Prettier is an opinionated code formatter. |
It is advisable to install pre-commit and the pre-commit hooks that come with this repository. sops-pre-commit will check to make sure you are not by accident committing your secrets un-encrypted.
After pre-commit is installed on your machine run:
pre-commit install-hooks
The Git repository contains the following directories under cluster
and are ordered below by how Flux will apply them.
- base directory is the entrypoint to Flux
- crds directory contains custom resource definitions (CRDs) that need to exist globally in your cluster before anything else exists
- core directory (depends on crds) are important infrastructure applications (grouped by namespace) that should never be pruned by Flux
- apps directory (depends on core) is where your common applications (grouped by namespace) could be placed, Flux will prune resources here if they are not tracked by Git anymore
cluster
├── apps
│ ├── default
| ├── kube-system
| ├── monitoring
│ ├── networking
│ └── system-upgrade
├── base
│ └── flux-system
├── core
│ ├── cert-manager
│ ├── metallb-system
│ ├── namespaces
| ├── nfs-subdir-external-provisioner
│ └── system-upgrade
└── crds
├── traefik
└── cert-manager
Create a Age Private and Public key for encrypting and decrypting secrets.
- Create a Age Private / Public Key
age-keygen -o age.agekey
- Set up the directory for the Age key and move the Age file to it
mkdir -p ~/.config/sops/age
mv age.agekey ~/.config/sops/age/keys.txt
- Export the
SOPS_AGE_KEY_FILE
inzshrc
and source it
echo "export SOPS_AGE_KEY_FILE=~/.config/sops/age/keys.txt" >> ~/.zshrc
source ~/.zshrc
- Fill out the Age public key in the
.config.env
underBOOTSTRAP_AGE_PUBLIC_KEY
, note the public key should start withage
...
The .config.env
file contains necessary configuration files that are needed by Ansible and Flux.
-
Start filling out all the environment variables. All are required and read the comments they will explain further what is required.
-
Copy tmpl folder from k8s-at-home/template-cluster-k3s into the root directory of that repository
-
Verify the configuration is correct:
./configure.sh --verify
(If ssh test failed, make sure the ssh key is copied onto the servers)
- Run
./configure.sh
to start having the script wire up the templated files and place them where they need to be.
(Nodes are not security hardened by default, you can do this with dev-sec/ansible-collection-hardening or something similar.)
-
Install the deps by running
task ansible:deps
-
Verify Ansible can view your config by running
task ansible:list
-
Verify Ansible can ping your nodes by running
task ansible:adhoc:ping
-
Finally, run the Ubuntu Prepare playbook by running
task ansible:playbook:ubuntu-prepare
-
If everything goes as planned you should see Ansible running the Ubuntu Prepare Playbook against your nodes.
-
Must be done manually for now. Open /boot/firmware/cmdline.txt and add
cgroup_enable=memory cgroup_memory=1
to all raspberry pis. Reboot afterwards. https://rancher.com/docs/k3s/latest/en/advanced/#enabling-cgroups-for-raspbian-buster
📍 Here we will be running a Ansible Playbook to install k3s with this wonderful k3s Ansible galaxy role. After completion, Ansible will drop a kubeconfig
in ./provision/kubeconfig
for use with interacting with your cluster with kubectl
. Copy kubeconfig to ~/.kube/
-
Verify Ansible can view your config by running
task ansible:list
-
Verify Ansible can ping your nodes by running
task ansible:adhoc:ping
-
Run the k3s install playbook by running
task ansible:playbook:k3s-install
-
Verify the nodes are online
kubectl --kubeconfig=./provision/kubeconfig get nodes
# NAME STATUS ROLES AGE VERSION
# k8s-0 Ready control-plane,master 4d20h v1.21.5+k3s1
# k8s-1 Ready <none> 4d20h v1.21.5+k3s1
# k8s-2 Ready <none> 4d20h v1.21.5+k3s1
- Optionally label the workers so their role is displayed correctly
kubectl --kubeconfig=./provision/kubeconfig label node k8s-1 node-role.kubernetes.io/worker=''
kubectl --kubeconfig=./provision/kubeconfig label node k8s-2 node-role.kubernetes.io/worker=''
kubectl --kubeconfig=./provision/kubeconfig get nodes
# NAME STATUS ROLES AGE VERSION
# k8s-0 Ready control-plane,master 4d20h v1.21.5+k3s1
# k8s-1 Ready worker 4d20h v1.21.5+k3s1
# k8s-2 Ready worker 4d20h v1.21.5+k3s1
- Verify Flux can be installed
flux --kubeconfig=./provision/kubeconfig check --pre
# ► checking prerequisites
# ✔ kubectl 1.21.5 >=1.18.0-0
# ✔ Kubernetes 1.21.5+k3s1 >=1.16.0-0
# ✔ prerequisites checks passed
- Pre-create the
flux-system
namespace
kubectl --kubeconfig=./provision/kubeconfig create namespace flux-system --dry-run=client -o yaml | kubectl --kubeconfig=./provision/kubeconfig apply -f -
- Add the Age key in-order for Flux to decrypt SOPS secrets
cat ~/.config/sops/age/keys.txt |
kubectl --kubeconfig=./provision/kubeconfig -n flux-system create secret generic sops-age \
--from-file=age.agekey=/dev/stdin
- Verify all the above files are encrypted with SOPS Commands for encryption / decryption:
# Decrypt secrets
sops --decrypt cluster/base/cluster-secrets.sops.yaml > cluster/base/cluster-secrets.yaml
# Encrypt secrets
sops --encrypt cluster/base/cluster-secrets.yaml > cluster/base/cluster-secrets.sops.yaml
- Push you changes to git
git add -A
git commit -m "initial commit"
git push
- Install Flux Due to race conditions with the Flux CRDs you will have to run the below command twice. There should be no errors on this second run.
# Optional:
cp ./provision/kubeconfig ~/.kube/config && chmod 0600 ~/.kube/config
kubectl --kubeconfig=./provision/kubeconfig apply --kustomize=./cluster/base/flux-system
# namespace/flux-system configured
# customresourcedefinition.apiextensions.k8s.io/alerts.notification.toolkit.fluxcd.io created
# ...
# unable to recognize "./cluster/base/flux-system": no matches for kind "Kustomization" in version "kustomize.toolkit.fluxcd.io/v1beta1"
# unable to recognize "./cluster/base/flux-system": no matches for kind "GitRepository" in version "source.toolkit.fluxcd.io/v1"
# unable to recognize "./cluster/base/flux-system": no matches for kind "HelmRepository" in version "source.toolkit.fluxcd.io/v1"
# unable to recognize "./cluster/base/flux-system": no matches for kind "HelmRepository" in version "source.toolkit.fluxcd.io/v1"
# unable to recognize "./cluster/base/flux-system": no matches for kind "HelmRepository" in version "source.toolkit.fluxcd.io/v1"
# unable to recognize "./cluster/base/flux-system": no matches for kind "HelmRepository" in version "source.toolkit.fluxcd.io/v1"
Workaround for error error GitRepository/flux-system.flux-system - Reconciler error auth secret error: Secret "flux-system" not found
export GITHUB_USER=<GITHUB_USER>
export GITHUB_TOKEN=<GITHUB_TOKEN>
flux bootstrap github --owner=$GITHUB_USER --repository=k3s-rpi-gitops --branch=main --path=./cluster/base --personal --private=false
- Verify Flux components are running in the cluster
kubectl --kubeconfig=./provision/kubeconfig get pods -n flux-system
# NAME READY STATUS RESTARTS AGE
# helm-controller-5bbd94c75-89sb4 1/1 Running 0 1h
# kustomize-controller-7b67b6b77d-nqc67 1/1 Running 0 1h
# notification-controller-7c46575844-k4bvr 1/1 Running 0 1h
# source-controller-7d6875bcb4-zqw9f 1/1 Running 0 1h
Update the channel
spec of cluster/apps/system-upgrade-system-upgrade-controller/server-plan.yaml
and cluster/apps/system-upgrade-system-upgrade-controller/agent-plan.yaml
to the desired version.
- Download the v<UPDATE_VERSION> operator manifest.
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/<VERSION>/manifests/tigera-operator.yaml
- Use the following command to initiate an upgrade.
kubectl replace -f tigera-operator.yaml
TODO
# Force reconciliation
flux reconcile helmrelease <HELMRELEASE> -n <NAMESPACE>
flux reconcile kustomization apps
# Get statuses of flux resources
flux get all
flux get helmrelease -A
# Follow flux logs
flux logs --level=error
flux logs --follow
# Monitor helm-controller
kubectl get pods -n flux-system
klf -n flux-system helm-controller-55896d6ccf-d9w8p
# klf -n flux-system $(kubectl get pods -n flux-system | grep -E 'helm-controller.*Running' | cut -d ' ' -f1)
# Fix message: "Helm upgrade failed: another operation (install/upgrade/rollback) is in progress"
helm delete <HELMRELEASE> -n <NAMESPACE>
flux reconcile helmrelease <HELMRELEASE> -n <NAMESPACE>
flux delete helmrelease <HELMRELEASE> -n <NAMESPACE>
flux reconcile source helm <HELMRELEASE>
# Fix error message (re-create kustomization):
# ✗ Kustomization reconciliation failed: Secret/flux-system/cluster-secrets is SOPS encryted, configuring decryption is required for this secret to be reconciled
flux create kustomization cluster-secrets --source=flux-system --path=./cluster/base --prune=true --interval=10m --decryption-provider=sops --decryption-secret=sops-age