This is an Azure Resource Manager template that automatically deploys a
k3s cluster atop Ubuntu 18.04. This cluster has a single master VMs and a VM scaleset for workers/agents, plus required network infrastructure.
The template defaults to deploying B-Series VMs (
B1ls) with the smallest possible managed disk size (S4, 32GB). It also deploys (and mounts) an Azure File Share on all machines with (very) permissive access at
/srv, which makes it quite easy to run stateful services.
The key aspect of this template is that you can add and remove agents at will simply by resizing the VM scaleset - the cluster comes with a few (very simple) helper scripts that allow nodes to join and leave the cluster as they are created/destroyed, and the
k3s scheduler will redeploy pods as needed.
This was originally built as a Docker Swarm template, and even though Azure has a perfectly serviceable Kubernetes managed service, I enjoy the challenge of building my own stuff and fine-tuning it.
k3s is a breath of fresh air, and an opportunity to play around with a simpler, slimmer version of Kubernetes--and break it to see what happens.
Also, a lot of the ARM templating involved (for metrics, managed identities, etc.) lacks comprehensive samples, so this was also a way for me to provide a fully working one.
- air-gapped (i.e., standalone) install without
- upgrade to
k3s0.6.0 and test its metrics server
- WIP: sample deployments/pods/charts
- WIP: simple Python scale-down helper inspired by this C# sample
- re-usable user-assigned service identity instead of system (per-machine)
- Managed Service Identity for master and role allocations to allow it to manage the scaleset (and the rest of the resource group)
- add Linux Monitoring Extension (3.x) to master and agents (visible in the "Guest (classic)" metrics namespace in Azure Portal)
- scratch folder on agents' temporary storage volume (on-hypervisor SSD), available as
- set timezone
kubectlin master node
- remove scale set load balancer (everything must go through
traefikon the master)
- re-enable first-time reboot after OS package updates
- private registry on master node
- trivial ingress through master node (built-in)
- Set node role labels
k3stoken to agents
- remove unused packages from
- remove unnecessary commands from
- remove unnecessary files from repo and trim history
- fork, new
make keys- generates an SSH key for provisioning
make deploy-storage- deploys shared storage
make params- generates ARM template parameters
make deploy-compute- deploys cluster resources and pre-provisions Docker on all machines
make view-deployment- view deployment progress
make list-agents- lists all agent VMs
make scale-agents-<number>- scales the agent VM scale set to
make scale-10will resize it (up or down) to 10 VMs
make stop-agents- stops all agents
make start-agents- starts all agents
make reimage-agents-parallel- nukes and paves all agents
make reimage-agents-serial- reimages all agents in sequence
make chaos-monkey- restarts all agents in random order
make proxy- opens an SSH session to
master0and sets up TCP forwarding to
make tail-helper- opens an SSH session to
master0and tails the
make list-endpoints- list DNS aliases
make destroy-cluster- destroys the entire cluster
az login make keys make deploy-storage make params make deploy-compute make view-deployment # Go to the Azure portal and check the deployment progress # Clean up after we're done working make destroy-cluster
- The Azure CLI (
pip install -U -r requirements.txtwill install it)
make(you can just read through the
Makefileand type the commands yourself)
master0 runs a very simple HTTP server (only accessible inside the cluster) that provides tokens for new VMs to join the cluster and an endpoint for them to signal that they're leaving. That server also cleans up the node table once agents are gone.
Upon provisioning, all agents try to obtain a token and join the cluster. Upon rebooting, they signal they're leaving the cluster and re-join it again.
This is done in the simplest possible way, by using
cloud-init to bootstrap a few helper scripts that are invoked upon shutdown and (re)boot. Check the YAML files for details.
To avoid using VM extensions (which are nice, but opaque to most people used to using
cloud-init) and to ensure each fresh deployment runs the latest Docker version, VMs are provisioned using
customData in their respective ARM templates.
cloud-init files and SSH keys are then packed into the JSON parameters file and submitted as a single provisioning transaction, and upon first boot Ubuntu takes the
cloud-init file and provisions the machine accordingly.
azure-docker-swarm-cluster for more details.
Keep in mind that this was written for conciseness and ease of experimentation -- look to AKS for a production service.