Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
229 lines (163 sloc) 8.31 KB

Kubernetes not the hardest way (or "Provisioning a Kubernetes Cluster on AWS using Terraform and Ansible")

A worked example to provision a Kubernetes cluster on AWS from scratch, using Terraform and Ansible. A scripted version of the famous tutorial Kubernetes the hard way.

See the companion article for details about goals, design decisions and simplifications.

  • 3 EC2 instances for HA Kubernetes Control Plane: Kubernetes API, Scheduler and Controller Manager
  • 3 EC2 instances for etcd cluster
  • 3 EC2 instances as Kubernetes Workers (aka Minions or Nodes)
  • Kubenet Pod networking (using CNI)
  • HTTPS between components and control API
  • Sample nginx service deployed to check everything works

This is a learning tool, not a production-ready setup.


Requirements on control machine:

  • Terraform (tested with Terraform 0.7.0; NOT compatible with Terraform 0.6.x)
  • Python (tested with Python 2.7.12, may be not compatible with older versions; requires Jinja2 2.8)
  • Python netaddr module
  • Ansible (tested with Ansible
  • cfssl and cfssljson:
  • Kubernetes CLI
  • SSH Agent
  • (optionally) AWS CLI

AWS Credentials

AWS KeyPair

You need a valid AWS Identity (.pem) file and the corresponding Public Key. Terraform imports the KeyPair in your AWS account. Ansible uses the Identity to SSH into machines.

Please read AWS Documentation about supported formats.

Terraform and Ansible authentication

Both Terraform and Ansible expect AWS credentials set in environment variables:

$ export AWS_ACCESS_KEY_ID=<access-key-id>
$ export AWS_SECRET_ACCESS_KEY="<secret-key>"

If you plan to use AWS CLI you have to set AWS_DEFAULT_REGION.

Ansible expects the SSH identity loaded by SSH agent:

$ ssh-add <keypair-name>.pem

Defining the environment

Terraform expects some variables to define your working environment:

  • control_cidr: The CIDR of your IP. All instances will accept only traffic from this address only. Note this is a CIDR, not a single IP. e.g. (mandatory)
  • default_keypair_public_key: Valid public key corresponding to the Identity you will use to SSH into VMs. e.g. "ssh-rsa" (mandatory)

Note that Instances and Kubernetes API will be accessible only from the "control IP". If you fail to set it correctly, you will not be able to SSH into machines or run Ansible playbooks.

You may optionally redefine:

  • default_keypair_name: AWS key-pair name for all instances. (Default: "k8s-not-the-hardest-way")
  • vpc_name: VPC Name. Must be unique in the AWS Account (Default: "kubernetes")
  • elb_name: ELB Name for Kubernetes API. Can only contain characters valid for DNS names. Must be unique in the AWS Account (Default: "kubernetes")
  • owner: Owner tag added to all AWS resources. No functional use. It becomes useful to filter your resources on AWS console if you are sharing the same AWS account with others. (Default: "kubernetes")

The easiest way is creating a terraform.tfvars variable file in ./terraform directory. Terraform automatically imports it.

Sample terraform.tfvars:

default_keypair_public_key = "ssh-rsa AAA...zzz"
control_cidr = ""
default_keypair_name = "lorenzo-glf"
vpc_name = "Lorenzo ETCD"
elb_name = "lorenzo-etcd"
owner = "Lorenzo"

Changing AWS Region

By default, the project uses eu-west-1. To use a different AWS Region, set additional Terraform variables:

  • region: AWS Region (default: "eu-west-1").
  • zone: AWS Availability Zone (default: "eu-west-1a")
  • default_ami: Pick the AMI for the new Region from Ubuntu 16.04 LTS (xenial), HVM:EBS-SSD

You also have to edit ./ansible/hosts/ec2.ini, changing regions = eu-west-1 to the new Region.

Provision infrastructure, with Terraform

Run Terraform commands from ./terraform subdirectory.

$ terraform plan
$ terraform apply

Terraform outputs public DNS name of Kubernetes API and Workers public IPs.

Apply complete! Resources: 12 added, 2 changed, 0 destroyed.

  kubernetes_api_dns_name =
  kubernetes_workers_public_ip =,,

You will need them later (you may show them at any moment with terraform output).

Generated SSH config

Terraform generates ssh.cfg, SSH configuration file in the project directory. It is convenient for manually SSH into machines using node names (controller0...controller2, etcd0...2, worker0...2), but it is NOT used by Ansible.


$ ssh -F ssh.cfg worker0

Install Kubernetes, with Ansible

Run Ansible commands from ./ansible subdirectory.

We have multiple playbooks.

Install and set up Kubernetes cluster

Install Kubernetes components and etcd cluster.

$ ansible-playbook infra.yaml

Setup Kubernetes CLI

Configure Kubernetes CLI (kubectl) on your machine, setting Kubernetes API endpoint (as returned by Terraform).

$ ansible-playbook kubectl.yaml --extra-vars "kubernetes_api_endpoint=<kubernetes-api-dns-name>"

Verify all components and minions (workers) are up and running, using Kubernetes CLI (kubectl).

$ kubectl get componentstatuses
NAME                 STATUS    MESSAGE              ERROR
controller-manager   Healthy   ok
scheduler            Healthy   ok
etcd-2               Healthy   {"health": "true"}
etcd-1               Healthy   {"health": "true"}
etcd-0               Healthy   {"health": "true"}

$ kubectl get nodes
NAME                                       STATUS    AGE   Ready     6m   Ready     6m   Ready     6m

Setup Pod cluster routing

Set up additional routes for traffic between Pods.

$ ansible-playbook kubernetes-routing.yaml

Smoke test: Deploy nginx service

Deploy a ngnix service inside Kubernetes.

$ ansible-playbook kubernetes-nginx.yaml

Verify pods and service are up and running.

$ kubectl get pods -o wide
NAME                     READY     STATUS    RESTARTS   AGE       IP           NODE
nginx-2032906785-9chju   1/1       Running   0          3m
nginx-2032906785-anu2z   1/1       Running   0          3m
nginx-2032906785-ynuhi   1/1       Running   0          3m

> kubectl get svc nginx --output=json
    "kind": "Service",
    "apiVersion": "v1",
    "metadata": {
        "name": "nginx",
        "namespace": "default",

Retrieve the port nginx has been exposed on:

$ kubectl get svc nginx --output=jsonpath='{range .spec.ports[0]}{.nodePort}'

Now you should be able to access nginx default page:

$ curl http://<worker-0-public-ip>:<exposed-port>
<!DOCTYPE html>
<title>Welcome to nginx!</title>

The service is exposed on all Workers using the same port (see Workers public IPs in Terraform output).

Known simplifications

There are many known simplifications, compared to a production-ready solution:

  • Networking setup is very simple: ALL instances have a public IP (though only accessible from a configurable Control IP).
  • Infrastructure managed by direct SSH into instances (no VPN, no Bastion).
  • Very basic Service Account and Secret (to change them, modify: ./ansible/roles/controller/files/token.csv and ./ansible/roles/worker/templates/kubeconfig.j2)
  • No actual integration between Kubernetes and AWS.
  • No additional Kubernetes add-on (DNS, Dashboard, Logging...)
  • Simplified Ansible lifecycle. Playbooks support changes in a simplistic way, including possibly unnecessary restarts.
  • Instances use static private IP addresses
  • No stable private or public DNS naming (only dynamic DNS names, generated by AWS)