Warning This is a research prototype. Think before you deploy 😈
This Terraform configuration deploys a sandbox for experimenting with GKE Autopilot private clusters.
Because it is meant for exploration and demos, some parts are configured differently from what you'd expect to see in a production system. The most prominent deviations are:
- A lot of telemetry is collected. Logging and monitoring levels are set well above their default values.
- All Google Cloud resources for the cluster are deployed directly from this Terraform module with no extra dependencies.
- The latest versions of Terraform and Terraform Google provider are used.
- Some resources are deployed using Google-beta provider.
- Input validation is done on a "best-effort" basis.
- No backwards compatibility should be expected.
You have been warned! It's good fun, though, so feel free to fork and play around with GKE, it's pretty cool tech, in my opinion.
GKE best practices and other related resources.
- Terraform for opinionated GKE clusters
- Autopilot vs Standard clusters feature comparison
- Best practices for GKE networking
- Harden your cluster's security
- Best practices for running cost-optimized Kubernetes applications on GKE. Includes a great summary checklist.
Although this deployment is meant for proof-of-concept and experimental work, it implements many of the Google's cluster security recommendations.
- It is a private cluster, so the cluster nodes do not have public IP addresses.
- Cloud NAT is configured to allow the cluster nodes and pods to access the Internet. So container registries located outside Google Cloud can be used.
- The cluster nodes use a user-managed least privilege service account.
- The cluster is subscribed to the Rapid release channel.
- VPC Flow Logs are enabled by default on the cluster subnetwork.
Some other aspects which used to be a thing when this sandbox was for deployment of Standard GKE clusters are now "pre-configured" by GKE Autopilot, but it's still useful to remember what they are:
- The cluster is VPC-native.
- It has regional availability.
- Shielded GKE nodes feature is enabled.
- Secure Boot and Integrity Monitoring are enabled.
- Workload Identity is enabled.
- A hardened node image with
containerd
runtime is used.
- Terraform, obviously.
- A Google Cloud project with the necessary permissions granted to you.
- The project must be linked to an active billing account.
Given that this is a research prototype, I am not that fussy about scoping every admin role that is needed to deploy this module. The roles/owner
IAM basic role on the project would work. The roles/editor
IAM basic role might work but I have not tested it.
If you fancy doing it the hard way – and there is time and place for such adventures, indeed – I hope this starting list of roles will help:
- Kubernetes Engine Admin (
roles/container.admin
) - Service Account Admin (
roles/iam.serviceAccountAdmin
) - Compute Admin (
roles/compute.admin
) - Service Usage Admin (
roles/serviceusage.serviceUsageAdmin
) - Monitoring Admin (
roles/monitoring.admin
) - Private Logs Viewer (
roles/logging.privateLogViewer
) - Moar?!
Clone the repo and you are good to go! You can provide the input variables' values as command-line parameters to Terraform CLI:
terraform init && terraform apply -var="project=infernal-horse" -var="region=europe-west4"
- You must set the Google Cloud project ID and Google Cloud region.
- You may set
authorized_networks
to enable access ot the cluster's endpoint from a public IP address. You still would have to authenticate.
Note The default value for
authorized_networks
does not allows any public access to the cluster endpoint.
To avoid having to provide the input variable values on the command line, you can create a variable definitions file, such as env.auto.tfvars
and define the values therein.
project = "<PROJECT_ID>"
region = "<REGION>"
authorized_networks = [
{
cidr_block = "1.2.3.4/32"
display_name = "my-ip-address"
},
]
Note that you'd have to provide your own values for the variables 😉
Note To find your public IP, you can run the following command
dig -4 TXT +short o-o.myaddr.l.google.com @ns1.google.com
Now you can run Terraform (init
... plan
... apply
) to deploy.
Happy hacking!
This module accepts the following input variables.
project
is the Google Cloud project ID.region
is the Google Cloud region for all deployed resources.- (Optional) VPC flow logs:
enable_flow_log
- (Optional)
node_cidr_range
- (Optional)
pod_cidr_range
- (Optional)
service_cidr_range
- (Optional) The list of
authorized_networks
representing CIDR blocks allowed to access the cluster's control plane.
Warning This section is grossly out-of-date!
Once the infrastructure is provisioned with Terraform, you can deploy the example workload.
- Online Boutique application by Google: GoogleCloudPlatform/microservices-demo
- out-of-date My custom, hardened, version of deployment manifests: olliefr/gke-microservices-demo
Warning This section is grossly out-of-date!
This module runs in two stages, using two (aliased) instances of Terraform Google provider.
The first stage, named the seed, is self-contained in 010-seed.tf
. It runs with user
credentials via ADC and sets up the foundation for the deployment that follows. The required
services are enabled at this stage, and a least privilege IAM service account is provisioned
and configured. At the end of the seed stage, a second instance of Terraform Google provider
is initialised with the service account's credentials.
The following stage deploys the cluster resources using service account impersonation.
This deployment architecture serves three aims:
- Short feedback loop. Everything is contained in a single Terraform module so is simple to deploy and update.
- Deploying using a least privilege service account. This reduces the risk of
hitting a permission error on deployment into "production", which is usually done
by a locked-down service account, as compared to deployment into "development" environment,
which was done with user's Google account identity that usually has very broad permissions
on the project (
Owner
orEditor
). Inspiration. - The module can be used with "long-life" Google Cloud projects that are "repurposed" from one experiment to another. The explicit declaration of dependencies, where it was necessary, allows Terraform to destroy the resources in the right order, when requested.
# 000-versions: Terraform and provider versions and configuration
# 010-seed: configure the project and provision a least privilege service account for deploying the cluster
# 030-cluster-node-sa: provision and configure a least privilege service account for cluster nodes
# 040-network: create a VPC, a subnet, and configure network and firewall logs.
# 050-nat: resources that provide NAT functionality to cluster nodes with private IP addresses.
# 060-cluster: create a GKE cluster (Standard)
Just some ideas for future explorations.
- Deploy by impersonating a service account to validate the list of required admin roles;
- Create a private cluster with no public endpoint and access the endpoint using IAP for TCP forwarding;
- Provide an option for Secret management;
- Configure Artifact registry;
- Enable Binary Authorization;
- Shared VPC set-up;
- VPC Service Controls;
- Enable intranode visibility on a cluster;
- Set up the Config Connector (or use the Config Controller);
- Explore Cloud DNS for GKE option;
- IPv6 set-up;
- Explore Anthos Service Mesh (managed Istio);