Skip to content
Terraform module for creating GKE clusters to run Kubeflow
HCL Shell
Branch: master
Clone or download
mattnworb Merge pull request #3 from spotify/releases
readme: add note on how to release
Latest commit be4a192 Dec 11, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
scripts initial import of repo to github.com Dec 6, 2019
.gitignore initial import of repo to github.com Dec 6, 2019
.terraform-version
AUTHORS
LICENSE initial import of repo to github.com Dec 6, 2019
README.md Merge pull request #3 from spotify/releases Dec 11, 2019
cloudsql.tf initial import of repo to github.com Dec 6, 2019
iam.tf initial import of repo to github.com Dec 6, 2019
kube-dns.tf
kube-secrets.tf initial import of repo to github.com Dec 6, 2019
main.tf sync updates from internal repo since open sourcing Dec 11, 2019
metrics-server.tf initial import of repo to github.com Dec 6, 2019
providers.tf sync updates from internal repo since open sourcing Dec 11, 2019
rbac.tf initial import of repo to github.com Dec 6, 2019
variables.tf initial import of repo to github.com Dec 6, 2019
velero.tf sync updates from internal repo since open sourcing Dec 11, 2019

README.md

terraform-gke-kubeflow-cluster

lifecycle License

A Terraform module for creating a GKE cluster to run Kubeflow on.

This module creates a GKE cluster similiar to how the kfctl tool does, with a few changes:

  • adds a Cloud SQL instance to use for the metadata store/databases
  • creates a GCE Persistent Disk to use for the artifact store

This module was originally created by the ML Infrastructure team at Spotify to create and manage long-lived GKE clusters for many Kubeflow-using teams at Spotify to use, whereas the kfctl tool and documentation around creating a cluster for Kubeflow tends to assume that individual clusters are quickly spun-up and torn-down by engineers using Kubeflow. For more details on how Spotify's centralized Kubeflow platform, see this talk from Kubecon North America 2019.

Usage

To use this within Terraform, add a module block like:

module "kubeflow-cluster" {
  source  = "spotify/kubeflow-cluster/gke"
  version = "0.0.1"
}

For more details, see https://registry.terraform.io/modules/spotify/kubeflow-cluster/gke/0.0.1

Module details

The terraform-gke-kubeflow-cluster module creates the following resources:

  • a GKE cluster (attached to a Shared VPC if the relevant parameters for networks/subnetworks are set)
  • a Cloud SQL instance to use for the metadata store/databases
  • a GCE Persistent Disk to use for Argo's artifact store
  • GCP service accounts for Kubeflow to use (distinct accounts per cluster):
    • an "admin" service account (used for IAP - which is not included in this module)
    • the "user" service account for Kubeflow pipelines to use
    • the VM service account used by the GKE cluster/nodes itself
  • IAM bindings for the above service accounts
  • Kubernetes secrets for:
    • cloudsql-instance-credentials for the cloudsql-proxy connected to the metadata SQL instance
    • admin-gcp-sa containing the "admin" GCP service account for Kubeflow
    • user-gcp-sa containing the "user" GCP service account for Kubeflow

Each "instantiation" of the module creates a new set of all of these resources

  • the intent of the module is to automate the entire setup of all of the GCP resources needed to run a Kubeflow cluster.

This repo does not currently actually install the Kubeflow system components on the cluster - use kfctl or another tool for that.

Local development

Run the following commands from the root of the project:

  1. brew install tfenv -- install tfenv
  2. tfenv install -- install the version of Terraform specified in .terraform-version in source control
  3. terraform init -- setup terraform providers

Note on master and node version values

The expected behavior of fuzzy versions for min_master_version and node_version is undocumented (Github issue). From empirical evidence, the behavior so far is that the most recent version that matches the fuzzy version is used. For example, node_version = "1.11" results in GKE nodes running 1.11.7-gke.6 if that's the most recent version.

Releasing new versions of the module

See https://www.terraform.io/docs/registry/modules/publish.html#releasing-new-versions

A webhook has been automatically added to the repo, and a new "release" will be visible in the Terraform Registry whenever a new tag is pushed that looks like a semantic version (e.g. "v1.2.3"). So to cut a release, simply tag a commit and make sure to push the tag to Github with git push --tags.

Code of Conduct

This project adheres to the Open Code of Conduct. By participating, you are expected to honor this code.

You can’t perform that action at this time.