An (almost) produciton ready set of modules for provisioning DC/OS in AWS.
These modules implement the advanced installer (https://dcos.io/docs/1.10/administration/installing/custom/advanced/)
and using some clever terraform tricks, are able to do it in a single terraform apply
- Builds the prerequisite resources to get a s3 bucket for exhibitor and internal ELB for master discovery
- Creates an S3 VPC endpoint that replaces the need for a bootstrap server
- Passes those resources into a script that builds and uploads a DC/OS package via docker and waits for it to finish
- Provisions ASGs and ELBs for masters, slaves, etc
- Creates ECR repos and lambda function for writing docker credentials for private images
- Gives you extension points for customizing as you see fit
- Modules for creating initial VPC
- CloudWatch alerts for monitoring
Their are some decisions I don't want to make for you, but this intended to be flexible enough to extend to fit your needs, but will probably want to consider:
- Log aggregation for DC/OS components
- Host level metric collection
- Be famaliar with terraform
- have docker installed with proper volumes support (see
Docker Notes
below) - have a VPC provisioned you want to deploy in
The example belows gives a complete example of creating a DC/OS cluster using the default setup of:
- 3 masters in private subnets, with an external admin ELB and an internal discovery ELB
- 5 agents in private subnets, with an internal ELB for exposing VPC only apps
- 2 public agents in private subnets, with an external ELB for publicly exposing apps
It also creates an ECR repo with a lambda function that drops credentials into a bucket for consumption by marathon for pulling private images
variable "vpc_id" {
default = "vpc-xxxxxxx"
description = "the vpc you want to provision into"
}
variable "route_table_ids" {
default = "rtb-xxxxxx1,rtb-xxxxxx2"
description = "comma seperated list of route tables"
}
variable "cluster_name" {
default = "test_env"
description = "the name of the environment, allows for multiple DC/OS clusters in a VPC"
}
variable "network" {
default = "10.0.0.0/16"
description = "the cidr of your VPC"
}
variable "public_subnets" {
default = ["subnet-xxxxxx1","subnet-xxxxxx2"]
type = "list"
description = "a comma seperated list of public subnets in your VPC"
}
variable "private_subnets" {
default = ["subnet-xxxxxx1","subnet-xxxxxx2"]
type = "list"
description = "a comma seperated list of private subnets in your VPC"
}
variable "key_name" {
default = "my_key"
description = "the ssh key name you want to use in provision hosts"
}
variable "region_azs" {
default = ["a","b"]
type = "list"
description = "the availiability zones corresponding to your subnets"
}
provider "aws" {
region = "${var.aws_region}"
}
module "dcos_region" {
source = "github.com/instructure/aws_dcos_terraform//modules/dcos_region"
vpc_id = "${var.vpc_id}"
aws_region = "${var.aws_region}"
route_table_ids = "${var.route_table_ids}"
}
module "dcos_core" {
source = "github.com/instructure/aws_dcos_terraform//modules/dcos_core"
cluster_name = "${var.env_name}"
dcos_version = "1.10"
vpc_id = "${var.vpc_id}"
aws_region = "${var.aws_region}"
network = "${var.network}" # cidr of VPC
public_subnets = "${var.public_subnets}"
private_subnets = "${var.private_subnets}"
#use the same bucket for both
bootstrap_bucket = "${module.dcos_region.bucket}"
exhibitor_bucket = "${module.dcos_region.bucket}"
key_name = "${var.key_name}"
region_azs = "${var.region_azs}"
}
variable "namespace" {
default = "my_group"
description = "the namespace you want to create repos under"
}
variable "repo_names" {
default = ["my_app1","myapp_2"]
type = "list"
description = "a list of ECR repos you want to create"
}
variable "account_id" {
default = "1234567890"
description = "your aws account id"
}
variable "allowed_users" {
default = ["bob","lisa","mark"]
type = "list"
description = "a list of IAM users who should be able to push to the given repos"
}
module "ecr" {
source = "github.com/instructure/aws_dcos_terraform//modules/ecr"
env_name = "${var.env_name}"
# we reuse this bucket for also storing docker creds
docker_cred_bucket = "${module.dcos_region.bootstrap_bucket}"
namespace = "${var.namespace}"
repo_names = "${var.repo_names}"
account_id = "${var.account_id}"
users = "${var.allowed_users}"
}
terraform plan
and terraform apply
should bring you up a full DC/OS cluster
The modules are highly modularlized, allowing for a customization. If you don't want public agents, you don't
have to have them. See the modules
folder for the individual modules. Additionally, you may want to
use different user_data for setting up nodes, such as adding users or running your own config management. To faciliate
that, any resources that use templates or provisioner scripts take paths that allow for injecting custom functionality.
Here are descriptions of each module, mix and match as you please to build your customized cluster
dcos_agent_group
, creates a role, ELB, and ASG of private agents, used indcos_core
, just combines other modulesdcos_agent_role
, the IAM role that agents use by defaultdcos_asg
, an ASG and launch configuration that are used for all DC/OS rolesdcos_bootstrap
, the module that creates a DC/OS bootstrap packagedcos_core
, the 'default' DC/OS setup, which is simply composed of other modulesdcos_lb
, an ELB for association with an ASGdcos_master_group
, creates a role, public ELB, and ASG for masters, used indcos_core
, just combines other modulesdcos_master_internal_lb
, creates the internal elb, required formaster_discovery: master_http_loadbalancer
dcos_master_role
, the IAM role that masters use by defaultdcos_public_agent_group
, creates a role, public ELB, and ASG of agents marked as public_slavedcos_region
, creates s3 buckets and s3 vpc endpoint, you only need one of these per regiondcos_spot_asg
, the same asdcos_asg
but sets a spot pricedefault_sec_group
, the default security group that all DC/OS components (nodes and ELBs) have to allow for communcationecr
, creates theecr_cred_lambda
as well asecr_repo
ecr_cred_lambda
, creates a lambda function for writing docker credentials to s3ecr_repo
, creates a number of ECR repos and makes the images readable by all
The files
directory, contains a few different scripts. If you want to override behavior of these you should generally
be able to:
- Copy the files locally
- Customize as you see fit
- Path this path to the relevant module, overriding the including config
The three current places this technique are used are:
files/scripts/build_upload.sh
, 'dcos_bootstrap' call this script to build and upload the DC/OS package, setbuild_script_path
to overridefiles/ecr_writer/build_docker.sh
, gets called to build the lambda function used inecr_cred_lambda
and upload it to s3, to override this, you need to providelambda_package_path
with a script that uploads a lambda package the specified s3 pathfiles/user_data.tpl
,dcos_asg
anddcos_asg_spot
use this template for user data. Setcloud_config_template
to a custom template to override.
See the docs/
folder for generated markdown of the inputs and outputs of each module (built using https://github.com/segmentio/terraform-docs
)
Currently, the dcos_generate_config.sh
script only runs in linux and itself makes use of docker. To make this
work on OSX, the script that builds the bootstrap package runs a priviliged docker image and
requires that you use a docker setup where the host can properly use volumes in arbitary locations. On OSX, this means
using either Docker For Mac of dinghy, on linux, it should just work
- Add in a VPC module for a basic VPC setup
- Create a 'root' module that replicates almost everything the default DC/OS cloudformation does
- Add in Cloudwatch monitoring
- Add in more machine options, such as spot block and spot fleet support
Breaking changes are handeled by... not really handling them. It is recomended you pin your import of this TF to a sha.
All contributions are welcome! To contribute:
- Open an issue dicsussing any major changes you want to make (for smaller changes, feel free to skip this)
- Fork the project
- Make your changes and run the validate.sh script to ensure it doesn't have any errors
- Open a pull request