Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attach EBS volumes to etcd nodes for persistence #98

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ creation
* Bastion Host
* Multi-AZ Auto-Scaling Worker Nodes
* [NAT Gateway](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-nat-gateway.html)
* Cluster state persisted using EBS
* Automatic snapshotting of all EBS volumes, including dynamically generated persistent volumes

### CoreOS (1122.3.0, 1185.2.0, 1192.2.0)
* etcd DNS Discovery Bootstrap
Expand Down Expand Up @@ -107,6 +109,7 @@ Terraform v0.7.7
- Route 53 internal zone for VPC
- Etcd cluster bootstrapped from Route 53
- High Availability Kubernetes configuration (masters running on etcd nodes)
- EBS volumes for etcd cluster with automatic snapshots
- Autoscaling worker node group across subnets in selected region
- kube-system namespace and addons: DNS, UI, Dashboard

Expand Down
9 changes: 9 additions & 0 deletions modules.tf
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,15 @@ module "worker" {
worker-name = "general"
}

module "snapshot" {
source = "./modules/snapshot"

iam-role-snapshot-arn = "${ module.iam.iam-role-snapshot-arn }"
name = "${ var.name }"
security-groups = "${ module.security.etcd-id },${ module.security.worker-id }"
subnet-ids = "${ module.vpc.subnet-ids-private },${ module.vpc.subnet-ids-public }"
}

/*
module "worker2" {
source = "./modules/worker"
Expand Down
38 changes: 38 additions & 0 deletions modules/etcd/cloud-config.tf
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ coreos:
advertise-client-urls: http://${ fqdn }:2379
# cert-file: /etc/kubernetes/ssl/k8s-etcd.pem
# debug: true
data-dir: /media/etcd2
discovery-srv: ${ internal-tld }
initial-advertise-peer-urls: https://${ fqdn }:2380
initial-cluster-state: new
Expand All @@ -25,6 +26,43 @@ coreos:
peer-key-file: /etc/kubernetes/ssl/k8s-etcd-key.pem

units:
- name: format-ebs-volume.service
command: start
content: |
[Unit]
Description=Formats the ebs volume
After=dev-xvdf.device
Requires=dev-xvdf.device
[Service]
ExecStart=/bin/bash -c "(/usr/sbin/blkid -t TYPE=ext4 | grep /dev/xvdf) || (/usr/sbin/wipefs -fa /dev/xvdf && /usr/sbin/mkfs.ext4 /dev/xvdf)"
RemainAfterExit=yes
Type=oneshot

- name: media-etcd2.mount
command: start
content: |
[Unit]
Description=Mount ebs to /media/etcd2
Requires=format-ebs-volume.service
After=format-ebs-volume.service
[Mount]
What=/dev/xvdf
Where=/media/etcd2
Type=ext4

- name: prepare-etcd-data-dir.service
command: start
content: |
[Unit]
Description=Prepares the etcd data directory
Requires=media-etcd2.mount
After=media-etcd2.mount
Before=etcd2.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/chown -R etcd:etcd /media/etcd2

- name: etcd2.service
command: start
drop-ins:
Expand Down
33 changes: 32 additions & 1 deletion modules/etcd/ec2.tf
Original file line number Diff line number Diff line change
@@ -1,3 +1,30 @@
resource "aws_ebs_volume" "etcd" {
count = "${ length( split(",", var.azs) ) }"

availability_zone = "${ element( split(",", var.azs), 0 ) }"

type = "gp2"
size = 100

tags {
builtWith = "terraform"
Cluster = "${ var.name }"
depends-id = "${ var.depends-id }"
KubernetesCluster = "${ var.name }"
Name = "etcd${ count.index + 1 }-${ var.name }"
role = "etcd,apiserver"
version = "${ var.coreos-hyperkube-tag }"
}
}

resource "aws_volume_attachment" "etcd" {
count = "${ length( split(",", var.azs) ) }"

device_name = "/dev/xvdf"
Copy link
Member

@wellsie wellsie Oct 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adambom - Where are you mounting xvdf ? Or am I missing something ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to include some changes to cloud-config.tf. On startup we format /dev/xvdf and mount it to /media/etcd2 and instruct etcd to use that location to persist state. I chose xvdf because that's the device we're using on worker nodes for ephemeral storage. See 8e73f42.

volume_id = "${ element(aws_ebs_volume.etcd.*.id, count.index) }"
instance_id = "${ element(aws_instance.etcd.*.id, count.index) }"
}

resource "aws_instance" "etcd" {
count = "${ length( split(",", var.etcd-ips) ) }"

Expand Down Expand Up @@ -32,5 +59,9 @@ resource "aws_instance" "etcd" {
}

resource "null_resource" "dummy_dependency" {
depends_on = [ "aws_instance.etcd" ]
depends_on = [
"aws_instance.etcd",
"aws_ebs_volume.etcd",
"aws_volume_attachment.etcd"
]
}
1 change: 1 addition & 0 deletions modules/iam/io.tf
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,6 @@ variable "name" {}
output "depends-id" { value = "${ null_resource.dummy_dependency.id }" }
output "aws-iam-role-etcd-id" { value = "${ aws_iam_role.master.id }" }
output "aws-iam-role-worker-id" { value = "${ aws_iam_role.worker.id }" }
output "iam-role-snapshot-arn" { value = "${ aws_iam_role.snapshot.arn }" }
output "instance-profile-name-master" { value = "${ aws_iam_instance_profile.master.name }" }
output "instance-profile-name-worker" { value = "${ aws_iam_instance_profile.worker.name }" }
62 changes: 62 additions & 0 deletions modules/iam/snapshot.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
resource "aws_iam_role" "snapshot" {
name = "snapshot"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "lambda.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}

resource "aws_iam_role_policy" "snapshot" {
name = "snapshot-k8s-${ var.name }"

policy = <<EOS
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["logs:*"],
"Resource": "arn:aws:logs:*:*:*"
},
{
"Effect": "Allow",
"Action": "ec2:Describe*",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateNetworkInterface",
"ec2:DescribeNetworkInterfaces",
"ec2:DetachNetworkInterface",
"ec2:DeleteNetworkInterface"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateSnapshot",
"ec2:CreateTags",
"ec2:ModifySnapshotAttribute",
"ec2:ResetSnapshotAttribute"
],
"Resource": ["*"]
}
]
}
EOS

role = "${ aws_iam_role.snapshot.id }"
}
11 changes: 11 additions & 0 deletions modules/snapshot/cloudwatch.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
resource "aws_cloudwatch_event_rule" "snapshot" {
name = "snapshot-${ var.name }"
description = "Schedule snapshots of ebs volumes"

schedule_expression = "rate(2 hours)"
}

resource "aws_cloudwatch_event_target" "snapshot" {
rule = "${ aws_cloudwatch_event_rule.snapshot.name }"
arn = "${ aws_lambda_function.snapshot.arn }"
}
4 changes: 4 additions & 0 deletions modules/snapshot/io.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
variable "iam-role-snapshot-arn" {}
variable "name" {}
variable "security-groups" {}
variable "subnet-ids" {}
37 changes: 37 additions & 0 deletions modules/snapshot/lambda.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
resource "aws_lambda_permission" "snapshot" {
statement_id = "AllowExecutionFromCloudWatchForSnapshots"
action = "lambda:InvokeFunction"
function_name = "${ aws_lambda_function.snapshot.arn }"
principal = "events.amazonaws.com"
source_arn = "${ aws_cloudwatch_event_rule.snapshot.arn }"
}

resource "aws_lambda_function" "snapshot" {
filename = "${ path.module }/../../tmp/snapshot.zip"
function_name = "snapshot-${ var.name }"
handler = "snapshot.snapshot"
runtime = "python2.7"
source_code_hash = "${ base64sha256(data.template_file.init.rendered) }"

role = "${ var.iam-role-snapshot-arn }"

vpc_config {
subnet_ids = ["${ split(",", var.subnet-ids) }"]
security_group_ids = ["${ split(",", var.security-groups) }"]
}
}

data "template_file" "init" {
template = "${ file("${ path.module }/snapshot.py.tpl") }"

vars {
name = "${ var.name }"
}
}

resource "archive_file" "init" {
type = "zip"
source_content_filename = "snapshot.py"
source_content = "${ data.template_file.init.rendered }"
output_path = "${ path.module }/../../tmp/snapshot.zip"
}
15 changes: 15 additions & 0 deletions modules/snapshot/snapshot.py.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import boto3


ec = boto3.client('ec2')

def snapshot(event, context):
volumes = ec.describe_volumes(Filters=[{
'Name': 'tag:KubernetesCluster',
'Values': ['${ name }'], }])

for vol in volumes.get('Volumes', []):
snap = ec.create_snapshot(VolumeId=vol.get('VolumeId'))
ec.create_tags(
Resources=[snap.get('SnapshotId')],
Tags=vol.get('Tags'))