Hashicorp Vault is becoming one of the most popular tools for secret management, every company to improve their security but sometimes setting a Vault it requires some time and deep understanding on how to configure it. To make it easy the journey to AWS Cloud and increase the level of security of all application I've decided to create an out-of-the-box solution to configure the AWS infrastructure and setting up Vault in one click.
This implementation of Vault cluster is based on Raft Storage Backend announced tech preview on 1.2.0 (July 30th, 2019), introduced a beta on 1.3.0 (November 14th, 2019)) and promoted out of beta on 1.4.0 (April 7th, 2020) and is relying on native AWS tool such as AWS KMS, AWS S3, AWS Cloudwatch.
The Raft storage backend is used to persist Vault's data. Unlike other storage backends, Raft storage does not operate from a single source of data. Instead all the nodes in a Vault cluster will have a replicated copy of Vault's data. Data gets replicated across the all the nodes via the Raft Consensus Algorithm.
- High Availability – the Raft storage backend supports high availability.
- HashiCorp Supported – the Raft storage backend is officially supported by HashiCorp.
Created using CloudCraft
- Packer script to create a Golden Image with Vault
- AWS Autoscaling group with Userdata to configure Vault and AWS Cloudwatch Agent.
- AWS Cloudwatch Dashboard for monitoring
- Vault with AWSKMS Auto-Unseal
- AWS S3 for storing the raft snapshot
- Export of Vault sensitive parameters in AWS Parameters Store
- AWS KMS To encrypt all sensitive Parameters
- AWS Application LoadBalancer with AWS ACM integration
- Vault End-to-End encryption using local private CA and SSL dynamic creation
- Vault leader election intergrated with AWS Autoscaling group lifecycle using AWS Lambda
- (Optional) Using AWS ARM instances to save cost
- (Optional) AWS WAF to protect from malicious attack
This module support Terraform >= 1.0
Name | Version |
---|---|
terraform | >= 1.0.0 |
aws | ~> 3.0 |
Name | Version |
---|---|
archive | n/a |
aws | 3.22.0 |
random | 3.0.0 |
template | 2.2.0 |
tls | n/a |
No modules.
Name | Description | Type | Default | Required |
---|---|---|---|---|
actions_alarm | A list of actions to take when alarms are triggered. Will likely be an SNS topic for event distribution. | list(string) |
[] |
no |
actions_ok | A list of actions to take when alarms are cleared. Will likely be an SNS topic for event distribution. | list(string) |
[] |
no |
admin_cidr_blocks | Admin CIDR Block to access SSH and internal Application ports | list(string) |
[] |
no |
alb_ssl_policy | ALB ssl policy | string |
"ELBSecurityPolicy-FS-1-2-Res-2020-10" |
no |
app_name | Application name N.1 (e.g. vault, secure, store, etc..) | string |
"vault" |
no |
arch | EC2 Architecture arm64/x86_64 (arm64 is suggested) | string |
"x86_64" |
no |
create_asg_service_linked_role | Automatic creation of Autoscaling Service Linked Role | bool |
true |
no |
default_cooldown | ASG cooldown time | string |
"60" |
no |
ebs_optimized | If true, the launched EC2 instance will be EBS-optimized. | bool |
false |
no |
ec2_subnets | ASG Subnets | list(string) |
[] |
no |
environment | Environment Name (e.g. dev, test, uat, prod, etc..) | string |
"dev" |
no |
extra_tags | Additional Tag to add | map(string) |
n/a | yes |
health_check_type | 'EC2' or 'ELB'. Controls how health checking is done. | string |
"ELB" |
no |
instance_type | EC2 Instance Size | string |
n/a | yes |
internal | ALB internal/public flag | bool |
false |
no |
kms_key_deletion_window_in_days | The waiting period, specified in number of days. After the waiting period ends, AWS KMS deletes the KMS key. If you specify a value, it must be between 7 and 30, inclusive | string |
"7" |
no |
kms_key_id | KMS Key Id for vault Auto-Unseal | string |
"" |
no |
lb_subnets | ALB Subnets | list(string) |
[] |
no |
prefix | Prefix to add on all resources | string |
"" |
no |
protect_from_scale_in | n/a | bool |
false |
no |
public_key | SSH public key to install in vault | string |
null |
no |
root_volume_size | EC2 ASG Disk Size | string |
"50" |
no |
root_volume_type | The volume type. Can be standard, gp2, gp3, io1, io2, sc1 or st1 (Default: gp2). | string |
"gp2" |
no |
s3_bucket_name | S3 Backup name for Raft backup if empty will be automatically generated | string |
"" |
no |
size | ASG Size | string |
"3" |
no |
sns_email | list of email for SNS alarm | list(string) |
[] |
no |
suffix | Suffix to add on all resources | string |
"" |
no |
termination_policies | ASG Termination Policy | list(string) |
[ |
no |
vault_telemetry | enabling Vault Telemetry (Warning!!! AWS custom metric will increase the cost of the solution) | string |
"false" |
no |
vault_version | Vault version to install | string |
n/a | yes |
vpc_id | VPC Id | string |
n/a | yes |
zone_name | Public Route53 Zone name for DNS and ACM validation | string |
n/a | yes |
Name | Description |
---|---|
admin_pass_arn | SSM vault root password ARN |
alb_arn | ALB ARN |
alb_hostname | ALB DNS |
ec2_iam_role_arn | IAM EC2 role ARN |
kms_key_id | KMS key ID |
root_token_arn | SSM vault root token ARN |
sns_arn | SNS ARN |
vault_fqdn | Vault DNS |
vault_version | Vault Version |
Fargate is a new AWS serverless technology for running Docker containers. It was considered for this project but rejected for several reasons:
-
No support for
IPC_LOCK
. Vault tries to lock its memory so that secret data is never swapped to disk. Although it seems unlikely Fargate swaps to disk, the lock capability is not provided. -
Running on EC2 makes configuring Vault easier. The Ansible playbooks or bash included with this terraform build the Vault configuration for each server. It would be much harder to do this in a Fargate environment with sidecar containers or custom Vault images.
-
Running on EC2 makes DNS configuration easier. The Vault redirection method means you need to know the separate DNS endpoint names and doing this on Fargate is complicated. With EC2 we register some ElasticIPs and use those for the individual servers.
Many of these problems could be solved by running Vault in a custom image. However, it seemed valuable to use the Hashicorp Vault image instead of relying on custom built ones, so EC2 was chosen as the ECS technology.
please check the example folder.
Do you want to test your deployment? Just open your shell, adjust the DNS and kill the primary vault
for i in {1..500}
do
RES=$(curl -s -o /dev/null -w "%{http_code}" https://vault.[ YOUR DOMAIN ]/ui/)
echo "[$(date +%T)] HTTP:$RES attemp:$i"
sleep 1
done
in less than a minute the standby instance will be available and in few minutes the ASG will launch a new node
this repo is using pre-commit hook to know more click here to manually trigger use this command
pre-commit install
pre-commit run --all-files
-
Autoscaling Group not encrypted EBS volume required to have a dedicated AMI already encrypted and required to have the proper service role for ASG to be albe to encrypt/decrypt the ebs volume
-
ASG Service Linked Role Terraform on destruction phase can spit
│ Error: error waiting for IAM Service Linked Role (arn:aws:iam::XXXXXXX:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling) delete: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: %!s(<nil>)
open issue on terraform-provider-aws -
ACM soft limit if you see this error
Error requesting certificate: LimitExceededException: Error: you have reached your limit of 20 certificates in the last year.
please increase the Limit using AWs Support of AWS Quota -
Cloudwatch Logs KMS Error
Error: Creating CloudWatch Log Group failed: InvalidParameterException: The specified KMS Key Id could not be found.
, double check if the KMS key have proper policy to allow the regional Cloudwatch logs Service Principle (e.g.logs.eu-central-1.amazonaws.com
)
this repo is licensed under the WTFPL.