Skip to content

giuliocalzolari/terraform-aws-vault-raft

Repository files navigation

Hashicorp Vault using AWS Native

Overview

Hashicorp Vault is becoming one of the most popular tools for secret management, every company to improve their security but sometimes setting a Vault it requires some time and deep understanding on how to configure it. To make it easy the journey to AWS Cloud and increase the level of security of all application I've decided to create an out-of-the-box solution to configure the AWS infrastructure and setting up Vault in one click.

This implementation of Vault cluster is based on Raft Storage Backend announced tech preview on 1.2.0 (July 30th, 2019), introduced a beta on 1.3.0 (November 14th, 2019)) and promoted out of beta on 1.4.0 (April 7th, 2020) and is relying on native AWS tool such as AWS KMS, AWS S3, AWS Cloudwatch.

The Raft storage backend is used to persist Vault's data. Unlike other storage backends, Raft storage does not operate from a single source of data. Instead all the nodes in a Vault cluster will have a replicated copy of Vault's data. Data gets replicated across the all the nodes via the Raft Consensus Algorithm.

  • High Availability – the Raft storage backend supports high availability.
  • HashiCorp Supported – the Raft storage backend is officially supported by HashiCorp.

Diagram

Created using CloudCraft

The solution

  • Packer script to create a Golden Image with Vault
  • AWS Autoscaling group with Userdata to configure Vault and AWS Cloudwatch Agent.
  • AWS Cloudwatch Dashboard for monitoring
  • Vault with AWSKMS Auto-Unseal
  • AWS S3 for storing the raft snapshot
  • Export of Vault sensitive parameters in AWS Parameters Store
  • AWS KMS To encrypt all sensitive Parameters
  • AWS Application LoadBalancer with AWS ACM integration
  • Vault End-to-End encryption using local private CA and SSL dynamic creation
  • Vault leader election intergrated with AWS Autoscaling group lifecycle using AWS Lambda
  • (Optional) Using AWS ARM instances to save cost
  • (Optional) AWS WAF to protect from malicious attack

Terraform Version

This module support Terraform >= 1.0

Current module version GitHub tag (latest by date)

Module Overview

Requirements

Name Version
terraform >= 1.0.0
aws ~> 3.0

Providers

Name Version
archive n/a
aws 3.22.0
random 3.0.0
template 2.2.0
tls n/a

Modules

No modules.

Resources

Name Type
aws_acm_certificate.vault resource
aws_acm_certificate_validation.vault resource
aws_alb.main resource
aws_alb_listener.https resource
aws_alb_listener.main resource
aws_alb_listener.redirect_http_to_https resource
aws_alb_target_group.main resource
aws_autoscaling_group.asg resource
aws_autoscaling_lifecycle_hook.ec2terminate resource
aws_cloudwatch_dashboard.dashboard resource
aws_cloudwatch_event_rule.asg_cw_rule resource
aws_cloudwatch_event_target.event_target resource
aws_cloudwatch_event_target.lambda_rule resource
aws_cloudwatch_log_group.lambda_log resource
aws_cloudwatch_log_group.logs resource
aws_cloudwatch_metric_alarm.alb_unhealty_host resource
aws_cloudwatch_metric_alarm.asg_alarm_inservice resource
aws_cloudwatch_metric_alarm.high_cpu_utilization resource
aws_cloudwatch_metric_alarm.httpcode_lb_5xx_count resource
aws_cloudwatch_metric_alarm.httpcode_target_5xx_count resource
aws_cloudwatch_metric_alarm.lambda_alarm resource
aws_cloudwatch_metric_alarm.low_cpu_credit_balance resource
aws_cloudwatch_metric_alarm.target_response_time_average resource
aws_cloudwatch_metric_alarm.vault_core_unsealed resource
aws_cloudwatch_metric_alarm.vault_raft_backup resource
aws_iam_instance_profile.ec2_instance resource
aws_iam_role.cw_role resource
aws_iam_role.ec2_instance resource
aws_iam_role.lambda_role resource
aws_iam_role_policy.cw_role_policy resource
aws_iam_role_policy.ec2_role_policy resource
aws_iam_role_policy.lambda_policy resource
aws_iam_role_policy_attachment.lambda_policy resource
aws_iam_role_policy_attachment.this resource
aws_iam_service_linked_role.asg_service_role resource
aws_key_pair.key resource
aws_kms_alias.key resource
aws_kms_key.key resource
aws_lambda_function.main resource
aws_lambda_permission.self resource
aws_launch_template.tpl resource
aws_route53_record.cname resource
aws_route53_record.validation resource
aws_s3_bucket.bucket resource
aws_s3_bucket_policy.s3_alb_logs resource
aws_s3_bucket_public_access_block.example resource
aws_security_group.alb_sg resource
aws_security_group.ec2 resource
aws_security_group.lambda_sg resource
aws_security_group_rule.alb_egress_rule resource
aws_security_group_rule.alb_http_rule resource
aws_security_group_rule.alb_https_rule resource
aws_security_group_rule.alb_vault_rule resource
aws_security_group_rule.asg_app_admin_rule resource
aws_security_group_rule.asg_egress_rule resource
aws_security_group_rule.asg_ssh_admin_rule resource
aws_security_group_rule.asg_vault_rule resource
aws_security_group_rule.lambda_asg_vault_rule resource
aws_security_group_rule.lambda_egress_rule resource
aws_security_group_rule.vault_node_ig resource
aws_sns_topic.topic resource
aws_sns_topic_policy.default resource
aws_sns_topic_subscription.topic_email_subscription resource
aws_ssm_parameter.admin_pass resource
aws_ssm_parameter.root_token resource
aws_ssm_parameter.ssh_key resource
aws_ssm_parameter.vault_backup resource
aws_ssm_parameter.vault_ca resource
aws_ssm_parameter.vault_ca_key resource
aws_ssm_parameter.vault_init resource
random_uuid.uuid resource
tls_private_key.key resource
tls_private_key.vault-ca resource
tls_self_signed_cert.vault-ca resource
archive_file.lambda data source
aws_ami.ami data source
aws_caller_identity.current data source
aws_elb_service_account.main data source
aws_iam_policy_document.bucket_policy data source
aws_iam_policy_document.cw data source
aws_iam_policy_document.cw_role_policy data source
aws_iam_policy_document.ec2_instance data source
aws_iam_policy_document.ec2_role_policy data source
aws_iam_policy_document.kms_key_policy_document data source
aws_iam_policy_document.sns_topic_policy data source
aws_kms_key.by_id data source
aws_region.current data source
aws_route53_zone.zone data source
template_file.vault data source

Inputs

Name Description Type Default Required
actions_alarm A list of actions to take when alarms are triggered. Will likely be an SNS topic for event distribution. list(string) [] no
actions_ok A list of actions to take when alarms are cleared. Will likely be an SNS topic for event distribution. list(string) [] no
admin_cidr_blocks Admin CIDR Block to access SSH and internal Application ports list(string) [] no
alb_ssl_policy ALB ssl policy string "ELBSecurityPolicy-FS-1-2-Res-2020-10" no
app_name Application name N.1 (e.g. vault, secure, store, etc..) string "vault" no
arch EC2 Architecture arm64/x86_64 (arm64 is suggested) string "x86_64" no
create_asg_service_linked_role Automatic creation of Autoscaling Service Linked Role bool true no
default_cooldown ASG cooldown time string "60" no
ebs_optimized If true, the launched EC2 instance will be EBS-optimized. bool false no
ec2_subnets ASG Subnets list(string) [] no
environment Environment Name (e.g. dev, test, uat, prod, etc..) string "dev" no
extra_tags Additional Tag to add map(string) n/a yes
health_check_type 'EC2' or 'ELB'. Controls how health checking is done. string "ELB" no
instance_type EC2 Instance Size string n/a yes
internal ALB internal/public flag bool false no
kms_key_deletion_window_in_days The waiting period, specified in number of days. After the waiting period ends, AWS KMS deletes the KMS key. If you specify a value, it must be between 7 and 30, inclusive string "7" no
kms_key_id KMS Key Id for vault Auto-Unseal string "" no
lb_subnets ALB Subnets list(string) [] no
prefix Prefix to add on all resources string "" no
protect_from_scale_in n/a bool false no
public_key SSH public key to install in vault string null no
root_volume_size EC2 ASG Disk Size string "50" no
root_volume_type The volume type. Can be standard, gp2, gp3, io1, io2, sc1 or st1 (Default: gp2). string "gp2" no
s3_bucket_name S3 Backup name for Raft backup if empty will be automatically generated string "" no
size ASG Size string "3" no
sns_email list of email for SNS alarm list(string) [] no
suffix Suffix to add on all resources string "" no
termination_policies ASG Termination Policy list(string)
[
"Default"
]
no
vault_telemetry enabling Vault Telemetry (Warning!!! AWS custom metric will increase the cost of the solution) string "false" no
vault_version Vault version to install string n/a yes
vpc_id VPC Id string n/a yes
zone_name Public Route53 Zone name for DNS and ACM validation string n/a yes

Outputs

Name Description
admin_pass_arn SSM vault root password ARN
alb_arn ALB ARN
alb_hostname ALB DNS
ec2_iam_role_arn IAM EC2 role ARN
kms_key_id KMS key ID
root_token_arn SSM vault root token ARN
sns_arn SNS ARN
vault_fqdn Vault DNS
vault_version Vault Version

Why Not Fargate?

Fargate is a new AWS serverless technology for running Docker containers. It was considered for this project but rejected for several reasons:

  1. No support for IPC_LOCK. Vault tries to lock its memory so that secret data is never swapped to disk. Although it seems unlikely Fargate swaps to disk, the lock capability is not provided.

  2. Running on EC2 makes configuring Vault easier. The Ansible playbooks or bash included with this terraform build the Vault configuration for each server. It would be much harder to do this in a Fargate environment with sidecar containers or custom Vault images.

  3. Running on EC2 makes DNS configuration easier. The Vault redirection method means you need to know the separate DNS endpoint names and doing this on Fargate is complicated. With EC2 we register some ElasticIPs and use those for the individual servers.

Many of these problems could be solved by running Vault in a custom image. However, it seemed valuable to use the Hashicorp Vault image instead of relying on custom built ones, so EC2 was chosen as the ECS technology.

Example

please check the example folder.

Test your solution

Do you want to test your deployment? Just open your shell, adjust the DNS and kill the primary vault

for i in {1..500}
do
   RES=$(curl -s -o /dev/null -w "%{http_code}"  https://vault.[ YOUR DOMAIN ]/ui/)
   echo "[$(date +%T)] HTTP:$RES attemp:$i"
   sleep 1
done

in less than a minute the standby instance will be available and in few minutes the ASG will launch a new node

pre-commit hook

this repo is using pre-commit hook to know more click here to manually trigger use this command

pre-commit install
pre-commit run --all-files

Troubleshooting / Known Issue

  • Autoscaling Group not encrypted EBS volume required to have a dedicated AMI already encrypted and required to have the proper service role for ASG to be albe to encrypt/decrypt the ebs volume

  • ASG Service Linked Role Terraform on destruction phase can spit │ Error: error waiting for IAM Service Linked Role (arn:aws:iam::XXXXXXX:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling) delete: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: %!s(<nil>) open issue on terraform-provider-aws

  • ACM soft limit if you see this error Error requesting certificate: LimitExceededException: Error: you have reached your limit of 20 certificates in the last year. please increase the Limit using AWs Support of AWS Quota

  • Cloudwatch Logs KMS Error Error: Creating CloudWatch Log Group failed: InvalidParameterException: The specified KMS Key Id could not be found., double check if the KMS key have proper policy to allow the regional Cloudwatch logs Service Principle (e.g. logs.eu-central-1.amazonaws.com)

License

this repo is licensed under the WTFPL.