Skip to content

Commit

Permalink
enable tiered storage in AWS via IAM policy (#86)
Browse files Browse the repository at this point in the history
  • Loading branch information
vuldin committed Jan 18, 2023
1 parent 25c04d4 commit 231ee44
Show file tree
Hide file tree
Showing 9 changed files with 246 additions and 125 deletions.
57 changes: 29 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Terraform and Ansible Deployment for Redpanda

Terraform and Ansible configuration to easily provision a [Redpanda](https://vectorized.io) cluster on AWS, GCP, Azure, or IBM .
Terraform and Ansible configuration to easily provision a [Redpanda](https://www.redpanda.com/) cluster on AWS, GCP, Azure, or IBM.

## Installation Prerequisites

Expand All @@ -11,12 +11,12 @@ Terraform and Ansible configuration to easily provision a [Redpanda](https://vec

### On Mac OS X:
You can use brew to install the prerequisites. You will also need to install gnu-tar:
```
brew tap hashicorp/tap
brew install hashicorp/tap/terraform
brew install ansible
brew install gnu-tar
ansible-galaxy install -r ansible/requirements.yml
```commandline
brew tap hashicorp/tap
brew install hashicorp/tap/terraform
brew install ansible
brew install gnu-tar
ansible-galaxy install -r ansible/requirements.yml
```

## Usage
Expand Down Expand Up @@ -65,22 +65,24 @@ You can pass the following variables as `-e var=value`:
| `skip_node` | false | Per-node config to prevent the Redpanda_broker role being applied to this specific node. Use carefully when adding new nodes to avoid existing nodes from being reconfigured. |
| `restart_node` | false | Per-node config to prevent Redpanda brokers from being restarted after updating. Use with care because this can cause `rpk` to be reconfigured but the node not be restarted and therefore be in an inconsistent state. |
| `rack` | `undefined` | Per-node config to enable rack awareness. N.B. Rack awareness will be enabled cluster-wide if at least one node has the `rack` variable set. |

| `tiered_storage_bucket_name`| | Set bucket name to enable tiered storage
| `aws_region` | | The region to be used if tiered storage is enabled

You can also specify any available Redpanda configuration value (or set of values) by passing a JSON dictionary as an Ansible extra-var. These values will be spliced with the calculated configuration and only override those values that you specify.
There are two sub-dictionaries that you can specify, `redpanda.cluster` and `redpanda.node`. Check the Redpanda docs for the available [Cluster configuration properties](https://docs.redpanda.com/docs/platform/reference/cluster-properties/) and [Node configuration properties](https://docs.redpanda.com/docs/platform/reference/node-properties/).

An example overriding specific properties would be as follows:

```commandline
ansible-playbook ansible/playbooks/provision-node.yml -i hosts.ini --extra-vars '{ "redpanda":
{"cluster":
{ "auto_create_topics_enabled": "true"
},
"node":
{ "developer_mode": "false"
}
}
ansible-playbook ansible/playbooks/provision-node.yml -i hosts.ini --extra-vars '{
"redpanda": {
"cluster": {
"auto_create_topics_enabled": "true"
},
"node": {
"developer_mode": "false"
}
}
}'
```

Expand All @@ -89,13 +91,13 @@ ansible-playbook ansible/playbooks/provision-node.yml -i hosts.ini --extra-vars

## Configure TLS

There are two options for configuring TLS. The first option would be to use externally provided and signed certificates (possibly via a corporately provided Certmonger) and re-run the `provision_node` playbook but specifying the relevant locations and `tls=true`.
For example:
There are two options for configuring TLS. The first option would be to use externally provided and signed certificates (possibly via a corporately provided Certmonger) and re-run the `provision_node` playbook but specifying the relevant locations and `tls=true`. For example:

```commandline
ansible-playbook ansible/playbooks/provision-node.yml -i hosts.ini --extra-vars redpanda_key_file='<path to key file>' --extra-vars redpanda_cert_file='<path to cert file>' --extra-vars redpanda_truststore_file='<path to truststore file>' --extra-vars tls=true
ansible-playbook ansible/playbooks/provision-node.yml -i hosts.ini --extra-vars redpanda_key_file='<path to key file>' --extra-vars redpanda_cert_file='<path to cert file>' --extra-vars redpanda_truststore_file='<path to truststore file>' --extra-vars tls=true
```

The second option is to deploy a private certificate authority using the playbooks provided below and generating private keys and signed certificates. For this approach, follow the steps below.
The second option is to deploy a private certificate authority using the playbooks provided below and generating private keys and signed certificates. For this approach, follow the steps below.

### Optional: Create a Local Certificate Authority

Expand Down Expand Up @@ -130,10 +132,10 @@ The playbooks can be used to add nodes to an existing cluster however care is re
1. Add the new host(s) to the `hosts.ini` file. You may add `skip_node=true` to the existing hosts to avoid the playbooks being re-run on those nodes.
2. `install-node-deps.yml` - this will set up the Prometheus node_exporter and install package dependencies.
3. `prepare-data-dir.yml` - this will create any RAID devices required and format devices as XFS. Note: This playbook looks for devices presented to the operating system as NVMe devices (which can include EBS volumes built on the Nitro System). You may replace this playbook with your own method of formatting devices and presenting disks.
4. If managing TLS with the Redpanda playbooks:
1. `generate-csrs.yml` - will create private key and CSR and bring the CSR back to the Ansible host.
2. If using the Redpanda provided CA: `issue-certs.yml` - signs the CSR and issues a certificate.
3. `install-certs.yml` - Installs the certificate and also applies the `redpanda_broker` role to the cluster nodes. Note: This will install and start Redpanda (and restart any brokers that do not have `skip_node=true` set)
4. If managing TLS with the Redpanda playbooks:
1. `generate-csrs.yml` - will create private key and CSR and bring the CSR back to the Ansible host.
2. If using the Redpanda provided CA: `issue-certs.yml` - signs the CSR and issues a certificate.
3. `install-certs.yml` - Installs the certificate and also applies the `redpanda_broker` role to the cluster nodes. Note: This will install and start Redpanda (and restart any brokers that do not have `skip_node=true` set)
5. If `install-certs.yml` was not run in step iii above, you will need to run `provision-node.yml` which will install the `redpanda_broker` role onto any nodes without `skip_node=true` set. **Note: If TLS is enabled on the cluster, make sure that `-e tls=true` is set, otherwise this playbook will disable TLS across any nodes that don't have `skip_nodes=true` set.**

## Building a cluster with TLS enabled in one execution
Expand All @@ -144,9 +146,9 @@ A similar process can be used to build a cluster with TLS in one execution as to
2. `install-node-deps.yml` - this will set up the Prometheus node_exporter and install package dependencies.
3. `prepare-data-dir.yml` - this will create any RAID devices required and format devices as XFS. Note: This playbook looks for devices presented to the operating system as NVMe devices (which can include EBS volumes built on the Nitro System). You may replace this playbook with your own method of formatting devices and presenting disks.
4. If managing TLS with the Redpanda playbooks run the following steps. If you're using externally provided certificates, skip to step 5 remembering to set `tls=true`:
1. `generate-csrs.yml` - will create private key and CSR and bring the CSR back to the Ansible host.
2. If using the Redpanda provided CA: `issue-certs.yml` - signs the CSR and issues a certificate.
3. `install-certs.yml` - Installs the certificate and also applies the `redpanda_broker` role to the cluster nodes. Note: This will install and start Redpanda (and restart any brokers that do not have `skip_node=true` set)
1. `generate-csrs.yml` - will create private key and CSR and bring the CSR back to the Ansible host.
2. If using the Redpanda provided CA: `issue-certs.yml` - signs the CSR and issues a certificate.
3. `install-certs.yml` - Installs the certificate and also applies the `redpanda_broker` role to the cluster nodes. Note: This will install and start Redpanda (and restart any brokers that do not have `skip_node=true` set)
5. If `install-certs.yml` was not run in step iii above, you will need to run `provision-node.yml` which will install the `redpanda_broker` role. **Note: If TLS is enabled on the cluster, make sure that `-e tls=true` is set, otherwise this playbook will disable TLS across any nodes that don't have `skip_nodes=true` set.**


Expand All @@ -166,4 +168,3 @@ You might try resolving by setting an environment variable:
`export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES`

See: https://stackoverflow.com/questions/50168647/multiprocessing-causes-python-to-crash-and-gives-an-error-may-have-been-in-progr

Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
cluster:
cloud_storage_access_key: THISVALUENOTUSED
cloud_storage_bucket: {{ tiered_storage_bucket_name if tiered_storage_bucket_name is defined }}
cloud_storage_enable_remote_read: true
cloud_storage_enable_remote_write: true
cloud_storage_region: {{ aws_region if aws_region is defined }}
cloud_storage_secret_key: THISVALUENOTUSED
cloud_storage_credentials_source: aws_instance_metadata
# cloud_storage_enabled must be after other cloud_storage parameters
cloud_storage_enabled: {{ true if tiered_storage_bucket_name is defined and tiered_storage_bucket_name|d('')|length > 0 else false }}
4 changes: 3 additions & 1 deletion ansible/playbooks/roles/redpanda_broker/vars/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,6 @@
custom_config_templates:
- template: configs/defaults.j2
- template: configs/tls.j2
condition: "{{ tls | default(False) | bool }}"
condition: "{{ tls | default(False) | bool }}"
- template: configs/tiered_storage.j2
condition: "{{ tiered_storage_bucket_name is defined | default(False) | bool }}"
104 changes: 89 additions & 15 deletions aws/cluster.tf
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@ resource "random_uuid" "cluster" {}
resource "time_static" "timestamp" {}

locals {
uuid = random_uuid.cluster.result
timestamp = time_static.timestamp.rfc3339
deployment_id = "redpanda-${local.uuid}-${local.timestamp}"
uuid = random_uuid.cluster.result
timestamp = time_static.timestamp.unix
deployment_id = length(var.deployment_prefix) > 0 ? var.deployment_prefix : "redpanda-${substr(local.uuid, 0, 8)}-${local.timestamp}"
tiered_storage_bucket_name = "${local.deployment_id}-bucket"
# tags shared by all instances
instance_tags = {
Expand All @@ -14,15 +15,73 @@ locals {
}
}
resource "aws_iam_policy" "redpanda" {
count = var.tiered_storage_enabled ? 1 : 0
name = local.deployment_id
path = "/"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
"Effect": "Allow",
"Action": [
"s3:*",
"s3-object-lambda:*",
],
"Resource": [
"arn:aws:s3:::${local.tiered_storage_bucket_name}/*"
]
},
]
})
}

resource "aws_iam_role" "redpanda" {
count = var.tiered_storage_enabled ? 1 : 0
name = local.deployment_id
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Sid = ""
Principal = {
Service = "ec2.amazonaws.com"
}
},
]
})
}

resource "aws_iam_policy_attachment" "redpanda" {
count = var.tiered_storage_enabled ? 1 : 0
name = local.deployment_id
roles = [aws_iam_role.redpanda[count.index].name]
policy_arn = aws_iam_policy.redpanda[count.index].arn
}

resource "aws_iam_instance_profile" "redpanda" {
count = var.tiered_storage_enabled ? 1 : 0
name = local.deployment_id
role = aws_iam_role.redpanda[count.index].name
}

resource "aws_instance" "redpanda" {
count = var.nodes
ami = var.distro_ami[var.distro]
instance_type = var.instance_type
key_name = aws_key_pair.ssh.key_name
iam_instance_profile = var.tiered_storage_enabled ? aws_iam_instance_profile.redpanda[0].name : null
vpc_security_group_ids = [aws_security_group.node_sec_group.id]
placement_group = var.ha ? aws_placement_group.redpanda-pg[0].id : null
placement_partition_number = var.ha ? (count.index % aws_placement_group.redpanda-pg[0].partition_count) + 1 : null
tags = local.instance_tags
tags = merge(
local.instance_tags,
{
Name = "${local.deployment_id}-node-${count.index}",
}
)

connection {
user = var.distro_ssh_user[var.distro]
Expand Down Expand Up @@ -53,7 +112,12 @@ resource "aws_instance" "prometheus" {
instance_type = var.prometheus_instance_type
key_name = aws_key_pair.ssh.key_name
vpc_security_group_ids = [aws_security_group.node_sec_group.id]
tags = local.instance_tags
tags = merge(
local.instance_tags,
{
Name = "${local.deployment_id}-prometheus",
}
)

connection {
user = var.distro_ssh_user[var.distro]
Expand All @@ -68,7 +132,12 @@ resource "aws_instance" "client" {
instance_type = var.client_instance_type
key_name = aws_key_pair.ssh.key_name
vpc_security_group_ids = [aws_security_group.node_sec_group.id]
tags = local.instance_tags
tags = merge(
local.instance_tags,
{
Name = "${local.deployment_id}-client",
}
)

connection {
user = var.distro_ssh_user[var.client_distro]
Expand Down Expand Up @@ -176,20 +245,25 @@ resource "aws_placement_group" "redpanda-pg" {
resource "aws_key_pair" "ssh" {
key_name = "${local.deployment_id}-key"
public_key = file(var.public_key_path)
tags = local.instance_tags
}

resource "local_file" "hosts_ini" {
content = templatefile("${path.module}/../templates/hosts_ini.tpl",
{
redpanda_public_ips = aws_instance.redpanda.*.public_ip
redpanda_private_ips = aws_instance.redpanda.*.private_ip
monitor_public_ip = var.enable_monitoring ? aws_instance.prometheus[0].public_ip : ""
monitor_private_ip = var.enable_monitoring ? aws_instance.prometheus[0].private_ip : ""
ssh_user = var.distro_ssh_user[var.distro]
enable_monitoring = var.enable_monitoring
client_public_ips = aws_instance.client.*.public_ip
client_private_ips = aws_instance.client.*.private_ip
rack = aws_instance.redpanda.*.placement_partition_number
aws_region = var.aws_region
client_count = var.clients
client_public_ips = aws_instance.client.*.public_ip
client_private_ips = aws_instance.client.*.private_ip
enable_monitoring = var.enable_monitoring
monitor_public_ip = var.enable_monitoring ? aws_instance.prometheus[0].public_ip : ""
monitor_private_ip = var.enable_monitoring ? aws_instance.prometheus[0].private_ip : ""
rack = aws_instance.redpanda.*.placement_partition_number
redpanda_public_ips = aws_instance.redpanda.*.public_ip
redpanda_private_ips = aws_instance.redpanda.*.private_ip
ssh_user = var.distro_ssh_user[var.distro]
tiered_storage_bucket_name = local.tiered_storage_bucket_name
tiered_storage_enabled = var.tiered_storage_enabled
}
)
filename = "${path.module}/../hosts.ini"
Expand Down
2 changes: 1 addition & 1 deletion aws/provider.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "3.73.0"
version = "4.35.0"
}
local = {
source = "hashicorp/local"
Expand Down
13 changes: 7 additions & 6 deletions aws/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,15 @@ Example: `terraform apply -var="instance_type=i3.large" -var="nodes=3"`

| Name | Version |
|------|---------|
| aws | 3.73.0 |
| aws | 4.35.0 |
| local | 2.1.0 |
| random | 3.1.0 |

## Providers

| Name | Version |
|------|---------|
| aws | 3.73.0 |
| aws | 4.35.0 |
| local | 2.1.0 |
| random | 3.1.0 |

Expand All @@ -35,10 +35,10 @@ No Modules.

| Name |
|--------------------------------------------------------------------------------------------------------------------|
| [aws_instance](https://registry.terraform.io/providers/hashicorp/aws/3.73.0/docs/resources/instance) |
| [aws_key_pair](https://registry.terraform.io/providers/hashicorp/aws/3.73.0/docs/resources/key_pair) |
| [aws_security_group](https://registry.terraform.io/providers/hashicorp/aws/3.73.0/docs/resources/security_group) |
| [aws_placement_group](https://registry.terraform.io/providers/hashicorp/aws/3.73.0/docs/resources/placement_group) |
| [aws_instance](https://registry.terraform.io/providers/hashicorp/aws/4.35.0/docs/resources/instance) |
| [aws_key_pair](https://registry.terraform.io/providers/hashicorp/aws/4.35.0/docs/resources/key_pair) |
| [aws_security_group](https://registry.terraform.io/providers/hashicorp/aws/4.35.0/docs/resources/security_group) |
| [aws_placement_group](https://registry.terraform.io/providers/hashicorp/aws/4.35.0/docs/resources/placement_group) |
| [local_file](https://registry.terraform.io/providers/hashicorp/local/2.1.0/docs/resources/file) |
| [random_uuid](https://registry.terraform.io/providers/hashicorp/random/3.1.0/docs/resources/uuid) |
| [timestamp_static](https://registry.terraform.io/providers/hashicorp/time/latest/docs/resources/static) |
Expand All @@ -57,6 +57,7 @@ No Modules.
| nodes | The number of nodes to deploy | `number` | `"3"` | no |
| prometheus\_instance\_type | Instant type of the prometheus/grafana node | `string` | `"c5.2xlarge"` | no |
| public\_key\_path | The public key used to ssh to the hosts | `string` | `"~/.ssh/id_rsa.pub"` | no |
| tiered\_storage\_enabled | Enables or disables tiered storage | `bool` | `false` | no |

### Client Inputs
By default, no client VMs are provisioned. If you want to also provision client
Expand Down
19 changes: 19 additions & 0 deletions aws/s3.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
resource "aws_s3_bucket" "tiered_storage" {
count = var.tiered_storage_enabled ? 1 : 0
bucket = local.tiered_storage_bucket_name
tags = local.instance_tags
}

resource "aws_s3_bucket_acl" "tiered_storage" {
count = var.tiered_storage_enabled ? 1 : 0
bucket = aws_s3_bucket.tiered_storage[count.index].id
acl = "private"
}

resource "aws_s3_bucket_versioning" "tiered_storage" {
count = var.tiered_storage_enabled ? 1 : 0
bucket = aws_s3_bucket.tiered_storage[count.index].id
versioning_configuration {
status = "Disabled"
}
}
Loading

0 comments on commit 231ee44

Please sign in to comment.