redpanda-data · tmgstevens · Jan 18, 2023 · Dec 8, 2022 · Jan 18, 2023 · hcoyote
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Terraform and Ansible Deployment for Redpanda
 
-Terraform and Ansible configuration to easily provision a [Redpanda](https://vectorized.io) cluster on AWS, GCP, Azure, or IBM .
+Terraform and Ansible configuration to easily provision a [Redpanda](https://www.redpanda.com/) cluster on AWS, GCP, Azure, or IBM.
 
 ## Installation Prerequisites
 
@@ -11,12 +11,12 @@ Terraform and Ansible configuration to easily provision a [Redpanda](https://vec
 
 ### On Mac OS X:
 You can use brew to install the prerequisites. You will also need to install gnu-tar:
-```
- brew tap hashicorp/tap
- brew install hashicorp/tap/terraform
- brew install ansible
- brew install gnu-tar
- ansible-galaxy install -r ansible/requirements.yml
+```commandline
+brew tap hashicorp/tap
+brew install hashicorp/tap/terraform
+brew install ansible
+brew install gnu-tar
+ansible-galaxy install -r ansible/requirements.yml
 ```
 
 ## Usage
@@ -65,22 +65,24 @@ You can pass the following variables as `-e var=value`:
 | `skip_node`                 | false                              | Per-node config to prevent the Redpanda_broker role being applied to this specific node. Use carefully when adding new nodes to avoid existing nodes from being reconfigured.                                                                                                                             |
 | `restart_node`              | false                              | Per-node config to prevent Redpanda brokers from being restarted after updating. Use with care because this can cause `rpk` to be reconfigured but the node not be restarted and therefore be in an inconsistent state.                                                                                   |
 | `rack`                      | `undefined`                        | Per-node config to enable rack awareness. N.B. Rack awareness will be enabled cluster-wide if at least one node has the `rack` variable set.                                                                                                                                                              |
-
+| `tiered_storage_bucket_name`|                                    | Set bucket name to enable tiered storage
+| `aws_region`                |                                    | The region to be used if tiered storage is enabled
 
 You can also specify any available Redpanda configuration value (or set of values) by passing a JSON dictionary as an Ansible extra-var. These values will be spliced with the calculated configuration and only override those values that you specify.
 There are two sub-dictionaries that you can specify, `redpanda.cluster` and `redpanda.node`. Check the Redpanda docs for the available [Cluster configuration properties](https://docs.redpanda.com/docs/platform/reference/cluster-properties/) and [Node configuration properties](https://docs.redpanda.com/docs/platform/reference/node-properties/).
 
 An example overriding specific properties would be as follows:
 
 ```commandline
-ansible-playbook ansible/playbooks/provision-node.yml -i hosts.ini  --extra-vars '{ "redpanda": 
- {"cluster":
-   { "auto_create_topics_enabled": "true"
-   },
-  "node":
-   { "developer_mode": "false"
-   }
- }
+ansible-playbook ansible/playbooks/provision-node.yml -i hosts.ini --extra-vars '{
+  "redpanda": {
+    "cluster": {
+      "auto_create_topics_enabled": "true"
+    },
+    "node": {
+      "developer_mode": "false"
+    }
+  }
 }'
 ```
 
@@ -89,13 +91,13 @@ ansible-playbook ansible/playbooks/provision-node.yml -i hosts.ini  --extra-vars
 
 ## Configure TLS
 
-There are two options for configuring TLS. The first option would be to use externally provided and signed certificates (possibly via a corporately provided Certmonger) and re-run the `provision_node` playbook but specifying the relevant locations and `tls=true`.
-For example:
+There are two options for configuring TLS. The first option would be to use externally provided and signed certificates (possibly via a corporately provided Certmonger) and re-run the `provision_node` playbook but specifying the relevant locations and `tls=true`. For example:
+
 ```commandline
-ansible-playbook ansible/playbooks/provision-node.yml -i hosts.ini  --extra-vars redpanda_key_file='<path to key file>' --extra-vars redpanda_cert_file='<path to cert file>' --extra-vars redpanda_truststore_file='<path to truststore file>' --extra-vars tls=true
+ansible-playbook ansible/playbooks/provision-node.yml -i hosts.ini --extra-vars redpanda_key_file='<path to key file>' --extra-vars redpanda_cert_file='<path to cert file>' --extra-vars redpanda_truststore_file='<path to truststore file>' --extra-vars tls=true
 ```
 
-The second option is to deploy a private certificate authority using the playbooks provided below and generating private keys and signed certificates. For this approach, follow the steps below. 
+The second option is to deploy a private certificate authority using the playbooks provided below and generating private keys and signed certificates. For this approach, follow the steps below.
 
 ### Optional: Create a Local Certificate Authority
 
@@ -130,10 +132,10 @@ The playbooks can be used to add nodes to an existing cluster however care is re
 1. Add the new host(s) to the `hosts.ini` file. You may add `skip_node=true` to the existing hosts to avoid the playbooks being re-run on those nodes.
 2. `install-node-deps.yml` - this will set up the Prometheus node_exporter and install package dependencies.
 3. `prepare-data-dir.yml` - this will create any RAID devices required and format devices as XFS. Note: This playbook looks for devices presented to the operating system as NVMe devices (which can include EBS volumes built on the Nitro System). You may replace this playbook with your own method of formatting devices and presenting disks.
-4. If managing TLS with the Redpanda playbooks: 
-   1. `generate-csrs.yml` - will create private key and CSR and bring the CSR back to the Ansible host.
-   2. If using the Redpanda provided CA: `issue-certs.yml` - signs the CSR and issues a certificate.
-   3. `install-certs.yml` - Installs the certificate and also applies the `redpanda_broker` role to the cluster nodes. Note: This will install and start Redpanda (and restart any brokers that do not have `skip_node=true` set)
+4. If managing TLS with the Redpanda playbooks:
+  1. `generate-csrs.yml` - will create private key and CSR and bring the CSR back to the Ansible host.
+  2. If using the Redpanda provided CA: `issue-certs.yml` - signs the CSR and issues a certificate.
+  3. `install-certs.yml` - Installs the certificate and also applies the `redpanda_broker` role to the cluster nodes. Note: This will install and start Redpanda (and restart any brokers that do not have `skip_node=true` set)
 5. If `install-certs.yml` was not run in step iii above, you will need to run `provision-node.yml` which will install the `redpanda_broker` role onto any nodes without `skip_node=true` set. **Note: If TLS is enabled on the cluster, make sure that `-e tls=true` is set, otherwise this playbook will disable TLS across any nodes that don't have `skip_nodes=true` set.**
 
 ## Building a cluster with TLS enabled in one execution
@@ -144,9 +146,9 @@ A similar process can be used to build a cluster with TLS in one execution as to
 2. `install-node-deps.yml` - this will set up the Prometheus node_exporter and install package dependencies.
 3. `prepare-data-dir.yml` - this will create any RAID devices required and format devices as XFS. Note: This playbook looks for devices presented to the operating system as NVMe devices (which can include EBS volumes built on the Nitro System). You may replace this playbook with your own method of formatting devices and presenting disks.
 4. If managing TLS with the Redpanda playbooks run the following steps. If you're using externally provided certificates, skip to step 5 remembering to set `tls=true`: 
-   1. `generate-csrs.yml` - will create private key and CSR and bring the CSR back to the Ansible host.
-   2. If using the Redpanda provided CA: `issue-certs.yml` - signs the CSR and issues a certificate.
-   3. `install-certs.yml` - Installs the certificate and also applies the `redpanda_broker` role to the cluster nodes. Note: This will install and start Redpanda (and restart any brokers that do not have `skip_node=true` set)
+  1. `generate-csrs.yml` - will create private key and CSR and bring the CSR back to the Ansible host.
+  2. If using the Redpanda provided CA: `issue-certs.yml` - signs the CSR and issues a certificate.
+  3. `install-certs.yml` - Installs the certificate and also applies the `redpanda_broker` role to the cluster nodes. Note: This will install and start Redpanda (and restart any brokers that do not have `skip_node=true` set)
 5. If `install-certs.yml` was not run in step iii above, you will need to run `provision-node.yml` which will install the `redpanda_broker` role. **Note: If TLS is enabled on the cluster, make sure that `-e tls=true` is set, otherwise this playbook will disable TLS across any nodes that don't have `skip_nodes=true` set.**
 
 
@@ -166,4 +168,3 @@ You might try resolving by setting an environment variable:
 `export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES`
 
 See: https://stackoverflow.com/questions/50168647/multiprocessing-causes-python-to-crash-and-gives-an-error-may-have-been-in-progr
-
diff --git a/ansible/playbooks/roles/redpanda_broker/templates/configs/tiered_storage.j2 b/ansible/playbooks/roles/redpanda_broker/templates/configs/tiered_storage.j2
@@ -0,0 +1,10 @@
+cluster:
+  cloud_storage_access_key: THISVALUENOTUSED
+  cloud_storage_bucket: {{ tiered_storage_bucket_name if tiered_storage_bucket_name is defined }}
+  cloud_storage_enable_remote_read: true
+  cloud_storage_enable_remote_write: true
+  cloud_storage_region: {{ aws_region if aws_region is defined }}
+  cloud_storage_secret_key: THISVALUENOTUSED
+  cloud_storage_credentials_source: aws_instance_metadata
+  # cloud_storage_enabled must be after other cloud_storage parameters
+  cloud_storage_enabled: {{ true if tiered_storage_bucket_name is defined and tiered_storage_bucket_name|d('')|length > 0 else false }}
diff --git a/ansible/playbooks/roles/redpanda_broker/vars/main.yml b/ansible/playbooks/roles/redpanda_broker/vars/main.yml
@@ -3,4 +3,6 @@
 custom_config_templates:
   - template: configs/defaults.j2
   - template: configs/tls.j2
-    condition: "{{ tls | default(False) | bool }}"
+    condition: "{{ tls | default(False) | bool }}"
+  - template: configs/tiered_storage.j2
+    condition: "{{ tiered_storage_bucket_name is defined | default(False) | bool }}"
diff --git a/aws/cluster.tf b/aws/cluster.tf
@@ -3,9 +3,10 @@ resource "random_uuid" "cluster" {}
 resource "time_static" "timestamp" {}
 
 locals {
-  uuid          = random_uuid.cluster.result
-  timestamp     = time_static.timestamp.rfc3339
-  deployment_id = "redpanda-${local.uuid}-${local.timestamp}"
+  uuid                       = random_uuid.cluster.result
+  timestamp                  = time_static.timestamp.unix
+  deployment_id              = length(var.deployment_prefix) > 0 ? var.deployment_prefix : "redpanda-${substr(local.uuid, 0, 8)}-${local.timestamp}"
+  tiered_storage_bucket_name = "${local.deployment_id}-bucket"
 
   # tags shared by all instances
   instance_tags = {
@@ -14,15 +15,73 @@ locals {
   }
 }
 
+resource "aws_iam_policy" "redpanda" {
+  count  = var.tiered_storage_enabled ? 1 : 0
+  name   = local.deployment_id
+  path   = "/"
+  policy = jsonencode({
+    Version   = "2012-10-17"
+    Statement = [
+      {
+        "Effect": "Allow",
+        "Action": [
+          "s3:*",
+          "s3-object-lambda:*",
+        ],
+        "Resource": [
+          "arn:aws:s3:::${local.tiered_storage_bucket_name}/*"
+        ]
+      },
+    ]
+  })
+}
+
+resource "aws_iam_role" "redpanda" {
+  count              = var.tiered_storage_enabled ? 1 : 0
+  name               = local.deployment_id
+  assume_role_policy = jsonencode({
+    Version   = "2012-10-17"
+    Statement = [
+      {
+        Action    = "sts:AssumeRole"
+        Effect    = "Allow"
+        Sid       = ""
+        Principal = {
+          Service = "ec2.amazonaws.com"
+        }
+      },
+    ]
+  })
+}
+
+resource "aws_iam_policy_attachment" "redpanda" {
+  count      = var.tiered_storage_enabled ? 1 : 0
+  name       = local.deployment_id
+  roles      = [aws_iam_role.redpanda[count.index].name]
+  policy_arn = aws_iam_policy.redpanda[count.index].arn
+}
+
+resource "aws_iam_instance_profile" "redpanda" {
+  count  = var.tiered_storage_enabled ? 1 : 0
+  name   = local.deployment_id
+  role   = aws_iam_role.redpanda[count.index].name
+}
+
 resource "aws_instance" "redpanda" {
   count                      = var.nodes
   ami                        = var.distro_ami[var.distro]
   instance_type              = var.instance_type
   key_name                   = aws_key_pair.ssh.key_name
+  iam_instance_profile       = var.tiered_storage_enabled ? aws_iam_instance_profile.redpanda[0].name : null
   vpc_security_group_ids     = [aws_security_group.node_sec_group.id]
   placement_group            = var.ha ? aws_placement_group.redpanda-pg[0].id : null
   placement_partition_number = var.ha ? (count.index % aws_placement_group.redpanda-pg[0].partition_count) + 1 : null
-  tags                       = local.instance_tags
+  tags                       = merge(
+    local.instance_tags,
+    {
+      Name = "${local.deployment_id}-node-${count.index}",
+    }
+  )
 
   connection {
     user        = var.distro_ssh_user[var.distro]
@@ -53,7 +112,12 @@ resource "aws_instance" "prometheus" {
   instance_type          = var.prometheus_instance_type
   key_name               = aws_key_pair.ssh.key_name
   vpc_security_group_ids = [aws_security_group.node_sec_group.id]
-  tags                   = local.instance_tags
+  tags                   = merge(
+    local.instance_tags,
+    {
+      Name = "${local.deployment_id}-prometheus",
+    }
+  )
 
   connection {
     user        = var.distro_ssh_user[var.distro]
@@ -68,7 +132,12 @@ resource "aws_instance" "client" {
   instance_type          = var.client_instance_type
   key_name               = aws_key_pair.ssh.key_name
   vpc_security_group_ids = [aws_security_group.node_sec_group.id]
-  tags                   = local.instance_tags
+  tags                   = merge(
+    local.instance_tags,
+    {
+      Name = "${local.deployment_id}-client",
+    }
+  )
 
   connection {
     user        = var.distro_ssh_user[var.client_distro]
@@ -176,20 +245,25 @@ resource "aws_placement_group" "redpanda-pg" {
 resource "aws_key_pair" "ssh" {
   key_name   = "${local.deployment_id}-key"
   public_key = file(var.public_key_path)
+  tags       = local.instance_tags
 }
 
 resource "local_file" "hosts_ini" {
   content = templatefile("${path.module}/../templates/hosts_ini.tpl",
     {
-      redpanda_public_ips  = aws_instance.redpanda.*.public_ip
-      redpanda_private_ips = aws_instance.redpanda.*.private_ip
-      monitor_public_ip    = var.enable_monitoring ? aws_instance.prometheus[0].public_ip : ""
-      monitor_private_ip   = var.enable_monitoring ? aws_instance.prometheus[0].private_ip : ""
-      ssh_user             = var.distro_ssh_user[var.distro]
-      enable_monitoring    = var.enable_monitoring
-      client_public_ips    = aws_instance.client.*.public_ip
-      client_private_ips   = aws_instance.client.*.private_ip
-      rack                 = aws_instance.redpanda.*.placement_partition_number
+      aws_region                 = var.aws_region
+      client_count               = var.clients
+      client_public_ips          = aws_instance.client.*.public_ip
+      client_private_ips         = aws_instance.client.*.private_ip
+      enable_monitoring          = var.enable_monitoring
+      monitor_public_ip          = var.enable_monitoring ? aws_instance.prometheus[0].public_ip : ""
+      monitor_private_ip         = var.enable_monitoring ? aws_instance.prometheus[0].private_ip : ""
+      rack                       = aws_instance.redpanda.*.placement_partition_number
+      redpanda_public_ips        = aws_instance.redpanda.*.public_ip
+      redpanda_private_ips       = aws_instance.redpanda.*.private_ip
+      ssh_user                   = var.distro_ssh_user[var.distro]
+      tiered_storage_bucket_name = local.tiered_storage_bucket_name
+      tiered_storage_enabled     = var.tiered_storage_enabled
     }
   )
   filename = "${path.module}/../hosts.ini"

diff --git a/aws/provider.tf b/aws/provider.tf
@@ -2,7 +2,7 @@ terraform {
   required_providers {
     aws = {
       source  = "hashicorp/aws"
-      version = "3.73.0"
+      version = "4.35.0"
     }
     local = {
       source  = "hashicorp/local"

diff --git a/aws/readme.md b/aws/readme.md
@@ -15,15 +15,15 @@ Example: `terraform apply -var="instance_type=i3.large" -var="nodes=3"`
 
 | Name | Version |
 |------|---------|
-| aws | 3.73.0 |
+| aws | 4.35.0 |
 | local | 2.1.0 |
 | random | 3.1.0 |
 
 ## Providers
 
 | Name | Version |
 |------|---------|
-| aws | 3.73.0 |
+| aws | 4.35.0 |
 | local | 2.1.0 |
 | random | 3.1.0 |
 
@@ -35,10 +35,10 @@ No Modules.
 
 | Name                                                                                                               |
 |--------------------------------------------------------------------------------------------------------------------|
-| [aws_instance](https://registry.terraform.io/providers/hashicorp/aws/3.73.0/docs/resources/instance)               |
-| [aws_key_pair](https://registry.terraform.io/providers/hashicorp/aws/3.73.0/docs/resources/key_pair)               |
-| [aws_security_group](https://registry.terraform.io/providers/hashicorp/aws/3.73.0/docs/resources/security_group)   |
-| [aws_placement_group](https://registry.terraform.io/providers/hashicorp/aws/3.73.0/docs/resources/placement_group) |
+| [aws_instance](https://registry.terraform.io/providers/hashicorp/aws/4.35.0/docs/resources/instance)               |
+| [aws_key_pair](https://registry.terraform.io/providers/hashicorp/aws/4.35.0/docs/resources/key_pair)               |
+| [aws_security_group](https://registry.terraform.io/providers/hashicorp/aws/4.35.0/docs/resources/security_group)   |
+| [aws_placement_group](https://registry.terraform.io/providers/hashicorp/aws/4.35.0/docs/resources/placement_group) |
 | [local_file](https://registry.terraform.io/providers/hashicorp/local/2.1.0/docs/resources/file)                    |
 | [random_uuid](https://registry.terraform.io/providers/hashicorp/random/3.1.0/docs/resources/uuid)                  |
 | [timestamp_static](https://registry.terraform.io/providers/hashicorp/time/latest/docs/resources/static)            |
@@ -57,6 +57,7 @@ No Modules.
 | nodes | The number of nodes to deploy | `number` | `"3"` | no |
 | prometheus\_instance\_type | Instant type of the prometheus/grafana node | `string` | `"c5.2xlarge"` | no |
 | public\_key\_path | The public key used to ssh to the hosts | `string` | `"~/.ssh/id_rsa.pub"` | no |
+| tiered\_storage\_enabled | Enables or disables tiered storage | `bool` | `false` | no |
 
 ### Client Inputs
 By default, no client VMs are provisioned. If you want to also provision client

diff --git a/aws/s3.tf b/aws/s3.tf
@@ -0,0 +1,19 @@
+resource "aws_s3_bucket" "tiered_storage" {
+  count  = var.tiered_storage_enabled ? 1 : 0
+  bucket = local.tiered_storage_bucket_name
+  tags   = local.instance_tags
+}
+
+resource "aws_s3_bucket_acl" "tiered_storage" {
+  count  = var.tiered_storage_enabled ? 1 : 0
+  bucket = aws_s3_bucket.tiered_storage[count.index].id
+  acl    = "private"
+}
+
+resource "aws_s3_bucket_versioning" "tiered_storage" {
+  count  = var.tiered_storage_enabled ? 1 : 0
+  bucket = aws_s3_bucket.tiered_storage[count.index].id
+  versioning_configuration {
+    status = "Disabled"
+  }
+}