Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable tiered storage in AWS via IAM policy #86

Merged
merged 2 commits into from
Jan 18, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 48 additions & 48 deletions README.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@

- name: Check if restart needed
ansible.builtin.shell:
cmd: "rpk cluster config status | awk '{ print $3 }' | grep -E 'true|false'"
cmd: "rpk cluster config status | awk '{ print $3 }' | grep -E 'true|false'"
register: restart_required_rc
changed_when: False
run_once: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ node:
- address: {{ hostvars[inventory_hostname].advertised_ip }}
port: {{ redpanda_kafka_port }}
advertised_rpc_api:
- address: {{ hostvars[inventory_hostname].advertised_ip }}
address: {{ hostvars[inventory_hostname].advertised_ip }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resulted in an error due to the fact that advertised_rpc_api (unlike other listeners) isn't a list.

port: {{ redpanda_rpc_port }}
data_directory: "{{ redpanda_data_directory }}"
rpc_server:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
cluster:
cloud_storage_access_key: THISVALUENOTUSED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you help me understand why this is here if it's not used? Are we expecting users to override this with their own storage key in the TF config?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is just a placeholder. Unfortunately Redpanda requires this to be something even when we are using aws_instance_metadata (IAM permissions applied to the EC2 instance).

cloud_storage_bucket: {{ tiered_storage_bucket_name if tiered_storage_bucket_name is defined }}
cloud_storage_enable_remote_read: true
cloud_storage_enable_remote_write: true
Comment on lines +4 to +5
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes all topics tiered-storage-enabled by default if the tiered_storage_enabled terraform variable is enabled.

cloud_storage_region: {{ aws_region if aws_region is defined }}
cloud_storage_secret_key: THISVALUENOTUSED
cloud_storage_credentials_source: aws_instance_metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to pass this in as a param from terraform otherwise we're assuming AWS only

# cloud_storage_enabled must be after other cloud_storage parameters
cloud_storage_enabled: {{ true if tiered_storage_bucket_name is defined and tiered_storage_bucket_name|d('')|length > 0 else false }}
4 changes: 3 additions & 1 deletion ansible/playbooks/roles/redpanda_broker/vars/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,6 @@
custom_config_templates:
- template: configs/defaults.j2
- template: configs/tls.j2
condition: "{{ tls | default(False) | bool }}"
condition: "{{ tls | default(False) | bool }}"
- template: configs/tiered_storage.j2
condition: "{{ tiered_storage_bucket_name is defined | default(False) | bool }}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tiered_storage_bucket_name ansible variable is pulled from hosts.ini, and is only defined if the tiered_storage_enabled terraform variable is true.

104 changes: 89 additions & 15 deletions aws/cluster.tf
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@ resource "random_uuid" "cluster" {}
resource "time_static" "timestamp" {}

locals {
uuid = random_uuid.cluster.result
timestamp = time_static.timestamp.rfc3339
deployment_id = "redpanda-${local.uuid}-${local.timestamp}"
uuid = random_uuid.cluster.result
timestamp = time_static.timestamp.unix
deployment_id = length(var.deployment_prefix) > 0 ? var.deployment_prefix : "redpanda-${substr(local.uuid, 0, 8)}-${local.timestamp}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terraform complained about names being too long, so I shortened both the UUID and the timestamp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how safe is it to just grab the first 8 characters off the uuid? I haven't look at how go-uuid works to see if it's one of the time-based uuids.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, why do we use a timestamp here instead of just the larger uuid?

tiered_storage_bucket_name = "${local.deployment_id}-bucket"

# tags shared by all instances
instance_tags = {
Expand All @@ -14,15 +15,73 @@ locals {
}
}

resource "aws_iam_policy" "redpanda" {
count = var.tiered_storage_enabled ? 1 : 0
name = local.deployment_id
path = "/"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
"Effect": "Allow",
"Action": [
"s3:*",
"s3-object-lambda:*",
Comment on lines +28 to +29
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be more limited to list only those needed permissions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this with the broader team. From that conversation:

Having an policy that tightly limits redpanda to its own bucket is clearly the right thing to do, it's less obvious to me that restricting the verbs is helpful: it creates maintenance burden to keep those up to date as/when we change redpanda, and doesn't meaningfully change security if redpanda is the owner of the bucket.

So limiting this policy to the specific bucket (as shown in this file below) is the best approach, and avoids a maintenance burden as tiered storage capabilities are expanded.

],
"Resource": [
"arn:aws:s3:::${local.tiered_storage_bucket_name}/*"
]
},
]
})
}

resource "aws_iam_role" "redpanda" {
count = var.tiered_storage_enabled ? 1 : 0
name = local.deployment_id
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Sid = ""
Principal = {
Service = "ec2.amazonaws.com"
}
},
]
})
}

resource "aws_iam_policy_attachment" "redpanda" {
count = var.tiered_storage_enabled ? 1 : 0
name = local.deployment_id
roles = [aws_iam_role.redpanda[count.index].name]
policy_arn = aws_iam_policy.redpanda[count.index].arn
}

resource "aws_iam_instance_profile" "redpanda" {
count = var.tiered_storage_enabled ? 1 : 0
name = local.deployment_id
role = aws_iam_role.redpanda[count.index].name
}

vuldin marked this conversation as resolved.
Show resolved Hide resolved
resource "aws_instance" "redpanda" {
count = var.nodes
ami = var.distro_ami[var.distro]
instance_type = var.instance_type
key_name = aws_key_pair.ssh.key_name
iam_instance_profile = var.tiered_storage_enabled ? aws_iam_instance_profile.redpanda[0].name : null
vpc_security_group_ids = [aws_security_group.node_sec_group.id]
placement_group = var.ha ? aws_placement_group.redpanda-pg[0].id : null
placement_partition_number = var.ha ? (count.index % aws_placement_group.redpanda-pg[0].partition_count) + 1 : null
tags = local.instance_tags
tags = merge(
local.instance_tags,
{
Name = "${local.deployment_id}-node-${count.index}",
}
)

connection {
user = var.distro_ssh_user[var.distro]
Expand Down Expand Up @@ -53,7 +112,12 @@ resource "aws_instance" "prometheus" {
instance_type = var.prometheus_instance_type
key_name = aws_key_pair.ssh.key_name
vpc_security_group_ids = [aws_security_group.node_sec_group.id]
tags = local.instance_tags
tags = merge(
local.instance_tags,
{
Name = "${local.deployment_id}-prometheus",
}
)

connection {
user = var.distro_ssh_user[var.distro]
Expand All @@ -68,7 +132,12 @@ resource "aws_instance" "client" {
instance_type = var.client_instance_type
key_name = aws_key_pair.ssh.key_name
vpc_security_group_ids = [aws_security_group.node_sec_group.id]
tags = local.instance_tags
tags = merge(
local.instance_tags,
{
Name = "${local.deployment_id}-client",
}
)

connection {
user = var.distro_ssh_user[var.client_distro]
Expand Down Expand Up @@ -167,20 +236,25 @@ resource "aws_placement_group" "redpanda-pg" {
resource "aws_key_pair" "ssh" {
key_name = "${local.deployment_id}-key"
public_key = file(var.public_key_path)
tags = local.instance_tags
}

resource "local_file" "hosts_ini" {
content = templatefile("${path.module}/../templates/hosts_ini.tpl",
{
redpanda_public_ips = aws_instance.redpanda.*.public_ip
redpanda_private_ips = aws_instance.redpanda.*.private_ip
monitor_public_ip = var.enable_monitoring ? aws_instance.prometheus[0].public_ip : ""
monitor_private_ip = var.enable_monitoring ? aws_instance.prometheus[0].private_ip : ""
ssh_user = var.distro_ssh_user[var.distro]
enable_monitoring = var.enable_monitoring
client_public_ips = aws_instance.client.*.public_ip
client_private_ips = aws_instance.client.*.private_ip
rack = aws_instance.redpanda.*.placement_partition_number
aws_region = var.aws_region
client_count = var.clients
client_public_ips = aws_instance.client.*.public_ip
client_private_ips = aws_instance.client.*.private_ip
enable_monitoring = var.enable_monitoring
monitor_public_ip = var.enable_monitoring ? aws_instance.prometheus[0].public_ip : ""
monitor_private_ip = var.enable_monitoring ? aws_instance.prometheus[0].private_ip : ""
rack = aws_instance.redpanda.*.placement_partition_number
redpanda_public_ips = aws_instance.redpanda.*.public_ip
redpanda_private_ips = aws_instance.redpanda.*.private_ip
ssh_user = var.distro_ssh_user[var.distro]
tiered_storage_bucket_name = local.tiered_storage_bucket_name
tiered_storage_enabled = var.tiered_storage_enabled
}
)
filename = "${path.module}/../hosts.ini"
Expand Down
2 changes: 1 addition & 1 deletion aws/provider.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "3.73.0"
version = "4.35.0"
vuldin marked this conversation as resolved.
Show resolved Hide resolved
}
local = {
source = "hashicorp/local"
Expand Down
13 changes: 7 additions & 6 deletions aws/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,15 @@ Example: `terraform apply -var="instance_type=i3.large" -var="nodes=3"`

| Name | Version |
|------|---------|
| aws | 3.73.0 |
| aws | 4.35.0 |
| local | 2.1.0 |
| random | 3.1.0 |

## Providers

| Name | Version |
|------|---------|
| aws | 3.73.0 |
| aws | 4.35.0 |
| local | 2.1.0 |
| random | 3.1.0 |

Expand All @@ -35,10 +35,10 @@ No Modules.

| Name |
|--------------------------------------------------------------------------------------------------------------------|
| [aws_instance](https://registry.terraform.io/providers/hashicorp/aws/3.73.0/docs/resources/instance) |
| [aws_key_pair](https://registry.terraform.io/providers/hashicorp/aws/3.73.0/docs/resources/key_pair) |
| [aws_security_group](https://registry.terraform.io/providers/hashicorp/aws/3.73.0/docs/resources/security_group) |
| [aws_placement_group](https://registry.terraform.io/providers/hashicorp/aws/3.73.0/docs/resources/placement_group) |
| [aws_instance](https://registry.terraform.io/providers/hashicorp/aws/4.35.0/docs/resources/instance) |
| [aws_key_pair](https://registry.terraform.io/providers/hashicorp/aws/4.35.0/docs/resources/key_pair) |
| [aws_security_group](https://registry.terraform.io/providers/hashicorp/aws/4.35.0/docs/resources/security_group) |
| [aws_placement_group](https://registry.terraform.io/providers/hashicorp/aws/4.35.0/docs/resources/placement_group) |
| [local_file](https://registry.terraform.io/providers/hashicorp/local/2.1.0/docs/resources/file) |
| [random_uuid](https://registry.terraform.io/providers/hashicorp/random/3.1.0/docs/resources/uuid) |
| [timestamp_static](https://registry.terraform.io/providers/hashicorp/time/latest/docs/resources/static) |
Expand All @@ -57,6 +57,7 @@ No Modules.
| nodes | The number of nodes to deploy | `number` | `"3"` | no |
| prometheus\_instance\_type | Instant type of the prometheus/grafana node | `string` | `"c5.2xlarge"` | no |
| public\_key\_path | The public key used to ssh to the hosts | `string` | `"~/.ssh/id_rsa.pub"` | no |
| tiered\_storage\_enabled | Enables or disables tiered storage | `bool` | `false` | no |

### Client Inputs
By default, no client VMs are provisioned. If you want to also provision client
Expand Down
19 changes: 19 additions & 0 deletions aws/s3.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
resource "aws_s3_bucket" "tiered_storage" {
count = var.tiered_storage_enabled ? 1 : 0
bucket = local.tiered_storage_bucket_name
tags = local.instance_tags
}

resource "aws_s3_bucket_acl" "tiered_storage" {
count = var.tiered_storage_enabled ? 1 : 0
bucket = aws_s3_bucket.tiered_storage[count.index].id
acl = "private"
}

resource "aws_s3_bucket_versioning" "tiered_storage" {
count = var.tiered_storage_enabled ? 1 : 0
bucket = aws_s3_bucket.tiered_storage[count.index].id
versioning_configuration {
status = "Disabled"
}
}
Comment on lines +13 to +19
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a comment added on here on why we disable versioning (and maybe what happens, good or bad, if the user overrides this)?

Loading