Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable tiered storage in AWS via IAM policy #86

Merged
merged 2 commits into from
Jan 18, 2023
Merged

Enable tiered storage in AWS via IAM policy #86

merged 2 commits into from
Jan 18, 2023

Conversation

vuldin
Copy link
Member

@vuldin vuldin commented Oct 24, 2022

This PR adds tiered storage feature in AWS deployments. Once merged, new PRs will be created for adding the same feature to other cloud providers. Tiered storage is disabled by default (tiered_storage_enabled in vars.tf). When tiered storage is enabled, then by default it is enabled across all topics in this PR.

Steps to run

git clone https://github.com/redpanda-data/deployment-automation.git
cd deployment-automation
git checkout enable-si
cd aws
terraform init -upgrade
terraform apply -var "enable_monitoring=false" -var "tiered_storage_enabled=true"
cd ..
ansible-playbook --private-key `cat ~/.ssh/id_rsa.pub | awk '{print $2}'` -i hosts.ini -v ansible/playbooks/provision-node.yml -e '{ "redpanda": { "cluster": { "cloud_storage_segment_max_upload_interval_sec": 30 }}}'

Now connect via ssh to a broker, create a topic, produce some data, and check S3 bucket for remote segments. If you need to retrieve the IP address of one of the nodes, run terraform output.

ssh ubuntu@<redpanda-ec2-ip>
rpk topic create alog
BATCH=$(date) ; printf "$BATCH %s\n" {1..1000} | rpk topic produce alog

Run the last command 20+ times. View contents of the bucket in AWS console to see remote segments.

@vuldin vuldin added enhancement New feature or request v2-branch Will be addressed in v2 branch labels Oct 24, 2022
@vuldin vuldin requested a review from r-vasquez October 24, 2022 17:36
@vuldin vuldin linked an issue Oct 24, 2022 that may be closed by this pull request
@vuldin vuldin force-pushed the enable-si branch 2 times, most recently from 43cceed to 4847065 Compare October 28, 2022 03:00
@vuldin vuldin marked this pull request as ready for review October 28, 2022 03:00
@vuldin
Copy link
Member Author

vuldin commented Oct 28, 2022

Question: Should shadow indexing (an enterprise feature) be enabled by default in this project? Right now it is. We decided via slack to have it disabled by default.

@vuldin vuldin force-pushed the enable-si branch 2 times, most recently from e1ed30f to 021416b Compare October 31, 2022 15:14
README.md Outdated Show resolved Hide resolved
aws/cluster.tf Show resolved Hide resolved
aws/vars.tf Show resolved Hide resolved
@rkruze
Copy link
Contributor

rkruze commented Oct 31, 2022

Let's rename this to tiered storage instead of SI. As this is what the docs are focused around.

ansible/playbooks/start-redpanda.yml Outdated Show resolved Hide resolved
ansible/playbooks/start-redpanda.yml Outdated Show resolved Hide resolved
aws/vars.tf Outdated Show resolved Hide resolved
vuldin added a commit that referenced this pull request Nov 1, 2022
deployment_id = "redpanda-${local.uuid}-${local.timestamp}"
uuid = random_uuid.cluster.result
timestamp = time_static.timestamp.unix
deployment_id = length(var.deployment_prefix) > 0 ? var.deployment_prefix : "redpanda-${substr(local.uuid, 0, 8)}-${local.timestamp}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terraform complained about names being too long, so I shortened both the UUID and the timestamp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how safe is it to just grab the first 8 characters off the uuid? I haven't look at how go-uuid works to see if it's one of the time-based uuids.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, why do we use a timestamp here instead of just the larger uuid?

vuldin added a commit that referenced this pull request Nov 1, 2022
@vuldin vuldin changed the title Enable shadow indexing in AWS Enable tiered storage in AWS via IAM policy Nov 1, 2022
@vuldin vuldin removed the v2-branch Will be addressed in v2 branch label Nov 1, 2022
vuldin added a commit that referenced this pull request Nov 1, 2022
vuldin added a commit that referenced this pull request Nov 2, 2022
@vuldin
Copy link
Member Author

vuldin commented Nov 2, 2022

I'll squash the PR feedback commits once the PR is approved (let me know if you think this is unneeded).

tmgstevens added a commit that referenced this pull request Nov 9, 2022
Allow for configuration of arbitrary node and cluster configuration items.
N.B. Further work to be done on idempotence and integration of TLS. #74 and #86 will need some rework
Comment on lines +4 to +5
cloud_storage_enable_remote_read: true
cloud_storage_enable_remote_write: true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes all topics tiered-storage-enabled by default if the tiered_storage_enabled terraform variable is enabled.

condition: "{{ tls | default(False) | bool }}"
- template: configs/tiered_storage.j2
condition: "{{ tiered_storage_bucket_name is defined | default(False) | bool }}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tiered_storage_bucket_name ansible variable is pulled from hosts.ini, and is only defined if the tiered_storage_enabled terraform variable is true.

Comment on lines +28 to +29
"s3:*",
"s3-object-lambda:*",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be more limited to list only those needed permissions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this with the broader team. From that conversation:

Having an policy that tightly limits redpanda to its own bucket is clearly the right thing to do, it's less obvious to me that restricting the verbs is helpful: it creates maintenance burden to keep those up to date as/when we change redpanda, and doesn't meaningfully change security if redpanda is the owner of the bucket.

So limiting this policy to the specific bucket (as shown in this file below) is the best approach, and avoids a maintenance burden as tiered storage capabilities are expanded.

@@ -10,7 +10,7 @@ node:
- address: {{ hostvars[inventory_hostname].advertised_ip }}
port: {{ redpanda_kafka_port }}
advertised_rpc_api:
- address: {{ hostvars[inventory_hostname].advertised_ip }}
address: {{ hostvars[inventory_hostname].advertised_ip }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resulted in an error due to the fact that advertised_rpc_api (unlike other listeners) isn't a list.

cloud_storage_enable_remote_write: true
cloud_storage_region: {{ aws_region if aws_region is defined }}
cloud_storage_secret_key: THISVALUENOTUSED
cloud_storage_credentials_source: aws_instance_metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to pass this in as a param from terraform otherwise we're assuming AWS only

@WesWWagner WesWWagner linked an issue Jan 10, 2023 that may be closed by this pull request
@WesWWagner WesWWagner self-requested a review January 10, 2023 17:04
@hcoyote hcoyote self-requested a review January 12, 2023 23:59
@hcoyote
Copy link
Contributor

hcoyote commented Jan 13, 2023

Temporarily cherry picked the ansible2.14+ node_exporter fixes into my local repo in order to get this to run. fwiw, it seems to run ok with the ansible 7.1.0 bottle in homebrew (or, at least, it's behaving the same way so far as other clusters I've run this week from a cluster provisioning standpoint).

Copy link
Contributor

@hcoyote hcoyote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some questions for clarity, I don't see anything specific that should hold this back as is.

@@ -0,0 +1,10 @@
cluster:
cloud_storage_access_key: THISVALUENOTUSED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you help me understand why this is here if it's not used? Are we expecting users to override this with their own storage key in the TF config?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is just a placeholder. Unfortunately Redpanda requires this to be something even when we are using aws_instance_metadata (IAM permissions applied to the EC2 instance).

deployment_id = "redpanda-${local.uuid}-${local.timestamp}"
uuid = random_uuid.cluster.result
timestamp = time_static.timestamp.unix
deployment_id = length(var.deployment_prefix) > 0 ? var.deployment_prefix : "redpanda-${substr(local.uuid, 0, 8)}-${local.timestamp}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, why do we use a timestamp here instead of just the larger uuid?

Comment on lines +13 to +19
resource "aws_s3_bucket_versioning" "tiered_storage" {
count = var.tiered_storage_enabled ? 1 : 0
bucket = aws_s3_bucket.tiered_storage[count.index].id
versioning_configuration {
status = "Disabled"
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a comment added on here on why we disable versioning (and maybe what happens, good or bad, if the user overrides this)?

@tmgstevens
Copy link
Contributor

Thanks for the review @hcoyote

@tmgstevens tmgstevens merged commit 231ee44 into main Jan 18, 2023
@gene-redpanda gene-redpanda deleted the enable-si branch March 9, 2023 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable SI on AWS s3 setup for archival storage
5 participants