-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable tiered storage in AWS via IAM policy #86
Conversation
43cceed
to
4847065
Compare
|
e1ed30f
to
021416b
Compare
Let's rename this to tiered storage instead of SI. As this is what the docs are focused around. |
deployment_id = "redpanda-${local.uuid}-${local.timestamp}" | ||
uuid = random_uuid.cluster.result | ||
timestamp = time_static.timestamp.unix | ||
deployment_id = length(var.deployment_prefix) > 0 ? var.deployment_prefix : "redpanda-${substr(local.uuid, 0, 8)}-${local.timestamp}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terraform complained about names being too long, so I shortened both the UUID and the timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how safe is it to just grab the first 8 characters off the uuid? I haven't look at how go-uuid works to see if it's one of the time-based uuids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, why do we use a timestamp here instead of just the larger uuid?
I'll squash the PR feedback commits once the PR is approved (let me know if you think this is unneeded). |
cloud_storage_enable_remote_read: true | ||
cloud_storage_enable_remote_write: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes all topics tiered-storage-enabled by default if the tiered_storage_enabled
terraform variable is enabled.
condition: "{{ tls | default(False) | bool }}" | ||
- template: configs/tiered_storage.j2 | ||
condition: "{{ tiered_storage_bucket_name is defined | default(False) | bool }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tiered_storage_bucket_name
ansible variable is pulled from hosts.ini
, and is only defined if the tiered_storage_enabled
terraform variable is true
.
"s3:*", | ||
"s3-object-lambda:*", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be more limited to list only those needed permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this with the broader team. From that conversation:
Having an policy that tightly limits redpanda to its own bucket is clearly the right thing to do, it's less obvious to me that restricting the verbs is helpful: it creates maintenance burden to keep those up to date as/when we change redpanda, and doesn't meaningfully change security if redpanda is the owner of the bucket.
So limiting this policy to the specific bucket (as shown in this file below) is the best approach, and avoids a maintenance burden as tiered storage capabilities are expanded.
@@ -10,7 +10,7 @@ node: | |||
- address: {{ hostvars[inventory_hostname].advertised_ip }} | |||
port: {{ redpanda_kafka_port }} | |||
advertised_rpc_api: | |||
- address: {{ hostvars[inventory_hostname].advertised_ip }} | |||
address: {{ hostvars[inventory_hostname].advertised_ip }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This resulted in an error due to the fact that advertised_rpc_api
(unlike other listeners) isn't a list.
cloud_storage_enable_remote_write: true | ||
cloud_storage_region: {{ aws_region if aws_region is defined }} | ||
cloud_storage_secret_key: THISVALUENOTUSED | ||
cloud_storage_credentials_source: aws_instance_metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to pass this in as a param from terraform otherwise we're assuming AWS only
Temporarily cherry picked the ansible2.14+ node_exporter fixes into my local repo in order to get this to run. fwiw, it seems to run ok with the ansible 7.1.0 bottle in homebrew (or, at least, it's behaving the same way so far as other clusters I've run this week from a cluster provisioning standpoint). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some questions for clarity, I don't see anything specific that should hold this back as is.
@@ -0,0 +1,10 @@ | |||
cluster: | |||
cloud_storage_access_key: THISVALUENOTUSED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you help me understand why this is here if it's not used? Are we expecting users to override this with their own storage key in the TF config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is just a placeholder. Unfortunately Redpanda requires this to be something even when we are using aws_instance_metadata
(IAM permissions applied to the EC2 instance).
deployment_id = "redpanda-${local.uuid}-${local.timestamp}" | ||
uuid = random_uuid.cluster.result | ||
timestamp = time_static.timestamp.unix | ||
deployment_id = length(var.deployment_prefix) > 0 ? var.deployment_prefix : "redpanda-${substr(local.uuid, 0, 8)}-${local.timestamp}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, why do we use a timestamp here instead of just the larger uuid?
resource "aws_s3_bucket_versioning" "tiered_storage" { | ||
count = var.tiered_storage_enabled ? 1 : 0 | ||
bucket = aws_s3_bucket.tiered_storage[count.index].id | ||
versioning_configuration { | ||
status = "Disabled" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a comment added on here on why we disable versioning (and maybe what happens, good or bad, if the user overrides this)?
Thanks for the review @hcoyote |
This PR adds tiered storage feature in AWS deployments. Once merged, new PRs will be created for adding the same feature to other cloud providers. Tiered storage is disabled by default (
tiered_storage_enabled
invars.tf
). When tiered storage is enabled, then by default it is enabled across all topics in this PR.Steps to run
Now connect via ssh to a broker, create a topic, produce some data, and check S3 bucket for remote segments. If you need to retrieve the IP address of one of the nodes, run
terraform output
.Run the last command 20+ times. View contents of the bucket in AWS console to see remote segments.