New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-32645: Add alert DB backend resources #373
Conversation
These are copied from the Cloud SQL workflows.
It provides access to the service account ID more cleanly
its illegal and the terraform police will get you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want someone wiser in the ways of Terraform than I am. This seems fine but my "what am I looking at?" factor is high.
type = "Delete" | ||
}, | ||
condition = { | ||
age = var.maximum_alert_age |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Age has one-day granularity. Should this be >= ? Does this necessarily get run each day?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is correct as-is.
The way this works is through Object Lifecycle Management. It's a setting you can apply to a bucket which instructs Google to take care of object cleanup on your behalf. The "age" field acts like you would hope:
For example, if an object's creation time is 2019/01/10 10:00 UTC and the Age condition is 10 days, then the condition is satisfied for the object on and after 2019/01/20 10:00 UTC.
(emphasis added)
A second reason that >=
isn't permitted is that, despite the appearance, this condition
value is a Terraform map
type - just a string
-> anything
mapping, equivalent to the Python {"age": maximum_alert_age}
. Using >=
would be a syntax error here.
I'm going to go ahead and merge this after re-reading it quite carefully a few times, since it is blocking my other work. |
This change does three things:
The alert database (design doc: https://dmtn-183lsst.io/) uses buckets for backend storage of alert packets and schema data. The first, simplest thing this PR does is add those buckets. It also enables a lifecycle rule for non-production environments so we don't store petabytes of simulation and test data.
There are two applications that interact with those buckets: alert_database_ingester, which reads from Kafka and writes to the buckets, and alert_database_server, which takes HTTP requests and serves up the contents from those buckets.
Those applications will run in Kubernetes on the RSP IDF-INT environment during integration of the alert stream. Therefore, we need to thread credentials all the way through from Google Cloud Storage, through Kubernetes Service Accounts, down to the pods running those applications.
So the second, complex thing this PR does is add the right Google Cloud Service Accounts and role bindings to make it possible for pods in Kubernetes to access these Google Cloud resources. I followed the guidance in the "Workload Identity" docs and the examples set by Gafaelfawr to write this stuff.
Finally, I structured this as a separate submodule within the science-platform deployment. Maybe this is right, maybe it's wrong, I'm pretty ambivalent - it could all go in the science-platform
main.tf
. But I split it out, which then necessitates making moretfvars
and Github workflow files, so that's the third big thing here. I wrote the tfvars myself but I copied the workflows from the -cloudsql-tf.yaml versions.