Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-32645: Add alert DB backend resources #373

Merged
merged 8 commits into from Dec 16, 2021
Merged

DM-32645: Add alert DB backend resources #373

merged 8 commits into from Dec 16, 2021

Conversation

spenczar
Copy link
Contributor

@spenczar spenczar commented Dec 15, 2021

This change does three things:

  • Adds two GCS buckets for storing alert packets and schemas in an archive
  • Adds GCP IAM machinery to allow an application in kubernetes to access the buckets
  • Adds Github and Terraform boilerplate to deploy this stuff

The alert database (design doc: https://dmtn-183lsst.io/) uses buckets for backend storage of alert packets and schema data. The first, simplest thing this PR does is add those buckets. It also enables a lifecycle rule for non-production environments so we don't store petabytes of simulation and test data.

There are two applications that interact with those buckets: alert_database_ingester, which reads from Kafka and writes to the buckets, and alert_database_server, which takes HTTP requests and serves up the contents from those buckets.

Those applications will run in Kubernetes on the RSP IDF-INT environment during integration of the alert stream. Therefore, we need to thread credentials all the way through from Google Cloud Storage, through Kubernetes Service Accounts, down to the pods running those applications.

So the second, complex thing this PR does is add the right Google Cloud Service Accounts and role bindings to make it possible for pods in Kubernetes to access these Google Cloud resources. I followed the guidance in the "Workload Identity" docs and the examples set by Gafaelfawr to write this stuff.

Finally, I structured this as a separate submodule within the science-platform deployment. Maybe this is right, maybe it's wrong, I'm pretty ambivalent - it could all go in the science-platform main.tf. But I split it out, which then necessitates making more tfvars and Github workflow files, so that's the third big thing here. I wrote the tfvars myself but I copied the workflows from the -cloudsql-tf.yaml versions.

Copy link
Contributor

@athornton athornton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want someone wiser in the ways of Terraform than I am. This seems fine but my "what am I looking at?" factor is high.

type = "Delete"
},
condition = {
age = var.maximum_alert_age
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Age has one-day granularity. Should this be >= ? Does this necessarily get run each day?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is correct as-is.

The way this works is through Object Lifecycle Management. It's a setting you can apply to a bucket which instructs Google to take care of object cleanup on your behalf. The "age" field acts like you would hope:

For example, if an object's creation time is 2019/01/10 10:00 UTC and the Age condition is 10 days, then the condition is satisfied for the object on and after 2019/01/20 10:00 UTC.

(emphasis added)

A second reason that >= isn't permitted is that, despite the appearance, this condition value is a Terraform map type - just a string -> anything mapping, equivalent to the Python {"age": maximum_alert_age}. Using >= would be a syntax error here.

@spenczar
Copy link
Contributor Author

I'm going to go ahead and merge this after re-reading it quite carefully a few times, since it is blocking my other work.

@spenczar spenczar merged commit 831f428 into main Dec 16, 2021
@spenczar spenczar deleted the tickets/DM-32645 branch December 16, 2021 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants