DM-32645: Add alert DB backend resources #373

spenczar · 2021-12-15T23:19:27Z

This change does three things:

Adds two GCS buckets for storing alert packets and schemas in an archive
Adds GCP IAM machinery to allow an application in kubernetes to access the buckets
Adds Github and Terraform boilerplate to deploy this stuff

The alert database (design doc: https://dmtn-183lsst.io/) uses buckets for backend storage of alert packets and schema data. The first, simplest thing this PR does is add those buckets. It also enables a lifecycle rule for non-production environments so we don't store petabytes of simulation and test data.

There are two applications that interact with those buckets: alert_database_ingester, which reads from Kafka and writes to the buckets, and alert_database_server, which takes HTTP requests and serves up the contents from those buckets.

Those applications will run in Kubernetes on the RSP IDF-INT environment during integration of the alert stream. Therefore, we need to thread credentials all the way through from Google Cloud Storage, through Kubernetes Service Accounts, down to the pods running those applications.

So the second, complex thing this PR does is add the right Google Cloud Service Accounts and role bindings to make it possible for pods in Kubernetes to access these Google Cloud resources. I followed the guidance in the "Workload Identity" docs and the examples set by Gafaelfawr to write this stuff.

Finally, I structured this as a separate submodule within the science-platform deployment. Maybe this is right, maybe it's wrong, I'm pretty ambivalent - it could all go in the science-platform main.tf. But I split it out, which then necessitates making more tfvars and Github workflow files, so that's the third big thing here. I wrote the tfvars myself but I copied the workflows from the -cloudsql-tf.yaml versions.

These are copied from the Cloud SQL workflows.

It provides access to the service account ID more cleanly

its illegal and the terraform police will get you

athornton

You probably want someone wiser in the ways of Terraform than I am. This seems fine but my "what am I looking at?" factor is high.

athornton · 2021-12-16T14:28:31Z

environment/deployments/science-platform/alertdb/main.tf

+        type = "Delete"
+      },
+      condition = {
+        age = var.maximum_alert_age


Age has one-day granularity. Should this be >= ? Does this necessarily get run each day?

Yes, this is correct as-is.

The way this works is through Object Lifecycle Management. It's a setting you can apply to a bucket which instructs Google to take care of object cleanup on your behalf. The "age" field acts like you would hope:

For example, if an object's creation time is 2019/01/10 10:00 UTC and the Age condition is 10 days, then the condition is satisfied for the object on and after 2019/01/20 10:00 UTC.

(emphasis added)

A second reason that >= isn't permitted is that, despite the appearance, this condition value is a Terraform map type - just a string -> anything mapping, equivalent to the Python {"age": maximum_alert_age}. Using >= would be a syntax error here.

spenczar · 2021-12-16T19:45:25Z

I'm going to go ahead and merge this after re-reading it quite carefully a few times, since it is blocking my other work.

spenczar added 8 commits December 15, 2021 11:59

Add support for lifecycle rules to buckets

c4b349a

Fix default for a set type

9a923ba

Add alertdb buckets and service accounts

06d6827

Add tfvars for alertdb in all science platform environments

34e8777

Create github workflows for alertdb

7599e29

These are copied from the Cloud SQL workflows.

Fix terraform type name

ea765c2

Use upstream service account module

4dac174

It provides access to the service account ID more cleanly

Don't specify source twice in one module

723f019

its illegal and the terraform police will get you

athornton approved these changes Dec 16, 2021

View reviewed changes

spenczar mentioned this pull request Dec 16, 2021

DM-32643: Add alert DB ingester lsst-sqre/charts#580

Merged

spenczar merged commit 831f428 into main Dec 16, 2021

spenczar deleted the tickets/DM-32645 branch December 16, 2021 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-32645: Add alert DB backend resources #373

DM-32645: Add alert DB backend resources #373

spenczar commented Dec 15, 2021 •

edited

athornton left a comment

athornton Dec 16, 2021

spenczar Dec 16, 2021

spenczar commented Dec 16, 2021

DM-32645: Add alert DB backend resources #373

DM-32645: Add alert DB backend resources #373

Conversation

spenczar commented Dec 15, 2021 • edited

athornton left a comment

Choose a reason for hiding this comment

athornton Dec 16, 2021

Choose a reason for hiding this comment

spenczar Dec 16, 2021

Choose a reason for hiding this comment

spenczar commented Dec 16, 2021

spenczar commented Dec 15, 2021 •

edited