Support workload identity in flush manager service #1630

whd · 2021-04-02T03:14:09Z

We use workload identity in newer GKE deployments. In order to actually flush queued messages to pubsub, flush jobs spawned by flush manager need to be annotated with the appropriate k8s service account information (mapped to a GCP SA by workload identity).

In the old stack that flush manager was developed on, we used GOOGLE_APPLICATION_CREDENTIALS via env var and mounted secret volume to pass credentials to containers. In point of fact, I'm not sure if the flush manager config ever worked in old stage either since the current code passes neither GCP SA nor K8s SA information. I vaguely recall discussing this with :relud so it might have been known and I simply didn't catch that flushes always failed until doing some more exhaustive testing (flushes of empty disks do succeed however which is the common case).

See service_account_name. The behavior of k8s when leaving this unspecified is to use default so this shouldn't be a change in behavior. In ops logic we generally prefer not to use the default k8s SA in annotations and annotate explicit service accounts within namespaces instead.

Tested with https://github.com/mozilla-services/cloudops-infra/pull/2695/commits/b92de758dbb751a77dda8d25146b93ddf56a7d4e in stage.

Separately while testing this I was able to induce data loss by simply deleting a pod associated with a flush job while in an induced error state (flush manager would delete the pv even though the flush job did not succeed, or at least should not have returned a successful status). This is mildly concerning but I may be misunderstanding the expectation here (kubectl delete pod is not going to happen on production stacks in practice). I will double check on this with :relud next week.

jklukas

LGTM

Support workload identity in flush manager service

0e2d731

whd requested review from relud and jklukas April 2, 2021 03:14

jklukas approved these changes Apr 2, 2021

View reviewed changes

whd merged commit cebe636 into master Apr 2, 2021

whd deleted the flush_workload_identity branch April 2, 2021 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support workload identity in flush manager service #1630

Support workload identity in flush manager service #1630

whd commented Apr 2, 2021

jklukas left a comment

Support workload identity in flush manager service #1630

Support workload identity in flush manager service #1630

Conversation

whd commented Apr 2, 2021

jklukas left a comment

Choose a reason for hiding this comment