Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add default_service_account resource #876

Merged
merged 14 commits into from
Aug 28, 2020

Conversation

bendrucker
Copy link
Contributor

Description

This PR adds a kubernetes_default_service_account resource. This follows the lead of aws_default_vpc and other aws_default_* resources. It doesn't create any infrastructure, instead setting minimal state (ID + default secret) and then calling out to Update so that any new user-specified config is persisted.

It shares most of its functionality/code with the main kubernetes_service_account resource. It overrides:

  • The metadata.name schema, forcing the value to be "default"
  • The Create method which doesn't actually create any resources, and just finds the default service account token secret and then calls Update

I factored out getServiceAccountDefaultSecret so that both resources' Create methods could use it. I also updated that to use a field selector to filter service account tokens on the server side. That should help performance in namespaces with large numbers of secrets.

Acceptance tests

  • Have you added an acceptance test for the functionality being added?
  • Have you run the acceptance tests on this branch? (If so, please include the test log in a gist)

https://gist.github.com/bendrucker/03b3483eefbd1094aa262c6fa0309b56

References

Closes #302

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

@joemiller
Copy link

Would love to see this merged. Any feedback or changes from the hashicorp folks on next steps?

@alexsomesan
Copy link
Member

I tested this manually using the following configuration and got a failure on apply:

resource "kubernetes_namespace" "test" {
	metadata {
		name = "test-876"
	}
}

resource "kubernetes_default_service_account" "test" {
  metadata {
	namespace = kubernetes_namespace.test.metadata.0.name
  }

  secret {
    name = kubernetes_secret.one.metadata.0.name
  }

}

resource "kubernetes_secret" "one" {
  metadata {
    name      = "one"
    namespace = kubernetes_namespace.test.metadata.0.name
  }
}

The output looks like this:

kubernetes_namespace.test: Creating...
kubernetes_namespace.test: Creation complete after 0s [id=test-876]
kubernetes_secret.one: Creating...
kubernetes_secret.one: Creation complete after 0s [id=test-876/one]
kubernetes_default_service_account.test: Creating...
kubernetes_default_service_account.test: Still creating... [10s elapsed]
kubernetes_default_service_account.test: Still creating... [20s elapsed]
kubernetes_default_service_account.test: Still creating... [30s elapsed]

Error: Waiting for default secret of "test-876/default" to appear

  on test.tf line 7, in resource "kubernetes_default_service_account" "test":
   7: resource "kubernetes_default_service_account" "test" {

The secret is created correctly. The service account however is missing the secret association. I'm guessing this is due to a race condition during resourceKubernetesDefaultServiceAccountCreate where the update at the end of it gets called too soon and the default service account is not yet there. I'd suggest trying to add retries around getServiceAccountDefaultSecret to make sure it retrieves a valid default service account.

TIP: running Terraform with TRACE debug level will show you the API calls being made and their contents. You can follow the order of operations there.

TF_LOG=TRACE terraform apply

@bendrucker
Copy link
Contributor Author

bendrucker commented Aug 13, 2020

Looking into it, will have an update by next week at the latest

@bendrucker
Copy link
Contributor Author

Thanks for testing @alexsomesan, definitely an oversight on my part. Should be fixed now!

  • I broke the acceptance tests into "basic" (labels/annotations) and "secrets" (secrets/image pull secrets)
  • I'm now trying trying to get the ServiceAccount before checking for its secret, retrying if a "not found" error is encountered

I was able to reproduce the race with the ServiceAccount controller via the acceptance tests, but found it was a lot more intermittent. Running in a loop until failure eventually triggered the error:

while true; do
  TF_ACC=1 go test "./kubernetes" -v -run 'TestAccKubernetesDefaultServiceAccount' -timeout 120m -count 1 || break
done
=== RUN   TestAccKubernetesDefaultServiceAccount_basic
    testing.go:654: Step 0 error: errors during apply:

        Error: serviceaccounts "default" not found

          on /var/folders/7p/z_kw6fc9693fly126tljpml40000gn/T/tf-test719696721/main.tf line 8:
          (source code not available)


--- FAIL: TestAccKubernetesDefaultServiceAccount_basic (6.70s)

I broke up the test so that the ServiceAccount can be created immediately after the namespace. If it also has to wait on several secrets, the race is presumably much less likely.

@bendrucker
Copy link
Contributor Author

I've also added a similar wait to the TestAccKubernetesDefaultServiceAccount_importBasic test. The Importer itself doesn't need to handle a race, since importing implies an already existing namespace and SA. But in an acceptance test, I was able to reproduce a race. So now in addition to applying the configuration, the first import test step also performs a check that waits for the default ServiceAccount's existence.

@bendrucker
Copy link
Contributor Author

The Travis error is coming from make website-test which I can't imagine is related. Passes locally as well.

/home/travis/gopath/src/github.com/hashicorp/terraform-website/content/scripts/check-links.sh "http://127.0.0.1:4567/docs/providers/kubernetes/"
526Crawling site...
527http://127.0.0.1:4567/docs/providers/kubernetes/:
528Remote file does not exist -- broken link!!!
529Found 1 broken link.
530
531http://127.0.0.1:4567/docs/providers/kubernetes/

@dak1n1 dak1n1 self-assigned this Aug 18, 2020
Copy link
Contributor

@dak1n1 dak1n1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for this very thorough PR! I'm on-duty this week and taking over for Alex. I left one minor style comment in the test. Everything else looks great. I did run into an issue with testing though, running the same test as Alex mentioned.

The acceptance tests are passing consistently for me, both in our CI environment (so there are no regressions) and locally testing the new acceptance tests in a loop like you showed us. Those are passing consistently.

However, there seems to be an issue with the new acceptance tests not reporting errors. The tests need to look for the specific data that the resource is injecting into the default service account (for example, checking for the existence of any configured secrets being added to the default service account).

The manual test that Alex mentioned is failing for me, so that's a good place to start testing. Basically I just copy/pasted it into main.tf in a new empty directory and ran TF_LOG=trace terraform apply. (I also copied the provider binary into that directory after compiling it from your branch). Here is all my terminal output from testing, which includes setup, and debug output from the provider. It shows that the default service account never gets updated with the new secrets.

https://gist.githubusercontent.com/dak1n1/c6c7ed628c3d56080431180043e1a133/raw/a44c47f92a58794a3da3701e806b9413db51dcba/gistfile1.txt

We just need to make sure that the configs we're setting actually get applied to the default service account.

@bendrucker
Copy link
Contributor Author

Awesome, thank you! Seems like whatever is causing Error: Waiting for default secret of "test-876/default" to appear is preventing Update from then running, so the additional secrets are never set. I fixed the more obvious race condition (default SA doesn't exist yet for a new namespace) but it seems like there's an addition bug with associating the default secret. Will look into it—that hasn't been at all reproducible in my acceptance test runs.

TestAccKubernetesDefaultServiceAccount_secrets does have coverage against the secrets and image being present, so as long as the apply succeeds I think everything should work.

@bendrucker
Copy link
Contributor Author

Ok, can now confirm the failing example above is now working!

kubernetes_namespace.test: Creating...
kubernetes_namespace.test: Creation complete after 0s [id=test-876]
kubernetes_secret.one: Creating...
kubernetes_secret.one: Creation complete after 0s [id=test-876/one]
kubernetes_default_service_account.test: Creating...
kubernetes_default_service_account.test: Creation complete after 0s [id=test-876/default]

Apply complete! Resources: 3 added, 0 changed, 0 destroyed.

Leaving notes below for each change.

Wait Error

Error: Waiting for default secret of "test-876/default" to appear

Turns out this was not related to acceptance tests versus a full end-to-end TF run, but rather a symptom of having exactly 1 secret specified in configuration. The wait logic checks whether the number of actual secrets matches an input:

len(resp.Secrets) == len(config.Secrets)

When config is first used to to create a ServiceAccount, this is sound logic. A default ServiceAccount was created by a Kubernetes controller, so we instead need to compare against an empty SA object rather than build up an object from the configuration. Otherwise, the above expression is liable to be permanently true (and cause the resource to time out) when the response contains the default SA token secret and the configuration specifies exactly one secret.

I added some logging to help confirm the problem initially and to aid in future debugging.

.metadata.0.name vs. .id

Converted the latter to the former.

I confirmed that these attributes behave equivalently when it comes to the ordering of a Terraform apply graph. The fact that the former is known and the latter is not only affects the plan output, where referencing id will cause the target attribute to be (known after apply). As far as the apply graph is concerned, dependencies are drawn between resources, irrespective of their attributes:

2020/08/27 14:40:57 [TRACE] Executing graph transform *terraform.ReferenceTransformer
2020/08/27 14:40:57 [DEBUG] ReferenceTransformer: "provider.kubernetes" references: []
2020/08/27 14:40:57 [DEBUG] ReferenceTransformer: "kubernetes_namespace.test (prepare state)" references: []
2020/08/27 14:40:57 [DEBUG] ReferenceTransformer: "kubernetes_default_service_account.test (prepare state)" references: []
2020/08/27 14:40:57 [DEBUG] ReferenceTransformer: "kubernetes_secret.one (prepare state)" references: []
2020/08/27 14:40:57 [DEBUG] ReferenceTransformer: "kubernetes_namespace.test" references: []
2020/08/27 14:40:57 [DEBUG] ReferenceTransformer: "kubernetes_secret.one" references: [kubernetes_namespace.test (prepare state) kubernetes_namespace.test kubernetes_namespace.test]
2020/08/27 14:40:57 [DEBUG] ReferenceTransformer: "kubernetes_default_service_account.test" references: [kubernetes_namespace.test (prepare state) kubernetes_namespace.test kubernetes_namespace.test kubernetes_secret.one (prepare state) kubernetes_secret.one kubernetes_secret.one]
2020/08/27 14:40:57 [TRACE] Completed graph transform *terraform.ReferenceTransformer with new graph:
  kubernetes_default_service_account.test - *terraform.NodeApplyableResourceInstance
    kubernetes_default_service_account.test (prepare state) - *terraform.NodeApplyableResource
    kubernetes_namespace.test - *terraform.NodeApplyableResourceInstance
    kubernetes_namespace.test (prepare state) - *terraform.NodeApplyableResource
    kubernetes_secret.one - *terraform.NodeApplyableResourceInstance
    kubernetes_secret.one (prepare state) - *terraform.NodeApplyableResource
    provider.kubernetes - *terraform.NodeApplyableProvider
  kubernetes_default_service_account.test (prepare state) - *terraform.NodeApplyableResource
    provider.kubernetes - *terraform.NodeApplyableProvider
  kubernetes_namespace.test - *terraform.NodeApplyableResourceInstance
    kubernetes_namespace.test (prepare state) - *terraform.NodeApplyableResource
    provider.kubernetes - *terraform.NodeApplyableProvider
  kubernetes_namespace.test (prepare state) - *terraform.NodeApplyableResource
    provider.kubernetes - *terraform.NodeApplyableProvider
  kubernetes_secret.one - *terraform.NodeApplyableResourceInstance
    kubernetes_namespace.test - *terraform.NodeApplyableResourceInstance
    kubernetes_namespace.test (prepare state) - *terraform.NodeApplyableResource
    kubernetes_secret.one (prepare state) - *terraform.NodeApplyableResource
    provider.kubernetes - *terraform.NodeApplyableProvider
  kubernetes_secret.one (prepare state) - *terraform.NodeApplyableResource
    provider.kubernetes - *terraform.NodeApplyableProvider
  provider.kubernetes - *terraform.NodeApplyableProvider
  ------
2020/08/27 14:40:57 [TRACE] Executing graph transform *terraform.AttachDependenciesTransformer
2020/08/27 14:40:57 [TRACE] AttachDependenciesTransformer: kubernetes_namespace.test depends on []
2020/08/27 14:40:57 [TRACE] AttachDependenciesTransformer: kubernetes_secret.one depends on [kubernetes_namespace.test]
2020/08/27 14:40:57 [TRACE] AttachDependenciesTransformer: kubernetes_default_service_account.test depends on [kubernetes_namespace.test kubernetes_secret.one]

Formatting

The review suggestions were missing a few deprecated interpolations ("${foo}") so I just let terrafmt handle it:

terrafmt upgrade012 ./kubernetes/resource_kubernetes_default_service_account_test.go

@dak1n1 dak1n1 merged commit c3fcb85 into hashicorp:master Aug 28, 2020
@dak1n1
Copy link
Contributor

dak1n1 commented Aug 28, 2020

Looks good! I ran the acceptance tests and did some local testing. It all works as planned. Thanks for this contribution!

@bendrucker bendrucker deleted the default-service-account branch August 28, 2020 20:48
@ghost
Copy link

ghost commented Oct 10, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Oct 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow to update default service account
4 participants