Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ACL secret_id #17

Merged
merged 3 commits into from
Dec 9, 2017
Merged

Add support for ACL secret_id #17

merged 3 commits into from
Dec 9, 2017

Conversation

koiuo
Copy link
Contributor

@koiuo koiuo commented Dec 1, 2017

acctest was failing even before the changes, first commit fixes that (in the easiest way I could figure out)

nomad api had to updated to 0.7.0 for this.
It's basically govendor fetch ...@=v0.7.0

@apparentlymart
Copy link
Member

Thanks for working on this, @edio!

We can't look at this immediately due to other work going on but we should be working on this provider more in the near future so we'll plan to review/test this as part of other work. Sorry for the delay...

@ashald
Copy link

ashald commented Dec 1, 2017

@apparentlymart we would highly appreciate if you could reconsider that and take a look at this pull request. At this moment Nomad provider for Terraform is literally unusable with upstream version of Nomad once you enable ACL (which is an obvious thing to do in any production-like environment). Also, given that the provider does validate the job definitions we essentially cannot use any of the job features added since version 0.6. Regardless of huge improvements that are planned for the provider it's a disaster to not be abel to use official version of the provider with the official version of Nomad.

Copy link
Contributor

@paddycarver paddycarver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some code-level comments on this, but I have more of a design question here. And please forgive me, I'm relatively inexperienced with Nomad ACLs, so I beg your patience if I get some of the details wrong here.

From what I can surmise from the documentation, the issue we ran afoul of is that from Nomad 0.7 onward, we got a new NOMAD_TOKEN env var that must be set, otherwise it's default deny for all operations. Our provider didn't have the ability to set that, so the provider couldn't do anything.

If that's an accurate summary, that is indeed a problem, and something we should resolve. And adding a new field to the provider is a great way to go about that.

My concern largely comes in with the changes in provider_test.go. I don't really understand why we're obtaining a secret in the init function. From what I understand, the initial bootstrap function to obtain the secret can only be performed once, until it's reset. Which means if we're on a cluster that is reused across multiple runs, or is used for multiple runs in parallel, it'll refuse to give out a secret after the first. But if I understand correctly, the secret ID is essentially a credential, something the user is providing to authenticate themselves. Traditionally, we've had credentials passed into our providers' tests using environment variables, instead of having the providers bootstrap them. That allows each environment to obtain the credentials in whatever way is appropriate for it, and provide them to the tests. It also means we could probably reduce this PR in size (we don't need the code for obtaining the bootstrap credential anymore, we probably don't even need the vendor updates) which makes it a lot easier and less risky to merge and get a quick release out.

I'd love to hear your thoughts on all this, and appreciate your patience and your willingness to contribute to Terraform.

GNUmakefile Outdated
@@ -12,6 +12,8 @@ test: fmtcheck
xargs -t -n4 go test $(TESTARGS) -timeout=30s -parallel=4

testacc: fmtcheck
@echo "Make sure nomad is running"
@echo " nomad agent -dev -acl-enabled"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a rule of thumb, we tend to have the environment setup required for our acceptance tests happen outside the testacc target. This is so our CI and contributors have the option to reuse stable environments, and to avoid hardcoding the environment in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood you correctly, you're suggesting, that testacc should depend on, say, run_nomad, which will start it for the user, correct?

If that's the case, I totally agree with you, but I just kept original approach.

Please confirm if I udnerstood you correctly, and I'll fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically what I'm suggesting is that testacc not try to start Nomad at all. Instead, the user running testacc should start Nomad themselves, then run testacc. A Makefile target can be added to run Nomad, but I think "running the tests" and "setting up the environment the tests run against" should be two separate commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, so it's the opposite of what I understood :)

Well, this does not setup the environment, but just merely a reminder for the user. But now that you pointed this out, I think, that a right place for this is README.md.

I'm going to address all this today.

CHANGELOG.md Outdated

* provider: Update Nomad to v0.7.0
* provider: Allow setting ACL in hcl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it doesn't really matter, but we tend to update the CHANGELOG as a separate step, outside the PR process. Otherwise, merge conflicts quickly get out of control.


func init() {
testProvider = Provider().(*schema.Provider)
testProviders = map[string]terraform.ResourceProvider{
"nomad": testProvider,
}

if v := os.Getenv("TF_ACC"); v == "1" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: resource.TestEnvVar can be used to avoid hardcoding this.

}

func setupAcl() {
client, _ := api.NewClient(&api.Config{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're ignoring an error here. I'd be a lot more comfortable if we checked and panicked on it.

@@ -208,6 +223,7 @@ func testResourceJob_checkExists(s *terraform.State) error {

func testResourceJob_checkDestroy(jobID string) r.TestCheckFunc {
return func(*terraform.State) error {
time.Sleep(time.Second) // let things happen in background
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What things? What are we waiting for here? Why is this necessary?

Sometimes Sleeps are necessary, and it's always unfortunate when that happens, but if we document why, at least we can remove them in the future when they stop being necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing on the recent commit of the master branch of this repo for me.

I'm not sure, whether this test was written in such way itnentionally. I might have misunderstood the whole idea of this test.

But, anyway, my understanding is:

  • in test we create a job with command /bin/sleep 1
  • right after we create the job we check that the job is "dead"
  • apparently, by the time we check job state, it is still not "dead", so test fails

So I introduced sleep before we check.

Perhaps, I'd better change job definition to /bin/true, so the test will pass (in most cases, explained below) without introducing delay before the check.

This solution will practically work, but is still conceptually wrong and hypothetically may fail on a very slow machine.

But, I don't see, how the previous version of the test have been working at all.

Let me know, what do you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the clarification, and thanks for fixing the test!

I'm uncomfortable with the Sleep, and think a better idea would probably be to introduce some polling to make things a bit more robust (and agree that /bin/true is probably a better test job to run), but it's a bit removed from the main problem this PR is addressing, so I'm ok just letting it pass for the moment and circling back around to fix it after we get this resolved.

@koiuo
Copy link
Contributor Author

koiuo commented Dec 7, 2017

I started responding from bottom to top, and finally got to your main question :)

My knowledge is probably way more limited than yours :) But here's what I think.

With these acceptance tests we have 2 mutually exclusive options:

  1. user runs nomad, bootstraps ACL, extracts token manually and passes it to tests with env var
  2. we do all, or part of this, automactically (this is also relevant to your suggestion in the Makefile)

Why these are exclusive: when we bootstrap ACL we can't just tell nomad to use our own token for that, token is being generated by nomad automatically. Hence, we either do it manually, or we do it in init.

I didn't really understand your concern about parallel tests, sorry.

I'll just emphasize one thing, maybe it'll clarify something: bootstrap ACL token and nomad instance is a single invariant. If we bootstrapped nomad instance, token won't change. So if your CI is going to run couple of different tests against the same instance of nomad, it, of course, will fail.

I saw the original comment on this test, that test expects fresh instance of nomad, so I concluded, that this is something I can safely rely on.

@koiuo
Copy link
Contributor Author

koiuo commented Dec 7, 2017

@paddycarver I addressed all your comments in the code.
Do you still want me to change the code, so the token for tests is passed as env var?
The downside of that would be more work for user or CI to setup environment.

@paddycarver
Copy link
Contributor

@edio I think this would be better with the token passed as an env var, and that seems more inline with how our providers tend to operate. I think we could also have a script that automates the setup and token retrieval, but I think we'd be best served by keeping that separate from the running of the tests. I'm not quite sure why the comment says the cluster needs to be empty, but I think if we can avoid reenforcing that requirement where possible, that's probably for the best.

Our support engineers, @angrycub in particular, mentioned that you'd prefer I just go ahead and make the changes and merge it in over making them yourself, and I'm willing to do that. I'll go ahead and do that now, but I do appreciate your work, effort, and patience on this!

paddycarver added a commit that referenced this pull request Dec 9, 2017
This is just a cleanup of #17, which did all the hard work. I just
cleaned it up quickly to get it merged quickly, largely just removing
the code that bootstraps the credentials, and adding instructions for
users on how to pass the credentials in.
@paddycarver paddycarver mentioned this pull request Dec 9, 2017
@paddycarver paddycarver merged commit 087bdb3 into hashicorp:master Dec 9, 2017
paddycarver added a commit that referenced this pull request Dec 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants