Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destroy 'provisioner' for instance resources #386

Closed
kforsthoevel opened this issue Oct 10, 2014 · 86 comments · Fixed by #11329
Closed

Destroy 'provisioner' for instance resources #386

kforsthoevel opened this issue Oct 10, 2014 · 86 comments · Fixed by #11329

Comments

@kforsthoevel
Copy link

I would be great to have sort of a 'provisioner' for destroying an instance resource.

Example:
When creating a instance, I bootstrap it with chef and the node is registered with the chef server. Now I need a way of automatically deleting the node from the chef server after terraform destroys the instance.

@pmoust
Copy link
Contributor

pmoust commented Oct 10, 2014

Yes, a provisioner that would be called on_destroy.
Would be useful for puppet as well, to clean up certificates so that the same(getting into puppetmaster and issuing puppet cert clean ${aws_route53_record.foobar.*.cname).
But more importantly the use of such on_destroy event has some critical cases:

  • In some cases you need to do general cleanup before destroying an instance, (or push logs to storage)
  • gracefully deregistering it from continuous delivery stack
  • shutting off services gracefully
  • notifying monitoring tools that is ok that the instance is dieing etc.

@mitchellh
Copy link
Contributor

What is the behavior if the provisioner fails? Does the destroy fail and it is run again until it succeeds?

I'm open to the idea, but I'm not sure I see how Terraform could ever safely do this.

With provisioning on create, it is much simpler (for reference): The resource is created, the provisioners are run. If any provisioner fails, the resource is "tainted", and the entire thing is destroyed/created and tried again on a subsequent run.

Destroy provisioners don't have the same "taint" concept, which results in them going into an uncertain state that Terraform can't reason about. If Terraform can't reason about it, we can't safely change infrastructure (one of the core tenets of the project). This might imply that this feature is not fit for this project.

@mitchellh mitchellh added the waiting-response An issue/pull request is waiting for a response from the community label Oct 10, 2014
@kforsthoevel
Copy link
Author

I totally understand your point and honestly I don't have a smart answer.
But I can offer two options which would work for me and hopefully for
others as well.

I see the destroy provisioner as a post destroy hook. So the instance
should be destroyed by terraform no matter what. After this,
terraform should try to execute the destroy provisioner and in case it
fails:

  1. just print out a warning and let me handle the fallout manually or

  2. remember the state and re-execute only the destroy provisioner on the
    next run. Maybe till it succeeds.

Does this make sense? I really think terraform is awesome and I hope to use
it together w/ chef.

@mitchellh mitchellh added thinking and removed waiting-response An issue/pull request is waiting for a response from the community labels Oct 11, 2014
@armon
Copy link
Member

armon commented Oct 12, 2014

I think we can run the destroy provisioner before the destroy step itself. This way if the provisioner fails, we abort the destroy, and the user can re-attempt on another Terraform run.

@mitchellh
Copy link
Contributor

@armon What if there are multiple provisioners though? It would force us to keep track of provisioner state. Maybe that is part of the feature, but just worth pointing out.

@armon
Copy link
Member

armon commented Oct 13, 2014

I think they have to be idempotent or just plan to run multiple times in some cases. We don't have to track their state, we just abort the Apply() on that node. Then we just retry it later. I think for most cases (de-registering servers) this should be fine.

@pmoust
Copy link
Contributor

pmoust commented Oct 14, 2014

I am with @armon on this one.

Three rules:

  • provisioners' actions must be idempotent
  • if a provisioner fails, destroy of resource is aborted.
  • a force option should taint/destroy the resource even if the on_destroy provisioner fails.

@sethvargo sethvargo changed the title Feature request: Destroy 'provisioner' for instance resources Destroy 'provisioner' for instance resources Nov 19, 2014
@woodhull
Copy link

Agree that this would be a useful feature, and that idempotency should keep it safe. Options from @pmoust seems reasonable.

@tehnorm
Copy link

tehnorm commented Apr 23, 2015

This would be an excellent feature - we've found need for being able to clean up before a resource is destroyed.

@mlrobinson
Copy link

+1

1 similar comment
@menicosia
Copy link

+1

@dalehamel
Copy link

With the addition of the 'chef' provisioner, IMHO this is a must have.

Right now terraform can provision chef instances, but not actually destroy them. This leaves stale clients and nodes on the chef server.

@mitchellh I see this being implemented as a 'destroy' block within the provisioner block itself. If it's there, then on_destroy is called before destroy is called.

Without this feature, terraform is going to keep causing garbage to pile up on our chef server. We'd be happy to spend some time on a POC PR.

@dalehamel
Copy link

cc @thegedge

@dhoer
Copy link

dhoer commented Jul 24, 2015

👍 +1

I'm surprised packer does this, but terraform doesn't: https://www.packer.io/docs/provisioners/chef-client.html#skip_clean_client

@vjanelle
Copy link

@mitchellh do what puppet does? have a test to see if it needs to execute the on_destroy?

@queeno
Copy link

queeno commented Oct 1, 2015

👍 +1

1 similar comment
@grosendorf
Copy link

👍 +1

@josephholsten
Copy link
Contributor

so what's the next step here? create a proof-of-concept nop destroy provisioner and add the hooks? I like @pmoust's three rules, which can become these test cases:

  • provisioner succeeds, destroy the resource
  • provisioner fails, destroy of resource is aborted
  • force option provided, provisioner fails, still destroy the resource

Should there be an option to skip the on_destroy provisioner altogether?

@apparentlymart
Copy link
Contributor

The "junk in Chef Server" problem (which I also have!) makes me think that we should consider letting a single provisioner hook in to both the create and destroy parts of the lifecycle. Possibly to update, too.

It'd be nice if you could just add provisioner "chef" and it would integrate with both the create and destroy actions and tidy up during destroy.

Of course at that point the provisioner starts to look quite a lot like a resource. In #3084 I included a chef_node resource that creates the chef server object but isn't able to actually get the server set up. I spent some time trying to figure out a suitable workflow there and didn't hit one; a provisioner being able to hook into destroying could be the missing link to make that work well.

@FergusNelson
Copy link

Another use case for this feature is to allow an unmount command to run before a aws_volume_attachment gets destroyed from a running instance.

@w1lnx
Copy link

w1lnx commented Jan 6, 2016

+1

1 similar comment
@trawler
Copy link

trawler commented Jan 6, 2016

+1

@mitchellh
Copy link
Contributor

@kris-nova Looks good! My thoughts (that may overlap with yours, just trying to be as detailed as possible):

  • Config changes, looks like you have those on lockdown.
  • Change EvalTree for apply operation: ignore destroy provisioners
  • Change EvalTree for destroy operation: run destroy provisioners prior to resource apply
  • Probably need to modify the EvalApplyProvisioner EvalNode to understand the on_destroy semantics to return an error or not (hence halting the evaluation process, hence halting the destruction of the resource)
  • Copious tests in context_destroy_test.go and context_apply_test.go to be able to test this stuff in-memory.

Those together would probably do it!

Let me know @kris-nova where your comfort level is and when you want me or don't want me to jump in and I'm happy to provide input. I think your RFC is solid (I actually missed it before you mentioned it!), the only reaction I'd have is maybe splitting up the on_destroy into two fields later to give us a when field (when to run) and a field to specify behavior (attempt/require), this would allow us to build future provisioner lifecycle stuff (only on update, for example). HOWEVER, that is a nitpick and bike shedding in the sense that those changes are easy relative to the core feature itself, so I'll ignore those. :)

@dmourati
Copy link

Woohoo. Reviewing this thread after a similar discussion in #649.

@Techcadia
Copy link
Contributor

Is this still targeted for v0.8?

@mitchellh
Copy link
Contributor

It didn't make it, sorry! This was my fault for not starting this up earlier. But that's okay, we'll get it into 0.9 for sure. It may even sneak into a 0.8.x release depending on the type of changes necessary.

@Techcadia
Copy link
Contributor

Thanks for the update @mitchellh we are starting to work through the issues with Chef AD and Consul and how we remove old stale servers.

@imduffy15
Copy link
Contributor

Would love to hear what you end up with @Techcadia

We ended up just having a job that runs every X minutes to detect stale objects.

@stack72
Copy link
Contributor

stack72 commented Nov 25, 2016

@Techcadia / @imduffy15 I usually baked a script into the instance that ran a removal from our chef server on box termination.

@imduffy15
Copy link
Contributor

@stack72 How are you telling the difference between shutdown, reboot and terminate?

@ashald
Copy link
Contributor

ashald commented Dec 2, 2016

@mitchellh we're missing this sooo much for managing our Consul instances in AWS. Would be very happy to see the feature! Hope it will make its way into the next release.

@eedwardsdisco
Copy link

@ashald have you tried using the consul leave_on_terminate feature?

@mitchellh mitchellh removed the thinking label Dec 4, 2016
@ashald
Copy link
Contributor

ashald commented Dec 21, 2016

@eedwardsdisco that's a dangerous option for server nodes.

It's a Christmas time, let's hope a miracle will happen and it will sneak into one of 0.8.X releases! :)

@akaspin
Copy link

akaspin commented Dec 21, 2016

Any chances to get provision on destroy?

@nbering
Copy link

nbering commented Jan 18, 2017

Aside from the branch on @kris-nova's fork - which is currently over two thousand commits behind - is there anywhere work on this is being done? Is there an open PR?

@asciifaceman
Copy link

I've long since given up hope on this ever being worked on.

@mitchellh
Copy link
Contributor

No open PR at the moment, but it is on the roadmap for 0.9. :)

@nbering
Copy link

nbering commented Jan 18, 2017

@mitchellh Is there an official roadmap published somewhere else?

https://github.com/hashicorp/terraform/milestone/2

@mitchellh
Copy link
Contributor

There is not. Years and years ago (pre-Terraform) I used to but I've been burned too many times by people being very upset when something doesn't make it in. I've learned since to not make such commitments.

That being said, its on the roadmap. We hope to ship it.

@mitchellh
Copy link
Contributor

mitchellh commented Jan 21, 2017

Howdy folks, I've continued the fantastic RFC from this issue and this is the final RFC that I'm going to run with to build this: https://docs.google.com/document/d/15nEcV7fxskDgYrXoNMl6RYIo10PCiZGle7TP8xitrFE/edit#

The work is in f-destroy-prov and is just about complete now. It is just missing some polish work around validation and docs. So I can confirm this will 100% make it into TF 0.9. Thanks so much to @kris-nova for the original RFC work, that helped save copious time during the design phase so I could just start going with implementation!

@krisnova
Copy link

Wow! This is fantastic work @mitchellh 🥇

Your RFC 2.0 looks like exactly what the community needs, and I am so glad to see this coming to life. Again, can't thank you enough for all that you do. Looking forward to the PR - and to one day having a long awaited closure on the issue.

Go Terraform!

@mitchellh
Copy link
Contributor

If on_failure = "continue" is set (not the default) then we'll just continue and it'll only be in the output. The output may be noisy but we don't have a better way to surface that information at the moment (bonus: it'll be in red, so that helps if you have colors on). If this becomes a real issue then we can consider some other options, but I'd rather get the working feature out into the hands of the community and see where pain points end up being. 🤢

@krisnova
Copy link

Ah thanks - I nuked my original question (as I read on I found more information in the implementation section..) which was

How will Terraform handle a failed destroy provisioner? Will this be tracked anywhere?

+1 to iterating on this.. I think you are nailing the feature exactly as we want, and I am pumped to see where this evolves to!

I am curious though.. what if we have an on_failure = "continue" and the provisioner itself breaks the program? (Might be a question for another forum, so feel free to defer me) Will terraform recover from the potential fatal, and continue destroying?

@mitchellh
Copy link
Contributor

I think the discussion here is fine. 👍

If a provisioner returns any error (connection error, execution error, etc.) it will continue destruction. If the provisioner causes Terraform to crash we'll not, but I think that's a reasonable tradeoff (provisioners should not crash Terraform 😛).

The broadness on the "failure" type is necessary due to the current core <=> provisioner API interface. Changing that would be pretty messy/annoying/deeper. Again, possible if it ends up adding high value, but not something I'd casually do unless there was enough of a proven use case.

Hope I answered your Q right... I didn't fully understand it (late Friday night reading perhaps an excuse. I dunno, but prob my own fault).

@krisnova
Copy link

Great explanation! Thanks! You're right.. maybe I should get back to Friday night. Cheers

@ghost
Copy link

ghost commented Apr 17, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.