Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crtl-c shows "graceful" shutdown message, then long pause, then terraform still destroys resources #30918

Closed
josh-m-sharpe opened this issue Apr 22, 2022 · 5 comments · Fixed by #30979
Labels
bug confirmed a Terraform Core team member has reproduced this issue

Comments

@josh-m-sharpe
Copy link

josh-m-sharpe commented Apr 22, 2022

Terraform Version

v1.1.7

Screenshot of destruction:

josh_ip-10-20-11-55__

Expected Behavior

not destroyed resources

Actual Behavior

resources destroyed. site down. 💥

Steps to Reproduce

Press ctrl-c - see graceful message. wait.

Additional Context

As noted in the screenshot there was a long pause after the Graceful shutdown message showed up to when it continued on and destroyed resources. That is, it had plenty of time to think about doing things and still didn't shutdown.

Counterintuitively, I'm pretty sure if I held ctrl-c down and sent the process 10s or 100s of kill signals it would've died "ungracefully" and the resource would not have been destroyed. This would've been a much better scenario.

@josh-m-sharpe josh-m-sharpe added bug new new issue not yet triaged labels Apr 22, 2022
@apparentlymart
Copy link
Member

apparentlymart commented Apr 25, 2022

Hi @josh-m-sharpe! Thanks for reporting this.

It does seem like the graceful shutdown began at a bad time here where Terraform wasn't prepared to deal with it; I guess it probably cancelled an operation that wasn't actually needed to continue and so Terraform didn't get its usual opportunity to see an operation fail with an error as a result of being cancelled and thus didn't stop running early as it should have.

You mentioned this went on to actually apply the plan. Did Terraform fail to show the confirmation prompt in this case, or had you intentionally disabled confirmation with -auto-approve?

Since this seems to be a timing-sensitive problem I bet it will be hard to reliably reproduce, but it looks like the cancellation here might've arrived right at the end of the planning process, when Terraform had already done all of the cancelable work anyway, and so the plan operation was still active (and subject to cancellation) but was already in the process of returning the successful plan and so didn't check the cancellation state again.

If so, I expect we need to find some way to propagate the cancellation channel from the plan operation into the confirmation prompt and subsequent apply operation so that once it is cancelled it stays cancelled and will not be able to start any new cancelable operations without immediately encountering the cancellation error.

@josh-m-sharpe
Copy link
Author

I ride the lighting -- so, yes, I did run apply -auto-approve in this case.

I'm not 100% sure if it had completed planning when I ctrl-c'd.

It would seem that a ctrl-c any time before the operation/execution phase should prevent that phase from starting.

@apparentlymart
Copy link
Member

I think probably what we'll need to do here is assume my theory above is correct for the moment, and find some way to modify the code to simulate that situation by artifically triggering the cancellation sequence as just the wrong moment (after planning has completed all of the network operations but the apply walk hasn't yet started) and see if that causes the cancellation to be ignored.

If it does, that would confirm that my hunch is valid and we can work on figuring out why the cancellation state doesn't survive between the plan phase and the apply phase. (My initial guess: we're creating a new context for each walk and so the fact that the context for the plan walk was cancelled doesn't affect the freshly-created context for the apply walk. But I've not checked that.)

@apparentlymart apparentlymart added waiting for reproduction unable to reproduce issue without further information and removed new new issue not yet triaged labels Apr 28, 2022
@jbardin
Copy link
Member

jbardin commented May 2, 2022

The behavior shown here can be explained by the combination of a couple factors.
The "long pause" is the result of aws the provider is not cancelling the operations which are already in progress. While we usually leave decision of how to cancel up to the providers, I believe we do have an open issue to be more forceful in the cancellation of a plan where the result is going to be discarded anyway.

After the plan is cancelled however we hit the root cause of this bug -- the next step where the context is checked when running apply with no prior plan is during the plan confirmation, so when the confirmation is skipped, so is the check for cancellation.

@jbardin jbardin added confirmed a Terraform Core team member has reproduced this issue and removed waiting for reproduction unable to reproduce issue without further information labels May 2, 2022
@github-actions
Copy link

github-actions bot commented Jun 2, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug confirmed a Terraform Core team member has reproduced this issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants