Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race in writing state during hard cancelation #17323

Merged
merged 4 commits into from Feb 13, 2018
Merged

Conversation

jbardin
Copy link
Member

@jbardin jbardin commented Feb 12, 2018

If the user wishes to interrupt the running operation, only the first
interrupt was communicated to the operation by canceling the provided
context. A second interrupt would start the shutdown process, but not
communicate this to the running operation. This order of events could
cause partial writes of state.

What would happen is that once the command returns, the plugin system
would stop the provider processes. Once the provider processes dies, all
pending Eval operations would return return with an error, and quickly
cause the operation to complete. Since the backend code didn't know that
the process was shutting down imminently, it would continue by
attempting to write out the last known state. Under the right
conditions, the process would exit part way through the writing of the
state file.

Add Stop and Cancel CancelFuncs to the RunningOperation, to allow it to
easily differentiate between the two signals. The backend will then be
able to detect a shutdown and abort more gracefully.

In order to ensure that the backend is not in the process of writing the
state out, the command will always attempt to wait for the process to
complete after cancellation.

fixes #17282

If the user wishes to interrupt the running operation, only the first
interrupt was communicated to the operation by canceling the provided
context. A second interrupt would start the shutdown process, but not
communicate this to the running operation. This order of event could
cause partial writes of state.

What would happen is that once the command returns, the plugin system
would stop the provider processes. Once the provider processes dies, all
pending Eval operations would return return with an error, and quickly
cause the operation to complete. Since the backend code didn't know that
the process was shutting down imminently, it would continue by
attempting to write out the last known state. Under the right
conditions, the process would exit part way through the writing of the
state file.

Add Stop and Cancel CancelFuncs to the RunningOperation, to allow it to
easily differentiate between the two signals. The backend will then be
able to detect a shutdown and abort more gracefully.

In order to ensure that the backend is not in the process of writing the
state out, the command will always attempt to wait for the process to
complete after cancellation.
Create a single command method for running and operation with
cancellation.
The error was being silently dropped before.

There is an interpolation error, because the plan is canceled before
some of the resources can be evaluated. There might be a better way to
handle this in the walk cancellation, but the behavior has not changed.

Make the plan and apply shutdown match implementation-wise
Moves the nested select statements for backend operations into a single
function. The only difference in this part was that apply called
PersistState, which should be harmless regardless of the type of
operation being run.
Copy link
Member

@apparentlymart apparentlymart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I'm a big fan of factoring out the details of this from all the individual commands into a single function.

@jbardin jbardin merged commit a65089f into master Feb 13, 2018
@jbardin jbardin deleted the jbardin/shutdown branch February 13, 2018 15:14
@ghost
Copy link

ghost commented Apr 4, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@hashicorp hashicorp locked and limited conversation to collaborators Apr 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aborting a long running terraform apply operation results in empty state file
2 participants