New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable state locking for plan/apply/destroy/refresh/taint/untaint #11686
Conversation
Have the LocalBackend lock the state during operations, and enble this for the apply comand.
This makes it more apparent that the information passed in isn't required nor will it conform to any standard. There may be call sites that can't provide good contextual info, and we don't want to count on that value.
Previously when runnign a plan with no exitsing state, the plan would be written out and then backed up on the next WriteState by another BackupState instance. Since we now maintain a single State instance thoughout an operation, the backup happens before any state exists so no backup file is created. This is OK, as the backup state the tests were checking for is from the plan file, which already exists separate from the state.
We are not going to handle lock expiration, at least at this time, so remove the Expires fields to avoid any confusion.
Depending on the implementation, local state locks may be reentrant within the same process. Use a separate process to test locked state files.
Verify that these operations fail when a state file is locked.
this way we can signal it directly to amke sure it exits cleanly.
add missing lock-state flag to untaint
Close and remove the file descriptor from LocalState if we Unlock the state. Also remove an empty state file if we created it and it was never written to. This is mostly to clean up after tests, but doesn't hurt to not leave empty files around.
👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor changes, overall looks amazing.
@@ -22,7 +22,8 @@ type Backend interface { | |||
|
|||
// State returns the current state for this environment. This state may | |||
// not be loaded locally: the proper APIs should be called on state.State | |||
// to load the state. | |||
// to load the state. If the state.State is a state.Locker, it's up to the | |||
// caller to call Lock and Unlock as needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the comment, this is the correct behavior I wanted!
backend/local/backend_apply.go
Outdated
defer func() { | ||
if s, ok := opState.(state.Locker); op.LockState && ok { | ||
if err := s.Unlock(); err != nil { | ||
log.Printf("[ERROR]: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should multierror append the error to runningOp.Err
so that the error shows up to the end user. I would make a long const (like other error messages in the package) for it so that the user knows what to do: verify everything is okay, manually call terraform unlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes. I was thinking this would only apply to LocalState, where Unlock shouldn't error, and the lock is gone on exit anyway.
command/meta.go
Outdated
statePath string | ||
stateOutPath string | ||
backupPath string | ||
parallelism int | ||
shadow bool | ||
provider string | ||
lockState bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: let's name this stateLock
just to match the other state-related fields above (statePath, stateOutPath)
command/apply.go
Outdated
@@ -272,6 +274,8 @@ Options: | |||
modifying. Defaults to the "-state-out" path with | |||
".backup" extension. Set to "-" to disable backup. | |||
|
|||
-lock-state=true Lock the state file when locking is supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bike shed: Let's just use -lock
. Maybe we'll lock more in the future maybe we won't but I think its clear regardless and I'd prefer the aesthetic of it.
@@ -99,6 +103,10 @@ type Operation struct { | |||
// Input/output/control options. | |||
UIIn terraform.UIInput | |||
UIOut terraform.UIOutput | |||
|
|||
// If LockState is true, the Operation must Lock any | |||
// state.Lockers for its duration, and Unlock when complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add to the comment: if using backend.Local
, it is up to the caller to unlock the state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think backend.Local
is an exception here. The state is acquired and used solely within Enhanced.Operation
, which is done for backend.Local
too. I did note that Backend.State
expects the caller to lock the state as needed, and that's called from within an Operation.
Have the defer'ed State.Unlock call append any error to the RunningOperation.Err field. Local error would be rare and self-correcting, but when the backend.Local is using a remote state the error may require user intervention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good. One thing I want to just leave as a note here but shouldn't block this merge: we should think through a UX if locking is taking awhile (for whatever reason, the network).
In Otto I had created a package that was basically "do this, but if it takes longer than N (time.Duration) then show this message". I think bringing that as a helper package here and using that for cases like this would be ideal. In the average case, state locking should be fast enough, if its taking longer than 100ms or something we should probably inform the user that we're trying to acquire a state lock. I could see some users terraform <op>
hanging and being curious why.
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
This enables the locking of state through the command UI.
This does change the behavior of 2 tests. Previously when running a plan with no existing state, the plan would be written out and then backed up on the next WriteState by another BackupState instance. Since we now maintain a single State instance throughout an operation, the backup happens before any state exists so no backup file is created. This shouldn't be a problem, as there really was nothing that required backing up. Now those tests will create the state file before running.
The lock/unlock terraform commands will be added in another PR. Only local state is supported so far, so the commands aren't yet required.