terraform lock resource #26422

steeling · 2020-09-29T23:11:46Z

Hi there,

Apologies if this is already possible, but I don't see a command listed here

I'd like to propose adding a

terraform lock -target=resource

command so that a user can lock a resource to prevent both automation and other users from making changes during outages. This is a principal taken from lock-out/tag-out used in industrial equipment maintenance, and applied to software maintenance.

The text was updated successfully, but these errors were encountered:

steeling · 2020-09-29T23:18:48Z

I'm not sure if terraform supports locking by resource, or just locks the entire state file. If currently doing the latter, even adding a lock command for the entire file would be helpful.

Additionally, adding a lock command could help if I want to do multiple commands transactionally. ie: from #26423

tf lock
tf check-for-diff -lock=false
tf apply -lock=false
tf force-unlock

apparentlymart · 2020-10-01T23:09:29Z

Hi @steeling! Thanks for this enhancement request.

Terraform does indeed currently model locking as a whole-workspace idea (usually implemented by locking the object that's storing the state, as you mentioned). Some of the locking implementations are also unable to hold a lock without keeping a terraform process running to hold it, and so that's why Terraform doesn't currently have a command to just create a lock without its lifecycle being connected to some other operation.

A possible compromise here could be a command that takes the lock and then blocks at the terminal until it is interrupted by something like Ctrl+C, so you can therefore hold a Terraform lock even though Terraform isn't currently actually doing anything, but the terraform process still exists to hold it.

I think you could emulate this today by making a throwaway change to your configuration, running terraform apply, and then leaving Terraform waiting for confirmation while you do something else; Terraform holds the lock while it awaits approval for the plan, so you can in principle use it as a weird way to grab a lock and then eventually just say "no" at the confirmation prompt to release the lock without changing anything.

With all of that said, it would of course not help very much with the "running multiple Terraform commands transactionally" idea because in that case you explicitly want the lock to outlive a particular terraform process, and for those other commands to somehow pick up the same lock rather than trying to create a new one (which would otherwise deadlock).

steeling · 2020-10-02T16:56:26Z

Hey @apparentlymart, thanks for the detailed response! Would you mind explaining how a separate process determines how the lock is currently being held? Is it implementation specific depending on where the state is stored (ie: 1 impl for azure blob store, and another for GCS, or something more generic?)

Ya I don't think the running process would meet our needs unfortunately. Also instead of passing the lock from one process to another, I think we could model it like code, where I grab the lock, the other terraform actions do things without the lock (or even without knowledge that the lock is held, ie: supply the -lock=false flag. ie: consider the following golang psuedo code:

var mu sync.Mutex
mu.Lock()
defer mu.Unlock()
diffs, err := terraform.Reconcile(lock=false) # doesn't know lock is held
if err != nil {
  return err
}
if !diffs {
  terraform.Apply(lock=false)
}
return

Here's a thought on how this could be accomplished given the current locking mechanisms:

Every command that currently grabs the lock would do the following. Supplying -lock=false would skip* steps 1 & 2:

Grab the lock via the running process (as is currently done)
Check a new field on the state lock_status to determine if it is locked asyncrhonously
If not, continue with the operation.
Release the lock (same as is currently done)

The lock/unlock command would be a special case:

Grab the lock via the running process (as is currently done)
Check a new field on the state lock_status to determine if it is locked asyncrhonously
Set the lock_status to lock/unlock (return error if dest lock_status == src lock_status)
Release the lock (same as is currently done)

*Note: on skipping steps 1 & 2, it might make more sense to skip just 2.. I find it hard to imagine a scenario where one would want commands to race with each other, although maybe I'm just not thinking hard enough :)

Eventually lock_status could also be moved to each individual terraform resource

Thanks in advance for entertaining this discussion!

steeling · 2020-10-02T21:08:51Z

looking at some of the implemenations I can answer my own question above on the locking mechanism being implementation specific. Following up on that, the above pseudo code is only necessary for those specific implementations, while the rest (majority?) can just grab the lock and return.

steeling · 2020-10-12T22:47:48Z

@apparentlymart, looking into this more, it seems like terraform is doing something more complicated than simply grabbing the resource lock, ie: on an azurerm backend, if I grab the blob lease, and do tf plan -lock=false, I get:

Error: Error loading state: failed to lock azure state: 2 errors occurred:
* state blob is already locked
* blob metadata "terraformlockid" was empty

This seems like a pretty basic feature to ensure transactionality between multiple requests, and allowing a simple mechanism for oncall ops to prevent automation from rolling forward.

apparentlymart · 2020-10-13T00:34:03Z

Hi @steeling,

The backends all have pretty different implementations of the locking interfaces with different requirements and tradeoffs, and all of them have been through many iterations to get their behavior right against the quirks of each service, so unfortunately I don't think we can consider any change to the locking model to be a "basic feature". That doesn't mean it isn't a valid feature request, but it does mean it will require a considerable design effort and is something we're unlikely to tackle in the near future due to our focus being elsewhere.

steeling · 2020-10-13T16:12:52Z

Hi @apparentlymart, thanks for the reply! That's very reasonable :)

Submitted #26572 to see if I can poke around in this space.

Also submitted #26561 to fix azure force-unlock, which doesn't work in non-default workspaces

steeling added enhancement new new issue not yet triaged labels Sep 29, 2020

steeling mentioned this issue Oct 13, 2020

simplify the backend interface; promote locking to a first class citizen #26572

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

terraform lock resource #26422

terraform lock resource #26422

steeling commented Sep 29, 2020

steeling commented Sep 29, 2020

apparentlymart commented Oct 1, 2020

steeling commented Oct 2, 2020 •

edited

steeling commented Oct 2, 2020

steeling commented Oct 12, 2020

apparentlymart commented Oct 13, 2020

steeling commented Oct 13, 2020 •

edited

terraform lock resource #26422

terraform lock resource #26422

Comments

steeling commented Sep 29, 2020

steeling commented Sep 29, 2020

apparentlymart commented Oct 1, 2020

steeling commented Oct 2, 2020 • edited

steeling commented Oct 2, 2020

steeling commented Oct 12, 2020

apparentlymart commented Oct 13, 2020

steeling commented Oct 13, 2020 • edited

steeling commented Oct 2, 2020 •

edited

steeling commented Oct 13, 2020 •

edited