Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terraform lock resource #26422

Open
steeling opened this issue Sep 29, 2020 · 7 comments
Open

terraform lock resource #26422

steeling opened this issue Sep 29, 2020 · 7 comments
Labels
enhancement new new issue not yet triaged

Comments

@steeling
Copy link
Contributor

Hi there,

Apologies if this is already possible, but I don't see a command listed here

I'd like to propose adding a

terraform lock -target=resource

command so that a user can lock a resource to prevent both automation and other users from making changes during outages. This is a principal taken from lock-out/tag-out used in industrial equipment maintenance, and applied to software maintenance.

@steeling steeling added enhancement new new issue not yet triaged labels Sep 29, 2020
@steeling
Copy link
Contributor Author

I'm not sure if terraform supports locking by resource, or just locks the entire state file. If currently doing the latter, even adding a lock command for the entire file would be helpful.

Additionally, adding a lock command could help if I want to do multiple commands transactionally. ie: from #26423

tf lock
tf check-for-diff -lock=false
tf apply -lock=false
tf force-unlock

@apparentlymart
Copy link
Member

Hi @steeling! Thanks for this enhancement request.

Terraform does indeed currently model locking as a whole-workspace idea (usually implemented by locking the object that's storing the state, as you mentioned). Some of the locking implementations are also unable to hold a lock without keeping a terraform process running to hold it, and so that's why Terraform doesn't currently have a command to just create a lock without its lifecycle being connected to some other operation.

A possible compromise here could be a command that takes the lock and then blocks at the terminal until it is interrupted by something like Ctrl+C, so you can therefore hold a Terraform lock even though Terraform isn't currently actually doing anything, but the terraform process still exists to hold it.

I think you could emulate this today by making a throwaway change to your configuration, running terraform apply, and then leaving Terraform waiting for confirmation while you do something else; Terraform holds the lock while it awaits approval for the plan, so you can in principle use it as a weird way to grab a lock and then eventually just say "no" at the confirmation prompt to release the lock without changing anything.

With all of that said, it would of course not help very much with the "running multiple Terraform commands transactionally" idea because in that case you explicitly want the lock to outlive a particular terraform process, and for those other commands to somehow pick up the same lock rather than trying to create a new one (which would otherwise deadlock).

@steeling
Copy link
Contributor Author

steeling commented Oct 2, 2020

Hey @apparentlymart, thanks for the detailed response! Would you mind explaining how a separate process determines how the lock is currently being held? Is it implementation specific depending on where the state is stored (ie: 1 impl for azure blob store, and another for GCS, or something more generic?)

Ya I don't think the running process would meet our needs unfortunately. Also instead of passing the lock from one process to another, I think we could model it like code, where I grab the lock, the other terraform actions do things without the lock (or even without knowledge that the lock is held, ie: supply the -lock=false flag. ie: consider the following golang psuedo code:

var mu sync.Mutex
mu.Lock()
defer mu.Unlock()
diffs, err := terraform.Reconcile(lock=false) # doesn't know lock is held
if err != nil {
  return err
}
if !diffs {
  terraform.Apply(lock=false)
}
return

Here's a thought on how this could be accomplished given the current locking mechanisms:

Every command that currently grabs the lock would do the following. Supplying -lock=false would skip* steps 1 & 2:

  1. Grab the lock via the running process (as is currently done)
  2. Check a new field on the state lock_status to determine if it is locked asyncrhonously
  3. If not, continue with the operation.
  4. Release the lock (same as is currently done)

The lock/unlock command would be a special case:

  1. Grab the lock via the running process (as is currently done)
  2. Check a new field on the state lock_status to determine if it is locked asyncrhonously
  3. Set the lock_status to lock/unlock (return error if dest lock_status == src lock_status)
  4. Release the lock (same as is currently done)

*Note: on skipping steps 1 & 2, it might make more sense to skip just 2.. I find it hard to imagine a scenario where one would want commands to race with each other, although maybe I'm just not thinking hard enough :)

Eventually lock_status could also be moved to each individual terraform resource

Thanks in advance for entertaining this discussion!

@steeling
Copy link
Contributor Author

steeling commented Oct 2, 2020

looking at some of the implemenations I can answer my own question above on the locking mechanism being implementation specific. Following up on that, the above pseudo code is only necessary for those specific implementations, while the rest (majority?) can just grab the lock and return.

@steeling
Copy link
Contributor Author

@apparentlymart, looking into this more, it seems like terraform is doing something more complicated than simply grabbing the resource lock, ie: on an azurerm backend, if I grab the blob lease, and do tf plan -lock=false, I get:

Error: Error loading state: failed to lock azure state: 2 errors occurred:
* state blob is already locked
* blob metadata "terraformlockid" was empty

This seems like a pretty basic feature to ensure transactionality between multiple requests, and allowing a simple mechanism for oncall ops to prevent automation from rolling forward.

@apparentlymart
Copy link
Member

Hi @steeling,

The backends all have pretty different implementations of the locking interfaces with different requirements and tradeoffs, and all of them have been through many iterations to get their behavior right against the quirks of each service, so unfortunately I don't think we can consider any change to the locking model to be a "basic feature". That doesn't mean it isn't a valid feature request, but it does mean it will require a considerable design effort and is something we're unlikely to tackle in the near future due to our focus being elsewhere.

@steeling
Copy link
Contributor Author

steeling commented Oct 13, 2020

Hi @apparentlymart, thanks for the reply! That's very reasonable :)

Submitted #26572 to see if I can poke around in this space.

Also submitted #26561 to fix azure force-unlock, which doesn't work in non-default workspaces

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new new issue not yet triaged
Projects
None yet
Development

No branches or pull requests

2 participants