Preconditions and postconditions don't run during apply when their associated resource instance has no planned changes #31261

apparentlymart · 2022-06-17T14:38:33Z

Terraform Version

Terraform v1.3.0-dev
on linux_amd64
+ provider registry.terraform.io/hashicorp/null v3.1.1

I'm using a build from source in my local work tree here, but I've also minimally confirmed that this is reproducible with the v1.2.3 release build.

Terraform Configuration Files

The following is the final configuration that exhibits the bug, but see "Steps to Reproduce" below because this bug is only visible if we reach this configuration gradually over multiple steps:

resource "null_resource" "a" {
  triggers = {
    hello = "Hello!"
  }
}

resource "null_resource" "b" {
  lifecycle {
    precondition {
      condition     = null_resource.a.id == ""
      error_message = "The other resource should have an empty ID, for some iexplicable reason."
    }
  }
}

Debug Output

I've already root-caused this, so I'm going to skip this step and will post a follow-up comment after I open the issue explaining what's going on here.

Steps to Reproduce

Let's start with the following contrived configuration:

resource "null_resource" "a" {

}

resource "null_resource" "b" {
  lifecycle {
    precondition {
      condition     = null_resource.a.id == ""
      error_message = "The other resource should have an empty ID, for some iexplicable reason."
    }
  }
}

This is modelling the situation where a precondition of one resource depends on an attribute of another resource that can't be known until the apply step. null_resource fakes this by having id appear as unknown during planning and then filling in a timestamp during apply, and so I intentionally wrote the precondition above to fail in order to demonstrate this issue.

If we plan and apply this all at once then we can see Terraform check the precondition at the appropriate time:

$ terraform apply

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # null_resource.a will be created
  + resource "null_resource" "a" {
      + id = (known after apply)
    }

  # null_resource.b will be created
  + resource "null_resource" "b" {
      + id = (known after apply)
    }

Plan: 2 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

null_resource.a: Creating...
null_resource.a: Creation complete after 0s [id=2015836518349445544]
╷
│ Error: Resource precondition failed
│ 
│   on checks.tf line 41, in resource "null_resource" "b":
│   41:       condition     = null_resource.a.id == ""
│     ├────────────────
│     │ null_resource.a.id is "2015836518349445544"
│ 
│ The other resource should have an empty ID, for some iexplicable reason.
╵

However, things get more interesting if we arrive at this destination over multiple steps.

Let's remove the terraform.tfstate file to start fresh here and then use the following simpler configuration as the first step:

resource "null_resource" "a" {

}

resource "null_resource" "b" {
}

$ terraform apply

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # null_resource.a will be created
  + resource "null_resource" "a" {
      + id = (known after apply)
    }

  # null_resource.b will be created
  + resource "null_resource" "b" {
      + id = (known after apply)
    }

Plan: 2 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

null_resource.b: Creating...
null_resource.a: Creating...
null_resource.b: Creation complete after 0s [id=5480810909147783652]
null_resource.a: Creation complete after 0s [id=3935487483259785894]

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

So far so good! We have two useless resource instances.

Now let's return to the original configuration I opened with:

resource "null_resource" "a" {

}

resource "null_resource" "b" {
  lifecycle {
    precondition {
      condition     = null_resource.a.id == ""
      error_message = "The other resource should have an empty ID, for some iexplicable reason."
    }
  }
}

$ terraform apply
null_resource.a: Refreshing state... [id=3935487483259785894]
null_resource.b: Refreshing state... [id=5480810909147783652]
╷
│ Error: Resource precondition failed
│ 
│   on checks.tf line 41, in resource "null_resource" "b":
│   41:       condition     = null_resource.a.id == ""
│     ├────────────────
│     │ null_resource.a.id is "3935487483259785894"
│ 
│ The other resource should have an empty ID, for some iexplicable reason.
╵

This time Terraform was able to catch the problem during the planning phase, because we already know from the prior state that the id value is not the empty string. This is also expected behavior: Terraform eagerly checks the conditions as soon as it has enough information to do so, aiming to raise a problem during the plan phase whenever possible so that we can avoid bailing out partway through apply.

However, now let's see what happens if I also add triggers to null_resource.a at the same time, which simulates my having changed the configuration of that resource in a way that can only be resolved by replacing the remote object with a fresh one:

resource "null_resource" "a" {
  triggers = {
    hello = "Hello!"
  }
}

resource "null_resource" "b" {
  lifecycle {
    precondition {
      condition     = null_resource.a.id == ""
      error_message = "The other resource should have an empty ID, for some iexplicable reason."
    }
  }
}

$ terraform apply
null_resource.a: Refreshing state... [id=3935487483259785894]
null_resource.b: Refreshing state... [id=5480810909147783652]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # null_resource.a must be replaced
-/+ resource "null_resource" "a" {
      ~ id       = "3935487483259785894" -> (known after apply)
      + triggers = {
          + "hello" = "Hello!"
        } # forces replacement
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

null_resource.a: Destroying... [id=3935487483259785894]
null_resource.a: Destruction complete after 0s
null_resource.a: Creating...
null_resource.a: Creation complete after 0s [id=1106553121951240691]

Apply complete! Resources: 1 added, 0 changed, 1 destroyed.

Expected Behavior

Terraform should've checked the precondition on null_resource.b during the apply step, once the new null_resource.a.id became known, and raised an error about it not being an empty string.

Actual Behavior

Terraform didn't check the condition in either the plan phase or the apply phase. Instead, I need to re-run terraform apply to catch the problem during the next plan:

$ terraform apply
null_resource.a: Refreshing state... [id=1106553121951240691]
null_resource.b: Refreshing state... [id=5480810909147783652]
╷
│ Error: Resource precondition failed
│ 
│   on checks.tf line 43, in resource "null_resource" "b":
│   43:       condition     = null_resource.a.id == ""
│     ├────────────────
│     │ null_resource.a.id is "1106553121951240691"
│ 
│ The other resource should have an empty ID, for some iexplicable reason.
╵

If this condition were checking something real that affects the behavior of my infrastructure, I may have a problem I'm unaware of, which may confuse someone downstream trying to make another change because their plan will fail for a reason unrelated to what they modified.

The text was updated successfully, but these errors were encountered:

apparentlymart · 2022-06-17T14:41:10Z

I happen to know why this happens because I discovered this problem from reading the code as part of working on something else, and noticing the architectural problem before confirming that it led to this bug.

The root problem is that Terraform tries to optimize the apply step by only including graph nodes for resource instances that have actual changes (not "no-op" changes) in the plan. However, that doesn't take into account the fact that some resource behaviors ought to happen even if there isn't a pending change to a particular object, because that object must react to some changes made upstream that aren't reflected in the resource's own configuration arguments.

I think we could address this by just always putting every resource instance from the plan into the graph (even the ones marked as "no-op") and then handling the no-op-ness of the action during the evaluation of the graph node itself, skipping over the actions that would actually modify the remote object but still running all of the ancillary logic which deals with concerns like preconditions and postconditions.

However, our current apply node evaluation process wasn't designed to skip out the real action so surgically and so I expect it'll require at least a little refactoring to pull that off. I've not yet investigated exactly what that might look like.

My current work exploring some new condition-related capabilities also requires resolving this, so I may develop a possible fix as part of that but I'm currently working in a prototyping capacity in a significantly-modified Terraform Core and so it may take some work to adapt my prototype solution into something we could backport in isolation into the v1.2 series.

apparentlymart · 2022-06-20T18:27:52Z

I mentioned I would need at least a hacky solution to this bug for another thing I was working on, and that other thing is what turned out to be #31268, and so there are now two commits in that branch which seem to address this problem though at the expense of a non-trivial change to the separation of concerns for who deals with a resource instance having plans.NoOp as its action in the plan:

1e75266 (core: Create apply graph nodes even for no-op "changes")
f8e3286 (core: Do everything except the actual action for plans.NoOp)

At the very least we'd need to add some tests to these if we want to use them as the basis for a direct solution to this bug. If we intend to backport the fix to v1.2 then we'll probably also want to look for a less invasive way to get there, since the solution I used here is probably a bit too risky for a v1.2 patch release.

apparentlymart · 2022-07-21T16:54:53Z

I pulled the changes I mentioned in my previous comment, along with some new test cases, into a new PR #31491 so that we can consider it separately from the checks work, which is still in an exploratory phase.

github-actions · 2022-08-22T02:39:45Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

apparentlymart added bug core custom-conditions Feedback about the "variable_validation" experiment confirmed a Terraform Core team member has reproduced this issue v1.2 Issues (primarily bugs) reported against v1.2 releases labels Jun 17, 2022

apparentlymart added the explained a Terraform Core team member has described the root cause of this issue in code label Jun 17, 2022

alisdair mentioned this issue Jun 21, 2022

Data source postconditions using timestamp() do not reevaluate at apply time #31289

Closed

apparentlymart mentioned this issue Jul 21, 2022

Evaluate resource preconditions and postconditions during apply even if they have no planned change #31491

Merged

apparentlymart closed this as completed in #31491 Jul 22, 2022

github-actions bot locked as resolved and limited conversation to collaborators Aug 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preconditions and postconditions don't run during apply when their associated resource instance has no planned changes #31261

Preconditions and postconditions don't run during apply when their associated resource instance has no planned changes #31261

apparentlymart commented Jun 17, 2022

apparentlymart commented Jun 17, 2022

apparentlymart commented Jun 20, 2022

apparentlymart commented Jul 21, 2022

github-actions bot commented Aug 22, 2022

Preconditions and postconditions don't run during apply when their associated resource instance has no planned changes #31261

Preconditions and postconditions don't run during apply when their associated resource instance has no planned changes #31261

Comments

apparentlymart commented Jun 17, 2022

Terraform Version

Terraform Configuration Files

Debug Output

Steps to Reproduce

Expected Behavior

Actual Behavior

apparentlymart commented Jun 17, 2022

apparentlymart commented Jun 20, 2022

apparentlymart commented Jul 21, 2022

github-actions bot commented Aug 22, 2022