New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preconditions and postconditions don't run during apply when their associated resource instance has no planned changes #31261
Comments
I happen to know why this happens because I discovered this problem from reading the code as part of working on something else, and noticing the architectural problem before confirming that it led to this bug. The root problem is that Terraform tries to optimize the apply step by only including graph nodes for resource instances that have actual changes (not "no-op" changes) in the plan. However, that doesn't take into account the fact that some resource behaviors ought to happen even if there isn't a pending change to a particular object, because that object must react to some changes made upstream that aren't reflected in the resource's own configuration arguments. I think we could address this by just always putting every resource instance from the plan into the graph (even the ones marked as "no-op") and then handling the no-op-ness of the action during the evaluation of the graph node itself, skipping over the actions that would actually modify the remote object but still running all of the ancillary logic which deals with concerns like preconditions and postconditions. However, our current apply node evaluation process wasn't designed to skip out the real action so surgically and so I expect it'll require at least a little refactoring to pull that off. I've not yet investigated exactly what that might look like. My current work exploring some new condition-related capabilities also requires resolving this, so I may develop a possible fix as part of that but I'm currently working in a prototyping capacity in a significantly-modified Terraform Core and so it may take some work to adapt my prototype solution into something we could backport in isolation into the v1.2 series. |
I mentioned I would need at least a hacky solution to this bug for another thing I was working on, and that other thing is what turned out to be #31268, and so there are now two commits in that branch which seem to address this problem though at the expense of a non-trivial change to the separation of concerns for who deals with a resource instance having
At the very least we'd need to add some tests to these if we want to use them as the basis for a direct solution to this bug. If we intend to backport the fix to v1.2 then we'll probably also want to look for a less invasive way to get there, since the solution I used here is probably a bit too risky for a v1.2 patch release. |
I pulled the changes I mentioned in my previous comment, along with some new test cases, into a new PR #31491 so that we can consider it separately from the checks work, which is still in an exploratory phase. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Terraform Version
I'm using a build from source in my local work tree here, but I've also minimally confirmed that this is reproducible with the v1.2.3 release build.
Terraform Configuration Files
The following is the final configuration that exhibits the bug, but see "Steps to Reproduce" below because this bug is only visible if we reach this configuration gradually over multiple steps:
Debug Output
I've already root-caused this, so I'm going to skip this step and will post a follow-up comment after I open the issue explaining what's going on here.
Steps to Reproduce
Let's start with the following contrived configuration:
This is modelling the situation where a precondition of one resource depends on an attribute of another resource that can't be known until the apply step.
null_resource
fakes this by havingid
appear as unknown during planning and then filling in a timestamp during apply, and so I intentionally wrote theprecondition
above to fail in order to demonstrate this issue.If we plan and apply this all at once then we can see Terraform check the precondition at the appropriate time:
However, things get more interesting if we arrive at this destination over multiple steps.
Let's remove the
terraform.tfstate
file to start fresh here and then use the following simpler configuration as the first step:So far so good! We have two useless resource instances.
Now let's return to the original configuration I opened with:
This time Terraform was able to catch the problem during the planning phase, because we already know from the prior state that the
id
value is not the empty string. This is also expected behavior: Terraform eagerly checks the conditions as soon as it has enough information to do so, aiming to raise a problem during the plan phase whenever possible so that we can avoid bailing out partway through apply.However, now let's see what happens if I also add
triggers
tonull_resource.a
at the same time, which simulates my having changed the configuration of that resource in a way that can only be resolved by replacing the remote object with a fresh one:Expected Behavior
Terraform should've checked the precondition on
null_resource.b
during the apply step, once the newnull_resource.a.id
became known, and raised an error about it not being an empty string.Actual Behavior
Terraform didn't check the condition in either the plan phase or the apply phase. Instead, I need to re-run
terraform apply
to catch the problem during the next plan:If this condition were checking something real that affects the behavior of my infrastructure, I may have a problem I'm unaware of, which may confuse someone downstream trying to make another change because their plan will fail for a reason unrelated to what they modified.
The text was updated successfully, but these errors were encountered: