New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant performance degradation with Terraform 1.3.x and local state #32060
Comments
Thanks for reporting this, @danischm. I expect that the root cause here is that the local state implementation doesn't implement the state storage API as documented: The state storage API distinguishes between creating a new state snapshot in memory vs. persisting the latest snapshot to durable storage. Terraform Core repeatedly calls the first of these operations during its work, but calls the "persist" operation much less often on the assumption that it is significantly more expensive. Unfortunately the local backend treats the first operation as a request to persist on local disk and the second to be a no-op. We've been aware of this for some time but have not acted to correct it because writing to local disk is generally relatively faster than accessing an API over the network and local state is primarily for initial development until someone is ready to activate a proper state storage mechanism. However, we can see here that the change in treatment of "no-op" operations had caused Terraform Core to call the "update snapshot in memory" operation significantly more often than before when there are many resource instances, which makes the local state storage design flaw more significant. As a short-term fix for the v1.3 series I expect we can lightly modify the behaviour to skip creating a new in-memory snapshot for "no-op" changes, although we will need to watch out for the special situation where the resource instance itself isn't changing but its precondition and postcondition results are changing: in that case we should still record the updated check results table. It might also be time to fix this long-standing design flaw in the local state implementation so it implements equivalent behavior to all of the other storage implementations. Then we are less likely to be caught out by local-storage-specific regressions in future work. Thanks again! |
Terraform does try to prevent writing state with no changes. Taking a quick look at the state output during the apply process, it seems we are alternating empty checks from |
I can confirm that this fixes the issue. Thanks for the quick turnaround! |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Terraform Version
Terraform Configuration Files
https://github.com/danischm/tf-perf-test
Debug Output
Expected Behavior
The second
terraform apply
should be completed in less than a second (using the provided configuration) with no changes to the local state file. This is the observed behavior with TF <1.3.0.Actual Behavior
With TF 1.3.0+ we can see that the local state file is rewritten once per every resource in the configuration even though there are no changes. This is also visible in the local state file where
serial
gets incremented by the number of resources. Compared to TF <1.3.0 where the second no-opterraform apply
takes less than a second, with TF 1.3.0+ the secondterraform apply
takes 3+ minutes to complete. Increasing the number of resources and/or the size of the state file worsens the issue.Steps to Reproduce
terraform apply -auto-approve -refresh=false
terraform apply -auto-approve -refresh=false
Additional Context
A git bisect has revealed the following 'bad' commit: 72dd14c
References
No response
The text was updated successfully, but these errors were encountered: