Significant performance degradation with Terraform 1.3.x and local state #32060

danischm · 2022-10-21T14:18:21Z

Terraform Version

Terraform v1.3.3

Terraform Configuration Files

https://github.com/danischm/tf-perf-test

Debug Output

2022-10-21T13:06:59.469+0200 [TRACE] statemgr.Filesystem: have already backed up original terraform.tfstate to terraform.tfstate.backup on a previous write
2022-10-21T13:06:59.567+0200 [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 29
2022-10-21T13:06:59.567+0200 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate
2022-10-21T13:06:59.639+0200 [TRACE] vertex "null_resource.test3": visit complete
2022-10-21T13:06:59.640+0200 [TRACE] statemgr.Filesystem: have already backed up original terraform.tfstate to terraform.tfstate.backup on a previous write
2022-10-21T13:06:59.737+0200 [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 30
2022-10-21T13:06:59.738+0200 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate
2022-10-21T13:06:59.804+0200 [TRACE] vertex "null_resource.test0": visit complete
2022-10-21T13:06:59.805+0200 [TRACE] statemgr.Filesystem: have already backed up original terraform.tfstate to terraform.tfstate.backup on a previous write
2022-10-21T13:06:59.900+0200 [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 31
2022-10-21T13:06:59.900+0200 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate
2022-10-21T13:06:59.968+0200 [TRACE] vertex "null_resource.test6": visit complete

Expected Behavior

The second terraform apply should be completed in less than a second (using the provided configuration) with no changes to the local state file. This is the observed behavior with TF <1.3.0.

Actual Behavior

With TF 1.3.0+ we can see that the local state file is rewritten once per every resource in the configuration even though there are no changes. This is also visible in the local state file where serial gets incremented by the number of resources. Compared to TF <1.3.0 where the second no-op terraform apply takes less than a second, with TF 1.3.0+ the second terraform apply takes 3+ minutes to complete. Increasing the number of resources and/or the size of the state file worsens the issue.

Steps to Reproduce

terraform apply -auto-approve -refresh=false
terraform apply -auto-approve -refresh=false

Additional Context

A git bisect has revealed the following 'bad' commit: 72dd14c

References

No response

The text was updated successfully, but these errors were encountered:

apparentlymart · 2022-10-21T15:55:33Z

Thanks for reporting this, @danischm.

I expect that the root cause here is that the local state implementation doesn't implement the state storage API as documented:

The state storage API distinguishes between creating a new state snapshot in memory vs. persisting the latest snapshot to durable storage. Terraform Core repeatedly calls the first of these operations during its work, but calls the "persist" operation much less often on the assumption that it is significantly more expensive.

Unfortunately the local backend treats the first operation as a request to persist on local disk and the second to be a no-op. We've been aware of this for some time but have not acted to correct it because writing to local disk is generally relatively faster than accessing an API over the network and local state is primarily for initial development until someone is ready to activate a proper state storage mechanism.

However, we can see here that the change in treatment of "no-op" operations had caused Terraform Core to call the "update snapshot in memory" operation significantly more often than before when there are many resource instances, which makes the local state storage design flaw more significant.

As a short-term fix for the v1.3 series I expect we can lightly modify the behaviour to skip creating a new in-memory snapshot for "no-op" changes, although we will need to watch out for the special situation where the resource instance itself isn't changing but its precondition and postcondition results are changing: in that case we should still record the updated check results table.

It might also be time to fix this long-standing design flaw in the local state implementation so it implements equivalent behavior to all of the other storage implementations. Then we are less likely to be caught out by local-storage-specific regressions in future work.

Thanks again!

jbardin · 2022-10-21T16:15:02Z

Terraform does try to prevent writing state with no changes. Taking a quick look at the state output during the apply process, it seems we are alternating empty checks from null to [], which is causing each NoOp instance to trigger a full state write. State values try to normalize whether they expect empty vs null values to prevent these types of changes, so there's probably something which can be changed at that layer too.

danischm · 2022-11-01T21:16:51Z

I can confirm that this fixes the issue. Thanks for the quick turnaround!

github-actions · 2022-12-02T02:14:52Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

danischm added bug new new issue not yet triaged labels Oct 21, 2022

jbardin added core and removed new new issue not yet triaged labels Oct 24, 2022

jbardin mentioned this issue Oct 25, 2022

v1.3 apply performance regressions with large numbers of instances #32071

Closed

jbardin self-assigned this Oct 25, 2022

jbardin mentioned this issue Oct 31, 2022

Apply optimizations for handling of condition checks #32123

Merged

jbardin closed this as completed in #32123 Nov 1, 2022

teamterraform mentioned this issue Nov 1, 2022

Backport of Apply optimizations for handling of condition checks into v1.3 #32134

Merged

rafabu mentioned this issue Nov 17, 2022

massive destroy performance degradation with 1.3.x #32234

Closed

github-actions bot locked as resolved and limited conversation to collaborators Dec 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant performance degradation with Terraform 1.3.x and local state #32060

Significant performance degradation with Terraform 1.3.x and local state #32060

danischm commented Oct 21, 2022

apparentlymart commented Oct 21, 2022 •

edited

jbardin commented Oct 21, 2022

danischm commented Nov 1, 2022

github-actions bot commented Dec 2, 2022

Significant performance degradation with Terraform 1.3.x and local state #32060

Significant performance degradation with Terraform 1.3.x and local state #32060

Comments

danischm commented Oct 21, 2022

Terraform Version

Terraform Configuration Files

Debug Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Additional Context

References

apparentlymart commented Oct 21, 2022 • edited

jbardin commented Oct 21, 2022

danischm commented Nov 1, 2022

github-actions bot commented Dec 2, 2022

apparentlymart commented Oct 21, 2022 •

edited