Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant performance degradation with Terraform 1.3.x and local state #32060

Closed
danischm opened this issue Oct 21, 2022 · 4 comments · Fixed by #32123
Closed

Significant performance degradation with Terraform 1.3.x and local state #32060

danischm opened this issue Oct 21, 2022 · 4 comments · Fixed by #32123
Assignees

Comments

@danischm
Copy link

Terraform Version

Terraform v1.3.3

Terraform Configuration Files

https://github.com/danischm/tf-perf-test

Debug Output

2022-10-21T13:06:59.469+0200 [TRACE] statemgr.Filesystem: have already backed up original terraform.tfstate to terraform.tfstate.backup on a previous write
2022-10-21T13:06:59.567+0200 [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 29
2022-10-21T13:06:59.567+0200 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate
2022-10-21T13:06:59.639+0200 [TRACE] vertex "null_resource.test3": visit complete
2022-10-21T13:06:59.640+0200 [TRACE] statemgr.Filesystem: have already backed up original terraform.tfstate to terraform.tfstate.backup on a previous write
2022-10-21T13:06:59.737+0200 [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 30
2022-10-21T13:06:59.738+0200 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate
2022-10-21T13:06:59.804+0200 [TRACE] vertex "null_resource.test0": visit complete
2022-10-21T13:06:59.805+0200 [TRACE] statemgr.Filesystem: have already backed up original terraform.tfstate to terraform.tfstate.backup on a previous write
2022-10-21T13:06:59.900+0200 [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 31
2022-10-21T13:06:59.900+0200 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate
2022-10-21T13:06:59.968+0200 [TRACE] vertex "null_resource.test6": visit complete

Expected Behavior

The second terraform apply should be completed in less than a second (using the provided configuration) with no changes to the local state file. This is the observed behavior with TF <1.3.0.

Actual Behavior

With TF 1.3.0+ we can see that the local state file is rewritten once per every resource in the configuration even though there are no changes. This is also visible in the local state file where serial gets incremented by the number of resources. Compared to TF <1.3.0 where the second no-op terraform apply takes less than a second, with TF 1.3.0+ the second terraform apply takes 3+ minutes to complete. Increasing the number of resources and/or the size of the state file worsens the issue.

Steps to Reproduce

  1. terraform apply -auto-approve -refresh=false
  2. terraform apply -auto-approve -refresh=false

Additional Context

A git bisect has revealed the following 'bad' commit: 72dd14c

References

No response

@danischm danischm added bug new new issue not yet triaged labels Oct 21, 2022
@apparentlymart
Copy link
Member

apparentlymart commented Oct 21, 2022

Thanks for reporting this, @danischm.

I expect that the root cause here is that the local state implementation doesn't implement the state storage API as documented:

The state storage API distinguishes between creating a new state snapshot in memory vs. persisting the latest snapshot to durable storage. Terraform Core repeatedly calls the first of these operations during its work, but calls the "persist" operation much less often on the assumption that it is significantly more expensive.

Unfortunately the local backend treats the first operation as a request to persist on local disk and the second to be a no-op. We've been aware of this for some time but have not acted to correct it because writing to local disk is generally relatively faster than accessing an API over the network and local state is primarily for initial development until someone is ready to activate a proper state storage mechanism.

However, we can see here that the change in treatment of "no-op" operations had caused Terraform Core to call the "update snapshot in memory" operation significantly more often than before when there are many resource instances, which makes the local state storage design flaw more significant.

As a short-term fix for the v1.3 series I expect we can lightly modify the behaviour to skip creating a new in-memory snapshot for "no-op" changes, although we will need to watch out for the special situation where the resource instance itself isn't changing but its precondition and postcondition results are changing: in that case we should still record the updated check results table.

It might also be time to fix this long-standing design flaw in the local state implementation so it implements equivalent behavior to all of the other storage implementations. Then we are less likely to be caught out by local-storage-specific regressions in future work.

Thanks again!

@jbardin
Copy link
Member

jbardin commented Oct 21, 2022

Terraform does try to prevent writing state with no changes. Taking a quick look at the state output during the apply process, it seems we are alternating empty checks from null to [], which is causing each NoOp instance to trigger a full state write. State values try to normalize whether they expect empty vs null values to prevent these types of changes, so there's probably something which can be changed at that layer too.

@danischm
Copy link
Author

danischm commented Nov 1, 2022

I can confirm that this fixes the issue. Thanks for the quick turnaround!

@github-actions
Copy link

github-actions bot commented Dec 2, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants