-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Every terraform import
command triggers reading all data sources – leads to module refactoring taking an inordinately long time
#32385
Comments
Thanks for the issue. In the v1.3 release we made some internal changes to import that make it share more code with plan and fix some bugs. This was not expected to have any negative user-facing consequences, since import is usually a one-off operation. Now that import is constructing the plan graph, I think adding a The second of your proposals is covered by #22219. |
Having now actually checked the code, it seems import is not running a full refresh: @andersthorbeck why do you believe a full refresh is being run? Can you share trace logs? It's possible that another v1.3-related change is the cause of the behaviour you are observing. Could it be that reading all of the data sources takes a long time? |
I'm seeing this in my scripts that I use to bulk task-def update my ECS clusters. I do a |
@kmoe its a lot of |
@kmoe Upon closer inspection, you're right, the I created a dummy example with the follow Terraform configuration (with the Azure subscription ID pseudonymized): terraform {
required_version = "1.3.6"
backend "azurerm" {
subscription_id = "12345678-1234-1234-1234-1234567890ab"
resource_group_name = "thorbecka-clickops"
storage_account_name = "thorbeckasandbox"
container_name = "tfstate"
key = "import-refresh.tfstate"
}
required_providers {
random = {
source = "hashicorp/random"
}
azurerm = {
source = "hashicorp/azurerm"
version = "3.35.0"
}
}
}
provider "azurerm" {
subscription_id = "12345678-1234-1234-1234-1234567890ab"
features {}
}
data "azurerm_resource_group" "sandbox" {
name = "thorbecka-clickops"
}
data "azurerm_storage_account" "sandbox" {
name = "thorbeckasandbox"
resource_group_name = data.azurerm_resource_group.sandbox.name
}
resource "azurerm_storage_container" "dummy_data" {
name = "dummydata"
storage_account_name = data.azurerm_storage_account.sandbox.name
container_access_type = "private"
}
resource "random_integer" "foo" {
min = 1
max = 1000
}
resource "random_integer" "bar" {
min = 1
max = 1000
}
resource "random_integer" "baz" {
min = 1
max = 1000
}
resource "azurerm_storage_blob" "foo" {
name = "foo.txt"
storage_account_name = data.azurerm_storage_account.sandbox.name
storage_container_name = azurerm_storage_container.dummy_data.name
type = "Block"
source_content = random_integer.foo.result
}
resource "azurerm_storage_blob" "bar" {
name = "bar.txt"
storage_account_name = data.azurerm_storage_account.sandbox.name
storage_container_name = azurerm_storage_container.dummy_data.name
type = "Block"
source_content = random_integer.bar.result
}
resource "azurerm_storage_blob" "baz" {
name = "baz.txt"
storage_account_name = data.azurerm_storage_account.sandbox.name
storage_container_name = azurerm_storage_container.dummy_data.name
type = "Block"
source_content = random_integer.baz.result
} With
As can be seen from the above logs, the data blocks are read for every As a (redacted) real-world example, see the following, but with 10 x as many modules with nested data blocks, and multiplied by hundreds of resources to be imported.
As mentioned in the issue description, the import script from whence this comparatively short redacted log snippet above was fetched ran for 6 hours before we cancelled it. Yes, restructuring the use of nested data blocks in child modules may solve some of this, but even taking that into account, the amount of wait-time per resource to import (which for module refactoring will be a great number) is untenable. |
terraform import
command triggers a full state refresh – leads to module refactoring taking an inordinately long timeterraform import
command triggers reading all data sources – leads to module refactoring taking an inordinately long time
Thanks for the example - that makes sense. Leaving this issue open as it is well described. |
Hi @kmoe, any updates on when the |
Yeah this would be very useful! |
Thanks for your interest in this issue! This is just a reminder to please avoid "+1" comments, and to use the upvote mechanism (click or add the 👍 emoji to the original post) to indicate your support for this issue. Thanks again for the feedback! As an addendum, I am not aware of a timeline for this issue to be resolved. Thanks again for your continued interest in this issue. |
Hitting the same problem with a given scenario: terraform cloud markup project creates gcp project for other terraform projects to import into empty state, but that can not be done because TF tries to read secrets during import and we get errors because those secrets are not there. |
Please see the new feature released in 1.5, |
Fantastic and elegant solution! Tried it now, it works perfectly. |
@andersthorbeck do we have any blog which shows with terraform 1.5 with your above example I am also seeing lagging in 1.4 but seems like this thread is saying that 1.5 solves issues so I want to try it too let me know the exact steps to follow to speed up ? |
I want to see a complex import block for multiple resources to be imported. |
@thatsk The documentation was linked to in the comment which closed this issue: #32385 (comment). Imports can now be done in the terraform configuration itself, via the new Importing multiple resources would be multiple |
In case it helps we have a tutorial showing how to import a resource, which covers some edge cases and exceptions: https://developer.hashicorp.com/terraform/tutorials/state/state-import |
Can the |
(You may have noticed that Terraform does tend to leave behind results from reading data resources in the state after apply is complete, but those are there only to support unusual situations like debugging in |
@onetwopunch, could you open a new issue (or two) for the following?
Which resources? It should work for all resources where
Does this only happen with the |
Whilst I understand that data objects need to be read or they have no data; I am not clear why they need data for an unrelated resource to be successfully imported? The common case of a CLI import should just check that the destination resource dose not have state; it doesn't use anything else from the existing state, plan or data resources? |
@kmoe this is not the same case as onetwopunch is referring to, but import can fail entirely because it tries to read data regardless of dependencies. Consider a scenario where a resource is created and later read as a data source (required to get IP from azurerm_public_ip). Terraform will try to immediately read on import and fail because obviously the resource and data source doesn't exist |
For I haven't tried to import all resources at once with v1.5+ because it is a complex single shot operation and I just prefer to see how the plan changes after one or few imports. And this is possible in v1.0.8 Might be a bit offtopic but if you're struggling with imports that fail when refreshing state, give it a try with an older terraform version. |
As a workaround, you can comment/remove all resources except the ones you are trying to import. Worked for me in a pinch. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Terraform Version
Use Cases
As Terraform root modules/states grow in size, they get to a point where they become too unwieldy and need to be split up until several smaller modules. In order to split up such modules, you need to run the
terraform import
command on hundreds of resources in the new module, andterraform state rm
on the corresponding resources in the old module. Moreover, working as part of a larger team/organization, you want this migration to be completed as quickly as possible, to minimize the chances that others might be running interferingterraform apply
commands against these two modules at the same time, which may lead to deletion of the resources you were trying to move.Since some weeks or months ago, there seems to have been a change introduced where the
terraform import
command automatically refreshes the entire state before it actually performs the requested import. This did not use to be the case. When splitting very large Terraform states, this is very detrimental, as it triggers reads against remote state backends and providers for hundreds of resources, for every single resource to be imported. In other words, importing N resources has now become an O(N^2) operation. We recently attempted to split out a part of a large module into a smaller cohesive module, but aborted the attempt after the import script had taken 6 hours without an end in sight.From my recollection, this automatic refresh as part of the
terraform import
has been observed on both thegithub
,azurerm
andazuread
providers, so the behaviour seems to come from Terraform Core. I have not found any mention of this change in the Terraform Core CHANGELOG, so I cannot find specifically when and how the change was introduced, but I believe it was introduced at some point between at least 2022-05-03 and 2022-11-25 (probably closed to the later end).Attempted Solutions
I haven't really found any good workarounds to this, short of accepting that splitting up and migrating parts of terraform states will take ages, and that you're susceptible to ruining your resources in the meantime. There doesn't seem to be any option to opt out of the automatic refresh on every single call to
terraform import
.Proposal
I envision two possible solutions to this:
-no-refresh
(name up for debate) to theterraform import
command to disable the automatic refresh before attempting the import, and leave it up to the caller to have manually performed a refresh before the call toterraform import
.terraform import
command to take a list of resources to be imported, not just one at a time, and perform the automatic refresh only once per call to the command, not before each supplied resource. The resources could be supplied via accepting multiple (ADDR
ID
) argument pairs, or (perhaps more sensibly) via an input file where each line contains oneADDR
ID
argument pair.References
I have not found any related GitHub issues or pull requests.
The text was updated successfully, but these errors were encountered: