Data Resource Lifecycle Adjustments #17034

apparentlymart · 2018-01-04T20:00:43Z

Background Info

Back in #6598 we introduced the idea of data sources, allowing us to model reading data from external sources as a first-class concept. This has generally been a successful addition, with some great new patterns emerging around it.

However, the current lifecycle for data sources creates some minor problems. As currently implemented, data sources are read during the "refresh" phase that runs prior to creating a plan, except in two situations:

If any of the configuration arguments in the corresponding data block have <computed> values.
If depends_on is non-empty for the data resource.

In both of the above cases, the read action is deferred until the "apply" phase, which in turn causes all of the result attributes to appear as <computed> in the plan.

Unfortunately both of the above situations are problematic today, as a consequence of data sources being processed during "refresh". These problems are described in more detail in the following sections.

When Data Resource Arguments change

Because data resources are read during the "refresh" phase, references to attributes of resources are resolved from their value in state rather than their value in the resulting diff. This results in a "change lag" , where certain changes to configuration require two runs of terraform apply to fully take effect. The first run reads the data source with the old resource values and then updates the resource, while the second run reads the data source using the new resource values, possibly causing further cascading changes to other resources.

This is particularly tricky for situations where a resource has custom diff logic (via the mechanism added in #14887) that detects and reports changes to Computed attributes that are side-effects of the requested changes, since this can result in additional value changes that are not reflected in the data source read.

The most problematic case is when an attribute is marked as <computed> during an update: this should cause any dependent data resource to be deferred until apply time, but instead the old value is used to do the read and the computed value resulting from the change is not detected at all.

Trouble with `depends_on`

The current behavior for depends_on for data resources is essentially useless, since it always results in a "perma-diff". The reason for this is that depends_on doesn't give Terraform enough information to know what aspect of the dependency is important, and so it must conservatively always defer the read until the "apply" phase to preserve the guarantee that it happens after the resource changes are finalized.

Ideally we'd like the data resource read to be deferred until apply time only if there are pending changes to a dependent resource, but that is not currently possible because we process data resources during the "refresh" phase where resource diffs have not yet been created, and thus we cannot determine if a change is pending.

Proposed Change

The above issues can be addressed by moving data source processing into the "plan" phase.

This was seen as undesirable during the original data source design because it would cause the "plan" phase to, for the first time, require reaching out to external endpoints. However, we have since made that compromise in order to improve the robustness of planning in #14887. In this new context, reading from data sources during plan is consistent with our goal of having Terraform make whatever requests it needs to make in order to produce an accurate, comprehensive plan. #15895 proposes some adjustments to the behavior of terraform validate so that it can be used as the "offline static check" command, allowing terraform plan to be more complex and require valid credentials for remote APIs even when the implicit refresh is disabled.

Including data source reads in the plan graph means that Terraform will produce diffs for resources before attempting to read data sources that depend on them, which addresses all of the problems described above: the data source arguments can be interpolated from the new values as defined in the diff, rather than the old values present in state.

In particular, the diff can be consulted in order to decide whether a data resource must be deferred until the "apply" phase, allowing any new <computed> values to be considered, and allowing depends_on to defer only if there is a non-empty diff for the referenced resource(s).

Effect on the Managed Resource Lifecycle

This change does not greatly affect the lifecycle for managed resources, but it does restore the original property that the "refresh" phase is a state-only operation with the exception of provider configuration.

An interesting implication of this is that it is in principle safe to disregard any inter-resource dependencies for refresh purposes and instead construct a flatter graph where each resource depends only on its provider. This in turn can permit greater parallelism in read calls, and more opportunities for API request consolidation once #7388 is addressed.

Effect on `terraform refresh`

Since data sources are currently processed in the "refresh" phase, the terraform refresh command currently updates them. This can be useful in situations where a root module output depends on a data source attribute and its state is being consumed with terraform_remote_state.

Moving data source reads to the plan phase will mean that terraform refresh will no longer update them. The change proposed in #15419 can partially mitigate this by making data resource updates -- and corresponding output updates -- an explicit part of the normal terraform apply flow, which is a more desirable outcome for the reasons described in that issue.

To retain the ability to only update data resources, without applying other changes, we can add a new argument to terraform plan (and, by extension, terraform apply with no explicit plan file argument) -read-only, which produces a reduced plan graph that only includes the data resources.

Bringing data source refresh into the main plan+apply workflow is superior to the current terraform refresh approach because it allows the user to evaluate and approve the resulting changes to outputs, rather than just blindly accepting these updates and potentially disrupting downstream remote state consumers.

For users that still want to accept data source updates without a confirmation step, the command line terraform apply -read-only -auto-approve would be equivalent to the current terraform refresh behavior.

The text was updated successfully, but these errors were encountered:

bbakersmith · 2018-01-29T21:37:18Z

this would really help some of my use cases for data sources, great to see it being considered!

finferflu · 2018-10-22T14:29:49Z

Are there any updates on the progress of this? My projects would really benefit from this. Thanks!

alwaysastudent · 2018-12-20T19:48:16Z

I hope this is added into the new terraform releases.

The count for a data resource can potentially depend on a managed resource that isn't recorded in the state yet, in which case references to it will always return unknown. Ideally we'd do the data refreshes during the plan phase as discussed in #17034, which would avoid this problem by planning the managed resources in the same walk, but for now we'll just skip refreshing any data resources with an unknown count during refresh and defer that work to the apply phase, just as we'd do if there were unknown values in the main configuration for the data resource.

paulashbourne · 2019-06-05T22:17:07Z

Any updates on this proposed change? This would be very useful to avoid having plan spit out a bunch of proposed changes that don't actually result in anything once the data sources are read.

jclynny · 2020-01-26T14:57:01Z

Are there any updates that can be shared here? I've mentioned this in other issues, but the depends_on really shouldn't do a diff on a local file forever. It would be great if it could understand that it has run and doesn't need to run more than once unless there's a change in the underlying template for example.

odee30 · 2020-03-29T20:27:08Z

On first apply the following works fine. If a key is added to the list then this also works.

locals {
  keys = [
    "key1",
    "key2",
    "key3"
  ]
}

resource "random_uuid" "values" {
  for_each = toset(local.keys)
}

output "zipped_map" {
  value = length(local.keys) < 1 ? null : zipmap(
    local.keys,
    [for uuid in random_uuid.values : uuid.result]
  )
}

But, removing any key from the key list generates the following error:

Error: Error in function call
  on zipmaptest.tf line 19, in output "zipped_map":
  19:   value = length(local.keys) < 1 ? null : zipmap(
  20: 
  21: 
  22: 
    |----------------
    | local.keys is tuple with 2 elements
    | random_uuid.values is object with 3 attributes
Call to function "zipmap" failed: number of keys (2) does not match number of
values (3).

A couple of issues on Github (this and this) led me to this post. Please can you confirm that this is indeed the same issue as suggested above as it seems to refer to issues with data sources? Are there any known workarounds?

I have found that using a conditional check check the length of local.keys and return null if it is 0 allows for all the items to the removed from the keys list. This does allow me to update the list by first removing all the re-adding those that are needed, but this is not ideal. Without the conditional it is not possible to remove all the items from the list without generating the error.

Any help would be appreciated. Thanks.

syedhassaanahmed · 2020-04-15T09:01:04Z

Any update on this one? My original issue is #22005

binarymist · 2020-04-30T04:48:20Z

Any update? Also hashicorp/terraform-provider-archive#11

danieldreier · 2020-05-22T00:01:45Z

I'm very excited to announce that beta 1 of terraform 0.13.0 will be available on June 3rd, and will these changes. I've pinned an issue with more details about the beta program, and posted a discuss thread for folks who want to talk about it more.

apparentlymart · 2020-06-24T17:20:53Z

Terraform v0.13.0 will include some changes that address parts of the problem statement as I wrote it up originally:

Most notably, using depends_on in a data resource will no longer cause a "perma-diff". Instead, Terraform will generate a deferred read plan for the data resources only if depends_on includes (directly or indirectly) a managed resource that has a planned change action.
Terraform v0.13 can now read data resources during the plan phase, though we retained the read during the refresh phase for now because of an interaction the original proposal didn't cover: we must configure a provider before we can refresh with it, and a provider configuration can depend on a data resource. This part is therefore only partially solved.

We are planning in a future release to merge the refresh and plan phases entirely, so that they will both happen in a single graph walk. That will address the rest of the quirks my original write-up described, because it will allow refreshing and planning actions to depend on each other in a fully-general way, rather than our current situation where refreshing actions are artificially forced to always precede planning actions.

This old issue has outlived its usefulness as a design proposal because the plan to merge refresh and plan phases will achieve the same results in a different (better) way. For that reason, I'm going to close this even though 0.13 doesn't implement exactly what I described, and we'll create future issues/PRs for the newer incarnation of this set of work once we complete some more detailed design work for it.

ghost · 2020-07-25T01:52:14Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

apparentlymart added core enhancement proposal labels Jan 4, 2018

This was referenced Jan 4, 2018

Change on non-computed attributes does not properly delegate to data source #15958

Closed

Changed resource does not trigger changes in depended module in same run #16727

Closed

depends_on always triggers data source read #11806

Closed

perriea mentioned this issue Jan 25, 2018

Fixing version of terraform (0.10.X) and providers stefanprodan/k8s-scw-baremetal#2

Closed

This was referenced Feb 6, 2018

file(path) - flag for lazy evaluation of file's existence #10878

Closed

Mechanism for updating output variables #17280

Closed

terraform plan -refresh=true vs. terraform plan -refresh=false #17311

Open

jbardin mentioned this issue Feb 19, 2018

module output can not be interpolated in local-exec command #17337

Closed

apparentlymart mentioned this issue Feb 27, 2018

Terraform always shows archive_file in plan hashicorp/terraform-provider-archive#11

Closed

apparentlymart mentioned this issue May 14, 2018

"terraform plan" should consider refresh updates as needing to be applied #15419

Closed

jbardin mentioned this issue Jun 13, 2018

Module Cannot be Used in depends_on #18239

Closed

This was referenced Jul 12, 2018

External Not Run When Query Contains Computed Values hashicorp/terraform-provider-external#19

Closed

does not have attribute result.foo hashicorp/terraform-provider-external#4

Open

dippynark mentioned this issue Aug 3, 2018

WIP: Recover properly when bastion goes down jetstack/tarmak#396

Open

dippynark mentioned this issue Sep 10, 2018

[WIP] Improve Tarmak Terraform provider jetstack/tarmak#359

Open

mildwonkey mentioned this issue Oct 26, 2018

Data source with computed value does not have attribute outputs #18956

Closed

antonbabenko mentioned this issue Dec 28, 2018

Can't add depends_on or pass conditional expression to create, so can't add elb private IPs to security group. terraform-aws-modules/terraform-aws-security-group#89

Closed

bernadinm mentioned this issue Jan 2, 2019

terraform resource doesn't exist when using modules and data resource hashicorp/terraform-provider-aws#7014

Closed

doi-t mentioned this issue Jan 27, 2019

build lambda package and deploy lambda function with terraform doi-t/event-to-slack-lambda-layer#1

Merged

florisvdg mentioned this issue Feb 20, 2019

Data source secrethub_secret secrethub/terraform-provider-secrethub#6

Merged

apparentlymart mentioned this issue Feb 26, 2019

depends_on in data source causes upstream resources to reprovision #20484

Closed

jbardin mentioned this issue Apr 29, 2019

Problem using depends_on with splat output from external provider #21151

Closed

jbardin mentioned this issue May 21, 2019

Allow module to depend on another module #21378

Closed

This was referenced Jul 8, 2019

External Data Source always forcing a replacement #22005

Closed

template_file - terraform will perform the following actions when none needed #21545

Closed

apparentlymart mentioned this issue Jul 15, 2019

Null provider tirggers always hashicorp/terraform-provider-null#21

Closed

florisvdg mentioned this issue Sep 26, 2019

Data source caching hashicorp/terraform-plugin-sdk#134

Closed

fatbasstard mentioned this issue Oct 23, 2019

Add hotfix for Terraform "count" issue schubergphilis/terraform-aws-mcaf-role#1

Merged

hashibot mentioned this issue Nov 13, 2019

Data source template_file does not respect changed resources in variables #18724

Closed

This was referenced Nov 15, 2019

Outputs are evaluated before resource updates which can cause unnecessary errors #23365

Closed

collection functions on maps of for_each resources are evaluated before changes to the input for for_each #23395

Closed

jbardin mentioned this issue Dec 9, 2019

provider/aws: Route Table Data Source resource ordering #23604

Closed

sjpb mentioned this issue Jan 22, 2020

ohpc_hosts not updated on apply stackhpc/eiffel-ohpc#2

Closed

jbardin mentioned this issue Jan 29, 2020

Terraform destroy fails when Terraform apply failed #23886

Closed

This was referenced May 13, 2020

Evaluate data sources in plan when necessary #24904

Merged

Data source reference for_each resource is 1 timestep old #24948

Closed

tombuildsstuff mentioned this issue May 19, 2020

custom_data forces VM recreation every time even if the data is the same hashicorp/terraform-provider-azurerm#6756

Closed

jcarlson mentioned this issue May 21, 2020

Computed module output is not forcing new on dependent resource #25015

Closed

danieldreier added this to the v0.13.0 milestone May 21, 2020

tombuildsstuff mentioned this issue May 26, 2020

azurerm_key_vault data source causes Terraform to think its ID has changed hashicorp/terraform-provider-azurerm#7052

Closed

apparentlymart closed this as completed Jun 24, 2020

ghost locked and limited conversation to collaborators Jul 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Resource Lifecycle Adjustments #17034

Data Resource Lifecycle Adjustments #17034

apparentlymart commented Jan 4, 2018 •

edited

Loading

bbakersmith commented Jan 29, 2018

finferflu commented Oct 22, 2018

alwaysastudent commented Dec 20, 2018

paulashbourne commented Jun 5, 2019

jclynny commented Jan 26, 2020

odee30 commented Mar 29, 2020 •

edited

Loading

syedhassaanahmed commented Apr 15, 2020

binarymist commented Apr 30, 2020

danieldreier commented May 22, 2020

apparentlymart commented Jun 24, 2020

ghost commented Jul 25, 2020

Data Resource Lifecycle Adjustments #17034

Data Resource Lifecycle Adjustments #17034

Comments

apparentlymart commented Jan 4, 2018 • edited Loading

Background Info

When Data Resource Arguments change

Trouble with depends_on

Proposed Change

Effect on the Managed Resource Lifecycle

Effect on terraform refresh

bbakersmith commented Jan 29, 2018

finferflu commented Oct 22, 2018

alwaysastudent commented Dec 20, 2018

paulashbourne commented Jun 5, 2019

jclynny commented Jan 26, 2020

odee30 commented Mar 29, 2020 • edited Loading

syedhassaanahmed commented Apr 15, 2020

binarymist commented Apr 30, 2020

danieldreier commented May 22, 2020

apparentlymart commented Jun 24, 2020

ghost commented Jul 25, 2020

apparentlymart commented Jan 4, 2018 •

edited

Loading

Trouble with `depends_on`

Effect on `terraform refresh`

odee30 commented Mar 29, 2020 •

edited

Loading