Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Resource Lifecycle Adjustments #17034

Closed
apparentlymart opened this issue Jan 4, 2018 · 11 comments
Closed

Data Resource Lifecycle Adjustments #17034

apparentlymart opened this issue Jan 4, 2018 · 11 comments

Comments

@apparentlymart
Copy link
Member

apparentlymart commented Jan 4, 2018

Background Info

Back in #6598 we introduced the idea of data sources, allowing us to model reading data from external sources as a first-class concept. This has generally been a successful addition, with some great new patterns emerging around it.

However, the current lifecycle for data sources creates some minor problems. As currently implemented, data sources are read during the "refresh" phase that runs prior to creating a plan, except in two situations:

  • If any of the configuration arguments in the corresponding data block have <computed> values.
  • If depends_on is non-empty for the data resource.

In both of the above cases, the read action is deferred until the "apply" phase, which in turn causes all of the result attributes to appear as <computed> in the plan.

Unfortunately both of the above situations are problematic today, as a consequence of data sources being processed during "refresh". These problems are described in more detail in the following sections.

When Data Resource Arguments change

Because data resources are read during the "refresh" phase, references to attributes of resources are resolved from their value in state rather than their value in the resulting diff. This results in a "change lag" , where certain changes to configuration require two runs of terraform apply to fully take effect. The first run reads the data source with the old resource values and then updates the resource, while the second run reads the data source using the new resource values, possibly causing further cascading changes to other resources.

This is particularly tricky for situations where a resource has custom diff logic (via the mechanism added in #14887) that detects and reports changes to Computed attributes that are side-effects of the requested changes, since this can result in additional value changes that are not reflected in the data source read.

The most problematic case is when an attribute is marked as <computed> during an update: this should cause any dependent data resource to be deferred until apply time, but instead the old value is used to do the read and the computed value resulting from the change is not detected at all.

Trouble with depends_on

The current behavior for depends_on for data resources is essentially useless, since it always results in a "perma-diff". The reason for this is that depends_on doesn't give Terraform enough information to know what aspect of the dependency is important, and so it must conservatively always defer the read until the "apply" phase to preserve the guarantee that it happens after the resource changes are finalized.

Ideally we'd like the data resource read to be deferred until apply time only if there are pending changes to a dependent resource, but that is not currently possible because we process data resources during the "refresh" phase where resource diffs have not yet been created, and thus we cannot determine if a change is pending.

Proposed Change

The above issues can be addressed by moving data source processing into the "plan" phase.

This was seen as undesirable during the original data source design because it would cause the "plan" phase to, for the first time, require reaching out to external endpoints. However, we have since made that compromise in order to improve the robustness of planning in #14887. In this new context, reading from data sources during plan is consistent with our goal of having Terraform make whatever requests it needs to make in order to produce an accurate, comprehensive plan. #15895 proposes some adjustments to the behavior of terraform validate so that it can be used as the "offline static check" command, allowing terraform plan to be more complex and require valid credentials for remote APIs even when the implicit refresh is disabled.

Including data source reads in the plan graph means that Terraform will produce diffs for resources before attempting to read data sources that depend on them, which addresses all of the problems described above: the data source arguments can be interpolated from the new values as defined in the diff, rather than the old values present in state.

In particular, the diff can be consulted in order to decide whether a data resource must be deferred until the "apply" phase, allowing any new <computed> values to be considered, and allowing depends_on to defer only if there is a non-empty diff for the referenced resource(s).

Effect on the Managed Resource Lifecycle

This change does not greatly affect the lifecycle for managed resources, but it does restore the original property that the "refresh" phase is a state-only operation with the exception of provider configuration.

An interesting implication of this is that it is in principle safe to disregard any inter-resource dependencies for refresh purposes and instead construct a flatter graph where each resource depends only on its provider. This in turn can permit greater parallelism in read calls, and more opportunities for API request consolidation once #7388 is addressed.

Effect on terraform refresh

Since data sources are currently processed in the "refresh" phase, the terraform refresh command currently updates them. This can be useful in situations where a root module output depends on a data source attribute and its state is being consumed with terraform_remote_state.

Moving data source reads to the plan phase will mean that terraform refresh will no longer update them. The change proposed in #15419 can partially mitigate this by making data resource updates -- and corresponding output updates -- an explicit part of the normal terraform apply flow, which is a more desirable outcome for the reasons described in that issue.

To retain the ability to only update data resources, without applying other changes, we can add a new argument to terraform plan (and, by extension, terraform apply with no explicit plan file argument) -read-only, which produces a reduced plan graph that only includes the data resources.

Bringing data source refresh into the main plan+apply workflow is superior to the current terraform refresh approach because it allows the user to evaluate and approve the resulting changes to outputs, rather than just blindly accepting these updates and potentially disrupting downstream remote state consumers.

For users that still want to accept data source updates without a confirmation step, the command line terraform apply -read-only -auto-approve would be equivalent to the current terraform refresh behavior.

@bbakersmith
Copy link

this would really help some of my use cases for data sources, great to see it being considered!

@finferflu
Copy link

Are there any updates on the progress of this? My projects would really benefit from this. Thanks!

@alwaysastudent
Copy link

I hope this is added into the new terraform releases.

apparentlymart added a commit that referenced this issue Apr 25, 2019
The count for a data resource can potentially depend on a managed resource
that isn't recorded in the state yet, in which case references to it will
always return unknown.

Ideally we'd do the data refreshes during the plan phase as discussed in
#17034, which would avoid this problem by planning the managed resources
in the same walk, but for now we'll just skip refreshing any data
resources with an unknown count during refresh and defer that work to the
apply phase, just as we'd do if there were unknown values in the main
configuration for the data resource.
@paulashbourne
Copy link

Any updates on this proposed change? This would be very useful to avoid having plan spit out a bunch of proposed changes that don't actually result in anything once the data sources are read.

@jclynny
Copy link

jclynny commented Jan 26, 2020

Are there any updates that can be shared here? I've mentioned this in other issues, but the depends_on really shouldn't do a diff on a local file forever. It would be great if it could understand that it has run and doesn't need to run more than once unless there's a change in the underlying template for example.

@odee30
Copy link

odee30 commented Mar 29, 2020

On first apply the following works fine. If a key is added to the list then this also works.

locals {
  keys = [
    "key1",
    "key2",
    "key3"
  ]
}

resource "random_uuid" "values" {
  for_each = toset(local.keys)
}

output "zipped_map" {
  value = length(local.keys) < 1 ? null : zipmap(
    local.keys,
    [for uuid in random_uuid.values : uuid.result]
  )
}

But, removing any key from the key list generates the following error:

Error: Error in function call
  on zipmaptest.tf line 19, in output "zipped_map":
  19:   value = length(local.keys) < 1 ? null : zipmap(
  20: 
  21: 
  22: 
    |----------------
    | local.keys is tuple with 2 elements
    | random_uuid.values is object with 3 attributes
Call to function "zipmap" failed: number of keys (2) does not match number of
values (3).

A couple of issues on Github (this and this) led me to this post. Please can you confirm that this is indeed the same issue as suggested above as it seems to refer to issues with data sources? Are there any known workarounds?

I have found that using a conditional check check the length of local.keys and return null if it is 0 allows for all the items to the removed from the keys list. This does allow me to update the list by first removing all the re-adding those that are needed, but this is not ideal. Without the conditional it is not possible to remove all the items from the list without generating the error.

Any help would be appreciated. Thanks.

@syedhassaanahmed
Copy link

Any update on this one? My original issue is #22005

@binarymist
Copy link

Any update? Also hashicorp/terraform-provider-archive#11

@danieldreier
Copy link
Contributor

I'm very excited to announce that beta 1 of terraform 0.13.0 will be available on June 3rd, and will these changes. I've pinned an issue with more details about the beta program, and posted a discuss thread for folks who want to talk about it more.

@apparentlymart
Copy link
Member Author

Terraform v0.13.0 will include some changes that address parts of the problem statement as I wrote it up originally:

  • Most notably, using depends_on in a data resource will no longer cause a "perma-diff". Instead, Terraform will generate a deferred read plan for the data resources only if depends_on includes (directly or indirectly) a managed resource that has a planned change action.
  • Terraform v0.13 can now read data resources during the plan phase, though we retained the read during the refresh phase for now because of an interaction the original proposal didn't cover: we must configure a provider before we can refresh with it, and a provider configuration can depend on a data resource. This part is therefore only partially solved.

We are planning in a future release to merge the refresh and plan phases entirely, so that they will both happen in a single graph walk. That will address the rest of the quirks my original write-up described, because it will allow refreshing and planning actions to depend on each other in a fully-general way, rather than our current situation where refreshing actions are artificially forced to always precede planning actions.

This old issue has outlived its usefulness as a design proposal because the plan to merge refresh and plan phases will achieve the same results in a different (better) way. For that reason, I'm going to close this even though 0.13 doesn't implement exactly what I described, and we'll create future issues/PRs for the newer incarnation of this set of work once we complete some more detailed design work for it.

@ghost
Copy link

ghost commented Jul 25, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Jul 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants