Data-driven Terraform Configuration #6598

apparentlymart · 2016-05-10T20:27:43Z

This is where I'm working on the implementation of the proposal from #4169.

This has been re-opened a bunch of times by this point as it's moved from my fork to the main repo, from master to dev-0.7, and now back from dev-0.7 to master again. 😑

Since this change spans multiple Terraform layers, the sections that follow summarize the changes in each layer, in the hope of making this changeset easier to review. The PR is broken into a sequence of commits which, as far as possible, change only one layer at a time so that each change can be understood in isolation.

Configuration (`config` package)

In the config layer, data sources are introduced by expanding the existing Resource concept with a new field Mode, which represents which operations/lifecycle this resource follows:

ManagedResourceMode: previously the only mode; Terraform creates and "owns" this resource, updating its configuration and eventually destroying it.
DataResourceMode: Terraform only reads from this resource

In the configuration language, resource blocks map to ManagedResourceMode resources and data blocks map to DataResourceMode resources.

data blocks don't permit provisioner or lifecycle sub-blocks because these concepts do not make sense for a resource that only has a "read" action. Internally, data resources always have an empty Provisioners slice and a zero-value ResourceLifecycle instance.

A similar extension has been made to ResourceVariable, which can now represent both the existing TYPE.NAME.ATTR variables and the new data.TYPE.NAME.ATTR variables, again using a Mode field as the discriminator.

Since both traditional resources and data resources are both kinds of resources, they both appear in the Resources slice within the configuration struct. The Resource.Id() implementation keeps them distinct by adding a data. prefix to data resource ids, which is a convention that will continue through to the core layer.

ResourceMode enumeration and Mode attribute on config.Resource
Parsing of data blocks from configuration files
Parsing of data.TYPE.NAME.ATTR variables and Mode attribute on config.ResourceVariable

Core changes

Within core is where we find the biggest divergence of codepaths for managed vs. data resources, since data resources have a simpler lifecycle.

The ResourceProvider interface has a new method DataSources, which is analogous to Resources. The Validate phase is consistent between the two, except that the provider abstraction distinguishes between ValidateResource and ValidateDataSource, both of which are supported by EvalValidate depending on mode.

The remainder of the workflow is completely distinct and handled by two different codepaths, switching on the resource mode inside terraform/transform_resource.go.

Even though ultimately data resources support only a "read" operation, the standard plan/apply model is supported by splitting a read into two steps in the ResourceProvider interface:

ReadDataDiff: takes the config and returns a diff as if the data resource were being "created", allowing core to know about the data source's computed attributes without actually reading any data.
ReadDataApply: takes the diff, uses it to obtain the configuration attributes, actually loads the data and returns a state.

The important special behavior for data resources is that during the "refresh" walk they will check to see if their config contains computed values, and if it doesn't then the diff/apply steps are run immediately, rather than waiting until the real plan and apply phases. This ensures that non-computed data source attributes can be safely used inside provider configurations, bypassing the chicken-and-egg problems that are caused by computed provider arguments.

A significant difference compared to managed resources is that a data source "read" does not get access to any previous state; we always create an entirely new instance on each refresh. The intended user-facing mental model for data resources is that they are not stateful at all, and we persist them in the on-disk state file only so that -refresh=false can act as expected without breaking the rest of the workflow.

`helper/schema` support for data sources

In the helper/schema layer, the new map of supported data sources is kept separate from the existing map of supported resources. Data sources use the familiar schema.Resource type but with only a Read implementation required and Create, Update, and Delete functions forbidden.

The Read implementation works in essentially the same way as it does for managed resources, getting access to its configuration attributes via d.Get(...) and setting computed attributes with d.Set(...). The only notable differences are that d.Get(...) won't return values of computed attributes set on previous runs, and calling d.SetId(...) is optional.

To help us migrate existing "logical resources" to instead be data sources, a helper is provided to wrap a data source implementation and shim it to work as a resource implementation. In this case, the Read implementation must call d.SetId(...) in order to meet the expectations of a managed resource implementation.

DataSourcesMap within helper.Provider
Implementations of DataSources, ValidateData, ReadDataDiff and ReadDataApply
Backward-compatibility shim for using data sources as logical resources
- Deprecation warning when using a resource-shimmed data source

`provider/terraform`: example remote state data source

As an example to show things working end-to-end, the terraform_remote_state resource is transformed into a data source, and the backward-compatibility shim is used to maintain the now-deprecated resource.

terraform_remote_state data source

Targeting Data Resources

ResourceAddress is extended with a ResourceMode to handle the distinct managed and data resource namespaces. data.TYPE.NAME can be used to target data resources, for consistency with how data resources are referenced elsewhere.

ResourceAddress support for data.TYPE.NAME syntax and ResourceMode.

UI Changes

When data resource reads appear in plan output, we show them using a distinct presentation to make it clear that no real infrastructure will be altered by this operation:

Since a data resource read is internally just a "create" diff for the resource, this is just some sleight of hand in the UI layer to present it differently.

A "read" diff will appear only if the read operation cannot be completed during the "refresh" phase due to computed configuration.

Change diff output show "reads" differently
Hide the "(ID: ...)" suffix when refreshing data sources, since a data source doesn't get an id until after it is "refreshed".

Other stuff

User-oriented documentation
Prevent data resources from being tainted explicitly with terraform taint ("tainting" is not meaningful for data resources because they are not created/destroyed.)

apparentlymart · 2016-05-10T20:35:46Z

@phinze, @jen20: I checked off all the items on my list, so this is now feature-complete according to my original plan.

Along with running the automated tests many times along the way 😀, I have done some ad-hoc manual testing to exercise the various different combinations of computed/non-computed configs, dependent resources, dependent providers, etc. It seems to work as I expected.

Previously resources were assumed to always support the full set of create, read, update and delete operations, and Terraform's resource management lifecycle. Data sources introduce a new kind of resource that only supports the "read" operation. To support this, a new "Mode" field is added to the Resource concept within the config layer, which can be set to ManagedResourceMode (to indicate the only mode previously possible) or DataResourceMode (to indicate that only "read" is supported). To support both managed and data resources in the tests, the stringification of resources in config_string.go is adjusted slightly to use the Id() method rather than the unusual type[name] serialization from before, causing a simple mechanical adjustment to the loader tests' expected result strings.

This allows the config loader to read "data" blocks from the config and turn them into DataSource objects. This just reads the data from the config file. It doesn't validate the data nor do anything useful with it.

This allows ${data.TYPE.NAME.FIELD} interpolation syntax at the configuration level, though since there is no special handling of them in the core package this currently just acts as an alias for ${TYPE.NAME.FIELD}.

This is a breaking change to the ResourceProvider interface that adds the new operations relating to data sources. DataSources, ValidateDataSource, ReadDataDiff and ReadDataApply are the data source equivalents of Resources, Validate, Diff and Apply (respectively) for managed resources. The diff/apply model seems at first glance a rather strange workflow for read-only resources, but implementing data resources in this way allows them to fit cleanly into the standard plan/apply lifecycle in cases where the configuration contains computed arguments and thus the read must be deferred until apply time. Along with breaking the interface, we also fix up the plugin client/server and helper/schema implementations of it, which are all of the callers used when provider plugins use helper/schema. This would be a breaking change for any provider plugin that directly implements the provider interface, but no known plugins do this and it is not recommended. At the helper/schema layer the implementer sees ReadDataApply as a "Read", as opposed to "Create" or "Update" as in the managed resource Apply implementation. The planning mechanics are handled entirely within helper/schema, so that complexity is hidden from the provider implementation itself.

In the "schema" layer a Resource is just any "thing" that has a schema and supports some or all of the CRUD operations. Data sources introduce a new use of Resource to represent read-only resources, which require some different InternalValidate logic.

Historically we've had some "read-only" and "logical" resources. With the addition of the data source concept these will gradually become data sources, but we need to retain backward compatibility with existing configurations that use the now-deprecated resources. This shim is intended to allow us to easily create a resource from a data source implementation. It adjusts the schema as needed and adds stub Create and Delete implementations. This would ideally also produce a deprecation warning whenever such a shimmed resource is used, but the schema system doesn't currently have a mechanism for resource-specific validation, so that remains just a TODO for the moment.

As a first example of a real-world data source, the pre-existing terraform_remote_state resource is adapted to be a data source. The original resource is shimmed to wrap the data source for backward compatibility.

For backward compatibility we will continue to support using the data sources that were formerly logical resources as resources for the moment, but we want to warn the user about it since this support is likely to be removed in future. This is done by adding a new "deprecation message" feature to schema.Resource, but for the moment this is done as an internal feature (not usable directly by plugins) so that we can collect additional use-cases and design a more general interface before creating a compatibility constraint.

This will undoubtedly evolve as implementation continues, but this is some initial documentation based on the design doc.

Once a data resource gets into the state, the state system needs to be able to parse its id to match it with resources in the configuration. Since data resources live in a separate namespace than managed resources, the extra "mode" discriminator is required to specify which namespace we're talking about, just like we do in the resource configuration.

data resources are a separate namespace of resources than managed resources, so we need to call a different provider method depending on what mode of resource we're visiting. Managed resources use ValidateResource, while data resources use ValidateDataSource, since at the provider level of abstraction each provider has separate sets of resources and data sources respectively.

The key difference between data and managed resources is in their respective lifecycles. Now the expanded resource EvalTree switches on the resource mode, generating a different lifecycle for each mode. For this initial change only managed resources are implemented, using the same implementation as before; data resources are no-ops. The data resource implementation will follow in a subsequent change.

This implements the main behavior of data resources, including both the early read in cases where the configuration is non-computed and the split plan/apply read for cases where full configuration can't be known until apply time.

The handling of data "orphans" is simpler than for managed resources because the only thing we need to deal with is our own state, and the validation pass guarantees that by the time we get to refresh or apply the instance state is no longer needed by any other resources and so we can safely drop it with no fanfare.

Previously they would get left behind in the state because we had no support for planning their destruction. Now we'll create a "destroy" plan and act on it by just producing an empty state on apply, thus ensuring that the data resources don't get left behind in the state after everything else is gone.

The ResourceAddress struct grows a new "Mode" field to match with Resource, and its parser learns to recognize the "data." prefix so it can set that field. Allows -target to be applied to data sources, although that is arguably not a very useful thing to do. Other future uses of resource addressing, like the state plumbing commands, may be better uses of this.

Since the data resource lifecycle contains no steps to deal with tainted instances, we must make sure that they never get created. Doing this out in the command layer is not the best, but this is currently the only layer that has enough information to make this decision and so this simple solution was preferred over a more disruptive refactoring, under the assumption that this taint functionality eventually gets reworked in terms of StateFilter anyway.

Data resources don't have ids when they refresh, so we'll skip showing the "(ID: ...)" indicator for these. Showing it with no id makes it look like something is broken.

New resources logically don't have "old values" for their attributes, so showing them as updates from the empty string is misleading and confusing. Instead, we'll skip showing the old value in a creation diff.

Internally a data source read is represented as a creation diff for the resource, but in the UI we'll show it as a distinct icon and color so that the user can more easily understand that these operations won't affect any real infrastructure. Unfortunately by the time we get to formatting the plan in the UI we only have the resource names to work with, and can't get at the original resource mode. Thus we're forced to infer the resource mode by exploiting knowledge of the naming scheme.

A companion to the null_resource resource, this is here primarily to enable manual quick testing of data sources workflows without depending on any external services. The "inputs" map gets copied to the computed "outputs" map on read, "rand" gives a random number to exercise cases with constantly-changing values (an anti-pattern!), and "has_computed_default" is settable in config but computed if not set.

Provider nodes interpolate their config during the input walk, but this is very early and so it's pretty likely that any resources referenced are entirely absent from the state. As a special case then, we tolerate the normally-fatal case of having an entirely missing resource variable so that the input walk can complete, albeit skipping the providers that have such interpolations. If these interpolations end up still being unresolved during refresh (e.g. because the config references a resource that hasn't been created yet) then we will catch that error on the refresh pass, or indeed on the plan pass if -refresh=false is used.

jen20 · 2016-05-16T18:00:36Z

Hi @apparentlymart! Both @phinze and I have reviewed this and decided to merge as-is. This is an amazing piece of work, and a fantastic OSS contribution! It will be in the first 0.7 beta along with a couple of additional sources. Hopefully by the time 0.7 actually hits we will have been able to expand the range of data sources available significantly. Thanks for all your work both on the initial proposal and on a solid implementation!

mkuzmin · 2016-08-03T09:12:33Z

@apparentlymart Martin, the amount and quality of work you did for Terraform is just incredible.
Congratulations with the release, and thank you!

ghost · 2020-04-23T02:33:21Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

apparentlymart added enhancement core labels May 10, 2016

apparentlymart force-pushed the f-data-sources branch from 1711372 to 16b57ad Compare May 10, 2016 20:31

apparentlymart force-pushed the f-data-sources branch 2 times, most recently from 89b8d2d to c13d0d7 Compare May 10, 2016 21:56

vancluever mentioned this pull request May 11, 2016

Data-driven Terraform Configuration #4169

Closed

apparentlymart force-pushed the f-data-sources branch from c13d0d7 to b48374c Compare May 11, 2016 21:47

apparentlymart added 21 commits May 14, 2016 08:26

config: Data source loading

8601400

This allows the config loader to read "data" blocks from the config and turn them into DataSource objects. This just reads the data from the config file. It doesn't validate the data nor do anything useful with it.

config: Parsing of data.TYPE.NAME.FIELD variables

718cdda

This allows ${data.TYPE.NAME.FIELD} interpolation syntax at the configuration level, though since there is no special handling of them in the core package this currently just acts as an alias for ${TYPE.NAME.FIELD}.

provider/terraform: remote state resource becomes a data source

3eb4a89

As a first example of a real-world data source, the pre-existing terraform_remote_state resource is adapted to be a data source. The original resource is shimmed to wrap the data source for backward compatibility.

website: Initial documentation about data sources

64f2651

This will undoubtedly evolve as implementation continues, but this is some initial documentation based on the design doc.

core: lifecycle for data resources

3605447

This implements the main behavior of data resources, including both the early read in cases where the configuration is non-computed and the split plan/apply read for cases where full configuration can't be known until apply time.

command: Show id only when refreshing managed resources

5d27a5b

Data resources don't have ids when they refresh, so we'll skip showing the "(ID: ...)" indicator for these. Showing it with no id makes it look like something is broken.

command: don't show old values for create diffs in plan

bfee4b0

New resources logically don't have "old values" for their attributes, so showing them as updates from the empty string is misleading and confusing. Instead, we'll skip showing the old value in a creation diff.

apparentlymart force-pushed the f-data-sources branch from b48374c to f95dccf Compare May 14, 2016 15:26

apparentlymart force-pushed the f-data-sources branch from b093de4 to 453fc50 Compare May 14, 2016 16:25

jen20 mentioned this pull request May 14, 2016

[WIP] provider/aws: Availability Zone helper resource #6669

Closed

apparentlymart mentioned this pull request May 14, 2016

[WIP]: provider/aws: Add aws_availability_zones data source #6671

Closed

jen20 merged commit a2950c7 into master May 16, 2016

apparentlymart mentioned this pull request May 16, 2016

Convert Legacy Logical and Read-only Resources to Data Sources #6688

Closed

7 tasks

apparentlymart deleted the f-data-sources branch May 17, 2016 13:40

jen20 mentioned this pull request May 23, 2016

Request: Custom Outputs from null_resource/local-exec #6830

Closed

vancluever mentioned this pull request May 25, 2016

Search for AWS AMIs #6862

Closed

apparentlymart mentioned this pull request Jan 4, 2018

Data Resource Lifecycle Adjustments #17034

Closed

ghost locked and limited conversation to collaborators Apr 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data-driven Terraform Configuration #6598

Data-driven Terraform Configuration #6598

apparentlymart commented May 10, 2016 •

edited by jbardin

Loading

apparentlymart commented May 10, 2016 •

edited

Loading

jen20 commented May 16, 2016

mkuzmin commented Aug 3, 2016

ghost commented Apr 23, 2020

Data-driven Terraform Configuration #6598

Data-driven Terraform Configuration #6598

Conversation

apparentlymart commented May 10, 2016 • edited by jbardin Loading

Configuration (config package)

Core changes

helper/schema support for data sources

provider/terraform: example remote state data source

Targeting Data Resources

UI Changes

Other stuff

apparentlymart commented May 10, 2016 • edited Loading

jen20 commented May 16, 2016

mkuzmin commented Aug 3, 2016

ghost commented Apr 23, 2020

apparentlymart commented May 10, 2016 •

edited by jbardin

Loading

Configuration (`config` package)

`helper/schema` support for data sources

`provider/terraform`: example remote state data source

apparentlymart commented May 10, 2016 •

edited

Loading