Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Allow circular dependencies in resources #27188

Open
dansimau opened this issue Dec 8, 2020 · 24 comments
Open

Feature: Allow circular dependencies in resources #27188

dansimau opened this issue Dec 8, 2020 · 24 comments
Labels
enhancement new new issue not yet triaged

Comments

@dansimau
Copy link

dansimau commented Dec 8, 2020

Current Terraform Version

v0.14.0-rc1

Use-cases

  • I have a aws_cognito_user_pool resource
  • On this resource there is a block where you can configure the ARN of Lambda triggers
  • I want to configure a Lambda trigger to have the Cognito user pool ID as an environment variable

When I configure this in Terraform, it obviously doesn't work. I get:

$ terraform apply

Error: Cycle: module.auth.aws_lambda_function.presignup, module.auth.aws_cognito_user_pool.default

However, the use case above is a real-world circular dependency that is legitimate. Outside of Terraform, it would be a 3-step process to configure this, e.g. one of the ways would be:

  1. Create Cognito user pool
  2. Create Lambda function, using Cognito user pool ID as an input
  3. Update Cognito user pool to add Lambda trigger

(You could also create the other resource first, but the steps are the same: Create resource A; Create resource B; Update resource A).

Attempted Solutions

  • As far as I know today, there is no built in way for Terraform to handle this type of situation
  • Providers can do things like create a "virtual" resource in Terraform, that connects two resources and performs some kind of update. However, this requires a common use case to appear and dev work to happen to support it in each case.
  • Otherwise, this problem is just pushed to the user to do workarounds: they can configure a data source for example, but it will require two Terrrform runs and the first one may result in an error. The user then has to know to ignore this and run Terraform again, which is not very elegant or good for CI.

Proposal

Is there a way in which Terraform could attempt to resolve cycles automatically by doing a create A, create B, then update A?

Sorry if this suggestion seems naïve, I admit I'm not familiar with Terraform internals. However, I imagine it to require:

  • Tracking specific attributes that are causing the cycle and deferring setting that during create
  • Taking into consideration which attributes can be updated in-place (e.g. attributes that require the resource to be recreated would not be candidates for this)
  • Being able to manage this behaviour per provider or per resource to overcome edge cases

The idea here is that this would be a general solution. The observation here is that resource cycles are a legitimate and real-world use case that need to be dealt with in a general way.

References

I did a search to try and find prior discussions on this but I couldn't find any specific feature request around representing or allowing resource dependency cycles.

@dansimau dansimau added enhancement new new issue not yet triaged labels Dec 8, 2020
@apparentlymart
Copy link
Member

Hi @dansimau! Thanks for sharing this use-case.

As you've noted, the typical way to deal with this today is for the provider to explain to Terraform that "update cognito user pool to add lambda trigger" is a separate operation by representing it as a separate resource. That creates a relatively easy to explain execution model: there is only one action for each resource per plan (with the special exception of "replace", which is internally a combined destroy/create), and the ordering of those actions is derived from the dependencies between those resources.

Off the top of my head I'm not able to imagine a general solution to this which doesn't require the provider to give Terraform enough information to understand that, in your case, it's allowable and reasonable to create a cognito user pool without a lambda trigger at first and then update it later. Any design that requires additional information in the provider schema would not meet the use-case as you framed it, where additional work in the provider was your criteria for failing the current design as a suitable solution.

Since your request here explicitly excludes the current design as a possible answer, but there isn't yet a candidate new design to evaluate, I'm going to leave this open for the moment but I want to be explicit that it will likely be closed unless someone suggests a concrete technical design for further discussion, because we (the Terraform team at HashiCorp) consider this problem already "solved" in the sense that there is a way for a provider to represent the sequence of three operations you described.

In our conception of Terraform's architecture, we consider it the provider's primary responsibility to map from the concepts of the remote API onto Terraform's workflow, and so although it would be nice to find some way to "automate away" this design problem, architecturally there is no particular need to do so, and if the AWS provider doesn't offer a way to associate a lambda trigger with a cognito pool as a separate operation then I expect it will be far more expedient to work on a specific technical design for the AWS provider to address that than to try to design a generalized solution for hypothetical additional problems that we are not yet aware of. At the very least, we'll need several more examples of similar problems in order to start to analyze what they all have in common and thus how the problem might generalize.

@dansimau dansimau changed the title Feature: Allow cyclic dependencies for resources Feature: Allow circular dependencies for resources Dec 9, 2020
@dansimau dansimau changed the title Feature: Allow circular dependencies for resources Feature: Allow circular dependencies in resource attributes Dec 9, 2020
@dansimau dansimau changed the title Feature: Allow circular dependencies in resource attributes Feature: Allow circular dependencies in resources Dec 9, 2020
@dansimau
Copy link
Author

dansimau commented Dec 9, 2020

Thanks for the considered reply @apparentlymart.

we'll need several more examples of similar problems in order to start to analyze what they all have in common and thus how the problem might generalize.

Indeed, I'd be interested to know how often this comes up. Judging by the fact that nobody filed an issue before, maybe not as much as I originally assumed when I hit this use case.

@okaros
Copy link

okaros commented Dec 10, 2020

I don't have a proposed solution, but I can provide another example. This circular dependency scenario happens in the AzureRM provider with app services and Azure-managed SSL certificates.

The azurerm_app_service resource can be given a custom hostname and SSL certificate via the azurerm_app_service_custom_hostname_binding resource. You normally specify the SSL certificate to use via the 'fingerprint' attribute, which is the SSL fingerprint of the desired certificate.

If you wish to use a free Azure Managed Certificate via the azurerm_app_service_managed_certificate resource, a circular dependency is created: azurerm_app_service_managed_certificate requires an azurerm_app_service_custom_hostname_binding, but azurerm_app_service_custom_hostname_binding requires the fingerprint from azurerm_app_service_managed_certificate in order to attach the certificate.

(I work around the problems with a judicious use of a local-exec provisioner and ignoring changes to some attributes, so I bring this up just to provide another example use-case)

Edit: Ironically, as I was typing this out a new release of the AzureRM provider eliminated this particular circular dependency. 🤣

@okaros
Copy link

okaros commented Dec 10, 2020

There's also the more general use-case of wanting to build resources that communicate with each other in Terraform.
Consider:
You want to create two Azure App Services that need to communicate with each other via connection strings configured in their environment variables. This is a circular dependency in Terraform, regardless of provider/platform, since you cannot have their resource objects reference each other unless they already exist, and they don't. You can cheat this in any number of ways (multi-stage terraform deployments with variables to control how far along you are in the process, for example, or by importing manually-created resources, or...) but it would terribly nice to not have this limitation.

From a what-does-a-solutiosn-look-like perspective, perhaps a configuration block similar in structure/functionality to a provisioner that fires after resource creation, but used to provide for a delayed attribute update/change instead?

As a vague example for such a post_create block:

resource "azurerm_app_service" "example1" {
  name                = "example-app-service"
...
...
  app_settings = {
    "SOME_KEY" = "some-initialvalue"
  }

  post_create {
    app_settings = merge(self.app_settings, { "SOME_OTHER_KEY" = azreurm_app_service.example2.default_site_hostname }
  }
}

resource "azurerm_app_service" "example2" {
  name                = "example-app-service"
...
...
  app_settings = {
    "SOME_KEY" = "some-value"
  }

  post_create {
    app_settings = merge(self.app_settings, { "SOME_OTHER_KEY" = azreurm_app_service.example1.default_site_hostname }
  }
}

The end-result being that each resource gets created first without SOME_OTHER_KEY being present in app_settings{}, then updated post-creation in the same plan to add it.

Referencing the other resource like this would allow for appropriate dependency ordering, hopefully? And after successful creation the results of the post_create can be merged into the regular state so future plans work normally? This would solve almost all of the use-cases for circular dependencies I've run into, I think, including the original Cognito-/Lambda-oriented presented here, and would also allow for more natively-Terraform workarounds for cases where providers haven't caught up with addressing circular resource dependencies like the one I mentioned in my previous comment.

@apparentlymart
Copy link
Member

That's an interesting idea, @okaros, and reminds me a bit of functional reactive programming where programs react to events by merging the event data in with a previous value.

It does seem like an idea worth researching in some more detail. Some initial thoughts I have for questions to consider would be:

  • How might this work if the value to be updated later is in a nested block rather than a top-level attribute? That would presumably require a way to refer to a specific nested block, and nested blocks aren't always directly addressable.
  • Are there situations that might require more than one subsequent update, where two separate operations elsewhere in the configuration contribute separate updates?
  • How would this be best presented in the plan output? Currently we show only one operation per resource instance but this creates the possibility of multiple. (It also means changing Terraform's internal models a bit, but I think better to figure out how it looks externally and then work backwards to the internals from there.)

It does seem like a promising direction to investigate, but also not an easy thing to prototype with Terraform as it exists today. 🤔 I would like to consider it more though, so thanks for suggesting it.

@okaros
Copy link

okaros commented Dec 12, 2020

I don't have full answers, @apparentlymart, but some thoughts from my end-user perspective :

An initial or even final implementation might simply say "You can't do that" with regards to nested blocks that aren't addressable, or even nested blocks altogether. Other Terraform functionality has limitations on what can be interacted with ("destroy" provisioners come immediately to mind as an example of something with heavy restrictions), so such limitations wouldn't be unprecedented. A solution that worked with everything would of course be ideal, but for me, at least, even a limited solution be a welcome improvement.

Multiple post_create updates would be interesting, although I'm not sure I can see a use-case where they'd be needed (at least, not without introducing additional layers and the concept of post_post_create, which strikes me as being... too much) . But if they are, perhaps they could be handled in the same fashion as provisioners, with multiple blocks simply being handled serially both in the written HCL blocks and in planning/execution? I'd envisioned the post_create block as being limited to addressing attributes on the attached resource and not being capable of adjusting other resources, and two different resources with post_create blocks would only be able to reference attribute values available during the initial creation. i.e. If my example2 app service from above tries to reference example1.app_settings, it only sees SOME_KEY and not SOME_OTHER_KEY (but, once the resources were created SOME_OTHER_KEY would be available and indistinguishable from SOME_KEY).

Currently provisioners aren't shown at plan-time at all, and those are my closest analogue to this idea, so.... 🤣
More reasonably, I think that the standard (known after apply) message would probably be appropriate? My limited understand of how the plan is built suggests it ought to be possible to know which attributes will be modified in the second pass and simply include those as attributes being changed/set on the resource, but with the circular-dependency scenarios we're talking about here I don't think there would be many cases where the final values could be known at plan time. After all, if they were know-able it really wouldn't be a circular dependency...
And generally, I'm not sure there would be much use for knowing details of the intermediate stage except during a failed apply, which should probably just be treated the same as a provisioner that failed (taint the entire resource so it's recreated on the next attempt and that recreation can cascade changes out to other, dependent resources appropriately). I think that from the perspective of the person running terraform the fact that it's two disparate operations "under the hood" rather than one to get to the end-state probably doesn't matter very much, and the actual details of how the execution behaves can simply be in the documentation for the post_create block.

@lmmattr
Copy link

lmmattr commented Apr 26, 2021

@okaros I know it doesn't solve this particular issue but I feel in needs pointing out that the event data passed to the lambda trigger does include the Cognito user pool id.

@jeffg-hpe
Copy link

jeffg-hpe commented May 5, 2021

we'll need several more examples of similar problems in order to start to analyze what they all have in common and thus how the problem might generalize

I have two Okta orgs managed via https://registry.terraform.io/providers/oktadeveloper/okta and I want to set up a SAML based trust between them.

Creating the resources in step 1 and 2 generates unique identifiers that must be exchanged, and cannot be known in advance.

Steps:

  1. Create resource okta_saml_idp in org1
  2. Create resource okta_app_saml in org2, using values from step 1
  3. Update okta_saml_idp, using values from step 2

Using the post_create proposed above, it might look something like this.

# idp is created first, with placeholder for argB
resource "okta_saml_idp" "external-idp" {
  argA = "value"
  argB = "placeholder"
  post_create {
    argB = okta_app_saml.sp.some.value
  }
}

# sp is created later, due to dependency on idp.somevalue
resource "okta_app_saml" "idp-provider" {
  argA = okta_saml_idp.idp.someother.value
  argB = "value"
}

# finally, post_create can execute as its dependency is satisfied now.

Note, I simplified for brevity (removed the needed multiple providers, used example arg/attrib names).

I see three alternatives to solving this via post_create block:

  1. ignore_changes + local-exec block and call out APIs directly for step 3
  2. multistage terraform deployment, with a resource import
  3. new provider resource that configures a subset of okta_saml_idp for an existing instance

@dzrtc
Copy link

dzrtc commented Jun 22, 2021

we'll need several more examples of similar problems in order to start to analyze what they all have in common and thus how the problem might generalize

AWS Transit Gateway provides routing between multiple VPCs, replacing VPC Peering. Setting this up involves circular dependencies because the TGW must be explicitly attached to the VPCs (requiring knowledge of the vpc_id) while the VPCs must setup routes through the TGW (requiring knowledge of the ec2_transit_gateway_id).

It makes a lot of sense to manage the (many) VPCs with their TGW routes independently (note1) of the (one) TGW with its VPC attachments. However, if you break the dependency cycle by setting up the VPC route tables after the VPCs and TGW exist, then you can't manage the VPC because the "new" routes are discovered in subsequent plans.

On the other hand, if you setup the TGW first without any attachments, then manage the attachments and route tables from inside each VPC, then that undermines the value of using TGW to centrally administer routes between VPCs.

I'm not sure how I could use the proposed post_create to solve this problem.

note1: By "independently", I mean, "resources managed in distinct tfstate files".

@bmilesp
Copy link

bmilesp commented Sep 23, 2022

I've been working around this issue using a blue/green and dev environments for my app, but then I ran into an AWS issue that left appsync domains in a state where it was unusable for hours (seperate Terraform issue regarding Custom domain disassociation).

This was the original stacks:

GlobalStack (route53 hosted zone, SES, IAM, ACM)
  |       \------------------------------
  |                                      | 
LiveDataStack                      DevDataStack (databases and Cognito user pools)
  |                                      |
  |                                      |
Blue/GreenAppStacks                DevAppStack (Appsync, StepFunctions, Lambdas, Cloudwatch)

This worked well as i could change a config var and point the route53 domain name to either blue or green stacks easily, but to help illustrate the point below, notice that the global resources and datastack resources are required by other resources in the stacks downstream.

So now were to the key problem. I wanted to safeguard against this aforementioned issue (and potential others like downed resources between regions, etc) by creating a "region" stack layer, so that i could replicate the LiveDataStack, DevDataStack, Blue/Green/Dev AppStacks into another/multiple regions like this:

GlobalStack
    |  \----------------------------------
    |                                     |
RegionalStack us-east-2          RegionalStack us-west-1
    |                                     |
Live/Dev DataStacks                  Live/Dev DataStacks 
    |                                     |
Blue/Green/Dev AppStacks       Blue/Green/Dev AppStacks       

But because of the circular dependencies, this is not possible (or at least I have not been able to find a way to do this).

@griffinator76
Copy link

we'll need several more examples of similar problems in order to start to analyze what they all have in common and thus how the problem might generalize

I tried to use Terraform to set up a Snowflake "Storage Integration" object that links to an AWS S3 bucket using the "chanzuckerberg" Snowflake provider from the Terraform registry in addition to the standard AWS provider.

Part of the process to create the integration requires the following sequence of actions:

  • Create an IAM Role with S3 access policy
  • Create a Snowflake Storage Integration object, specifying the IAM Role created in step 1
  • Modify the IAM Role access policy using values from the Storage Integration object
    (complete list of steps here)

Hence there is a circular dependency between the IAM Role and Storage Integration. Steps 1 and 2 are straightforward but step 3 involves modifying an object's state after it has been created.

The IAM Role access policy cannot be modified separately from the role itself.

@spectria-limina
Copy link

I've run another use case: trying to manage content with a series of messages, each with a "back to top" link which would link to the table of contents. The table of contents, of course, needs to be able to link to all the other posts. This is another instance of "I need mutually referential identifiers".

My alternative suggestion is that two-phase created could be explicitly supported at the platform level, as there are APIs that allow reservation of resources much more cheaply than full creation. Something like:

  • A Resource can optionally be a ResourceWithReserve. If it is, it must have at least one Attribute with a new Reservable flag, and may also set a new RequiredForReserve flag (or maybe this one should be inverted).
  • When the resource is to be created, it yields two nodes in the dependency graph: one for the reservation step, and one for the creation step. Dependencies through RequiredForReserve or Reservable are applied to the reservation node; dependencies through other attributes are applied to the creation node. The creation node depends on the reservation node. I hope this is similar to how an update can be sometimes internally represented as a create and a delete.
  • If there are any nodes strictly between the reservation and creation on the dependency graph, then the creation must be two-phase. Variables filled in by the reservation are marked as "(known after reservation)" in the plan.
  • If there are no such nodes, then the planner might still to do a two-phase create as an optimization, if it would unblock other resources' application. Perhaps a resource could hint when it can reserve much more quickly than it can actually create.
  • When a resource supports reservation, there is a new reservation meta-argument. It can have some depends_on and lifecycle meta arguments applied, as well as be the target of depends_on—these are simply ignored if the reservation step is never performed. There could also be reserve = always or reserve = never to control behaviour when needed (the latter being a hard error if it creates a loop).

This design could be extended to multiple phases, but it's not immediately clear you'd want that.

@mm-col
Copy link

mm-col commented Dec 2, 2022

I'm running into this in OCI.

Creating a custom route table and assigning that route table to the subnet works without issue. The problem comes when I also want to create route rules in that route table.

For example, I have a subnet defined and that subnet will have an Ubuntu instance in it along with a Palo Alto firewall instance. I need a route table assigned to the subnet that makes the trust interface IP of the firewall the default gateway for the subnet.

Here are the components that need to work together:
subnet
private IP
route table
route rules

The problem is the circular dependencies. The route rule depends on the network_entity_id of the private IP. That private IP depends on the subnet. The subnet depends on the route table at creation.

Everything works until a route rule is specified that includes the id of the private IP as that creates the circular reference. The subnet can't use the route table because the route table has a rule in it that points to the private IP which can't be created before the subnet is created.

@Huang-W
Copy link

Huang-W commented Dec 16, 2022

hashicorp/terraform-provider-aws#1824

Another valid use case is cycles in AWS security groups or prefix lists.

@BWeesy
Copy link

BWeesy commented Feb 28, 2023

We bumped into this while trying to pass the invokeURL of an AWS gateway resource to a lambda as an environment variable because the gateway has endpoints that route to the lambda.

@sgal-dm
Copy link

sgal-dm commented Apr 3, 2023

I have the same use case that jeffg-hpe posted above, and to expand on it a little, the provider can't cleanly handle this one because building it requires multiple instances of the provider, one targeting the IdP tenant, the other targeting the SP tenant each with distinct API endpoints and auth tokens.

So the typical approach of adding a virtual resource to the provider that manages multiple resources under the hood doesn't work here because those resources exist in disparate environments.

His third alternative approach, while novel, seems sloppy for a provider. It'd require something like:

  • Define resourceA, ignoring attributes w & x.
  • Define resourceB, which depends on A, and reads attributes y & z from it.
  • Define the new resourceC, which depends on A & B, and under the hood is an imported replica of A that sets attributes w & x based on the output of B.

So we're left with local-exec or multi-stage deployment unless this can be handled as a feature of Terraform.

@patmaddox
Copy link

the typical way to deal with this today is for the provider to explain to Terraform that "update cognito user pool to add lambda trigger" is a separate operation by representing it as a separate resource. That creates a relatively easy to explain execution model: there is only one action for each resource per plan (with the special exception of "replace", which is internally a combined destroy/create), and the ordering of those actions is derived from the dependencies between those resources.

Do you have an example of this typical way? I am researching and not sure how two separate operations (to the same resource I assume?) are modeled as separate resources.

Our use case is configuring snowflake. The manual process is:

  1. Create an AWS role with a dummy account ID
  2. Create a Snowflake integration, referring to the role ARN
  3. Update the role with the account ID generated by the integration

Some possible mechanisms I've heard referenced in my research are dynamic data sources, dynamic variables, and now this multiple operations. But I haven't worked out yet how to implement any of them.

@apparentlymart
Copy link
Member

One example of this pattern that I can think of quickly is in the hashicorp/aws provider:

There are separate resource types for aws_s3_bucket and aws_s3_bucket_policy, which allows the policy to refer to the arn attribute of the bucket itself when describing rules about specific sub-paths inside the bucket, which typically involves writing an ARN whose prefix is the arn attribute of the bucket as a whole.

@dbaynard
Copy link

There are separate resource types for aws_s3_bucket and aws_s3_bucket_policy, which allows the policy to refer to the arn attribute of the bucket itself when describing rules about specific sub-paths inside the bucket, which typically involves writing an ARN whose prefix is the arn attribute of the bucket as a whole.

Oh, is that why so many aws features have separate resource types?

Does that mean that in places where there are blocks that could be separate resources, the direction of travel is towards the latter?

@apparentlymart
Copy link
Member

apparentlymart commented May 12, 2023

There is a separate team responsible for the hashicorp/aws provider and so I don't know all of what motivates their design decisions, but in this particular case (the S3 operations) the structure with separate resource types for different features matches the structure of the underlying API, which has separate write operations for the two resource types I mentioned: s3:CreateBucket for aws_s3_bucket and s3:PutBucketPolicy for aws_s3_bucket_policy.

I suspect you're recalling that earlier versions of the provider just had a single aws_s3_bucket resource type which covered a large portion of the Amazon S3 API surface. And indeed, the lesson learned from that initial design is that providers should typically follow as closely as possible the separation of concerns in the underlying API, because the finer details of the API typically rely on characteristics of the coarser decisions. We can see that in the example I shared, where the underlying API assumes you can create a bucket to find out its ARN before you create a policy for that bucket. The Terraform provider merging those two into a single operation therefore made that particular detail of the API design not work properly in Terraform.

From discussions from the provider teams my understanding is that their modern design approach is to closely match the structure of the underlying API to avoid this sort of design inconsistency in the fine details. That goal might explain other API changes where certain single resource types were split into many separate resource types in later releases, but I'm not involved with the detailed planning of that and I only know about the S3 example because I've previously helped folks in the community who had problems caused by the old design.

If you'd like to discuss more about how the hashicorp/aws provider is designed then I suggest doing so in its own repository, because the folks who monitor this repository are not directly involved in the design or implementation of that provider.

Thanks!

@glerb
Copy link

glerb commented May 23, 2023

Another use case analogous to @apparentlymart 's S3 case above: locking KMS keys to any resource that uses them with a Resource restriction in the key policy:

resource "aws_sns_topic" "log_processing" {
  name = "LogProcessingTopic"
  kms_master_key_id = aws_kms_key.log_processing.arn

with a key policy for the KMS key of:

data "aws_iam_policy_document" "log_processing_kms_key" {
  statement {
    actions = [
      "kms:GenerateDataKey*",
      "kms:Decrypt"
    ]
    resources = [aws_sns_topic.log_processing.arn]
    effect    = "Allow"

    principals {
      type = "Service"
      identifiers = [
        "sns.amazonaws.com",
      ]
    }
  }

@Vingtoft

This comment was marked as off-topic.

@nibblesnbits
Copy link

3 years later. Any updates here?

@crw
Copy link
Collaborator

crw commented Dec 13, 2023

@nibblesnbits Based on scanning @apparentlymart's comments, I would not expect this behavior to change in Terraform v1.x. This is the type of issue the team likes to leave open to generate ideas and use cases for a "hypothetical v2."

For future viewers, if you are viewing this issue and would like to indicate your interest, please use the 👍 reaction on the issue description to upvote this issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new new issue not yet triaged
Projects
None yet
Development

No branches or pull requests